EBenchAttacker

Hello, |

EBenchAttacker is a simple and easy-to-use tool that can evaluate the alignment ability of large language models(LLMs). You can easily integrate various open source or commercial models into EBenchAttacker, as long as you have the relevant access and computing resources. The development of the tool is supported by Beijing Advanced Innovation Center for Big Data and Brain Computing.

✨Update 2024.5.4: We have added experimental results for the LLaMA-3-8B-Instruct model~

EBenchAttacker

EBenchAttack has the following designs:

Scenes
EBenchAttacker considers scenarios including cybercrime, fraud, political sensitivity and so on. The design of the scene refers to some public terms of service of OpenAI, Meta, etc.
EBench - Harmful Questions Set
EBench is our specialized dataset designed for EBenchAttacker. It contains 1,000 harmful questions, 10 scenarios, each harmful question written in eight languages.
Target Models
EBenchAttacker integrated four open source models and three commercial models and can be easily expanded.
Attack Methods
EBenchAttacker integrated Default Attack, GCG, PAIR and other attack methods to attack LLMs and collects experimental data.
Analysis
After using EBenchAttacker to attack, we analyze the experimental results and present them.

By utilizing EBenchAttacker for the analysis of LLMs, we can gain a more comprehensive understanding of their security aspects. This enables us to refine the model in a manner that enhances its security and compliance. Additionally, our tool has the potential to inspire future research endeavors.

Beihang University, School of Computer Science and Engineering(SCSE)

Beijing Advanced Innovation Center for Big Data and Brain Computing

938612