EBenchAttacker

Welcome to the EBenchAttacker's Leaderboard😄

Here we shows the results of EBenchAttacker attacking different LLMs. We tested both open source LLMs and commercial LLMs and calculated ASR(Attack Success Rate, %) respectively. Note that some attacks might not be able to work on commercial LLMs. Thus we might apply a "Transfer Attack" on these LLMs. Here we use dataset "EBench-small". You may conduct a more comprehensive experiment on larger datasets we provided.

Model Name	Publisher	Default Attack	Multilingual	GCG	GCG Transfer	PAIR	GPTFuzz	AutoDAN
Baichuan2-7B-Chat	Baichuan-inc	34.0%	29.0%	100.0%	23.0%	37.0%	100.0%	85.0%

ChatGLM3-6B	THUDM	27.0%	28.8%	—	28.3%	31.0%	100.0%	34.0%

Gemma-2B	Google	46.0%	31.2%	87.0%	21.0%	39.0%	84.0%	31.0%

LLaMA-2-7B-chat-hf	Meta	36.0%	20.4%	35.0%	28.5%	46.0%	18.0%	3.0%

LLaMA-3-8B-Instruct	Meta	9.0%	33.7%	—	4.3%	45.0%	97.0%	6.0%

GPT-3.5-Turbo-0125	OpenAI	27.0%	38.6%	—	31.7%	37.0%	92.0%	—

GPT-4	OpenAI	14.0%	32.0%	—	13.7%	27.0%	92.0%	—

Claude-Instant-1.2	Anthropic	4.0%	11.0%	—	—	—	—	—

In addition, we have included several radar charts below to facilitate a more straightforward comparison of the models' alignment capabilities. When using the provided data, please ensure to attribute the source of the information.

ASR of attacks in different scenarios - Default Attack(English)

©️ Enqurance · 2024

Beihang University, School of Computer Science and Engineering(SCSE)

Beijing Advanced Innovation Center for Big Data and Brain Computing

940614