Welcome to the EBenchAttacker's Leaderboard😄
Here we shows the results of EBenchAttacker attacking different LLMs. We tested both open source LLMs and commercial LLMs and calculated ASR(Attack Success Rate, %) respectively. Note that some attacks might not be able to work on commercial LLMs. Thus we might apply a "Transfer Attack" on these LLMs. Here we use dataset "EBench-small". You may conduct a more comprehensive experiment on larger datasets we provided.
Model NamePublisherDefault AttackMultilingualGCGGCG TransferPAIRGPTFuzzAutoDAN
Baichuan2-7B-ChatBaichuan-inc34.0%29.0%100.0%23.0%37.0%100.0%85.0%
ChatGLM3-6BTHUDM27.0%28.8%28.3%31.0%100.0%34.0%
Gemma-2BGoogle46.0%31.2%87.0%21.0%39.0%84.0%31.0%
LLaMA-2-7B-chat-hfMeta36.0%20.4%35.0%28.5%46.0%18.0%3.0%
LLaMA-3-8B-InstructMeta9.0%33.7%4.3%45.0%97.0%6.0%
GPT-3.5-Turbo-0125OpenAI27.0%38.6%31.7%37.0%92.0%
GPT-4OpenAI14.0%32.0%13.7%27.0%92.0%
Claude-Instant-1.2Anthropic4.0%11.0%
In addition, we have included several radar charts below to facilitate a more straightforward comparison of the models' alignment capabilities. When using the provided data, please ensure to attribute the source of the information.
  • Result 1
ASR of attacks in different scenarios - Default Attack(English)