Causal reasoning is no simple task for an LLM. Especially for smaller open source LLMs. Today a special test of 7B to 70B models in causal reasoning.
I define a testing scenario. And record all online results from my TOP 7 LLMs from HuggingFace's LLM Leaderboard.
Thanks to Lmsys.org for providing the Chatbot Arena for free to the AI community. Visit: https://chat.lmsys.org/
All my Top LLM candidates are available on HuggingFace:
pplx-70b-online
openhermes-2.5-mistral-7b
Yi-34B-Chat
Claude 2.1
qwen-14b-chat
zephyr-7b-beta
GPT-4 Turbo
00:00 HF Open Source LLM
05:48 My Instruction Prompt
10:36 PPLX-70B-Chat_LLama2_70B
11:22 OpenHermes-2.5-Mistral-7B
13:25 Yi-34B-Chat
17:52 Claude 2.1
18:46 Qwen-14B-Chat
20:01 Zephyr-7B-Beta
22:57 GPT-4 Turbo
#test
#benchmark
#ai
#reasoning