Mistral Large vs GPT4 - Practical Benchmarking!

Опубликовано: 28 Февраль 2024
на канале: Trelis Research
2,074
61

➡️ One-click Fine-tuning & Inference Templates: https://github.com/TrelisResearch/one...
➡️ Trelis Function-calling Models (incl. OpenChat 3.5): https://trelis.com/function-calling/
➡️ ADVANCED-fine-tuning Repo: https://trelis.com/advanced-fine-tuni...
➡️ ADVANCED-inference Repo: https://trelis.com/enterprise-server-...
➡️ Trelis Newsletter: https://Trelis.Substack.com
➡️ Tip Jar and Discord: https://ko-fi.com/trelisresearch

Affiliate Link (supports the channel):
RunPod - https://tinyurl.com/4b6ecbbn

VIDEO RESOURCES:
-Carlini Website: https://nicholas.carlini.com/
-Carlini Github: https://github.com/carlini/yet-anothe...
-Trelis fork w/ Custom LLM: https://github.com/TrelisResearch/yet...

TIMESTAMPS:
0:00 A practitioner's guide to evaluating LLMs
00:45 Nicolas Carlini's LLM Benchmark Blog Post
1:15 Benchmarking results of GPT4 vs Claude vs Gemini vs Mistral
4:16 Mistral Large vs Mixtral vs OpenChat vs Qwen
11:32 Running custom evaluations using Runpod
26:26 Final Thoughts