CONTEXT CACHING for Faster and Cheaper Inference

Опубликовано: 03 Сентябрь 2024
на канале: Trelis Research

2,169

➡️ ADVANCED-inference Repo (incl. context caching scripts in this vid.): https://trelis.com/ADVANCED-inference/
➡️ Runpod Affiliate Link: https://runpod.io?ref=jmfkcdio
➡️ One-click GPU templates: https://github.com/TrelisResearch/one...

OTHER TRELIS LINKS:
➡️ Trelis Newsletter: https://blog.Trelis.com
➡️ Trelis Resources and Support: https://Trelis.com/About

VIDEO LINKS:
Slides: https://docs.google.com/presentation/...

TIMESTAMPS:
0:00 Introduction to context caching for LLMs
1:06 Video Overview
3:24 How does context caching work?
13:20 Two types of caching
17:42 Context caching with Claude and Google Gemini
19:48 Context caching with Claude
24:13 Context caching with Gemini Flash or Gemini Pro
27:45 Context caching with SGLang (works also with vLLM)
32:00 Cost Comparison
34:56 Video Resources