➡️ ADVANCED-inference Repo (incl. context caching scripts in this vid.): https://trelis.com/ADVANCED-inference/
➡️ Runpod Affiliate Link: https://runpod.io?ref=jmfkcdio
➡️ One-click GPU templates: https://github.com/TrelisResearch/one...
OTHER TRELIS LINKS:
➡️ Trelis Newsletter: https://blog.Trelis.com
➡️ Trelis Resources and Support: https://Trelis.com/About
VIDEO LINKS:
Slides: https://docs.google.com/presentation/...
TIMESTAMPS:
0:00 Introduction to context caching for LLMs
1:06 Video Overview
3:24 How does context caching work?
13:20 Two types of caching
17:42 Context caching with Claude and Google Gemini
19:48 Context caching with Claude
24:13 Context caching with Gemini Flash or Gemini Pro
27:45 Context caching with SGLang (works also with vLLM)
32:00 Cost Comparison
34:56 Video Resources