How to Serve a Text to Speech Model with vLLM

Опубликовано: 25 Июнь 2025
на канале: Trelis Research

1,875

📜Get repo access at Trelis.com/ADVANCED-audio

Tip: If you subscribe here on YouTube, click the bell to be notified of new vids

🛠️ (NEW) Trelis Fine-tuning Workshops: https://trelis.com/workshops-and-semi...

💡 Need Technical or Market Assistance?
Book a Consult Here: https://forms.gle/wJXVZXwioKMktjyVA

🤝 Are You a Top Developer?
Work for Trelis: https://trelis.com/jobs/

💸 Starting a New Project/Venture?
Apply for a Trelis Grant: https://trelis.com/trelis-ai-grants/

📧 Get Trelis AI Tutorials by Email
Subscribe on Substack: https://trelis.substack.com

Video Links:
one-click-llms repo: https://github.com/TrelisResearch/one...

TIMESTAMPS:
0:00 Serving Orpheus Text-to-Speech model with continuous batching
0:44 Setup Demo with a one-click template from Runpod
4:12 Running inference on a fine-tuned model (poor quality, maybe don’t use fp8, and tune more)
5:25 Inference on the default orpheus model, “tara”
7:37 How vLLM works with Orpheus and how to decode audio tokens
12:38 Conclusion and Resources