Members-Only
Recent Talks & Demos are for members only
You must be an AI Tinkerers active member to view these talks and demos.
XTTS V2: Real-Time TTS Streaming
Learn to stream text-to-speech model audio in real-time for conversational voice agents, focusing on code implementation and production use cases.
When you chat with an LLM, the model starts responding as soon as the first characters of output are ready rather than making you wait for it to write the entire reply. You can do the same for text to speech models. Streaming audio output in real time with super-fast (~200ms) time to first chunk unlocks massive use cases across conversational user interfaces.
In this demo, I’ll walk through the code for implementing a streaming endpoint for XTTS V2 and calling the endpoint in production. We’ll use the model to generate real-time speech from audience suggestions.