Testing out some STT and TTS from Nvidia's Riva as an option for a current project. The real-time speech transcription runs pretty well. I won't be using the text to speech, but figured I'd have a bit of fun anyway. Not entirely sure why the audio is so bad from my Mic, but whatever.
In this setup, the Riva models are running in a docker container on my ML dev box and this is a python app I threw together from their samples to test with running on my Windows machine. The real thing will be from Unreal Engine on windows via a FastAPI websocket connection streaming audio from Unreal. FUN!