Emerging
May 27, 20261
50%
KugelAudio Launches Real-Time Text-to-Speech Model with Self-Hosting Option

KugelAudio, a Berlin-based startup's new real-time text-to-speech model, launched today offering self-hosted and API deployment options with sub-60ms latency, voice cloning, and support for 25+ languages with grammar-aware normalization.





Quick Facts
Who
KugelAudio (four-person team)
What
Launched real-time text-to-speech model
When
2026-05-27
Where
Berlin
- Launched real-time text-to-speech model
- Offers voice cloning capability
- Provides self-hosting and API deployment
- Includes grammar-aware normalization
- Supports word-level timestamps and IPA
KugelAudio, a new text-to-speech platform developed by a four-person team based in Berlin, launched today on Product Hunt. The service offers real-time TTS capabilities with voice cloning and sub-60 millisecond latency, deployable either on-premises or via API.
The platform distinguishes itself through grammar-aware normalization that intelligently reads phone numbers, IBANs, addresses, and medications naturally across more than 25 languages. Additional features include word-level timestamps, IPA (International Phonetic Alphabet) support, and pre-built integrations with communication and voice infrastructure platforms including LiveKit, Pipecat, and Vapi. Users can also customize pronunciations through dictionary features, allowing for handling of mixed-language contexts such as German text with English product names.
Developers have indicated that European languages are currently the primary focus, with the team actively gathering diverse voices and accents. While the platform lists support for languages including Hindi, stable support for these languages remains under development. The service includes a multi-context endpoint compatible with ElevenLabs SDK integration, particularly utilized in the LiveKit integration.
Topics
Why This Matters
KugelAudio addresses a critical gap in voice AI infrastructure by offering developers a low-latency, self-hosted alternative to cloud-dependent TTS solutions. The sub-60ms latency and grammar-aware processing enable real-time conversational AI and accessibility applications, while the self-hosting option provides enterprises with data privacy and cost control—particularly valuable for organizations handling sensitive communications across multiple languages.
Timeline & Sources
May 27, 2026
WireKugelAudio launches on Product Hunt