French AI company Mistral on Thursday released a new open-source text-to-speech model that can be used in enterprise use cases such as voice AI assistants and customer support. This model allows businesses to build voice agents for sales and customer engagement, putting Mistral in direct competition with the likes of Celebrities, Deepgram, and OpenAI.
The new model, called Voxtral TTS, supports nine languages including English, French, German, Spanish, Dutch, Portuguese, Italian, Hindi and Arabic.
“Our customers have asked for a voice model, so we built a small voice model that fits into smartwatches, smartphones, laptops, and other edge devices. It costs a fraction of other products on the market, but provides cutting-edge performance,” Pierre Stock, vice president of science operations at Mistral AI, told TechCrunch in a phone interview.

Mistral said the new model can adapt custom voices with samples of less than five seconds and can also capture features such as subtle accents, intonation, intonation and irregularities in the audio stream. Based on Ministral 3B, this model allows you to easily switch between languages without losing audio characteristics, making it useful for use cases such as dubbing and real-time translation. Stock said the company wanted the model to sound like a human, not a robot.
The company says the model is built with real-time performance in mind. Time to First Speech (TTFA), a measure of the time the model “starts speaking” after receiving input, is 90 ms for a 10 second sample of 500 characters. This model also has a 6x real-time factor (RTF). This means that a 10 second clip can be rendered in approximately 1.6 seconds.

Earlier this year, Mistral announced two transcription models. One for large-scale batch processing and one for low-latency, real-time use cases. With the new voice model, the company seems to be aiming to offer businesses a complete suite of voice products.
“We are also planning an end-to-end platform and output that can process multimodal input streams such as audio, text, and images. The main benefit is that we can get more information in an end-to-end agent system that supports audio as input or output,” Stock said.
tech crunch event
San Francisco, California
|
October 13-15, 2026
Mistral’s positioning is that its open source and customization bits will help businesses adopt voice models better than their competitors because they can adjust it however they want.
