Mistral releases new open source model for speech generation

French AI company Mistral on Thursday released a new open-source text-to-speech model that can be used in enterprise use cases such as voice AI assistants and customer support. This model allows businesses to build voice agents for sales and customer engagement, putting Mistral in direct competition with the likes of Celebrities, Deepgram, and OpenAI.

The new model, called Voxtral TTS, supports nine languages including English, French, German, Spanish, Dutch, Portuguese, Italian, Hindi and Arabic.

“Our customers have asked for a voice model, so we built a small voice model that fits into smartwatches, smartphones, laptops, and other edge devices. It costs a fraction of other products on the market, but provides cutting-edge performance,” Pierre Stock, vice president of science operations at Mistral AI, told TechCrunch in a phone interview.

Mistral said the new model can adapt custom voices with samples of less than five seconds and can also capture features such as subtle accents, intonation, intonation and irregularities in the audio stream. Based on Ministral 3B, this model allows you to easily switch between languages without losing audio characteristics, making it useful for use cases such as dubbing and real-time translation. Stock said the company wanted the model to sound like a human, not a robot.

The company says the model is built with real-time performance in mind. Time to First Speech (TTFA), a measure of the time the model “starts speaking” after receiving input, is 90 ms for a 10 second sample of 500 characters. This model also has a 6x real-time factor (RTF). This means that a 10 second clip can be rendered in approximately 1.6 seconds.

Earlier this year, Mistral announced two transcription models. One for large-scale batch processing and one for low-latency, real-time use cases. With the new voice model, the company seems to be aiming to offer businesses a complete suite of voice products.

“We are also planning an end-to-end platform and output that can process multimodal input streams such as audio, text, and images. The main benefit is that we can get more information in an end-to-end agent system that supports audio as input or output,” Stock said.

tech crunch event

San Francisco, California
|
October 13-15, 2026

Mistral’s positioning is that its open source and customization bits will help businesses adopt voice models better than their competitors because they can adjust it however they want.

Source link

What's Hot

Iran negotiator: “It’s better to get serious now, before it’s too late.”

Japanese GP: George Russell says it’s “not right” that Mercedes’ rivals are trying to slow down the Silver Arrow as front wing attracts attention | F1 News

The Global Forecast Group expects U.S. inflation to be 4.2% this year, much higher than the Fed’s forecast.

Mistral releases new open source model for speech generation

Mercor competitor Deccan AI raises $25 million, expert sources in India

The most amazing chapter in Manus’ story is what’s happening now

Harvey confirms $11 billion valuation: Sequoia triples

Venezuela’s Maduro to appear in US court again: How strong is the case? |Donald Trump News

US seeks Hamas’s ‘political surrender’ in new Gaza plan | Armed group news

America and Israel’s war against Iran: What’s happening on the 27th day of the attack? |US-Israel war against Iran News