Enterprise AI company Cohere announced its first voice model on Thursday. Transcribe is an open source automatic speech recognition model that can be used for tasks such as note-taking and speech analysis.
This model is relatively lightweight with only 2 billion parameters and is intended for use on consumer-grade GPUs that wish to self-host. We currently support 14 languages: English, French, German, Italian, Spanish, Portuguese, Greek, Dutch, Polish, Chinese, Japanese, Korean, Vietnamese, and Arabic.
According to Cohere, Transcribe outperformed models such as Zoom Scribe v1, IBM Granite 4.0 1B, ElevenLab Scribe v2, and Qwen3-ASR-1.7B Speech on the Hugging Face Open ASR leaderboard, and achieved an average word error rate (WER) of 5.42, lower than any other model in the benchmark.
According to the company, when human raters evaluated transcription accuracy, consistency, and ease of use, Transcribe had an average win rate of 61% compared to other models. However, this model lagged behind its rivals in that it had to transcribe Portuguese, German, and Spanish.
Cohere says Transcribe can process 525 minutes of audio per minute, which is high for a model in its class.
The company plans to integrate Transcribe into its enterprise agent orchestration platform, North, and is making its model available for free through its API. The model will also be available in Model Vault, Cohere’s managed inference platform.
Speech recognition models are becoming more popular as demand for note-taking and dictation apps such as Granola and Wispr Flow increases.
tech crunch event
San Francisco, California
|
October 13-15, 2026
Earlier this year, Cohere reportedly told investors it expected to have annual recurring revenue of $240 million in 2025, and CEO Aidan Gomez reportedly said the startup could go public “soon.”
