ElevenLabs has released Scribe V2, a new speech-to-text model that combines batch and real-time transcription with ultra-low latency, advanced detection, speaker diarization, and significantly reduced word error rates across dozens of languages.

As a blogger who is always collecting audio so that I can transcribe my AI tool reviews, I was excited to see when ElevenLabs released Scribe V2.
Topics
ToggleThis new hybrid of both batch and real-time speech-to-text features is exciting as it will accelerate the workflow of creators like me who require consistent high quality speech to text, minus the complexity.
Released 3 days ago, Scribe V2 inherits the high-fidelity, AI-driven audio for which ElevenLabs is known but also now provides an unparalleled level of accuracy and a suite of features designed to accommodate any use case from podcaster all the way up to enterprise meeting.
Today we’re introducing Scribe v2: the most accurate transcription model ever released.
— ElevenLabs (@elevenlabsio) January 9, 2026
While Scribe v2 Realtime is optimized for ultra low latency and agents use cases, Scribe v2 is built for batch transcription, subtitling, and captioning at scale. pic.twitter.com/lp0hTSzCDi
Standout Features and Enhancements
Scribe V2 works wonders in both of its modes: batch to analyse deeply real-time for lightening fast results. Key highlights include:
- Ultra-Low Latency: Real-time mode returns transcription results in less than 150ms, with negative latency for predicting the next word, ideal for live captioning or voice agents.
- Advanced Detection: Entity recognition for 56 categories (such as PII, health data) with time stamps, and keyterm prompting of up to 100 custom terms to overcome jargon in audio.
- Speaker Diarization: Transcribes up to 48 speakers and includes punctuation if wanted ideal for interviews or group recordings.
- Audio Tagging: Identifies non-speech elements like laughter or pauses for richer transcripts.
- Compliance and Scale: Enterprise-ready with GDPR, HIPAA, SOC 2, and zero-retention options; handles files up to 10 hours and 3GB.
Compared to the previous Scribe V1, V2 slashes word error rates (WER) significantly achieving ≤5% in over 35 languages and outperforming on noisy, complex samples.
It better handles accents, silences and multi-language switching, which also results in up to 20% error reduction on benchmarks.
For me, that translates to less manual cutting on interview transcripts and more precious time saved.
Language Support and API Integration
With support for more than 90 languages, including common ones like English and Mandarin as well as lesser-knowns like Zulu & Wolof.

Scribe V2 automatically recognizes mixed-language context and transcribes even complex cross-talk. This also gives it global reach, so you can use it with international projects.
The API is designed to be friendly for developers, and supports WebSocket for real-time streaming and REST for uploading in batches.
It’s compatible with different audio/video types (eg, MP3, MP4) and webhooks for async results. It is easy to integrate by using ElevenLabs’ documentation, which means custom apps for voice assistants or subtitling tools.
Pricing Breakdown
ElevenLabs uses a subscription model with included hours and per-hour billing. Here’s a quick tier overview:
| Tier | Monthly Cost | Included Batch Hours | Included Real-Time Hours | Additional Batch $/Hour | Additional Real-Time $/Hour |
|---|---|---|---|---|---|
| Free | $0 | 2.5 | 10 | $0.48 | N/A |
| Starter | $5 | 12.5 | 48 | $0.63 | $0.53 |
| Creator | $11 | 62.5 | 225 | $0.07 | $0.63 |
| Pro | $99 | 300 | 786 | $0.40 | $0.46 |
| Scale | $330 | 1100 | 3385 | $0.33 | $0.39 |
| Business | Custom | 6000 | Custom | $0.22 | Custom |
Pricing is competitive at roughly 40 cents an hour on average, comparable to competitors like OpenAI’s Whisper but with better accuracy in tests.
Read More: DeepSeek V4 Release Rumors: Chinese AI Model Aims to Challenge GPT and Claude in Coding
In my experience, starting with the Starter tier offers great value for occasional users, while Pro suits heavy transcribers. This launch cements ElevenLabs as a leader in AI audio, and I can’t wait to see what a huge leap the next model will make.













