FunASR is an open-source end-to-end speech recognition toolkit from Alibaba/ModelScope, supporting ASR, VAD, punctuation, diarization, emotion recognition, and more with SOTA pretrained models.

Is FunASR free to use?

Yes, it is completely free and open-source under MIT license with full code, pretrained models, and deployment tools available on GitHub and ModelScope.

What languages does FunASR support?

It excels in Chinese (Mandarin + dialects/accents), English, Japanese, Korean, and multilingual via models like SenseVoice and Whisper; Fun-ASR-Nano covers 31 languages.

When was FunASR released or last updated?

Initial release around 2023; major updates continue into 2026, with Fun-ASR-Nano-2512 in Dec 2025 and fixes as recent as Jan 2026.

What are the key models in FunASR?

Popular ones include Paraformer-large (high accuracy/efficiency), Fun-ASR-Nano-2512 (31 languages, real-time), SenseVoiceSmall (multi-task), and Whisper variants.

Does FunASR support real-time transcription?

Yes, it offers low-latency streaming ASR with configurable chunks, VAD for segmentation, and runtime services for real-time deployment.

How popular is FunASR?

It has over 14.8k GitHub stars, 1.6k forks, and is widely used in research and industry as one of the leading open-source ASR toolkits.

How do I install and use FunASR?

Install via pip install -U funasr; load models with AutoModel and run inference on audio files for transcription, or set up runtime for production servers.

FunASR

Name: FunASR
Author: Zelili AI

From Alibaba / ModelScope

Fundamental End-to-End Speech Recognition Toolkit – Industrial-Grade ASR with Real-Time Streaming, Multilingual Support, and SOTA Pretrained Models

Audio & Music

Pricing Model

Free

Starting Price

$0/Month

Last Updated: January 2, 2026

By Zelili AI

About This AI

FunASR is a comprehensive open-source speech recognition toolkit developed by Alibaba’s DAMO Academy and ModelScope team, providing an end-to-end pipeline for automatic speech recognition (ASR) and related tasks.

It supports both non-streaming and streaming ASR with high accuracy, low latency, and efficient deployment for industrial applications.

Core capabilities include voice activity detection (VAD), punctuation restoration, speaker diarization, timestamp prediction, keyword spotting, emotion recognition, and multilingual transcription.

The toolkit features SOTA pretrained models like Paraformer (non-autoregressive, high efficiency), Conformer, Whisper variants, SenseVoice (multi-task speech foundation), and the latest Fun-ASR-Nano-2512 (800M parameters, 31 languages, low-latency real-time transcription trained on tens of millions of hours).

It excels in noisy environments, real-world enterprise scenarios (conferences, call centers, in-car speech), and supports Chinese (with dialects/accents), English, Japanese, Korean, and more via multilingual models.

FunASR offers runtime services for offline file transcription (GPU/CPU optimized, RTF as low as 0.0076 single-thread), real-time streaming (chunk-based low latency), and ONNX export for edge deployment.

Released initially around 2023 with major updates through 2025-2026 (Fun-ASR-Nano in Dec 2025, ongoing fixes in Jan 2026), it is fully open-source under MIT license with over 14.8k GitHub stars, active community, and model zoo on ModelScope/Hugging Face.

Installation is simple via pip, with tutorials for inference, training, fine-tuning, and production deployment.

Ideal for developers building ASR applications, researchers in speech AI, enterprises needing robust transcription services, and anyone requiring customizable, high-performance speech-to-text without proprietary dependencies.

Key Features

End-to-End ASR Pipeline: Integrates VAD, ASR, punctuation, speaker diarization, timestamps, and ITN in one toolkit
Streaming and Non-Streaming Support: Real-time low-latency transcription with configurable chunk sizes
Multilingual and Multi-Dialect: Covers Chinese (7 dialects, 26 accents), English, Japanese, Korean, and more via SenseVoice/Whisper
High-Accuracy Models: Paraformer-large, Fun-ASR-Nano-2512 (31 languages), Conformer, Whisper-large-v3
Voice Activity Detection: FSMN-VAD for accurate audio segmentation
Punctuation Restoration: CT-Punc model adds natural punctuation and inverse text normalization
Speaker Diarization: CAM++ for speaker identification and separation
Emotion Recognition: Emotion2vec+large for detecting angry, happy, neutral, sad emotions
Keyword Spotting: FSMN-KWS for real-time keyword detection
Runtime Services: Optimized offline file (GPU RTF 0.0076) and real-time transcription servers
ONNX Export: Deploy models on edge devices or non-PyTorch environments

Price Plans

Free ($0): Fully open-source toolkit, pretrained models, runtime services, and inference code under MIT license; no usage fees
Cloud (Alibaba Cloud ModelScope): Optional hosted inference/API via Alibaba Cloud (pay-per-use, not part of core toolkit)
Enterprise Support (Custom): Potential premium consulting or deployment support from Alibaba (not specified in repo)

Pros

Industrial-Grade Performance: Optimized for real-world noisy scenarios with high accuracy and low RTF
Extensive Model Zoo: Over 100 pretrained models on ModelScope/Hugging Face for various tasks
Active Development: Frequent updates (latest in Jan 2026) and strong community (14.8k GitHub stars)
Full Open-Source: MIT license for code, permissive for models, easy pip install and deployment
Multilingual Excellence: Strong support for Chinese dialects/accents and global languages
Efficient Deployment: Streaming, ONNX, GPU/CPU runtime services for production use
Comprehensive Tasks: Beyond ASR includes VAD, punc, diarization, emotion, KWS

Cons

Hardware Requirements: Large models (e.g., 800M+) need good GPU for real-time/high-throughput
Setup for Advanced Use: Runtime services and fine-tuning require additional configuration
Primary Chinese Focus: Best performance on Mandarin/dialects; some global languages secondary
Dependency Heavy: Relies on PyTorch, torchaudio, and other libs; no one-click installer
Documentation Language: Many resources in Chinese alongside English
No Hosted Service: Self-hosted only; no cloud API from toolkit (Alibaba Cloud separate)
Edge Cases: Noisy/extreme accents may still require fine-tuning

Use Cases

Real-time transcription: Live captioning for meetings, calls, or broadcasts with low latency
Offline batch processing: Transcribe long audio/video files (podcasts, lectures, interviews) with punctuation and timestamps
Call center analytics: Speaker diarization, emotion detection, and keyword spotting for customer service
Multilingual applications: ASR for global products supporting Chinese, English, Japanese, Korean
Research and development: Fine-tune models or benchmark new speech techniques
Edge device deployment: Use ONNX exports for mobile or embedded ASR
Content creation: Subtitle generation, lyric/rap recognition, emotion-aware voice analysis

Target Audience

Developers and AI engineers: Building custom ASR pipelines or integrating speech features
Researchers in speech AI: Experimenting with SOTA models and multilingual ASR
Enterprises and startups: Needing robust, deployable transcription for products/services
Call center operators: Real-time monitoring and analytics with diarization/emotion
Content platforms: Auto-subtitling videos/podcasts in multiple languages
Open-source contributors: Extending or fine-tuning the toolkit

How To Use

Install toolkit: Run pip install -U funasr (or from source: git clone and pip install -e .)
Load model: Use AutoModel with pretrained name (e.g., paraformer-zh, SenseVoiceSmall)
Transcribe audio: Call model.generate(input=audio_file) for offline or streaming inference
Streaming example: Use paraformer-zh-streaming with chunk_size for real-time
Add VAD/Punc: Combine with fsmn-vad and ct-punc models in pipeline
Deploy runtime: Follow runtime/readme for offline file or real-time server setup
Export ONNX: Use funasr-export command for edge deployment

How we rated FunASR

Performance: 4.8/5
Accuracy: 4.7/5
Features: 4.9/5
Cost-Efficiency: 5.0/5
Ease of Use: 4.4/5
Customization: 4.8/5
Data Privacy: 5.0/5
Support: 4.5/5
Integration: 4.6/5
Overall Score: 4.8/5

FunASR integration with other tools

ModelScope and Hugging Face: Direct model downloads and inference pipelines from both hubs
PyTorch Ecosystem: Built on PyTorch for training, fine-tuning, and custom model development
ONNX Runtime: Export models to ONNX for cross-platform/edge deployment (CPU/GPU/Web)
Runtime Services: CPU/GPU servers for offline file and real-time streaming transcription
Third-Party Frameworks: Compatible with Whisper ecosystem, Qwen-Audio, and other speech libs

Best prompts optimised for FunASR

Not applicable - FunASR is a speech recognition toolkit that processes audio files/recordings automatically; it does not use text prompts for generation like LLM or image/video tools.
N/A - Core usage involves providing audio input (WAV/MP3/etc.) directly to models for transcription, not text-based prompting.

FunASR is a powerful open-source ASR toolkit from Alibaba, delivering industrial-grade transcription with real-time streaming, multilingual support, and advanced features like diarization and emotion recognition. Fully free with excellent pretrained models and easy deployment, it’s ideal for developers and enterprises building speech applications. Minor setup hurdles aside, it stands out for efficiency and breadth in noisy/real-world scenarios.

FAQs

What is FunASR?
FunASR is an open-source end-to-end speech recognition toolkit from Alibaba/ModelScope, supporting ASR, VAD, punctuation, diarization, emotion recognition, and more with SOTA pretrained models.
Is FunASR free to use?
Yes, it is completely free and open-source under MIT license with full code, pretrained models, and deployment tools available on GitHub and ModelScope.
What languages does FunASR support?
It excels in Chinese (Mandarin + dialects/accents), English, Japanese, Korean, and multilingual via models like SenseVoice and Whisper; Fun-ASR-Nano covers 31 languages.
When was FunASR released or last updated?
Initial release around 2023; major updates continue into 2026, with Fun-ASR-Nano-2512 in Dec 2025 and fixes as recent as Jan 2026.
What are the key models in FunASR?
Popular ones include Paraformer-large (high accuracy/efficiency), Fun-ASR-Nano-2512 (31 languages, real-time), SenseVoiceSmall (multi-task), and Whisper variants.
Does FunASR support real-time transcription?
Yes, it offers low-latency streaming ASR with configurable chunks, VAD for segmentation, and runtime services for real-time deployment.
How popular is FunASR?
It has over 14.8k GitHub stars, 1.6k forks, and is widely used in research and industry as one of the leading open-source ASR toolkits.
How do I install and use FunASR?
Install via pip install -U funasr; load models with AutoModel and run inference on audio files for transcription, or set up runtime for production servers.

Newly Added Tools

Qwen-Image-2.0

Image & Design

$0/Month

Qodo AI

Code & Development

$0/Month

Codiga

Code & Development

$10/Month

Tabnine

Code & Development

$59/Month

FunASR Alternatives

Synthflow AI

Audio & Music

$0/Month

Fireflies

Audio & Music

$10/Month

Notta AI

Audio & Music

$9/Month

About Author

Hi Guys! We are a group of ML Engineers by profession with years of experience exploring and building AI tools, LLMs, and generative technologies. We analyze new tools not just as a user, but as someone who understands their technical depth and real-world value.We know how overwhelming these tools can be for most people, that’s why we break down complex AI concepts into simple, practical insights. Our goal is to help you discover these magical AI tools that actually save your time and make everyday work smarter, not harder.“We don’t just write about AI: We build, test and simplify it for you.”

FunASR

From Alibaba / ModelScope

About This AI

Key Features

Price Plans

Pros

Cons

Use Cases

Target Audience

How To Use

How we rated FunASR

FunASR integration with other tools

Best prompts optimised for FunASR

FAQs

What is FunASR?

Is FunASR free to use?

What languages does FunASR support?

When was FunASR released or last updated?

What are the key models in FunASR?

Does FunASR support real-time transcription?

How popular is FunASR?

How do I install and use FunASR?

Newly Added Tools​

Qwen-Image-2.0

Qodo AI

Codiga

Tabnine

Synthflow AI

Fireflies

Notta AI

Newly Added Tools