What is MiMo V2 Flash?
MiMo V2 Flash is Xiaomi’s open-source Mixture-of-Experts model with 309B total and 15B active parameters, optimized for fast reasoning, agentic tasks, coding, and long-context processing.
When was MiMo V2 Flash released?
It was officially released and open-sourced on December 16, 2025, with the technical report published shortly after.
Is MiMo V2 Flash free to use?
Yes, it’s fully open-source under MIT license with weights and code available on Hugging Face; no usage fees for local deployment.
What are the key strengths of MiMo V2 Flash?
It excels in high-speed inference (150 tokens/s), strong benchmarks (e.g., 84.9 MMLU-Pro, 94.1 AIME 2025), 256k context, agentic/tool use, and efficiency with fewer active parameters.
What hardware is required for MiMo V2 Flash?
Full real-time performance needs multi-GPU setup (e.g., 8+ high-end GPUs); quantized versions (4-bit, GGUF) run on consumer hardware with trade-offs.
How does MiMo V2 Flash compare to other models?
It matches or exceeds DeepSeek-V3.2 and Kimi-K2 on many benchmarks while using fewer parameters and offering faster inference via MTP speculative decoding.
Does MiMo V2 Flash support tool use?
Yes, it features strong agentic capabilities with tool calling, multi-step reasoning, and outputs including reasoning_content and tool_calls for complex workflows.
Where can I run MiMo V2 Flash?
Locally via SGLang (recommended), Hugging Face Spaces, MLX (Apple silicon), or cloud API through Xiaomi’s platform with token pricing.




