XiaomiMiMoXiaomiMiMo·💬 Text Generation·New

MiMo-V2.5

ReasoningVisionCodeFunction CallingWeb SearchAudiofp8private
🧠 Try in Intelligence →Try on Venice.ai ↗
Quick reference
MiMo-V2.5 — TLDR
  • 🆕 Xiaomi's native omnimodal model: text, image, video, audio.
  • 🧠 Sparse MoE backbone, 310B total with 15B active parameters.
  • 📏 Context window extends up to 1 million tokens.
  • 👁️ Unified architecture for multimodal perception and reasoning.
  • 🔧 Function calling, web search, and code-oriented capabilities.
  • 💬 Accepts audio input alongside text, image, and video.
  • 🔒 Open weights released under the MIT license.
  • ⚡ Distributed in FP8 quantization on Hugging Face.
💰 Pricing
$0.175 / $0.350
per 1M · input / output
📏 Context
1M tokens
📅 On Venice since
Jun 11, 2026
2 days ago
Provider

XiaomiMiMo is the large language model initiative from Xiaomi, the Chinese electronics and technology company, dedicated to developing capable open language models under the MiMo name. The effort reflects Xiaomi's broader push into foundational AI research…

Read full profile →
1 model on Venice
1 text
Added Jun 11, 2026

About this model

MiMo-V2.5 is Xiaomi's native omnimodal model, designed to understand text, images, video, and audio within a single unified architecture. It uses a sparse Mixture-of-Experts backbone with 310B total parameters and roughly 15B active per token, and supports a context window of up to 1 million tokens. Xiaomi released the model in 2026 and open-sourced the weights and tokenizer, along with a separate Base checkpoint, under the MIT license on Hugging Face.

Beyond perception, the model is oriented toward agentic and developer workflows. Its documented capabilities include reasoning, function calling, web search, and code-focused use, with audio accepted as a native input modality alongside text, images, and video. The weights are distributed in FP8 quantization, which lowers the memory footprint for serving the large MoE network.

MiMo-V2.5 belongs to Xiaomi's broader MiMo series of open models, and an accompanying Base variant is published for further fine-tuning and research. As an omnimodal release with a long-context MoE design, it extends the family's focus toward unified multimodal understanding and agentic tool use rather than text-only generation. Because the catalog lists no sibling models here, this entry is described from the model's own card and configuration rather than direct head-to-head family comparisons.

This About section is AI-generated from public sources (Claude Opus 4.8), with no human editing. It may contain inaccuracies — verify critical details against the sources listed above.

Data sources: Venice API · HuggingFace · Wikipedia — enrichment updated 1d ago