Not currently listed in Venice's public API catalog — last listed Jun 22, 2026. Delisted models may still respond to direct API calls.

MiniMax·💬 Text Generation·VS Pick

MiniMax M3

🧠 Try in Intelligence →

Try on Venice.ai ↗

Quick reference

MiniMax M3 — TLDR

🆕 MiniMax's frontier M-series model for coding and agentic work.
📏 Up to 1M-token context, guaranteed minimum 512K tokens.
🔧 New MiniMax Sparse Attention (MSA) architecture for long context.
👁️ Natively multimodal: text, image, and video input.
🧠 Toggleable thinking mode for reasoning or fast responses.
⚡ 9× prefill and 15× decode speedups vs M2 at 1M context.
🏢 Reported as a 428B-parameter Mixture-of-Experts model.
🌐 Built for tool use, function calling, and web-style retrieval.

💰 Pricing

—

📅 On Venice since

Jun 1, 2026

48 days ago

Provider

MiniMax

MiniMax is an AI company building generative models across multiple modalities, with a focus that spans both language understanding and audio creation. Their rapid release cadence in early 2026—delivering several new models within just a few months—reflects…

Read full profile →

7 models on Venice

3 text · 3 music · 1 tts

Since Feb 12, 2026

Wikipedia ↗

See 6 other models from MiniMax →

About this model

MiniMax M3 is the latest model in MiniMax's M-series, positioned for coding, agentic workflows, and complex reasoning. It unifies three capabilities in a single checkpoint: frontier coding and agentic performance, a context window of up to 1 million tokens (with a guaranteed minimum of 512K), and native multimodality covering text, image, and video input. NVIDIA describes it as a 428-billion-parameter Mixture-of-Experts model. M3 supports a toggleable thinking mode—enabled for long-horizon agentic tasks and complex reasoning, disabled for latency-sensitive uses like conversation and code completion.

The headline change over its predecessors is the new MiniMax Sparse Attention (MSA) architecture, which enables native ultra-long-context pretraining. According to MiniMax's model card, MSA delivers 9× prefill and 15× decode speedups compared with M2 at 1M context, cutting per-token compute to roughly one-twentieth while preserving quality versus full attention. This is a substantial step up from earlier M-series entries like MiniMax M2.7 and MiniMax M2.5, which operated within shorter context windows.

MiniMax reports strong results on coding and agentic benchmarks spanning software engineering, terminal execution, and tool orchestration, with autonomous task decomposition and multi-step reasoning. Per the MiniMax docs, earlier M-series models including M2.7 and M2 remain available for existing workflows. A preview build, MiniMax M3 Preview, was also released in the family. Weights are published on Hugging Face, and a technical report accompanies the release.

Sources

MiniMax M3: Frontier Coding, 1M Context, Native Multimodality — All in One Model - MiniMax Research | MiniMaxminimax.io ↗

Model Invocation - MiniMax API Docsplatform.minimax.io ↗

Deploy Long-Context Reasoning and Agentic Workflows with MiniMax M3 on NVIDIA Accelerated Infrastructure | NVIDIA Technical Blogdeveloper.nvidia.com ↗

MiniMaxAI/MiniMax-M3 · Hugging Facehuggingface.co ↗

This About section is AI-generated from public sources (Claude Opus 4.8), with no human editing. It may contain inaccuracies — verify critical details against the sources listed above.

Data sources: Venice API · HuggingFace · Wikipedia — enrichment updated 26d ago