About this model
Mercury 2 is a diffusion-based large language model released in 2026 by Inception Labs, an AI startup focused on bringing the diffusion paradigm to language modeling. Unlike conventional autoregressive transformers that generate text strictly one token at a time, diffusion language models produce and iteratively refine spans of output in parallel — a coarse-to-fine approach adapted from the denoising process used in image generation.
According to its catalog description, Mercury 2 is positioned as a reasoning-capable model rather than a purely speed-optimized one, combining parallel diffusion decoding with reasoning, tool use, and structured output. Inception describes throughput exceeding 1,000 tokens per second, which the company frames as the model's headline advantage for workloads where generation latency matters.
The model ships with a 128,000-token context window and supports function calling and integrated web search, making it suited to agentic loops, retrieval pipelines, and extraction tasks where many sequential model calls compound latency.
Within Inception's own lineage, Mercury 2 extends the company's earlier diffusion language-model efforts, which began with code-generation systems, into a more general reasoning model with agentic tooling. Independent, third-party benchmark replication was not available among the sources reviewed here, so specific quality figures are omitted; the description above reflects the model's documented architecture and capabilities rather than performance rankings.
This About section is AI-generated from public sources (Claude Opus 4.8), with no human editing. It may contain inaccuracies — verify critical details against the sources listed above.
Data sources: Venice API · HuggingFace · Wikipedia — enrichment updated 1d ago