Inception Labs·💬 Text Generation

Mercury 2

ReasoningFunction CallingWeb Searchanonymized

🧠 Try in Intelligence →

Try on Venice.ai ↗

Quick reference

Mercury 2 — TLDR

- 🆕 Diffusion-based reasoning LLM from Inception Labs.
- ⚡ Provider describes throughput exceeding 1,000 tokens per second.
- 🧠 Built for reasoning, not just fast generation.
- 📏 128K-token context window.
- 🔧 Native function calling and structured output support.
- 🌐 Adds integrated web search to its capability set.
- 🎯 Generates and refines tokens in parallel rather than one at a time.
- 🏢 Aimed at latency-sensitive agent and pipeline workloads.

💰 Pricing

$0.313 / $0.938

per 1M · input / output

📏 Context

128K tokens

📅 On Venice since

Feb 20, 2026

149 days ago

Provider

Inception Labs

Inception Labs builds large language models powered by diffusion rather than the autoregressive generation most LLMs rely on. Instead of producing text one token at a time, its diffusion LLMs (dLLMs) generate many tokens in parallel, which makes them several…

Read full profile →

1 model on Venice

1 text

Added Feb 20, 2026

About this model

Mercury 2 is a diffusion-based large language model released in 2026 by Inception Labs, an AI startup focused on bringing the diffusion paradigm to language modeling. Unlike conventional autoregressive transformers that generate text strictly one token at a time, diffusion language models produce and iteratively refine spans of output in parallel — a coarse-to-fine approach adapted from the denoising process used in image generation.

According to its catalog description, Mercury 2 is positioned as a reasoning-capable model rather than a purely speed-optimized one, combining parallel diffusion decoding with reasoning, tool use, and structured output. Inception describes throughput exceeding 1,000 tokens per second, which the company frames as the model's headline advantage for workloads where generation latency matters.

The model ships with a 128,000-token context window and supports function calling and integrated web search, making it suited to agentic loops, retrieval pipelines, and extraction tasks where many sequential model calls compound latency.

Within Inception's own lineage, Mercury 2 extends the company's earlier diffusion language-model efforts, which began with code-generation systems, into a more general reasoning model with agentic tooling. Independent, third-party benchmark replication was not available among the sources reviewed here, so specific quality figures are omitted; the description above reflects the model's documented architecture and capabilities rather than performance rankings.

This About section is AI-generated from public sources (Claude Opus 4.8), with no human editing. It may contain inaccuracies — verify critical details against the sources listed above.

Data sources: Venice API · HuggingFace · Wikipedia — enrichment updated 4d ago