About this model
DeepSeek V4 Flash is the lightweight half of DeepSeek's V4 preview series, launched alongside DeepSeek V4 Pro on April 24, 2026. Where the Pro model carries 1.6 trillion total parameters with 49 billion active, Flash uses a much smaller 284-billion-parameter Mixture-of-Experts design activating just 13 billion parameters per token — positioning it as DeepSeek's economical, high-throughput option. Both models share a one-million-token context window, which the company states is now the default across its services.
The V4 family introduces a new hybrid attention mechanism combining Compressed Sparse Attention and Heavily Compressed Attention, plus DeepSeek Sparse Attention, to cut long-context compute and memory cost. DeepSeek reports that, at the 1M-token setting, the Pro variant needs only 27% of single-token inference FLOPs and 10% of the KV cache compared with the prior-generation DeepSeek V3.2, illustrating the architectural efficiency gains this generation targets.
Both V4 models support Thinking and Non-Thinking modes and an OpenAI- and Anthropic-compatible API. DeepSeek notes that Flash's maximum-effort mode can reach reasoning quality comparable to Pro when given a larger thinking budget, though its smaller scale leaves it slightly behind on pure-knowledge tasks and the most complex agentic workflows.
The model targets advanced reasoning, software engineering, tool use, and enterprise assistants, and ships under the MIT license.
This About section is AI-generated from public sources (Claude Opus 4.8), with no human editing. It may contain inaccuracies — verify critical details against the sources listed above.
Data sources: Venice API · HuggingFace · Wikipedia — enrichment updated 5d ago