DeepSeek·💬 Text Generation

DeepSeek V4 Flash

ReasoningCodeFunction CallingWeb Searchanonymized

🧠 Try in Intelligence →

Try on Venice.ai ↗

Quick reference

DeepSeek V4 Flash — TLDR

🆕 Efficiency-optimized member of DeepSeek's V4 preview series, released April 2026.
🧠 284B-parameter Mixture-of-Experts with only 13B active per token.
📏 One-million-token context window, now DeepSeek's default standard.
🔧 Hybrid attention pairs Compressed Sparse and Heavily Compressed Attention.
⚡ Tuned for fast, high-throughput, cost-efficient inference.
💬 Dual Thinking and Non-Thinking modes via one model.
🎯 Capable in reasoning, coding, function-calling, and agentic tool use.
🔒 Released under the permissive MIT license.

💰 Pricing

$0.138 / $0.275

per 1M · input / output

📏 Context

1M tokens

📅 On Venice since

Apr 24, 2026

86 days ago

Provider

DeepSeek

DeepSeek is a Chinese artificial intelligence company specializing in large language model development, founded in July 2023 by Liang Wenfeng. Based in Hangzhou, Zhejiang, the company is backed by High-Flyer, a prominent Chinese hedge fund also co-founded by…

Read full profile →

4 models on Venice

4 text

Since Dec 4, 2025

Wikipedia ↗Official site ↗

See 3 other models from DeepSeek →

About this model

DeepSeek V4 Flash is the lightweight half of DeepSeek's V4 preview series, launched alongside DeepSeek V4 Pro on April 24, 2026. Where the Pro model carries 1.6 trillion total parameters with 49 billion active, Flash uses a much smaller 284-billion-parameter Mixture-of-Experts design activating just 13 billion parameters per token — positioning it as DeepSeek's economical, high-throughput option. Both models share a one-million-token context window, which the company states is now the default across its services.

The V4 family introduces a new hybrid attention mechanism combining Compressed Sparse Attention and Heavily Compressed Attention, plus DeepSeek Sparse Attention, to cut long-context compute and memory cost. DeepSeek reports that, at the 1M-token setting, the Pro variant needs only 27% of single-token inference FLOPs and 10% of the KV cache compared with the prior-generation DeepSeek V3.2, illustrating the architectural efficiency gains this generation targets.

Both V4 models support Thinking and Non-Thinking modes and an OpenAI- and Anthropic-compatible API. DeepSeek notes that Flash's maximum-effort mode can reach reasoning quality comparable to Pro when given a larger thinking budget, though its smaller scale leaves it slightly behind on pure-knowledge tasks and the most complex agentic workflows.

The model targets advanced reasoning, software engineering, tool use, and enterprise assistants, and ships under the MIT license.

🤗View model card on HuggingFace ↗View source on GitHub ↗

Sources

DeepSeek V4 Preview Release | DeepSeek API Docsapi-docs.deepseek.com ↗

deepseek-v4-flash Model by Deepseek-aibuild.nvidia.com ↗

deepseek-ai/DeepSeek-V4-Flash · Hugging Facehuggingface.co ↗

This About section is AI-generated from public sources (Claude Opus 4.8), with no human editing. It may contain inaccuracies — verify critical details against the sources listed above.

Research & Papers

Primary reference paper for this model family, sourced from the HuggingFace model card.

arXiv2606.19348Apr 2026

DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence(2026)

DeepSeek-AI, Anyi Xu, Bangcai Lin et al.

We present a preview version of DeepSeek-V4 series, including two strong Mixture-of-Experts (MoE) language models -- DeepSeek-V4-Pro with 1.6T parameters (49B activated) and DeepSeek-V4-Flash with 284B parameters (13B activated) -- both supporting a context length of one million…

Data sources: Venice API · HuggingFace · Wikipedia · arXiv — enrichment updated 1d ago