DeepSeekDeepSeek·💬 Text Generation

DeepSeek V4 Flash

ReasoningCodeFunction CallingWeb Searchanonymized
🧠 Try in Intelligence →Try on Venice.ai ↗
Quick reference
DeepSeek V4 Flash — TLDR
  • 🆕 Efficiency-optimized member of DeepSeek's V4 preview series, released April 2026.
  • 🧠 284B-parameter Mixture-of-Experts with only 13B active per token.
  • 📏 One-million-token context window, now DeepSeek's default standard.
  • 🔧 Hybrid attention pairs Compressed Sparse and Heavily Compressed Attention.
  • ⚡ Tuned for fast, high-throughput, cost-efficient inference.
  • 💬 Dual Thinking and Non-Thinking modes via one model.
  • 🎯 Capable in reasoning, coding, function-calling, and agentic tool use.
  • 🔒 Released under the permissive MIT license.
💰 Pricing
$0.170 / $0.350
per 1M · input / output
📏 Context
1M tokens
📅 On Venice since
Apr 24, 2026
40 days ago
Provider

DeepSeek is a Chinese artificial intelligence company specializing in large language model development, founded in July 2023 by Liang Wenfeng. Based in Hangzhou, Zhejiang, the company is backed by High-Flyer, a prominent Chinese hedge fund also co-founded by…

Read full profile →
3 models on Venice
3 text
Since Dec 4, 2025

About this model

DeepSeek V4 Flash is the lightweight half of DeepSeek's V4 preview series, launched alongside DeepSeek V4 Pro on April 24, 2026. Where the Pro model carries 1.6 trillion total parameters with 49 billion active, Flash uses a much smaller 284-billion-parameter Mixture-of-Experts design activating just 13 billion parameters per token — positioning it as DeepSeek's economical, high-throughput option. Both models share a one-million-token context window, which the company states is now the default across its services.

The V4 family introduces a new hybrid attention mechanism combining Compressed Sparse Attention and Heavily Compressed Attention, plus DeepSeek Sparse Attention, to cut long-context compute and memory cost. DeepSeek reports that, at the 1M-token setting, the Pro variant needs only 27% of single-token inference FLOPs and 10% of the KV cache compared with the prior-generation DeepSeek V3.2, illustrating the architectural efficiency gains this generation targets.

Both V4 models support Thinking and Non-Thinking modes and an OpenAI- and Anthropic-compatible API. DeepSeek notes that Flash's maximum-effort mode can reach reasoning quality comparable to Pro when given a larger thinking budget, though its smaller scale leaves it slightly behind on pure-knowledge tasks and the most complex agentic workflows.

The model targets advanced reasoning, software engineering, tool use, and enterprise assistants, and ships under the MIT license.

This About section is AI-generated from public sources (Claude Opus 4.8), with no human editing. It may contain inaccuracies — verify critical details against the sources listed above.

Data sources: Venice API · HuggingFace · Wikipedia — enrichment updated 5d ago