About this model
DeepSeek V4 Pro is the flagship of DeepSeek's two-tier V4 preview series, a Mixture-of-Experts language model with 1.6 trillion total parameters and 49 billion activated per token, supporting a one-million-token context window. It launched alongside its lighter sibling DeepSeek V4 Flash (284B total / 13B active) under the MIT license on April 24, 2026, and is positioned for advanced reasoning, coding, and long-horizon agentic tasks.
The headline change over DeepSeek V3.2 is architectural. V4 Pro introduces a hybrid attention design combining Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA) to make long-context processing far cheaper. Per DeepSeek's own model card, at a 1M-token context the model requires only 27% of the single-token inference FLOPs and 10% of the KV cache compared with V3.2. This sub-linear scaling is what makes million-token context economically practical, and 1M context is now the default across DeepSeek's official services.
Training adds a two-stage post-training pipeline: domain-specific experts are first cultivated independently through supervised fine-tuning and reinforcement learning with GRPO, then consolidated into one model via on-policy distillation. The model was pre-trained on tens of trillions of tokens and uses an FP4/FP8 mixed-precision scheme for its expert parameters.
V4 Pro exposes three reasoning effort levels, including a maximum-effort "Think Max" mode, and supports function calling and web search for tool-driven agents.
This About section is AI-generated from public sources (Claude Opus 4.8), with no human editing. It may contain inaccuracies — verify critical details against the sources listed above.
Data sources: Venice API · HuggingFace · Wikipedia — enrichment updated 5d ago