DeepSeek·💬 Text Generation

DeepSeek V4 Pro

ReasoningCodeFunction CallingWeb Searchanonymized

🧠 Try in Intelligence →

Try on Venice.ai ↗

Quick reference

DeepSeek V4 Pro — TLDR

🧠 Flagship 1.6T-parameter Mixture-of-Experts model with 49B active per token.
📏 1M-token context window, now standard across DeepSeek services.
🆕 Hybrid attention pairs Compressed Sparse Attention with Heavily Compressed Attention.
⚡ Uses 27% of V3.2's per-token FLOPs and 10% of its KV cache at 1M context.
💬 Offers non-thinking, thinking, and Think Max reasoning modes.
🔧 Function calling and tool use for long-horizon agentic workflows.
🔒 MIT-licensed, open-weights; released April 24, 2026.
📚 Post-trained via domain-expert SFT/RL plus on-policy distillation.

💰 Pricing

$1.65 / $3.30

per 1M · input / output

📏 Context

1M tokens

📅 On Venice since

Apr 24, 2026

86 days ago

Provider

DeepSeek

DeepSeek is a Chinese artificial intelligence company specializing in large language model development, founded in July 2023 by Liang Wenfeng. Based in Hangzhou, Zhejiang, the company is backed by High-Flyer, a prominent Chinese hedge fund also co-founded by…

Read full profile →

4 models on Venice

4 text

Since Dec 4, 2025

Wikipedia ↗Official site ↗

See 3 other models from DeepSeek →

About this model

DeepSeek V4 Pro is the flagship of DeepSeek's two-tier V4 preview series, a Mixture-of-Experts language model with 1.6 trillion total parameters and 49 billion activated per token, supporting a one-million-token context window. It launched alongside its lighter sibling DeepSeek V4 Flash (284B total / 13B active) under the MIT license on April 24, 2026, and is positioned for advanced reasoning, coding, and long-horizon agentic tasks.

The headline change over DeepSeek V3.2 is architectural. V4 Pro introduces a hybrid attention design combining Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA) to make long-context processing far cheaper. Per DeepSeek's own model card, at a 1M-token context the model requires only 27% of the single-token inference FLOPs and 10% of the KV cache compared with V3.2. This sub-linear scaling is what makes million-token context economically practical, and 1M context is now the default across DeepSeek's official services.

Training adds a two-stage post-training pipeline: domain-specific experts are first cultivated independently through supervised fine-tuning and reinforcement learning with GRPO, then consolidated into one model via on-policy distillation. The model was pre-trained on tens of trillions of tokens and uses an FP4/FP8 mixed-precision scheme for its expert parameters.

V4 Pro exposes three reasoning effort levels, including a maximum-effort "Think Max" mode, and supports function calling and web search for tool-driven agents.

🤗View model card on HuggingFace ↗View source on GitHub ↗

Sources

deepseek-v4-pro Model by Deepseek-aibuild.nvidia.com ↗

DeepSeek V4 Preview Release | DeepSeek API Docsapi-docs.deepseek.com ↗

deepseek-ai/DeepSeek-V4-Pro · Hugging Facehuggingface.co ↗

This About section is AI-generated from public sources (Claude Opus 4.8), with no human editing. It may contain inaccuracies — verify critical details against the sources listed above.

Research & Papers

Primary reference paper for this model family, sourced from the HuggingFace model card.

arXiv2606.19348Apr 2026

DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence(2026)

DeepSeek-AI, Anyi Xu, Bangcai Lin et al.

We present a preview version of DeepSeek-V4 series, including two strong Mixture-of-Experts (MoE) language models -- DeepSeek-V4-Pro with 1.6T parameters (49B activated) and DeepSeek-V4-Flash with 284B parameters (13B activated) -- both supporting a context length of one million…

Data sources: Venice API · HuggingFace · Wikipedia · arXiv — enrichment updated 1d ago