DeepSeekDeepSeek·💬 Text Generation

DeepSeek V4 Pro

ReasoningCodeFunction CallingWeb Searchanonymized
🧠 Try in Intelligence →Try on Venice.ai ↗
Quick reference
DeepSeek V4 Pro — TLDR
  • 🧠 Flagship 1.6T-parameter Mixture-of-Experts model with 49B active per token.
  • 📏 1M-token context window, now standard across DeepSeek services.
  • 🆕 Hybrid attention pairs Compressed Sparse Attention with Heavily Compressed Attention.
  • ⚡ Uses 27% of V3.2's per-token FLOPs and 10% of its KV cache at 1M context.
  • 💬 Offers non-thinking, thinking, and Think Max reasoning modes.
  • 🔧 Function calling and tool use for long-horizon agentic workflows.
  • 🔒 MIT-licensed, open-weights; released April 24, 2026.
  • 📚 Post-trained via domain-expert SFT/RL plus on-policy distillation.
💰 Pricing
$1.73 / $3.80
per 1M · input / output
📏 Context
1M tokens
📅 On Venice since
Apr 24, 2026
40 days ago
Provider

DeepSeek is a Chinese artificial intelligence company specializing in large language model development, founded in July 2023 by Liang Wenfeng. Based in Hangzhou, Zhejiang, the company is backed by High-Flyer, a prominent Chinese hedge fund also co-founded by…

Read full profile →
3 models on Venice
3 text
Since Dec 4, 2025

About this model

DeepSeek V4 Pro is the flagship of DeepSeek's two-tier V4 preview series, a Mixture-of-Experts language model with 1.6 trillion total parameters and 49 billion activated per token, supporting a one-million-token context window. It launched alongside its lighter sibling DeepSeek V4 Flash (284B total / 13B active) under the MIT license on April 24, 2026, and is positioned for advanced reasoning, coding, and long-horizon agentic tasks.

The headline change over DeepSeek V3.2 is architectural. V4 Pro introduces a hybrid attention design combining Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA) to make long-context processing far cheaper. Per DeepSeek's own model card, at a 1M-token context the model requires only 27% of the single-token inference FLOPs and 10% of the KV cache compared with V3.2. This sub-linear scaling is what makes million-token context economically practical, and 1M context is now the default across DeepSeek's official services.

Training adds a two-stage post-training pipeline: domain-specific experts are first cultivated independently through supervised fine-tuning and reinforcement learning with GRPO, then consolidated into one model via on-policy distillation. The model was pre-trained on tens of trillions of tokens and uses an FP4/FP8 mixed-precision scheme for its expert parameters.

V4 Pro exposes three reasoning effort levels, including a maximum-effort "Think Max" mode, and supports function calling and web search for tool-driven agents.

This About section is AI-generated from public sources (Claude Opus 4.8), with no human editing. It may contain inaccuracies — verify critical details against the sources listed above.

Data sources: Venice API · HuggingFace · Wikipedia — enrichment updated 5d ago