NvidiaNvidia·💬 Text Generation

Nemotron Cascade 2 30B A3B

ReasoningFunction CallingWeb Searchfp8private
🧠 Try in Intelligence →Try on Venice.ai ↗
Quick reference
Nemotron Cascade 2 30B A3B — TLDR
  • - 🧠 Reasoning-optimized 30B Mixture-of-Experts model with only 3B activated parameters
  • - 🏢 Built by NVIDIA, post-trained from Nemotron-3-Nano-30B-A3B-Base
  • - 💬 Operates in both thinking and instruct (non-thinking) modes
  • - 📏 256K-token context in this build; model card cites support up to 1M tokens
  • - 🔧 Targets function-calling and agentic tasks via OpenHands
  • - 🎯 NVIDIA reports gold-medal IMO 2025 and IOI 2025 performance
  • - ⚡ Released under the NVIDIA Open Model License, FP8 quantized
💰 Pricing
$0.140 / $0.800
per 1M · input / output
📏 Context
256K tokens
📅 On Venice since
Mar 24, 2026
71 days ago
Provider

Nvidia Corporation is an American technology company founded in 1993 by Jensen Huang, Chris Malachowsky, and Curtis Priem, headquartered in Santa Clara, California. Long recognized as the dominant force in graphics processing units, Nvidia has expanded into a…

Read full profile →
4 models on Venice
2 text · 1 embedding · 1 asr
Since Oct 10, 2025

About this model

Nemotron Cascade 2 30B A3B is NVIDIA's reasoning-focused open-weight model, structured as a 30B Mixture-of-Experts network that activates only about 3B parameters per token for efficient inference. It is post-trained from the NVIDIA Nemotron 3 Nano 30B base (Nemotron-Nano-V3), inheriting that architecture while layering on a new reasoning recipe. The model follows a ChatML template and can switch between extended chain-of-thought "thinking" mode and a direct instruct mode by prepending an empty reasoning block.

The headline change over its base predecessor is the post-training method NVIDIA calls Cascade RL, expanded to a broader spectrum of reasoning and agentic domains, plus multi-domain on-policy distillation from the strongest intermediate teacher models. NVIDIA reports that Cascade 2 improves on the Nemotron-Nano-V3 base across nearly every benchmark, and that it reaches gold-medal-level results on the 2025 IMO, IOI, and ICPC World Finals. These olympiad figures are vendor-reported.

On context and modality, this catalog build exposes a 256K-token window, matching NVIDIA's default vLLM configuration, though the model card states support up to 1M tokens. The model is text-only and does not handle image input.

Independently, Artificial Analysis places Cascade 2 at 28 on its Intelligence Index. The collection ships under the NVIDIA Open Model License, which permits commercial use, with checkpoints and training data released.

This About section is AI-generated from public sources (Claude Opus 4.8), with no human editing. It may contain inaccuracies — verify critical details against the sources listed above.

Research & Papers

Primary reference paper for this model family, sourced from the HuggingFace model card.

Data sources: Venice API · HuggingFace · Wikipedia · arXiv — enrichment updated 1d ago