Nvidia·💬 Text Generation

Nemotron Cascade 2 30B A3B

ReasoningFunction CallingWeb Searchfp8private

🧠 Try in Intelligence →

Try on Venice.ai ↗

Quick reference

Nemotron Cascade 2 30B A3B — TLDR

- 🧠 Reasoning-optimized 30B Mixture-of-Experts model with only 3B activated parameters
- 🏢 Built by NVIDIA, post-trained from Nemotron-3-Nano-30B-A3B-Base
- 💬 Operates in both thinking and instruct (non-thinking) modes
- 📏 256K-token context in this build; model card cites support up to 1M tokens
- 🔧 Targets function-calling and agentic tasks via OpenHands
- 🎯 NVIDIA reports gold-medal IMO 2025 and IOI 2025 performance
- ⚡ Released under the NVIDIA Open Model License, FP8 quantized

💰 Pricing

$0.140 / $0.800

per 1M · input / output

📏 Context

256K tokens

📅 On Venice since

Mar 24, 2026

117 days ago

Provider

Nvidia

Nvidia Corporation is an American technology company founded in 1993 by Jensen Huang, Chris Malachowsky, and Curtis Priem, headquartered in Santa Clara, California. Long recognized as the dominant force in graphics processing units, Nvidia has expanded into a…

Read full profile →

5 models on Venice

3 text · 1 embedding · 1 asr

Since Oct 10, 2025

Wikipedia ↗Official site ↗

See 4 other models from Nvidia →

About this model

Nemotron Cascade 2 30B A3B is NVIDIA's reasoning-focused open-weight model, structured as a 30B Mixture-of-Experts network that activates only about 3B parameters per token for efficient inference. It is post-trained from the NVIDIA Nemotron 3 Nano 30B base (Nemotron-Nano-V3), inheriting that architecture while layering on a new reasoning recipe. The model follows a ChatML template and can switch between extended chain-of-thought "thinking" mode and a direct instruct mode by prepending an empty reasoning block.

The headline change over its base predecessor is the post-training method NVIDIA calls Cascade RL, expanded to a broader spectrum of reasoning and agentic domains, plus multi-domain on-policy distillation from the strongest intermediate teacher models. NVIDIA reports that Cascade 2 improves on the Nemotron-Nano-V3 base across nearly every benchmark, and that it reaches gold-medal-level results on the 2025 IMO, IOI, and ICPC World Finals. These olympiad figures are vendor-reported.

On context and modality, this catalog build exposes a 256K-token window, matching NVIDIA's default vLLM configuration, though the model card states support up to 1M tokens. The model is text-only and does not handle image input.

Independently, Artificial Analysis places Cascade 2 at 28 on its Intelligence Index. The collection ships under the NVIDIA Open Model License, which permits commercial use, with checkpoints and training data released.

🤗View model card on HuggingFace ↗View source on GitHub ↗

Sources

Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation - NVIDIA Nemotronresearch.nvidia.com ↗

nvidia/Nemotron-Cascade-2-30B-A3B yet another model to test - DGX Spark / GB10 - NVIDIA Developer Forumsforums.developer.nvidia.com ↗

Nemotron Cascade 2 30B A3B - Intelligence, Performance & Price Analysisartificialanalysis.ai ↗

nvidia/Nemotron-Cascade-2-30B-A3B · Hugging Facehuggingface.co ↗

This About section is AI-generated from public sources (Claude Opus 4.8), with no human editing. It may contain inaccuracies — verify critical details against the sources listed above.

Research & Papers

Primary reference paper for this model family, sourced from the HuggingFace model card.

arXiv2603.19220Mar 2026

Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation(2026)

Zhuolin Yang, Zihan Liu, Yang Chen et al.

We introduce Nemotron-Cascade 2, an open 30B MoE model with 3B activated parameters that delivers best-in-class reasoning and strong agentic capabilities. Despite its compact size, its mathematical and coding reasoning performance approaches that of frontier open models. It is…

Data sources: Venice API · HuggingFace · Wikipedia · arXiv — enrichment updated 4d ago