NvidiaNvidia·💬 Text Generation

NVIDIA Nemotron 3 Nano 30B

Function CallingWeb Searchfp8private
🧠 Try in Intelligence →Try on Venice.ai ↗
Quick reference
NVIDIA Nemotron 3 Nano 30B — TLDR
  • 🧠 Unified reasoning and non-reasoning model, trained from scratch by NVIDIA.
  • 🔧 Hybrid Mamba-2 plus Transformer backbone with sparse Mixture-of-Experts layers.
  • 📏 3.2B active parameters, 31.6B total; supports up to 1M-token context.
  • ⚡ Optimized for high-throughput inference on a single H200 GPU.
  • 🆕 Better accuracy than prior Nemotron 2 Nano while activating fewer parameters.
  • 🎯 Built for agentic AI, chatbots, RAG, and coding workflows.
  • 🌐 Trained on English plus 19 languages and 43 programming languages.
  • 🔒 Fully open weights, datasets, and training recipes released.
💰 Pricing
$0.075 / $0.300
per 1M · input / output
📏 Context
128K tokens
📅 On Venice since
Jan 27, 2026
127 days ago
Provider

Nvidia Corporation is an American technology company founded in 1993 by Jensen Huang, Chris Malachowsky, and Curtis Priem, headquartered in Santa Clara, California. Long recognized as the dominant force in graphics processing units, Nvidia has expanded into a…

Read full profile →
4 models on Venice
2 text · 1 embedding · 1 asr
Since Oct 10, 2025

About this model

NVIDIA Nemotron 3 Nano 30B-A3B is a compact Mixture-of-Experts language model in the Nemotron 3 family, trained from scratch by NVIDIA and designed as a unified system for both reasoning and non-reasoning tasks. It first generates a reasoning trace and then concludes with a final response, targeting developers building AI agents, chatbots, and retrieval-augmented systems. Architecturally, it pairs a hybrid Mamba-2 and Transformer backbone with sparse MoE feed-forward layers, activating just 3.2B of its 31.6B total parameters per forward pass.

Against its same-family predecessor, NVIDIA states Nemotron 3 Nano achieves better accuracy than the previous-generation Nemotron 2 Nano while activating less than half the parameters per forward pass. NVIDIA positions the model for high inference throughput on a single H200 GPU at an 8K-input/16K-output setting.

The model supports context up to 1M tokens, though deployment defaults and VRAM constraints often run it at 256k; this catalog entry exposes a 128k window with FP8 quantization. Training data covers webpages, dialogue, and articles in English, 19 additional languages, and 43 programming languages.

Within Venice's NVIDIA lineup, Nemotron 3 Nano sits alongside the later Nemotron Cascade 2 30B A3B text model, the Nemotron Embed VL 1B v2 embedding model, and the [[sibling:nvidia/parakeet-tdt-0.6b-v3|Parakeet ASR]] speech model. NVIDIA released the weights, training recipe, and redistributable data openly.

This About section is AI-generated from public sources (Claude Opus 4.8), with no human editing. It may contain inaccuracies — verify critical details against the sources listed above.

Data sources: Venice API · HuggingFace · Wikipedia — enrichment updated 1d ago