OpenAIOpenAI·💬 Text Generation

OpenAI GPT OSS 120B

ReasoningFunction CallingWeb Searchprivate
🧠 Try in Intelligence →Try on Venice.ai ↗
Quick reference
OpenAI GPT OSS 120B — TLDR
  • - 🆕 OpenAI's open-weight Mixture-of-Experts model under Apache 2.0 license.
  • - 📏 117B total parameters, 5.1B active per token.
  • - ⚡ Fits on a single 80GB H100 or AMD MI300X GPU.
  • - 🧠 Configurable reasoning effort: low, medium, or high.
  • - 🔧 Native function calling, browsing, Python execution, structured outputs.
  • - 👁️ Full chain-of-thought access for debugging and inspection.
  • - 📚 128K context window; fine-tunable on a single H100 node.
  • - 🔒 Native MXFP4 quantization on MoE weights for efficient deployment.
💰 Pricing
$0.070 / $0.300
per 1M · input / output
📏 Context
128K tokens
📅 On Venice since
Nov 6, 2025
209 days ago
Provider

OpenAI is an American artificial intelligence research organization headquartered in San Francisco, structured as both a for-profit public benefit corporation and a nonprofit foundation. The lab developed the GPT family of large language models, the DALL-E…

Read full profile →
24 models on Venice
13 text · 4 video · 2 image · 2 embedding · 2 inpaint · 1 asr
Since Jan 15, 2025

About this model

OpenAI GPT OSS 120B is the larger of OpenAI's two open-weight gpt-oss models, designed for production-grade reasoning, agentic workflows, and general-purpose use under the permissive Apache 2.0 license. Architecturally it is a Transformer Mixture-of-Experts model with 36 layers, 128 experts per layer (4 active per token), and roughly 117B total parameters of which about 5.1B are active per forward pass. Native MXFP4 quantization of the MoE weights lets it run on a single 80GB GPU such as an NVIDIA H100 or AMD MI300X.

Within the family, it sits above the smaller GPT OSS 20B, which carries roughly 21B total and 3.6B active parameters and is targeted at lower-latency or local deployments that fit within 16GB of memory. The 120B variant trades that footprint for higher reasoning capacity and can itself be fine-tuned on a single H100 node.

Developers can dial reasoning effort across three levels with a single line in the system prompt, and gain full access to the model's chain-of-thought, which OpenAI notes is intended for debugging rather than end-user display. Agentic features include native function calling, web browsing, Python code execution, and structured outputs, with the model able to chain together many sequential browsing calls.

OpenAI evaluated gpt-oss-120b against its own reasoning models including o3, o3-mini, and o4-mini across coding, competition math, health, and agentic tool-use benchmarks at the high reasoning setting.

This About section is AI-generated from public sources (Claude Opus 4.8), with no human editing. It may contain inaccuracies — verify critical details against the sources listed above.

Research & Papers

Primary reference paper for this model family, sourced from the HuggingFace model card.

Data sources: Venice API · HuggingFace · Wikipedia · arXiv — enrichment updated 1d ago