OpenAI·💬 Text Generation

OpenAI GPT OSS 120B

ReasoningFunction CallingWeb Searchprivate

🧠 Try in Intelligence →

Try on Venice.ai ↗

Quick reference

OpenAI GPT OSS 120B — TLDR

- 🆕 OpenAI's open-weight Mixture-of-Experts model under Apache 2.0 license.
- 📏 117B total parameters, 5.1B active per token.
- ⚡ Fits on a single 80GB H100 or AMD MI300X GPU.
- 🧠 Configurable reasoning effort: low, medium, or high.
- 🔧 Native function calling, browsing, Python execution, structured outputs.
- 👁️ Full chain-of-thought access for debugging and inspection.
- 📚 128K context window; fine-tunable on a single H100 node.
- 🔒 Native MXFP4 quantization on MoE weights for efficient deployment.

💰 Pricing

$0.070 / $0.300

per 1M · input / output

📏 Context

128K tokens

📅 On Venice since

Nov 6, 2025

255 days ago

Provider

OpenAI

OpenAI is an American artificial intelligence research organization headquartered in San Francisco, structured as both a for-profit public benefit corporation and a nonprofit foundation. The lab developed the GPT family of large language models, the DALL-E…

Read full profile →

30 models on Venice

19 text · 4 video · 2 image · 2 embedding · 2 inpaint · 1 asr

Since Jan 15, 2025

Wikipedia ↗Official site ↗

See 29 other models from OpenAI →

About this model

OpenAI GPT OSS 120B is the larger of OpenAI's two open-weight gpt-oss models, designed for production-grade reasoning, agentic workflows, and general-purpose use under the permissive Apache 2.0 license. Architecturally it is a Transformer Mixture-of-Experts model with 36 layers, 128 experts per layer (4 active per token), and roughly 117B total parameters of which about 5.1B are active per forward pass. Native MXFP4 quantization of the MoE weights lets it run on a single 80GB GPU such as an NVIDIA H100 or AMD MI300X.

Within the family, it sits above the smaller GPT OSS 20B, which carries roughly 21B total and 3.6B active parameters and is targeted at lower-latency or local deployments that fit within 16GB of memory. The 120B variant trades that footprint for higher reasoning capacity and can itself be fine-tuned on a single H100 node.

Developers can dial reasoning effort across three levels with a single line in the system prompt, and gain full access to the model's chain-of-thought, which OpenAI notes is intended for debugging rather than end-user display. Agentic features include native function calling, web browsing, Python code execution, and structured outputs, with the model able to chain together many sequential browsing calls.

OpenAI evaluated gpt-oss-120b against its own reasoning models including o3, o3-mini, and o4-mini across coding, competition math, health, and agentic tool-use benchmarks at the high reasoning setting.

🤗View model card on HuggingFace ↗View source on GitHub ↗

Sources

gpt-oss-120b Model | OpenAI APIdevelopers.openai.com ↗

Introducing gpt-oss | OpenAIopenai.com ↗

openai / gpt-oss-120bdocs.api.nvidia.com ↗

openai/gpt-oss-120b · Hugging Facehuggingface.co ↗

This About section is AI-generated from public sources (Claude Opus 4.8), with no human editing. It may contain inaccuracies — verify critical details against the sources listed above.

Research & Papers

Primary reference paper for this model family, sourced from the HuggingFace model card.

arXiv2508.10925Aug 2025

gpt-oss-120b & gpt-oss-20b Model Card(2025)

Liu, Jiancheng Liu, Kevin Lu et al.

We present gpt-oss-120b and gpt-oss-20b, two open-weight reasoning models that push the frontier of accuracy and inference cost. The models use an efficient mixture-of-expert transformer architecture and are trained using large-scale distillation and reinforcement learning. We…

Data sources: Venice API · HuggingFace · Wikipedia · arXiv — enrichment updated 4d ago