About this model
OpenAI GPT OSS 120B is the larger of OpenAI's two open-weight gpt-oss models, designed for production-grade reasoning, agentic workflows, and general-purpose use under the permissive Apache 2.0 license. Architecturally it is a Transformer Mixture-of-Experts model with 36 layers, 128 experts per layer (4 active per token), and roughly 117B total parameters of which about 5.1B are active per forward pass. Native MXFP4 quantization of the MoE weights lets it run on a single 80GB GPU such as an NVIDIA H100 or AMD MI300X.
Within the family, it sits above the smaller GPT OSS 20B, which carries roughly 21B total and 3.6B active parameters and is targeted at lower-latency or local deployments that fit within 16GB of memory. The 120B variant trades that footprint for higher reasoning capacity and can itself be fine-tuned on a single H100 node.
Developers can dial reasoning effort across three levels with a single line in the system prompt, and gain full access to the model's chain-of-thought, which OpenAI notes is intended for debugging rather than end-user display. Agentic features include native function calling, web browsing, Python code execution, and structured outputs, with the model able to chain together many sequential browsing calls.
OpenAI evaluated gpt-oss-120b against its own reasoning models including o3, o3-mini, and o4-mini across coding, competition math, health, and agentic tool-use benchmarks at the high reasoning setting.
This About section is AI-generated from public sources (Claude Opus 4.8), with no human editing. It may contain inaccuracies — verify critical details against the sources listed above.
Research & Papers
Primary reference paper for this model family, sourced from the HuggingFace model card.
Data sources: Venice API · HuggingFace · Wikipedia · arXiv — enrichment updated 1d ago