OpenAI·💬 Text Generation

GPT OSS 20B🔒Private

ReasoningWeb SearchE2EEprivate

🧠 Try in Intelligence →

Try on Venice.ai ↗

Quick reference

GPT OSS 20B — TLDR

🆕 OpenAI's compact open-weight model, 21B total parameters, Apache 2.0
🧠 Mixture-of-Experts with only 3.6B active parameters per token
⚡ Optimized for low latency; runs within 16GB memory
📏 Supports roughly 128K token context
🔧 Native function calling, web browsing, Python, structured outputs
🔒 Deployed in a Trusted Execution Environment with hardware attestation (catalog deployment info)
🎯 Full chain-of-thought reasoning traces exposed for debugging
📚 MXFP4-quantized MoE weights, GPT-4o tokenizer (201,088 vocab)

💰 Pricing

$0.050 / $0.190

per 1M · input / output

📏 Context

128K tokens

📅 On Venice since

Mar 18, 2026

123 days ago

Provider

OpenAI

OpenAI is an American artificial intelligence research organization headquartered in San Francisco, structured as both a for-profit public benefit corporation and a nonprofit foundation. The lab developed the GPT family of large language models, the DALL-E…

Read full profile →

30 models on Venice

19 text · 4 video · 2 image · 2 embedding · 2 inpaint · 1 asr

Since Jan 15, 2025

Wikipedia ↗Official site ↗

See 29 other models from OpenAI →

About this model

GPT OSS 20B is the smaller of OpenAI's two open-weight gpt-oss models, designed for lower-latency, local, or specialized use cases. It is a Transformer using a Mixture-of-Experts architecture with about 21B total parameters but only 3.6B active per token, paired with Grouped Query Attention, rotary embeddings, and RMSNorm. Thanks to native MXFP4 quantization of the MoE layer, it runs within 16GB of memory, making it suitable for edge and consumer hardware. According to OpenAI, it delivers results on common benchmarks similar to its o3-mini reasoning model.

This particular listing is the Venice deployment running inside a Trusted Execution Environment, adding hardware attestation evidence so users can independently verify the execution environment — a confidentiality layer wrapped around the same open weights distributed under Apache 2.0.

Within this end-to-end-encrypted family, the 20B sits below its sibling GPT OSS 120B, which carries 117B total and 5.1B active parameters and targets higher-reasoning production workloads on a single 80GB GPU. The 20B trades that capacity for faster inference and a smaller memory footprint, while keeping the same Harmony response format, configurable reasoning levels, and agentic tooling.

Compared with the earlier non-confidential release, OpenAI GPT OSS 120B, the model itself is unchanged in weights; the distinction here is the verifiable TEE wrapper rather than any architectural revision. Both expose full chain-of-thought traces and remain fully fine-tunable, including on consumer hardware for this 20B variant.

🤗View model card on HuggingFace ↗View source on GitHub ↗

Sources

gpt-oss-20b Model | OpenAI APIdevelopers.openai.com ↗

gpt-oss-20b Model by OpenAIbuild.nvidia.com ↗

Introducing gpt-oss | OpenAIopenai.com ↗

openai/gpt-oss-20b · Hugging Facehuggingface.co ↗

This About section is AI-generated from public sources (Claude Opus 4.8), with no human editing. It may contain inaccuracies — verify critical details against the sources listed above.

Research & Papers

Primary reference paper for this model family, sourced from the HuggingFace model card.

arXiv2508.10925Aug 2025

gpt-oss-120b & gpt-oss-20b Model Card(2025)

Liu, Jiancheng Liu, Kevin Lu et al.

We present gpt-oss-120b and gpt-oss-20b, two open-weight reasoning models that push the frontier of accuracy and inference cost. The models use an efficient mixture-of-expert transformer architecture and are trained using large-scale distillation and reinforcement learning. We…

Data sources: Venice API · HuggingFace · Wikipedia · arXiv — enrichment updated 4d ago