Google Gemma 4 26B A4B Instruct
About this model
Gemma 4 26B A4B Instruct is the Mixture-of-Experts member of Google DeepMind's open-weight Gemma 4 family, released in April 2026. Its model card describes a sparse design with roughly 26B total parameters but only about 4B active per token, letting it run close to the speed of a small model while approaching the quality of the dense Gemma 4 31B Instruct. It targets high-throughput deployment, complementing the dense 31B variant aimed at server-grade and local use.
Against its same-family predecessor, the Gemma 3 27B Instruct, this generation broadens capabilities. Per the catalog, Gemma 4 adds configurable thinking modes for reasoning, native function calling, and structured output, and it extends the context window to 256K tokens for the medium models.
Multimodality also expands: the 26B A4B accepts text, image, and video input, and the model card notes wide multilingual coverage, all in an Apache 2.0 package. This positions it as a flexible open model for developers who want vision, reasoning, and tool use in a single deployable checkpoint.
Buyers should note that, despite the low active-parameter inference cost, all of the model's parameters must be loaded into memory for routing, so its baseline VRAM footprint resembles a dense 26B model rather than a 4B one.
This About section is AI-generated from public sources (Claude Opus 4.8), with no human editing. It may contain inaccuracies — verify critical details against the sources listed above.
Data sources: Venice API · HuggingFace · Wikipedia — enrichment updated 1d ago