GoogleGoogle·💬 Text Generation

Google Gemma 4 26B A4B Instruct

ReasoningVisionFunction CallingWeb Searchbf16private
🧠 Try in Intelligence →Try on Venice.ai ↗
Quick reference
Google Gemma 4 26B A4B Instruct — TLDR
  • 🧠 Mixture-of-Experts: ~26B total, only ~4B active per token
  • ⚡ Near-dense quality with roughly 4B-parameter inference speed
  • 📏 256K-token context window for medium Gemma 4 models
  • 👁️ Accepts text, image, and video input
  • 🔧 Native function calling and structured output
  • 🧠 Configurable thinking modes for reasoning across the family
  • 🌐 Multilingual support spanning many languages
  • 🔒 Open weights under Apache 2.0
💰 Pricing
$0.163 / $0.500
per 1M · input / output
📏 Context
256K tokens
📅 On Venice since
Apr 2, 2026
63 days ago
Provider

Google is an American multinational technology corporation and one of the world's most valuable brands. A subsidiary of parent company Alphabet Inc., Google operates across search, cloud computing, consumer electronics, and artificial intelligence. Its…

Read full profile →
25 models on Venice
10 text · 8 video · 2 image · 2 inpaint · 1 music · 1 embedding · 1 tts
Since Oct 15, 2024

About this model

Gemma 4 26B A4B Instruct is the Mixture-of-Experts member of Google DeepMind's open-weight Gemma 4 family, released in April 2026. Its model card describes a sparse design with roughly 26B total parameters but only about 4B active per token, letting it run close to the speed of a small model while approaching the quality of the dense Gemma 4 31B Instruct. It targets high-throughput deployment, complementing the dense 31B variant aimed at server-grade and local use.

Against its same-family predecessor, the Gemma 3 27B Instruct, this generation broadens capabilities. Per the catalog, Gemma 4 adds configurable thinking modes for reasoning, native function calling, and structured output, and it extends the context window to 256K tokens for the medium models.

Multimodality also expands: the 26B A4B accepts text, image, and video input, and the model card notes wide multilingual coverage, all in an Apache 2.0 package. This positions it as a flexible open model for developers who want vision, reasoning, and tool use in a single deployable checkpoint.

Buyers should note that, despite the low active-parameter inference cost, all of the model's parameters must be loaded into memory for routing, so its baseline VRAM footprint resembles a dense 26B model rather than a 4B one.

This About section is AI-generated from public sources (Claude Opus 4.8), with no human editing. It may contain inaccuracies — verify critical details against the sources listed above.

Data sources: Venice API · HuggingFace · Wikipedia — enrichment updated 1d ago