Google·💬 Text Generation

Google Gemma 4 26B A4B Instruct

ReasoningVisionFunction CallingWeb Searchbf16private

🧠 Try in Intelligence →

Try on Venice.ai ↗

Quick reference

Google Gemma 4 26B A4B Instruct — TLDR

🧠 Mixture-of-Experts: 26B total parameters, only ~4B active per token.
⚡ Google reports it runs almost as fast as a 4B dense model.
📏 256K-token context window across the larger Gemma 4 models.
👁️ Accepts text, image, and video input.
🔧 Native function calling among its listed capabilities.
🧠 Configurable thinking modes for step-by-step reasoning.
🌐 Multilingual support across 140+ languages.
🔒 Open-weight, instruction-tuned release under Apache 2.0.

💰 Pricing

$0.130 / $0.400

per 1M · input / output

📏 Context

256K tokens

📅 On Venice since

Apr 2, 2026

108 days ago

Provider

Google

Google is an American multinational technology corporation and one of the world's most valuable brands. A subsidiary of parent company Alphabet Inc., Google operates across search, cloud computing, consumer electronics, and artificial intelligence. Its…

Read full profile →

30 models on Venice

11 video · 10 text · 3 image · 3 inpaint · 1 music · 1 embedding · 1 tts

Since Oct 15, 2024

Wikipedia ↗Official site ↗

See 29 other models from Google →

About this model

Gemma 4 26B A4B Instruct is an open-weight, instruction-tuned model from Google DeepMind, released in April 2026 as part of the Gemma 4 family. Unlike its dense siblings, it uses a Mixture-of-Experts design: of its roughly 26 billion total parameters, only about 4 billion activate per token, so all weights load into memory while inference stays fast. Google describes it as running almost as quickly as a 4B dense model.

Compared to same-family predecessors, this model advances on several fronts. Where the earlier Google Gemma 3 27B Instruct used a dense architecture, Gemma 4 introduces MoE variants alongside dense ones, plus built-in configurable reasoning modes and video input. Its context window reaches 256K tokens, and multilingual coverage spans over 140 languages.

Within Gemma 4 itself, the 26B A4B is positioned as the throughput-optimized counterpart to the dense Google Gemma 4 31B Instruct. Sparse activation reduces compute per token relative to the dense 31B while aiming for comparable quality.

Both target consumer GPUs and workstations. It supports text, image, and video input natively, with audio featured on the smaller family members rather than this size. Its listed capabilities include reasoning, vision, function calling, and web search, offered as an open-weight, multilingual option.

🤗View model card on HuggingFace ↗View source on GitHub ↗

Sources

Gemma 4 26B A4B IT | Gemini Enterprise Agent Platform | Google Cloud Documentationdocs.cloud.google.com ↗

google/gemma-4-26B-A4B-it · Hugging Facehuggingface.co ↗

This About section is AI-generated from public sources (Claude Opus 4.8), with no human editing. It may contain inaccuracies — verify critical details against the sources listed above.

Research & Papers

Primary reference paper for this model family, sourced from the HuggingFace model card.

arXiv2607.02770Jul 2026

Gemma 4 Technical Report(2026)

Gemma Team, Sherif El Abd, Vaibhav Aggarwal et al.

We introduce Gemma 4, a new generation of open-weight, natively multimodal language models in the Gemma model family. Designed to advance compute efficiency and reasoning, the Gemma 4 model suite features dense and Mixture-of-Experts architectures, ranging from 2.3B to 31B…

Data sources: Venice API · HuggingFace · Wikipedia · arXiv — enrichment updated 4d ago