Google·💬 Text Generation

Google Gemma 4 31B Instruct

ReasoningVisionFunction CallingWeb Searchbf16private

🧠 Try in Intelligence →

Try on Venice.ai ↗

Quick reference

Google Gemma 4 31B Instruct — TLDR

🧠 Dense 30.7B open model from Google DeepMind for reasoning
🆕 Configurable thinking modes toggled via a reasoning token
📏 256K-token context window for long documents and code
👁️ Handles text and image input; video processed as frames
🔧 Native function calling for agentic, tool-using workflows
🏢 Quantized checkpoints target consumer GPUs and workstations
🔒 Apache 2.0 license; open pre-trained and instruction-tuned weights
📚 Hybrid local/global attention with Proportional RoPE for long context

💰 Pricing

$0.120 / $0.360

per 1M · input / output

📏 Context

256K tokens

📅 On Venice since

Apr 3, 2026

107 days ago

Provider

Google

Google is an American multinational technology corporation and one of the world's most valuable brands. A subsidiary of parent company Alphabet Inc., Google operates across search, cloud computing, consumer electronics, and artificial intelligence. Its…

Read full profile →

30 models on Venice

11 video · 10 text · 3 image · 3 inpaint · 1 music · 1 embedding · 1 tts

Since Oct 15, 2024

Wikipedia ↗Official site ↗

See 29 other models from Google →

About this model

Gemma 4 31B Instruct is the dense flagship of Google DeepMind's Gemma 4 family, a 30.7B-parameter multimodal model that accepts text and image input (and can process video as sequences of frames) while generating text output. It offers a 256K-token context window, native function calling, and configurable thinking modes, aimed at running reasoning, coding, and multimodal tasks under an Apache 2.0 license.

Architecturally it is a dense transformer paired with a vision encoder, using a hybrid attention scheme that interleaves local sliding-window layers with full global attention and Proportional RoPE (p-RoPE) for efficient long-context handling; quantization-aware and w4a16 checkpoints are published for smaller-footprint deployment.

Relative to the sibling Gemma 4 26B A4B Instruct, a Mixture-of-Experts variant with fewer active parameters, this 31B is dense—trading that inference efficiency for the family's highest-quality tier. Against the previous generation Gemma 3 27B, Google DeepMind highlights Gemma 4's built-in reasoning with configurable thinking, native system-prompt and function-calling support, and coding improvements.

Google DeepMind publishes instruction-tuned results in the official Gemma 4 31B model card, spanning reasoning, coding, vision, long-context, and safety tasks, and states the models undergo the same safety evaluations as its proprietary Gemini models.

🤗View model card on HuggingFace ↗View source on GitHub ↗

Sources

google / gemma-4-31b-itdocs.api.nvidia.com ↗

gemma-4-31b-it Model by Googlebuild.nvidia.com ↗

google/gemma-4-31B · Hugging Facehuggingface.co ↗

This About section is AI-generated from public sources (Claude Opus 4.8), with no human editing. It may contain inaccuracies — verify critical details against the sources listed above.

Research & Papers

Primary reference paper for this model family, sourced from the HuggingFace model card.

arXiv2607.02770Jul 2026

Gemma 4 Technical Report(2026)

Gemma Team, Sherif El Abd, Vaibhav Aggarwal et al.

We introduce Gemma 4, a new generation of open-weight, natively multimodal language models in the Gemma model family. Designed to advance compute efficiency and reasoning, the Gemma 4 model suite features dense and Mixture-of-Experts architectures, ranging from 2.3B to 31B…

Data sources: Venice API · HuggingFace · Wikipedia · arXiv — enrichment updated 4d ago