GoogleGoogle·💬 Text Generation

Google Gemma 4 31B Instruct

ReasoningVisionFunction CallingWeb Searchbf16private
🧠 Try in Intelligence →Try on Venice.ai ↗
Quick reference
Google Gemma 4 31B Instruct — TLDR
  • 🆕 Dense 31B open-weights model from Google DeepMind, Apache 2.0.
  • 📏 256K-token context with hybrid local/global attention and p-RoPE.
  • 👁️ Multimodal: text, image, and video as frame sequences.
  • 🧠 Configurable thinking modes for step-by-step reasoning.
  • 🔧 Native function calling and structured output for agentic workflows.
  • 🌐 Pre-trained across 140+ languages.
  • 🏢 Targets consumer GPUs and workstations.
💰 Pricing
$0.155 / $0.440
per 1M · input / output
📏 Context
256K tokens
📅 On Venice since
Apr 3, 2026
62 days ago
Provider

Google is an American multinational technology corporation and one of the world's most valuable brands. A subsidiary of parent company Alphabet Inc., Google operates across search, cloud computing, consumer electronics, and artificial intelligence. Its…

Read full profile →
25 models on Venice
10 text · 8 video · 2 image · 2 inpaint · 1 music · 1 embedding · 1 tts
Since Oct 15, 2024

About this model

Gemma 4 31B Instruct is the dense, maximum-quality member of Google DeepMind's open Gemma 4 family, built for consumer GPUs and workstations rather than edge devices. It handles text and image inputs, processes video as sequences of frames, and generates text, with a 256K-token context window and support for over 140 languages under the Apache 2.0 license.

Against its same-family predecessor Google Gemma 3 27B Instruct, Gemma 4 introduces several documented changes: configurable thinking modes that emit internal reasoning before a final answer, and native function calling for agentic workflows.

Architecturally, Gemma 4 uses a hybrid attention mechanism interleaving local sliding-window and full global attention, with unified Keys and Values in global layers and Proportional RoPE to aid long-context performance. It sits alongside the latency-focused Google Gemma 4 26B A4B Instruct, a Mixture-of-Experts sibling that activates only a subset of its parameters per token for faster inference, whereas the 31B Dense model keeps all parameters active for quality.

Both pre-trained and instruction-tuned variants are released as open weights, and the model cards note that Gemma 4 underwent safety evaluations and sensitive-data filtering during training.

This About section is AI-generated from public sources (Claude Opus 4.8), with no human editing. It may contain inaccuracies — verify critical details against the sources listed above.

Data sources: Venice API · HuggingFace · Wikipedia — enrichment updated 1d ago