Z.ai·💬 Text Generation·↑ Newer: GLM 5 Turbo

GLM 5V Turbo

ReasoningVisionCodeFunction CallingWeb Searchanonymized

🧠 Try in Intelligence →

Try on Venice.ai ↗

Quick reference

GLM 5V Turbo — TLDR

🆕 Z.ai's first native multimodal agent foundation model.
👁️ Natively handles image, video, and text inputs.
🔧 Built for vision-based coding and agent-driven tasks.
📏 Roughly 200K-token context window.
🌐 Includes multimodal tools like screenshots and webpage reading.
💬 Targeted at agent-driven engineering workflows.
🎯 Supports reasoning, function calling, and web search.
🏢 Released April 2026 by Chinese lab Z.ai (formerly Zhipu AI).

💰 Pricing

$1.50 / $5.00

per 1M · input / output

📏 Context

200K tokens

📅 On Venice since

Apr 1, 2026

109 days ago

Provider

Z.ai

Z.ai, formally Knowledge Atlas Technology Joint Stock Co., Ltd., is a Chinese technology company specializing in artificial intelligence. Previously known internationally as Zhipu AI, the company rebranded to Z.ai in 2025. Its core focus is the GLM family of…

Read full profile →

12 models on Venice

11 text · 1 image

Since Apr 1, 2024

Wikipedia ↗Official site ↗

See 11 other models from Z.ai →

About this model

GLM-5V-Turbo is Z.ai's first native multimodal agent foundation model, built specifically for vision-based coding and agent-driven tasks. Unlike the text-only models in the GLM line, it natively processes image, video, and text inputs together rather than relying on intermediate text descriptions, and is aimed at agentic engineering workflows. According to the catalog, it carries roughly a 200K-token context window.

The key generational distinction is vision. Its same-family predecessor, GLM 5 Turbo, is a text-only model tuned for agent execution such as tool calling and long action chains; GLM-5V-Turbo inherits that agentic positioning and adds native multimodal perception. This lets it work directly from visual inputs—such as screenshots, design mockups, and document layouts—within coding and agent loops.

On the tooling side, Z.ai's documentation describes an expanded multimodal toolchain that includes capabilities like taking screenshots and reading webpages, supporting a perceive-then-act style of operation. The model also supports reasoning, function calling, and web search per the catalog capabilities.

Within the broader family—which includes the text-focused GLM 5 and the later GLM 5.2—GLM-5V-Turbo is positioned as the agent-first, vision-capable branch rather than a general-purpose text upgrade.

Sources

GLM-5V-Turbo - Overview - Z.AI DEVELOPER DOCUMENTdocs.z.ai ↗

This About section is AI-generated from public sources (Claude Opus 4.8), with no human editing. It may contain inaccuracies — verify critical details against the sources listed above.

Data sources: Venice API · HuggingFace · Wikipedia — enrichment updated 4d ago