Z.aiZ.ai·💬 Text Generation

GLM 5V Turbo

ReasoningVisionCodeFunction CallingWeb Searchanonymized
🧠 Try in Intelligence →Try on Venice.ai ↗
Quick reference
GLM 5V Turbo — TLDR
  • 🆕 Z.ai's first native multimodal agent foundation model.
  • 👁️ Processes image, video, and text inputs natively.
  • 🔧 Built for vision-based coding and agentic workflows.
  • 🏢 Integrates with Claude Code and OpenClaw agents.
  • 📏 200K-token context window per Z.ai docs.
  • 🧠 Uses a vision encoder Z.ai calls CogViT.
  • 🎯 Targets design-to-code, GUI agents, and screenshot debugging.
  • ⚡ Includes function calling and web search capabilities.
💰 Pricing
$1.50 / $5.00
per 1M · input / output
📏 Context
200K tokens
📅 On Venice since
Apr 1, 2026
63 days ago
Provider

Z.ai, formally Knowledge Atlas Technology Joint Stock Co., Ltd., is a Chinese technology company specializing in artificial intelligence. Previously known internationally as Zhipu AI, the company rebranded to Z.ai in 2025. Its core focus is the GLM family of…

Read full profile →
11 models on Venice
10 text · 1 image
Since Apr 1, 2024

About this model

GLM 5V Turbo, released April 2026, is Z.ai's first native multimodal agent foundation model, built specifically for vision-based coding and agent-driven tasks. Where many vision-language models bolt image understanding onto a text-first system, Z.ai's documentation states GLM 5V Turbo fuses visual and textual reasoning from pretraining through post-training, using a vision encoder the company calls CogViT. It natively ingests images, video, and text, then operates across a perceive–plan–execute loop, working with agent frameworks like Claude Code and OpenClaw.

The clearest comparison is to its same-family predecessor GLM 5 Turbo, which in this catalog is a text model. GLM 5V Turbo inherits that agentic foundation and adds the CogViT vision encoder plus multimodal training, making it a vision-language model rather than a pure LLM. Per Z.ai's documentation, it can take visual inputs such as screenshots and screen recordings and generate corresponding code or structured actions.

It carries a 200K-token context window per Z.ai's documentation, with a large maximum output, and supports an expanded multimodal toolchain including bounding-box drawing and webpage screenshot reading.

Within the broader GLM lineup, GLM 5V Turbo sits alongside text models like GLM 5, specialized for tasks that begin with visual input and end in code or action.

This About section is AI-generated from public sources (Claude Opus 4.8), with no human editing. It may contain inaccuracies — verify critical details against the sources listed above.

Data sources: Venice API · HuggingFace · Wikipedia — enrichment updated 1d ago