Z.ai·💬 Text Generation·↑ Newer: GLM 5.2

GLM 5

ReasoningCodeFunction CallingWeb Searchfp8private

🧠 Try in Intelligence →

Try on Venice.ai ↗

Quick reference

GLM 5 — TLDR

🆕 Z.ai's February 2026 flagship for agentic engineering and reasoning.
📏 744B-parameter MoE, 40B active, scaled up from GLM-4.5.
🧠 Trained on 28.5T tokens with new asynchronous RL infrastructure.
🔧 Adds DeepSeek Sparse Attention to cut training and inference cost.
📚 Large context window (catalog: 198K tokens), FP8 weights available.
🎯 Capabilities: reasoning, code-optimization, function calling, web search.
🔒 Released open-weight under the permissive MIT license.
💬 Vendor reports gains over GLM-4.7 across reasoning, coding, agentic tasks.

💰 Pricing

$1.00 / $3.20

per 1M · input / output

📏 Context

198K tokens

📅 On Venice since

Feb 11, 2026

158 days ago

Provider

Z.ai

Z.ai, formally Knowledge Atlas Technology Joint Stock Co., Ltd., is a Chinese technology company specializing in artificial intelligence. Previously known internationally as Zhipu AI, the company rebranded to Z.ai in 2025. Its core focus is the GLM family of…

Read full profile →

12 models on Venice

11 text · 1 image

Since Apr 1, 2024

Wikipedia ↗Official site ↗

See 11 other models from Z.ai →

About this model

GLM 5 is the February 2026 flagship from Z.ai (formerly Zhipu AI), positioned for complex systems engineering and long-horizon agentic work. Architecturally it is a Mixture-of-Experts model with 744 billion total parameters and roughly 40 billion active per token, scaled up from GLM-4.5's 355B (32B active), with pre-training data expanded to 28.5 trillion tokens. It is distributed open-weight under the MIT license in both full-precision and FP8 formats.

The two headline changes over earlier generations are efficiency-focused. GLM 5 adopts DeepSeek Sparse Attention (DSA), which the technical report describes as dynamically allocating attention by token importance to lower compute without compromising long-context understanding — an advance over the standard MoE used in GLM-4.5. Post-training uses a new asynchronous reinforcement-learning infrastructure built on the "slime" framework that decouples generation from training to improve GPU utilization.

Relative to its same-family predecessor GLM 4.7, Z.ai reports significant improvements across academic benchmarks in reasoning, coding, and agentic tasks.

GLM 5 was followed by refreshed siblings GLM 5.1 and GLM 5.2, the latter extending to a roughly 1M-token context with the IndexShare architecture.

🤗View model card on HuggingFace ↗View source on GitHub ↗

Sources

GLM-5 - Overview - Z.AI DEVELOPER DOCUMENTdocs.z.ai ↗

GLM-5: from Vibe Coding to Agentic Engineeringarxiv.org ↗

zai-org/GLM-5 · Hugging Facehuggingface.co ↗

This About section is AI-generated from public sources (Claude Opus 4.8), with no human editing. It may contain inaccuracies — verify critical details against the sources listed above.

Research & Papers

Primary reference paper for this model family, sourced from the HuggingFace model card.

arXiv2602.15763Feb 2026

GLM-5: from Vibe Coding to Agentic Engineering(2026)

GLM-5-Team, :, Aohan Zeng et al.

We present GLM-5, a next-generation foundation model designed to transition the paradigm of vibe coding to agentic engineering. Building upon the agentic, reasoning, and coding (ARC) capabilities of its predecessor, GLM-5 adopts DSA to significantly reduce training and inference…

Data sources: Venice API · HuggingFace · Wikipedia · arXiv — enrichment updated 4d ago