Z.aiZ.ai·💬 Text Generation

GLM 5

ReasoningCodeFunction CallingWeb Searchfp8private
🧠 Try in Intelligence →Try on Venice.ai ↗
Quick reference
GLM 5 — TLDR
  • 🏢 Z.ai's fifth-generation flagship, released February 2026 under MIT license.
  • 🧠 744B-parameter MoE, 40B active per inference, MoE architecture.
  • 📏 Roughly 200K-token context for long documents and codebases.
  • 🆕 Scales up from GLM-4.5's 355B params; 28.5T training tokens.
  • 🔧 Adds DeepSeek Sparse Attention to cut inference cost.
  • ⚡ Post-trained with new asynchronous "slime" RL infrastructure.
  • 🎯 Optimized for reasoning, coding, and agentic engineering tasks.
  • 🔒 FP8 quantized weights available alongside full precision.
💰 Pricing
$1.00 / $3.20
per 1M · input / output
📏 Context
198K tokens
📅 On Venice since
Feb 11, 2026
112 days ago
Provider

Z.ai, formally Knowledge Atlas Technology Joint Stock Co., Ltd., is a Chinese technology company specializing in artificial intelligence. Previously known internationally as Zhipu AI, the company rebranded to Z.ai in 2025. Its core focus is the GLM family of…

Read full profile →
11 models on Venice
10 text · 1 image
Since Apr 1, 2024

About this model

GLM 5 is the fifth-generation large language model from Z.ai (formerly Zhipu AI), released on February 11, 2026, and distributed under the MIT license. It is a Mixture-of-Experts model with roughly 744 billion total parameters and 40 billion active per inference, designed for advanced reasoning, code generation, function calling, and long-horizon agentic workflows. The model card and primary docs describe a context window near 200K tokens with FP8 quantization, and Z.ai publishes both BF16 and FP8 weight formats.

Compared with its same-family predecessors such as GLM 4.7 and the earlier GLM 4.6, GLM 5 represents a substantial scale-up. Z.ai's model card notes the architecture grew from GLM-4.5's 355B parameters (32B active) to 744B (40B active), while pre-training data expanded from 23T to 28.5T tokens. A key architectural change is the adoption of DeepSeek Sparse Attention (DSA), which dynamically allocates attention to reduce training and inference cost while preserving long-context fidelity.

On the training side, GLM 5 uses a new asynchronous reinforcement-learning framework called "slime" that decouples generation from training to improve post-training efficiency. According to the GLM-5 technical report, the model "significantly outperforms GLM-4.7 across frontend, backend, and long-horizon tasks," with mid-training progressively extending context from 4K to 200K tokens.

GLM 5 anchors a broad family that also includes the faster GLM 5 Turbo variant and the later GLM 5.1 update, positioning it as Z.ai's open-weight foundation for agentic engineering.

This About section is AI-generated from public sources (Claude Opus 4.8), with no human editing. It may contain inaccuracies — verify critical details against the sources listed above.

Research & Papers

Primary reference paper for this model family, sourced from the HuggingFace model card.

Data sources: Venice API · HuggingFace · Wikipedia · arXiv — enrichment updated 1d ago