Z.aiZ.ai·💬 Text Generation

GLM 4.7

ReasoningFunction CallingWeb Searchfp4private
🧠 Try in Intelligence →Try on Venice.ai ↗
Quick reference
GLM 4.7 — TLDR
  • 🆕 Z.ai's December 2025 GLM family update, MIT-licensed and open-weight.
  • 🧠 358-billion-parameter Mixture-of-Experts model tuned for coding and reasoning.
  • 🔧 Adds Interleaved, Preserved, and Turn-level thinking for agentic workflows.
  • 🎯 Vendor reports 42.8% on Humanity's Last Exam, +12.4 points over GLM-4.6.
  • 📏 Large context window for long documents and extended codebases.
  • 🌐 Multilingual coding plus native tool calling and web browsing.
  • ⚡ Per-turn reasoning toggle trades depth for latency on simple tasks.
💰 Pricing
$0.550 / $2.65
per 1M · input / output
📏 Context
198K tokens
📅 On Venice since
Dec 24, 2025
162 days ago
Provider

Z.ai, formally Knowledge Atlas Technology Joint Stock Co., Ltd., is a Chinese technology company specializing in artificial intelligence. Previously known internationally as Zhipu AI, the company rebranded to Z.ai in 2025. Its core focus is the GLM family of…

Read full profile →
11 models on Venice
10 text · 1 image
Since Apr 1, 2024

About this model

GLM 4.7 is the late-2025 release in Z.ai's open-weight GLM line, a 358-billion-parameter Mixture-of-Experts model released under the MIT license and oriented toward real software-development workflows, multi-step reasoning, and agentic tool use. It arrived alongside a lighter sibling, GLM 4.7 Flash, and was later succeeded within the same family by GLM 5 and GLM 5.1.

Compared with its predecessor GLM 4.6, Z.ai positions GLM 4.7 as a clear step forward in engineering use: stronger multi-step tool calling, better terminal and multilingual coding, and more polished UI generation. On the provider's reported figures, it reaches 42.8% on Humanity's Last Exam, which the model card describes as a +12.4-point gain over GLM 4.6, and notes improved results on the τ²-Bench tool-invocation and BrowseComp web benchmarks.

The most distinctive change is a trio of reasoning controls. Interleaved Thinking applies reasoning before each response and tool call; Preserved Thinking retains that reasoning context across conversation turns; and Turn-level Thinking lets developers enable or disable deep reasoning per turn to balance accuracy against latency.

In practice, GLM 4.7 targets long-horizon, production-style tasks—end-to-end coding, terminal automation, and front-end generation—with an OpenAI-compatible API and downloadable weights on Hugging Face for local deployment via vLLM or SGLang.

This About section is AI-generated from public sources (Claude Opus 4.8), with no human editing. It may contain inaccuracies — verify critical details against the sources listed above.

Research & Papers

Primary reference paper for this model family, sourced from the HuggingFace model card.

Data sources: Venice API · HuggingFace · Wikipedia · arXiv — enrichment updated 1d ago