Z.ai·💬 Text Generation

GLM 4.7

ReasoningFunction CallingWeb Searchfp4private

🧠 Try in Intelligence →

Try on Venice.ai ↗

Quick reference

GLM 4.7 — TLDR

🆕 Z.ai's December 2025 GLM family update, MIT-licensed and open-weight.
🧠 358-billion-parameter Mixture-of-Experts model tuned for coding and reasoning.
🔧 Adds Interleaved, Preserved, and Turn-level thinking for agentic workflows.
🎯 Vendor reports 42.8% on Humanity's Last Exam, +12.4 points over GLM-4.6.
📏 Large context window for long documents and extended codebases.
🌐 Multilingual coding plus native tool calling and web browsing.
⚡ Per-turn reasoning toggle trades depth for latency on simple tasks.

💰 Pricing

$0.550 / $2.65

per 1M · input / output

📏 Context

198K tokens

📅 On Venice since

Dec 24, 2025

162 days ago

Provider

Z.ai

Z.ai, formally Knowledge Atlas Technology Joint Stock Co., Ltd., is a Chinese technology company specializing in artificial intelligence. Previously known internationally as Zhipu AI, the company rebranded to Z.ai in 2025. Its core focus is the GLM family of…

Read full profile →

11 models on Venice

10 text · 1 image

Since Apr 1, 2024

Wikipedia ↗Official site ↗

See 10 other models from Z.ai →

About this model

GLM 4.7 is the late-2025 release in Z.ai's open-weight GLM line, a 358-billion-parameter Mixture-of-Experts model released under the MIT license and oriented toward real software-development workflows, multi-step reasoning, and agentic tool use. It arrived alongside a lighter sibling, GLM 4.7 Flash, and was later succeeded within the same family by GLM 5 and GLM 5.1.

Compared with its predecessor GLM 4.6, Z.ai positions GLM 4.7 as a clear step forward in engineering use: stronger multi-step tool calling, better terminal and multilingual coding, and more polished UI generation. On the provider's reported figures, it reaches 42.8% on Humanity's Last Exam, which the model card describes as a +12.4-point gain over GLM 4.6, and notes improved results on the τ²-Bench tool-invocation and BrowseComp web benchmarks.

The most distinctive change is a trio of reasoning controls. Interleaved Thinking applies reasoning before each response and tool call; Preserved Thinking retains that reasoning context across conversation turns; and Turn-level Thinking lets developers enable or disable deep reasoning per turn to balance accuracy against latency.

In practice, GLM 4.7 targets long-horizon, production-style tasks—end-to-end coding, terminal automation, and front-end generation—with an OpenAI-compatible API and downloadable weights on Hugging Face for local deployment via vLLM or SGLang.

🤗View model card on HuggingFace ↗View source on GitHub ↗

Sources

GLM-4.7 - Overview - Z.AI DEVELOPER DOCUMENTdocs.z.ai ↗

z-ai / glm4.7docs.api.nvidia.com ↗

zai-org/GLM-4.7 · Hugging Facehuggingface.co ↗

This About section is AI-generated from public sources (Claude Opus 4.8), with no human editing. It may contain inaccuracies — verify critical details against the sources listed above.

Research & Papers

Primary reference paper for this model family, sourced from the HuggingFace model card.

arXiv2508.06471Aug 2025

GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models(2025)

GLM-4. 5 Team, :, Aohan Zeng et al.

We present GLM-4.5, an open-source Mixture-of-Experts (MoE) large language model with 355B total parameters and 32B activated parameters, featuring a hybrid reasoning method that supports both thinking and direct response modes. Through multi-stage training on 23T tokens and…

Data sources: Venice API · HuggingFace · Wikipedia · arXiv — enrichment updated 1d ago