Z.aiZ.ai·💬 Text Generation

GLM 4.7 Flash

ReasoningCodeWeb SearchE2EEprivate
🧠 Try in Intelligence →Try on Venice.ai ↗
Quick reference
GLM 4.7 Flash — TLDR
  • 🧠 30B-A3B Mixture-of-Experts reasoning model, roughly 3B active parameters.
  • 🔧 Optimized for agentic coding, tool use, and long-horizon planning.
  • 📏 Context window near 200K tokens in this configuration.
  • 🔒 Runs inside a Trusted Execution Environment with hardware attestation.
  • 🆕 Uses interleaved/"preserved" thinking before each tool call.
  • 🌐 Includes web search and reasoning capabilities; MIT-licensed.
  • 🏢 Built by Z.ai (formerly Zhipu AI), released 2026.
💰 Pricing
$0.130 / $0.550
per 1M · input / output
📏 Context
198K tokens
📅 On Venice since
Mar 18, 2026
78 days ago
Provider

Z.ai, formally Knowledge Atlas Technology Joint Stock Co., Ltd., is a Chinese technology company specializing in artificial intelligence. Previously known internationally as Zhipu AI, the company rebranded to Z.ai in 2025. Its core focus is the GLM family of…

Read full profile →
11 models on Venice
10 text · 1 image
Since Apr 1, 2024

About this model

GLM 4.7 Flash is the efficiency-tier member of Z.ai's GLM 4.7 generation, built as a 30-billion-parameter Mixture-of-Experts model that activates only around 3 billion parameters per token, making it suited to lightweight and local deployment. This particular listing runs the model inside a Trusted Execution Environment, adding hardware attestation evidence so users can independently verify that inference happens in a sealed enclave — a privacy-and-verifiability feature layered on top of the standard weights. It supports a context window approaching 200K tokens and is tuned specifically for coding, tool collaboration, and long-horizon agentic workflows.

Compared with the larger flagship GLM 4.7, this Flash variant trades raw capacity for speed and deployability while staying within the same release family. Against earlier GLM generations such as GLM 4.6, Z.ai positions the 4.7 line around stronger agentic coding, repository-level understanding, and "interleaved thinking," where the model reasons sequentially before each action rather than planning everything upfront.

Within Z.ai's broader catalog, it sits alongside successors like GLM 5 and GLM 5.1, remaining the compact, agent-focused option for developers prioritizing efficiency and verifiable execution.

This About section is AI-generated from public sources (Claude Opus 4.8), with no human editing. It may contain inaccuracies — verify critical details against the sources listed above.

Research & Papers

Primary reference paper for this model family, sourced from the HuggingFace model card.

Data sources: Venice API · HuggingFace · Wikipedia · arXiv — enrichment updated 1d ago