Not currently listed in Venice's public API catalog — last listed Jun 15, 2026. Delisted models may still respond to direct API calls.

Z.ai·💬 Text Generation

GLM 4.7 Flash

🧠 Try in Intelligence →

Try on Venice.ai ↗

Quick reference

GLM 4.7 Flash — TLDR

🧠 30B-A3B Mixture-of-Experts reasoning model, roughly 3B active parameters.
🔧 Optimized for agentic coding, tool use, and long-horizon planning.
📏 Context window near 200K tokens in this configuration.
🔒 Runs inside a Trusted Execution Environment with hardware attestation.
🆕 Uses interleaved/"preserved" thinking before each tool call.
🌐 Includes web search and reasoning capabilities; MIT-licensed.
🏢 Built by Z.ai (formerly Zhipu AI), released 2026.

💰 Pricing

—

📅 On Venice since

Apr 13, 2026

96 days ago

Provider

Z.ai

Z.ai, formally Knowledge Atlas Technology Joint Stock Co., Ltd., is a Chinese technology company specializing in artificial intelligence. Previously known internationally as Zhipu AI, the company rebranded to Z.ai in 2025. Its core focus is the GLM family of…

Read full profile →

12 models on Venice

11 text · 1 image

Since Apr 1, 2024

Wikipedia ↗Official site ↗

See 11 other models from Z.ai →

About this model

GLM 4.7 Flash is the efficiency-tier member of Z.ai's GLM 4.7 generation, built as a 30-billion-parameter Mixture-of-Experts model that activates only around 3 billion parameters per token, making it suited to lightweight and local deployment. This particular listing runs the model inside a Trusted Execution Environment, adding hardware attestation evidence so users can independently verify that inference happens in a sealed enclave — a privacy-and-verifiability feature layered on top of the standard weights. It supports a context window approaching 200K tokens and is tuned specifically for coding, tool collaboration, and long-horizon agentic workflows.

Compared with the larger flagship GLM 4.7, this Flash variant trades raw capacity for speed and deployability while staying within the same release family. Against earlier GLM generations such as GLM 4.6, Z.ai positions the 4.7 line around stronger agentic coding, repository-level understanding, and "interleaved thinking," where the model reasons sequentially before each action rather than planning everything upfront.

Within Z.ai's broader catalog, it sits alongside successors like GLM 5 and GLM 5.1, remaining the compact, agent-focused option for developers prioritizing efficiency and verifiable execution.

🤗View model card on HuggingFace ↗View source on GitHub ↗

Sources

zai-org/GLM-4.7-Flash · Hugging Facehuggingface.co ↗

This About section is AI-generated from public sources (Claude Opus 4.8), with no human editing. It may contain inaccuracies — verify critical details against the sources listed above.

Research & Papers

Primary reference paper for this model family, sourced from the HuggingFace model card.

arXiv2508.06471Aug 2025

GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models(2025)

GLM-4. 5 Team, :, Aohan Zeng et al.

We present GLM-4.5, an open-source Mixture-of-Experts (MoE) large language model with 355B total parameters and 32B activated parameters, featuring a hybrid reasoning method that supports both thinking and direct response modes. Through multi-stage training on 23T tokens and…

Data sources: Venice API · HuggingFace · Wikipedia · arXiv — enrichment updated 39d ago