Z.aiZ.ai·💬 Text Generation

GLM 4.7 Flash

ReasoningFunction CallingWeb Searchfp8private
🧠 Try in Intelligence →Try on Venice.ai ↗
Quick reference
GLM 4.7 Flash — TLDR
  • 🆕 Fast-inference variant of GLM-4.7, tuned for speed.
  • 🧠 30B-A3B Mixture-of-Experts with roughly 3B active parameters.
  • 📏 128K-token context window for long inputs.
  • 🔧 Reasoning, function-calling, and web search supported.
  • ⚡ Optimized for quick responses on latency-sensitive workloads.
  • 🔒 Open-weight under the MIT license, fp8 quantized.
  • 🏢 Built by Z.ai (formerly Zhipu AI), released January 2026.
💰 Pricing
$0.125 / $0.500
per 1M · input / output
📏 Context
128K tokens
📅 On Venice since
Jan 29, 2026
126 days ago
Provider

Z.ai, formally Knowledge Atlas Technology Joint Stock Co., Ltd., is a Chinese technology company specializing in artificial intelligence. Previously known internationally as Zhipu AI, the company rebranded to Z.ai in 2025. Its core focus is the GLM family of…

Read full profile →
11 models on Venice
10 text · 1 image
Since Apr 1, 2024

About this model

GLM 4.7 Flash is the speed-optimized member of Z.ai's GLM-4.7 generation, positioned as a lightweight companion to the full GLM 4.7. It is a 30B-A3B Mixture-of-Experts model, activating roughly 3B parameters per token. The model ships open-weight under the MIT license with a 128K-token context window and fp8 quantization.

Compared with its same-family parent, Flash is tuned for faster, cheaper inference while inheriting the 4.7 generation's coding, tool-calling, and multi-step reasoning behaviors. Z.ai exposes reasoning, function-calling, and web-search capabilities, making it suitable for agentic and tool-augmented workflows.

Because it is tuned for throughput, Flash is best suited to high-volume, latency-sensitive, or background tasks, whereas the full GLM 4.7 remains the heavier option for the most demanding prompts.

Within the broader family, GLM 4.7 Flash follows earlier releases such as GLM 4.6 and sits alongside the larger GLM 5 line. It is distributed via Hugging Face, where the open weights and model card are published for self-hosting and deployment.

This About section is AI-generated from public sources (Claude Opus 4.8), with no human editing. It may contain inaccuracies — verify critical details against the sources listed above.

Research & Papers

Primary reference paper for this model family, sourced from the HuggingFace model card.

Data sources: Venice API · HuggingFace · Wikipedia · arXiv — enrichment updated 1d ago