About this model
GLM 4.7 Flash is the speed-optimized member of Z.ai's GLM-4.7 generation, positioned as a lightweight companion to the full GLM 4.7. It is a 30B-A3B Mixture-of-Experts model, activating roughly 3B parameters per token. The model ships open-weight under the MIT license with a 128K-token context window and fp8 quantization.
Compared with its same-family parent, Flash is tuned for faster, cheaper inference while inheriting the 4.7 generation's coding, tool-calling, and multi-step reasoning behaviors. Z.ai exposes reasoning, function-calling, and web-search capabilities, making it suitable for agentic and tool-augmented workflows.
Because it is tuned for throughput, Flash is best suited to high-volume, latency-sensitive, or background tasks, whereas the full GLM 4.7 remains the heavier option for the most demanding prompts.
Within the broader family, GLM 4.7 Flash follows earlier releases such as GLM 4.6 and sits alongside the larger GLM 5 line. It is distributed via Hugging Face, where the open weights and model card are published for self-hosting and deployment.
This About section is AI-generated from public sources (Claude Opus 4.8), with no human editing. It may contain inaccuracies — verify critical details against the sources listed above.
Research & Papers
Primary reference paper for this model family, sourced from the HuggingFace model card.
Data sources: Venice API · HuggingFace · Wikipedia · arXiv — enrichment updated 1d ago