About this model
GLM 4.7 Flash Heretic is a community-modified version of GLM-4.7-Flash, a roughly 30-billion-parameter mixture-of-experts model that activates about 3 billion parameters per token for lightweight deployment. The "Heretic" suffix denotes abliteration — an automated decensoring technique from the open-source Heretic project — applied here by the contributor Olafangensan to strip refusal behavior while aiming to preserve the underlying model's capabilities.
The practical difference from the unmodified GLM-4.7-Flash is behavioral rather than architectural: the variant is designed to answer prompts without the typical "I cannot help with that" responses, targeting unfiltered dialogue and creative writing. It retains the base model's reasoning traces, function-calling, and tool-use support. Downstream quant makers have anecdotally observed that the decensoring process can shorten the model's reasoning blocks and "focus" outputs, though these are informal notes rather than measured results.
This catalog entry lists a 200K-token context window and FP8 quantization. It carries an MIT license and is distributed through Hugging Face for local use via tools such as vLLM, SGLang, or Ollama.
As a hobbyist-oriented release from the broader open community rather than a vendor flagship, there are no official benchmark figures specific to this Heretic variant; users should treat it as an experimental, uncensored derivative of the base GLM-4.7-Flash.
This About section is AI-generated from public sources (Claude Opus 4.8), with no human editing. It may contain inaccuracies — verify critical details against the sources listed above.
Research & Papers
Primary reference paper for this model family, sourced from the HuggingFace model card.
Data sources: Venice API · HuggingFace · Wikipedia · arXiv — enrichment updated 1d ago