👥Community·💬 Text Generation

GLM 4.7 Flash Heretic

ReasoningFunction CallingWeb Searchfp8private

🧠 Try in Intelligence →

Try on Venice.ai ↗

Quick reference

GLM 4.7 Flash Heretic — TLDR

🆕 Uncensored "abliterated" variant of the GLM-4.7-Flash base model.
🧠 Built on a 30B-A3B mixture-of-experts base (~3B active).
🔧 Decensored via the Heretic method, credited to Olafangensan.
📏 Catalog context window of 200K tokens; FP8 quantized.
⚡ Tuned for fast inference and unfiltered creative writing.
🎯 Supports reasoning, function calling, and web search.
🔒 MIT-licensed community release, distributed via Hugging Face.
📚 Roughly 3,600 downloads on Hugging Face at writing.

💰 Pricing

$0.070 / $0.400

per 1M · input / output

📏 Context

200K tokens

📅 On Venice since

Feb 4, 2026

165 days ago

Provider

👥Community

Community represents the broader ecosystem of independent creators, fine-tuners, and open-source contributors who build and share models outside any single corporate lab. Rather than a formal organization, this category collects specialized models developed…

Read full profile →

6 models on Venice

4 image · 1 text · 1 video

Since Jan 11, 2025

See 5 other models from Community →

About this model

GLM 4.7 Flash Heretic is a community-modified version of GLM-4.7-Flash, a roughly 30-billion-parameter mixture-of-experts model that activates about 3 billion parameters per token for lightweight deployment. The "Heretic" suffix denotes abliteration — an automated decensoring technique from the open-source Heretic project — applied here by the contributor Olafangensan to strip refusal behavior while aiming to preserve the underlying model's capabilities.

The practical difference from the unmodified GLM-4.7-Flash is behavioral rather than architectural: the variant is designed to answer prompts without the typical "I cannot help with that" responses, targeting unfiltered dialogue and creative writing. It retains the base model's reasoning traces, function-calling, and tool-use support. Downstream quant makers have anecdotally observed that the decensoring process can shorten the model's reasoning blocks and "focus" outputs, though these are informal notes rather than measured results.

This catalog entry lists a 200K-token context window and FP8 quantization. It carries an MIT license and is distributed through Hugging Face for local use via tools such as vLLM, SGLang, or Ollama.

As a hobbyist-oriented release from the broader open community rather than a vendor flagship, there are no official benchmark figures specific to this Heretic variant; users should treat it as an experimental, uncensored derivative of the base GLM-4.7-Flash.

🤗View model card on HuggingFace ↗View source on GitHub ↗

Sources

DavidAU/GLM-4.7-Flash-Grande-Heretic-UNCENSORED-42B-A3B-GGUF · Hugging Facehuggingface.co ↗

This About section is AI-generated from public sources (Claude Opus 4.8), with no human editing. It may contain inaccuracies — verify critical details against the sources listed above.

Research & Papers

Primary reference paper for this model family, sourced from the HuggingFace model card.

arXiv2508.06471Aug 2025

GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models(2025)

GLM-4. 5 Team, :, Aohan Zeng et al.

We present GLM-4.5, an open-source Mixture-of-Experts (MoE) large language model with 355B total parameters and 32B activated parameters, featuring a hybrid reasoning method that supports both thinking and direct response modes. Through multi-stage training on 23T tokens and…

Data sources: Venice API · HuggingFace · Wikipedia · arXiv — enrichment updated 4d ago