👥Community·💬 Text Generation

GLM 4.7 Flash Heretic

ReasoningFunction CallingWeb Searchfp8private
🧠 Try in Intelligence →Try on Venice.ai ↗
Quick reference
GLM 4.7 Flash Heretic — TLDR
  • 🆕 Uncensored "abliterated" variant of the GLM-4.7-Flash base model.
  • 🧠 Built on a 30B-A3B mixture-of-experts base (~3B active).
  • 🔧 Decensored via the Heretic method, credited to Olafangensan.
  • 📏 Catalog context window of 200K tokens; FP8 quantized.
  • ⚡ Tuned for fast inference and unfiltered creative writing.
  • 🎯 Supports reasoning, function calling, and web search.
  • 🔒 MIT-licensed community release, distributed via Hugging Face.
  • 📚 Roughly 3,600 downloads on Hugging Face at writing.
💰 Pricing
$0.140 / $0.800
per 1M · input / output
📏 Context
200K tokens
📅 On Venice since
Feb 4, 2026
119 days ago
Provider

Community represents the broader ecosystem of independent creators, fine-tuners, and open-source contributors who build and share models outside any single corporate lab. Rather than a formal organization, this category collects specialized models developed…

Read full profile →
6 models on Venice
4 image · 1 text · 1 video
Since Jan 11, 2025

About this model

GLM 4.7 Flash Heretic is a community-modified version of GLM-4.7-Flash, a roughly 30-billion-parameter mixture-of-experts model that activates about 3 billion parameters per token for lightweight deployment. The "Heretic" suffix denotes abliteration — an automated decensoring technique from the open-source Heretic project — applied here by the contributor Olafangensan to strip refusal behavior while aiming to preserve the underlying model's capabilities.

The practical difference from the unmodified GLM-4.7-Flash is behavioral rather than architectural: the variant is designed to answer prompts without the typical "I cannot help with that" responses, targeting unfiltered dialogue and creative writing. It retains the base model's reasoning traces, function-calling, and tool-use support. Downstream quant makers have anecdotally observed that the decensoring process can shorten the model's reasoning blocks and "focus" outputs, though these are informal notes rather than measured results.

This catalog entry lists a 200K-token context window and FP8 quantization. It carries an MIT license and is distributed through Hugging Face for local use via tools such as vLLM, SGLang, or Ollama.

As a hobbyist-oriented release from the broader open community rather than a vendor flagship, there are no official benchmark figures specific to this Heretic variant; users should treat it as an experimental, uncensored derivative of the base GLM-4.7-Flash.

This About section is AI-generated from public sources (Claude Opus 4.8), with no human editing. It may contain inaccuracies — verify critical details against the sources listed above.

Research & Papers

Primary reference paper for this model family, sourced from the HuggingFace model card.

Data sources: Venice API · HuggingFace · Wikipedia · arXiv — enrichment updated 1d ago