Remove Censorship from Your LLM
Local LLMs refuse a lot. Not because they can’t answer, but because they’ve been trained to avoid it. There’s a technique called abliteration that removes this refusal behavior from the model weights — without retraining. This is how you do it in under 5 minutes.
What is Abliteration?
Abliteration identifies the internal “refusal direction” in a model’s weight matrices and removes it. The result is a model that behaves identically to the original — same intelligence, same capabilities, just without the refusals.
Heretic is an open-source tool that does this automatically. Here’s how it performs on Gemma 3 12B:
| Model | Refusals | KL Divergence |
|---|---|---|
| google/gemma-3-12b-it (original) | 97/100 | 0 |
| mlabonne/gemma-3-12b-it-abliterated-v2 | 3/100 | 1.04 |
| huihui-ai/gemma-3-12b-it-abliterated | 3/100 | 0.45 |
| p-e-w/gemma-3-12b-it-heretic | 3/100 | 0.16 |
KL divergence measures how much the model’s behavior changed from the original. Lower is better — the Heretic version reduces refusals to 3/100 while barely touching the model’s intelligence.
Prerequisites
- Ollama installed → VS Code has now a Native Integration of Your Local Models!
- Enough RAM for the model size you pick:
| Quant | Size | Quality |
|---|---|---|
| Q8_0 | 12.5 GB | Best |
| Q6_K | 9.66 GB | Very good |
| Q5_K_M | 8.45 GB | Good |
| Q4_K_M | 7.3 GB | Decent |
| Q3_K_M | 6.01 GB | Minimal |
Setup
The community already converted the Heretic-processed Gemma 3 12B to GGUF. Pull it directly into Ollama:
ollama pull hf.co/mradermacher/gemma-3-12b-it-heretic-GGUF:Q8_0
ollama run hf.co/mradermacher/gemma-3-12b-it-heretic-GGUF:Q8_0Replace Q8_0 with your preferred quant from the table above.
The original
hf.co/p-e-w/gemma-3-12b-it-hereticdoes not work with Ollama — it’s not in GGUF format. Use themradermacherlink above.
Everything runs locally. No data leaves your machine after the initial download.
Use Heretic with a Different Model
If you want to abliterate a different model or inspect the process:
pip install -U heretic-llm
HF_HUB_DISABLE_TELEMETRY=1 PYTORCH_ENABLE_MPS_FALLBACK=1 heretic Qwen/Qwen3.5-35B-A3BHeretic benchmarks your hardware, runs the optimization, and saves the modified model. Expect 30–90 minutes. Afterwards, convert to GGUF with llama.cpp and import into Ollama:
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp && pip install -r requirements.txt
python convert_hf_to_gguf.py /path/to/heretic-output --outfile model-heretic.gguf
ollama create model-heretic -f - << 'EOF'
FROM /path/to/model-heretic.gguf
EOF
ollama run model-hereticIf you run out of memory, add --quantization bnb_4bit to the Heretic command.
Privacy
Heretic pulls models from HuggingFace, which collects anonymous telemetry by default. Disable it before running:
export HF_HUB_DISABLE_TELEMETRY=1
export TRANSFORMERS_NO_ADVISORY_WARNINGS=1