Qwen3.5-122B-A10B-abliterated-v2-GGUF

GGUF quantizations of wangzhang/Qwen3.5-122B-A10B-abliterated-v2, the Prometheus-abliterated version of Alibaba's Qwen3.5-122B-A10B.

Why this model?

Not all abliterations are equal. The wangzhang Prometheus method is significantly more sophisticated than standard abliteration approaches (like huihui or Chompa1422), using:

Per-layer direction optimization — each transformer layer gets its own optimal refusal direction
MoE-aware expert steering — suppresses safety-critical expert router weights and modifies expert down_proj matrices
Bayesian hyperparameter search (Optuna TPE) with LLM-judge evaluation
KL divergence of just 0.0115 from the original model — virtually identical outputs on normal queries
97% refusal reduction (199/200 → 6/200)

This preserves significantly more of the base model's capabilities compared to basic single-direction abliteration methods.

Available quantizations

Quant	Size	Best for
Q4_K_M	74 GB	128GB systems (ASUS Ascent GX10, DGX Spark, Mac Studio) — recommended

More quantizations (Q5_K_M, Q6_K, Q8_0, F16) coming soon.

Base model details

Architecture: Hybrid Gated DeltaNet + Sparse MoE (122B total, 10B active per token, 256 experts)
Context: 262K native (extendable to 1M via YaRN)
Languages: 201
Capabilities: Multimodal (vision-language), reasoning, coding, agentic workflows
License: Apache 2.0

How to run

llama.cpp

./llama-server \
  -m Qwen3.5-122B-A10B-abliterated-v2-Q4_K_M.gguf \
  --ctx-size 262144 \
  -ngl 999 \
  --port 8000 \
  --host 0.0.0.0

Ollama

cat > Modelfile << 'MODELFILE'
FROM Qwen3.5-122B-A10B-abliterated-v2-Q4_K_M.gguf
PARAMETER temperature 0.7
PARAMETER top_p 0.95
PARAMETER top_k 20
MODELFILE
ollama create qwen35-122b-abl -f Modelfile
ollama run qwen35-122b-abl

Hardware requirements

Quant	Min memory	Recommended context
Q4_K_M (74GB)	128GB unified/system	Up to 262K (hybrid DeltaNet keeps KV cache small)

Tested and running on an ASUS Ascent GX10 (NVIDIA GB10 Grace Blackwell, 128GB unified LPDDR5x) with full 262K context.

How this was made

Downloaded wangzhang/Qwen3.5-122B-A10B-abliterated-v2 (BF16 safetensors)
Converted to F16 GGUF using convert_hf_to_gguf.py
Quantized to Q4_K_M using llama-quantize
Hardware: ASUS Ascent GX10
llama.cpp build: 8338 (9789c4ecd)

Acknowledgments

Abliteration: wangzhang using the Prometheus framework
Base model: Qwen Team / Alibaba
Quantization tools: llama.cpp / ggml-org

Downloads last month: 469

GGUF

Model size

122B params

Architecture

qwen35moe

Hardware compatibility

4-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for atanas1/Qwen3.5-122B-A10B-abliterated-v2-GGUF

Base model

Qwen/Qwen3.5-122B-A10B

Finetuned

wangzhang/Qwen3.5-122B-A10B-abliterated

Quantized

(8)

this model