Qwen3.5-122B-A10B-abliterated-v2-GGUF

GGUF quantizations of wangzhang/Qwen3.5-122B-A10B-abliterated-v2, the Prometheus-abliterated version of Alibaba's Qwen3.5-122B-A10B.

Why this model?

Not all abliterations are equal. The wangzhang Prometheus method is significantly more sophisticated than standard abliteration approaches (like huihui or Chompa1422), using:

  • Per-layer direction optimization โ€” each transformer layer gets its own optimal refusal direction
  • MoE-aware expert steering โ€” suppresses safety-critical expert router weights and modifies expert down_proj matrices
  • Bayesian hyperparameter search (Optuna TPE) with LLM-judge evaluation
  • KL divergence of just 0.0115 from the original model โ€” virtually identical outputs on normal queries
  • 97% refusal reduction (199/200 โ†’ 6/200)

This preserves significantly more of the base model's capabilities compared to basic single-direction abliteration methods.

Available quantizations

Quant Size Best for
Q4_K_M 74 GB 128GB systems (ASUS Ascent GX10, DGX Spark, Mac Studio) โ€” recommended

More quantizations (Q5_K_M, Q6_K, Q8_0, F16) coming soon.

Base model details

  • Architecture: Hybrid Gated DeltaNet + Sparse MoE (122B total, 10B active per token, 256 experts)
  • Context: 262K native (extendable to 1M via YaRN)
  • Languages: 201
  • Capabilities: Multimodal (vision-language), reasoning, coding, agentic workflows
  • License: Apache 2.0

How to run

llama.cpp

./llama-server \
  -m Qwen3.5-122B-A10B-abliterated-v2-Q4_K_M.gguf \
  --ctx-size 262144 \
  -ngl 999 \
  --port 8000 \
  --host 0.0.0.0

Ollama

cat > Modelfile << 'MODELFILE'
FROM Qwen3.5-122B-A10B-abliterated-v2-Q4_K_M.gguf
PARAMETER temperature 0.7
PARAMETER top_p 0.95
PARAMETER top_k 20
MODELFILE
ollama create qwen35-122b-abl -f Modelfile
ollama run qwen35-122b-abl

Hardware requirements

Quant Min memory Recommended context
Q4_K_M (74GB) 128GB unified/system Up to 262K (hybrid DeltaNet keeps KV cache small)

Tested and running on an ASUS Ascent GX10 (NVIDIA GB10 Grace Blackwell, 128GB unified LPDDR5x) with full 262K context.

How this was made

  1. Downloaded wangzhang/Qwen3.5-122B-A10B-abliterated-v2 (BF16 safetensors)
  2. Converted to F16 GGUF using convert_hf_to_gguf.py
  3. Quantized to Q4_K_M using llama-quantize
  4. Hardware: ASUS Ascent GX10
  5. llama.cpp build: 8338 (9789c4ecd)

Acknowledgments

Downloads last month
469
GGUF
Model size
122B params
Architecture
qwen35moe
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for atanas1/Qwen3.5-122B-A10B-abliterated-v2-GGUF

Quantized
(8)
this model