Qwen3.5-122B-A10B-abliterated-v2-GGUF
GGUF quantizations of wangzhang/Qwen3.5-122B-A10B-abliterated-v2, the Prometheus-abliterated version of Alibaba's Qwen3.5-122B-A10B.
Why this model?
Not all abliterations are equal. The wangzhang Prometheus method is significantly more sophisticated than standard abliteration approaches (like huihui or Chompa1422), using:
- Per-layer direction optimization โ each transformer layer gets its own optimal refusal direction
- MoE-aware expert steering โ suppresses safety-critical expert router weights and modifies expert down_proj matrices
- Bayesian hyperparameter search (Optuna TPE) with LLM-judge evaluation
- KL divergence of just 0.0115 from the original model โ virtually identical outputs on normal queries
- 97% refusal reduction (199/200 โ 6/200)
This preserves significantly more of the base model's capabilities compared to basic single-direction abliteration methods.
Available quantizations
| Quant | Size | Best for |
|---|---|---|
| Q4_K_M | 74 GB | 128GB systems (ASUS Ascent GX10, DGX Spark, Mac Studio) โ recommended |
More quantizations (Q5_K_M, Q6_K, Q8_0, F16) coming soon.
Base model details
- Architecture: Hybrid Gated DeltaNet + Sparse MoE (122B total, 10B active per token, 256 experts)
- Context: 262K native (extendable to 1M via YaRN)
- Languages: 201
- Capabilities: Multimodal (vision-language), reasoning, coding, agentic workflows
- License: Apache 2.0
How to run
llama.cpp
./llama-server \
-m Qwen3.5-122B-A10B-abliterated-v2-Q4_K_M.gguf \
--ctx-size 262144 \
-ngl 999 \
--port 8000 \
--host 0.0.0.0
Ollama
cat > Modelfile << 'MODELFILE'
FROM Qwen3.5-122B-A10B-abliterated-v2-Q4_K_M.gguf
PARAMETER temperature 0.7
PARAMETER top_p 0.95
PARAMETER top_k 20
MODELFILE
ollama create qwen35-122b-abl -f Modelfile
ollama run qwen35-122b-abl
Hardware requirements
| Quant | Min memory | Recommended context |
|---|---|---|
| Q4_K_M (74GB) | 128GB unified/system | Up to 262K (hybrid DeltaNet keeps KV cache small) |
Tested and running on an ASUS Ascent GX10 (NVIDIA GB10 Grace Blackwell, 128GB unified LPDDR5x) with full 262K context.
How this was made
- Downloaded wangzhang/Qwen3.5-122B-A10B-abliterated-v2 (BF16 safetensors)
- Converted to F16 GGUF using
convert_hf_to_gguf.py - Quantized to Q4_K_M using
llama-quantize - Hardware: ASUS Ascent GX10
- llama.cpp build: 8338 (9789c4ecd)
Acknowledgments
- Abliteration: wangzhang using the Prometheus framework
- Base model: Qwen Team / Alibaba
- Quantization tools: llama.cpp / ggml-org
- Downloads last month
- 469
Hardware compatibility
Log In to add your hardware
4-bit
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support
Model tree for atanas1/Qwen3.5-122B-A10B-abliterated-v2-GGUF
Base model
Qwen/Qwen3.5-122B-A10B Finetuned
wangzhang/Qwen3.5-122B-A10B-abliterated