Instructions to use osmapi/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-abliterated-6bit-mlx with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use osmapi/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-abliterated-6bit-mlx with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("osmapi/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-abliterated-6bit-mlx") prompt = "Write a story about Einstein" messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True ) text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
- Pi
How to use osmapi/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-abliterated-6bit-mlx with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "osmapi/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-abliterated-6bit-mlx"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "osmapi/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-abliterated-6bit-mlx" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use osmapi/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-abliterated-6bit-mlx with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "osmapi/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-abliterated-6bit-mlx"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default osmapi/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-abliterated-6bit-mlx
Run Hermes
hermes
- MLX LM
How to use osmapi/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-abliterated-6bit-mlx with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Interactive chat REPL mlx_lm.chat --model "osmapi/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-abliterated-6bit-mlx"
Run an OpenAI-compatible server
# Install MLX LM uv tool install mlx-lm # Start the server mlx_lm.server --model "osmapi/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-abliterated-6bit-mlx" # Calling the OpenAI-compatible server with curl curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "osmapi/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-abliterated-6bit-mlx", "messages": [ {"role": "user", "content": "Hello"} ] }'
Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-abliterated-6bit-mlx
⚠️ THIS IS A TEXT-ONLY MODEL — NO VISION
The upstream abliteration pass stripped the vision tower. For vision-capable Qwen 3.6 27B Opus-Distill MLX quants, see our parallel repos at huggingface.co/osmapi (look for repos without
-abliteratedin the name).
6-bit affine MLX quantization of an abliterated Qwen 3.6 27B Claude-Opus reasoning distill, by the osmAPI team — "OpenRouter of India".
Standard 6-bit MLX quantization. Effectively lossless on most reasoning benchmarks vs BF16. The recommended pick when you have the RAM headroom.
⚡ TL;DR
| Disk size | ~20 GB |
| Effective BPW | 6.0 |
| Scheme | Affine 6-bit, group size 64 (mlx-lm default) |
| Recommended RAM | 32 GB Apple Silicon (M4 Pro 32 GB, M3/M2 Max base) |
| Vision | ❌ text-only (the upstream abliteration step stripped the ViT) |
| Made by | osmAPI — OpenRouter of India |
🧬 Lineage
Qwen/Qwen3.6-27B (Qwen Team — base pretrain)
│
▼
TeichAI/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2 (TeichAI — Claude-Opus reasoning distill)
│
▼
abliterated (refusal-ablated) via OBLITERATUS v0.1.2 (multi-direction SVD, BF16)
│
▼
this repo — 6-bit affine, MLX format (osmAPI team — quantization)
Direct upstream links:
- 🏛️ Foundation: Qwen/Qwen3.6-27B
- 🎓 Reasoning distill: TeichAI/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2
- 🔓 Abliteration tool: OBLITERATUS (multi-direction SVD, 6 directions, 3 refinement passes, λ=0.08)
- 🧮 Quantization tool: mlx-lm + mlx-optiq for OptiQ variants
📦 Use it
mlx-lm (recommended)
pip install mlx-lm
from mlx_lm import load, generate
model, tokenizer = load("osmapi/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-abliterated-6bit-mlx")
prompt = "Explain the difference between SSM and softmax attention in three sentences."
out = generate(model, tokenizer, prompt=prompt, max_tokens=400)
print(out)
Chat template
messages = [
{"role": "system", "content": "You are a helpful, candid reasoning assistant."},
{"role": "user", "content": "Plan a 3-day Tokyo itinerary for a foodie."},
]
prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
print(generate(model, tokenizer, prompt=prompt, max_tokens=600))
CLI
mlx_lm.generate --model osmapi/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-abliterated-6bit-mlx --prompt "Hello" --max-tokens 256
🧪 Quantization details
- Source weights: BF16 abliterated checkpoint (28 shards, ~57 GB) derived from TeichAI/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2 via OBLITERATUS multi-direction SVD ablation (preserves coherence; KL drift = 0.149 from base).
- Quantization scheme: Affine 6-bit, group size 64 (mlx-lm default).
- Group size: 64.
- Calibration corpus:
mlx-lmcalibration_v5 (~427 KB English text, used for OptiQ sensitivity ranking; uniform/affine variants do not require calibration). - Sanity check: forward perplexity on held-out calibration text within 1–3% of next-higher-precision sibling.
Architecture notes
The Qwen 3.6 27B family uses a hybrid attention stack — 4 GatedDeltaNet (linear-attention/SSM) layers followed by 1 full-softmax-attention layer, repeated 16× for 64 total layers, 5120 hidden, 248K vocab, 262K context. The SSM kernels lack a VJP path in MLX, so backward-pass-based quant methods (DWQ, dynamic quant) cannot be applied here — OptiQ's forward-only sensitivity approach is the only calibration-aware option that works on this architecture. That's why the OptiQ variants exist.
⚠️ Behavior caveats
- Text-only — no vision. The abliteration pipeline (OBLITERATUS) ran on the LM tower and stripped the ViT. For vision-capable quants of the same Opus-Distill v2 lineage, use our parallel non-abliterated repos at huggingface.co/osmapi (any repo without
-abliteratedin the name). - This is an abliterated model — refusal directions were surgically removed from the parent. It will answer prompts the parent would refuse. Use responsibly and within applicable law.
- Quantization preserves abliteration: the refusal rate measured at BF16 (~35% from a 100% baseline) stays in that range across our quants.
🙏 Credits
| Quantization & release | osmAPI team — "OpenRouter of India" |
| Reasoning distill | TeichAI (Claude-Opus 4.5/4.6 high-reasoning datasets) |
| Foundation model | Qwen Team |
| Abliteration toolkit | OBLITERATUS by elder-plinius |
| Quant toolkit | mlx-lm, mlx-optiq |
📜 License
Apache-2.0, inherited from the foundation and distill upstream.
Need a hosted endpoint, custom quant, or larger-scale inference? osmAPI — multi-provider LLM routing for the Indian developer ecosystem.
⚡ 3.3–3.7× faster decoding with DFlash (lossless, MLX)
This MLX build supports lossless block-diffusion speculative decoding via DFlash in mlx_vlm — no requantization, no model changes. On an Apple M4 Max we measured 3.38× (8-bit) and 3.67× (bf16) decode speedups with byte-identical output; other MLX quants of this model should see a similar ~3×.
python3 -m mlx_vlm generate \
--model osmapi/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-abliterated-6bit-mlx \
--draft-model z-lab/Qwen3.6-27B-DFlash --draft-kind dflash \
--prompt "Write a merge function for two sorted lists in Python." --max-tokens 256
- Requires
mlx_vlm≥ 0.5.0 and access to the gated drafterz-lab/Qwen3.6-27B-DFlash(one-click "Agree and access"). - Accelerates the text path only (vision is unaffected); adds ~3.9 GB for the drafter.
- Acceptance ≈ 8.95 tokens/round (block size 16); the target runs ~10× fewer forward passes.
- Full write-up & benchmarks: [https://huggingface.co/blog/junafinity/block-diffusion-on-apple-silicon-with-3-7x-speedup] · see also
DFLASH_SPECULATIVE_DECODING.md.
- Downloads last month
- 1,730
6-bit
Model tree for osmapi/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-abliterated-6bit-mlx
Base model
Qwen/Qwen3.6-27B