Instructions to use osmapi/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-abliterated-8bit-mlx with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use osmapi/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-abliterated-8bit-mlx with MLX:

# Make sure mlx-lm is installed
# pip install --upgrade mlx-lm

# Generate text with mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("osmapi/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-abliterated-8bit-mlx")

prompt = "Write a story about Einstein"
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True
)

text = generate(model, tokenizer, prompt=prompt, verbose=True)

Notebooks
Google Colab
Kaggle
Local Apps
LM Studio

How to use osmapi/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-abliterated-8bit-mlx with Pi:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "osmapi/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-abliterated-8bit-mlx"

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "mlx-lm": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "osmapi/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-abliterated-8bit-mlx"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use osmapi/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-abliterated-8bit-mlx with Hermes Agent:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "osmapi/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-abliterated-8bit-mlx"

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default osmapi/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-abliterated-8bit-mlx

Run Hermes

hermes

MLX LM

How to use osmapi/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-abliterated-8bit-mlx with MLX LM:

Generate or start a chat session

# Install MLX LM
uv tool install mlx-lm
# Interactive chat REPL
mlx_lm.chat --model "osmapi/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-abliterated-8bit-mlx"

Run an OpenAI-compatible server

# Install MLX LM
uv tool install mlx-lm
# Start the server
mlx_lm.server --model "osmapi/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-abliterated-8bit-mlx"
# Calling the OpenAI-compatible server with curl
curl -X POST "http://localhost:8000/v1/chat/completions" \
   -H "Content-Type: application/json" \
   --data '{
     "model": "osmapi/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-abliterated-8bit-mlx",
     "messages": [
       {"role": "user", "content": "Hello"}
     ]
   }'

Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-abliterated-8bit-mlx

⚠️ THIS IS A TEXT-ONLY MODEL — NO VISION

The upstream abliteration pass stripped the vision tower. For vision-capable Qwen 3.6 27B Opus-Distill MLX quants, see our parallel repos at huggingface.co/osmapi (look for repos without -abliterated in the name).

8-bit affine MLX quantization of an abliterated Qwen 3.6 27B Claude-Opus reasoning distill, by the osmAPI team — "OpenRouter of India".

Indistinguishable from BF16 on every benchmark we measured (NLL drift < 0.005). Use this if you have the RAM and want zero quantization drift.

⚡ TL;DR


Disk size	~27 GB
Effective BPW	8.0
Scheme	Affine 8-bit, group size 64
Recommended RAM	48 GB Apple Silicon (M4 Pro 48 GB, M4 Max, Studio Ultra)
Vision	❌ text-only (the upstream abliteration step stripped the ViT)
Made by	osmAPI — OpenRouter of India

🧬 Lineage

Qwen/Qwen3.6-27B                                              (Qwen Team — base pretrain)
        │
        ▼
TeichAI/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2          (TeichAI — Claude-Opus reasoning distill)
        │
        ▼
abliterated (refusal-ablated) via OBLITERATUS v0.1.2          (multi-direction SVD, BF16)
        │
        ▼
this repo — 8-bit affine, MLX format                        (osmAPI team — quantization)

Direct upstream links:

🏛️ Foundation: Qwen/Qwen3.6-27B
🎓 Reasoning distill: TeichAI/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2
🔓 Abliteration tool: OBLITERATUS (multi-direction SVD, 6 directions, 3 refinement passes, λ=0.08)
🧮 Quantization tool: mlx-lm + mlx-optiq for OptiQ variants

📦 Use it

`mlx-lm` (recommended)

pip install mlx-lm

from mlx_lm import load, generate

model, tokenizer = load("osmapi/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-abliterated-8bit-mlx")
prompt = "Explain the difference between SSM and softmax attention in three sentences."
out = generate(model, tokenizer, prompt=prompt, max_tokens=400)
print(out)

Chat template

messages = [
    {"role": "system", "content": "You are a helpful, candid reasoning assistant."},
    {"role": "user", "content": "Plan a 3-day Tokyo itinerary for a foodie."},
]
prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
print(generate(model, tokenizer, prompt=prompt, max_tokens=600))

CLI

mlx_lm.generate --model osmapi/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-abliterated-8bit-mlx --prompt "Hello" --max-tokens 256

🧪 Quantization details

Source weights: BF16 abliterated checkpoint (28 shards, ~57 GB) derived from TeichAI/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2 via OBLITERATUS multi-direction SVD ablation (preserves coherence; KL drift = 0.149 from base).
Quantization scheme: Affine 8-bit, group size 64.
Group size: 64.
Calibration corpus: mlx-lm calibration_v5 (~427 KB English text, used for OptiQ sensitivity ranking; uniform/affine variants do not require calibration).
Sanity check: forward perplexity on held-out calibration text within 1–3% of next-higher-precision sibling.

Architecture notes

The Qwen 3.6 27B family uses a hybrid attention stack — 4 GatedDeltaNet (linear-attention/SSM) layers followed by 1 full-softmax-attention layer, repeated 16× for 64 total layers, 5120 hidden, 248K vocab, 262K context. The SSM kernels lack a VJP path in MLX, so backward-pass-based quant methods (DWQ, dynamic quant) cannot be applied here — OptiQ's forward-only sensitivity approach is the only calibration-aware option that works on this architecture. That's why the OptiQ variants exist.

⚠️ Behavior caveats

Text-only — no vision. The abliteration pipeline (OBLITERATUS) ran on the LM tower and stripped the ViT. For vision-capable quants of the same Opus-Distill v2 lineage, use our parallel non-abliterated repos at huggingface.co/osmapi (any repo without -abliterated in the name).
This is an abliterated model — refusal directions were surgically removed from the parent. It will answer prompts the parent would refuse. Use responsibly and within applicable law.
Quantization preserves abliteration: the refusal rate measured at BF16 (~35% from a 100% baseline) stays in that range across our quants.

🙏 Credits


Quantization & release	osmAPI team — "OpenRouter of India"
Reasoning distill	TeichAI (Claude-Opus 4.5/4.6 high-reasoning datasets)
Foundation model	Qwen Team
Abliteration toolkit	OBLITERATUS by elder-plinius
Quant toolkit	mlx-lm, mlx-optiq

📜 License

Apache-2.0, inherited from the foundation and distill upstream.

_{Need a hosted endpoint, custom quant, or larger-scale inference? osmAPI — multi-provider LLM routing for the Indian developer ecosystem.}

⚡ 3.3–3.7× faster decoding with DFlash (lossless, MLX)

This MLX build supports lossless block-diffusion speculative decoding via DFlash in mlx_vlm — no requantization, no model changes. On an Apple M4 Max we measured 3.38× (8-bit) and 3.67× (bf16) decode speedups with byte-identical output; other MLX quants of this model should see a similar ~3×.

python3 -m mlx_vlm generate \
  --model osmapi/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-abliterated-8bit-mlx \
  --draft-model z-lab/Qwen3.6-27B-DFlash --draft-kind dflash \
  --prompt "Write a merge function for two sorted lists in Python." --max-tokens 256

Requires mlx_vlm ≥ 0.5.0 and access to the gated drafter z-lab/Qwen3.6-27B-DFlash (one-click "Agree and access").
Accelerates the text path only (vision is unaffected); adds ~3.9 GB for the drafter.
Acceptance ≈ 8.95 tokens/round (block size 16); the target runs ~10× fewer forward passes.
Full write-up & benchmarks: [https://huggingface.co/blog/junafinity/block-diffusion-on-apple-silicon-with-3-7x-speedup] · see also DFLASH_SPECULATIVE_DECODING.md.

Downloads last month: 1,985

Safetensors

Model size

27B params

Tensor type

BF16

U32

MLX

Hardware compatibility

8-bit

Model tree for osmapi/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-abliterated-8bit-mlx

Base model

Qwen/Qwen3.6-27B

Finetuned

TeichAI/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2

Quantized

(10)

this model