🪐 Qwopus3.6-27B-v2-MTP

MTP Release

Multi-Token Prediction reasoning model fine-tuned from Qwen3.6-27B

🧬 Trace Inversion & Negentropy 🧠 27B Parameters ⚡ Speculative Decoding 🛠️ Coding / DevOps / Math

💡 What is Qwopus3.6-27B-v2-MTP?

🪐 Qwopus3.6-27B-v2-MTP is a speed-oriented reasoning release built on top of Qwen3.6-27B. It keeps the Qwopus line's focus on reconstructed reasoning traces, coding discipline, DevOps procedures, and mathematical derivations, while adding Multi-Token Prediction for faster generation. The goal is simple: preserve the depth and structure of a 27B reasoning model while making real interactive use noticeably faster.

⚡ MTP DecodingAuxiliary future-token prediction improves throughput on long reasoning, code, math, and strict-format prompts.
🧩 Structured ReasoningInherits the Qwopus training recipe built around reconstructed step-by-step reasoning trajectories.
🧪 GB10 TestedValidated on a 30-question local benchmark across Logic, Coding, DevOps, Math, and Edge tasks.
🚀 Practical SpeedDesigned for workflows where strong answers matter, but waiting several extra minutes per task does not.

💡 1. Base Model, Training Library & Cooperation

🧠 1.1 Base Model Specifications (Qwen3.6-27B)

Qwen3.6-27B provides the dense 27B foundation for this release. Qwopus3.6-27B-v2-MTP focuses on preserving the base model's broad reasoning capability while tuning the output style toward stepwise analysis, tool-aware execution, and practical engineering answers.

AttributeSpecifications & Details
🧠 ArchitectureDense Transformer / 27 Billion Parameters
🎯 Focus DomainsAgentic Coding, DevOps, structured logic, mathematics, and strict-format output
⚡ MTP ObjectiveImprove generation throughput through multi-token speculative prediction while retaining final-answer quality.
🧪 1.2 Hardware Cooperation & Joint Collaboration
This project is built in close collaboration with hardware engineer Kyle Hessling, whose infrastructure and training support helped make stable 27B-scale experimentation possible.
👉You can follow him for hardware and model training updates on X / Twitter: @KyleHessling1
🦥 1.3 Fine-tuning Framework (Unsloth)
The model training workflow is accelerated and memory-optimized with Unsloth. Special thanks to the Unsloth team for making efficient large-model fine-tuning more accessible.
👉Documentation and fine-tuning guidance: unsloth.ai/docs
⚙️ 1.4 Custom MTP Heads Processing & Automation Tooling
This release features a custom splitting and merging methodology designed specifically for Qwen series Multi-Token Prediction (MTP) heads. The automation skill and complete processing pipeline scripts are open-sourced in qwen-mtp-gguf.
🌟If you find this toolkit helpful, please support the project by leaving a star on GitHub!

Community Release Notice: Qwopus3.6-27B-v2-MTP is an experimental community release intended for research, evaluation, and workflow exploration.


🚀 2. MTP Benchmark: Qwen3.6-27B vs Qwopus3.6-27B-v2-MTP

Performance Snapshot
Across a 30-question benchmark covering Logic, Coding, DevOps, Math, and Edge-format tasks, Qwopus3.6-27B-v2-MTP delivers a clear speed advantage over Qwen3.6-27B while producing a more compact overall answer stream. The benchmark is not just a raw throughput test: it includes long coding prompts, operational runbooks, math derivations, and strict constrained-output cases.
Overall Throughput
10.46 T/s
1.66x vs Qwen3.6-27B
Latency Saved
2.34 h
56.5% total time reduction
Token Efficiency
-27.7%
fewer completion tokens overall
Coverage
30 / 30
all benchmark prompts completed
  • Speed: Qwopus3.6-27B-v2-MTP reaches 10.46 overall tokens/sec, compared with 6.29 tokens/sec for Qwen3.6-27B.
  • Latency: total evaluation time drops from 14,901.69s to 6,487.81s, saving 8,413.88s across the full run.
  • Output shape: MTP produces 67,862 completion tokens versus 93,802 from Qwen3.6-27B, giving a more compact overall response profile.

Benchmark source: /workspace/renji-training/Jackrong/qwopus3.6-27B-v2-MTP/benchmark_27b_pair_report.md on the GB10 server. Local workspace date: 2026-05-22.


⚙️ 3. Test Environment & Configuration

  • Compute platform: GB10 dedicated server platform.
  • Evaluation format: same local GGUF server stack for both models.
  • llama-server total context: 49152.
  • Temperature / Top-p: 1.0 / 0.95.
  • Max generated tokens: no explicit cap; generation is bounded by the request budget.
  • Request format: /v1/chat/completions with user content as text payload.
Benchmark Summary: Qwen3.6-27B vs Qwopus3.6-27B-v2-MTP
ModelCompletedAvg SpeedOverall T/sCompletion TokensTotal Time
Qwen3.6-27B306.326.2993,80214,901.69s
Qwopus3.6-27B-v2-MTP3010.6610.4667,8626,487.81s
Domain-Level Performance
DomainQuestionsQwen3.6-27B T/sMTP T/sLatency GainQwen3.6-27B TimeMTP TimeToken Delta
Logic56.3310.772.31x38.5 min16.7 min-26.3%
Coding76.2610.272.25x1.52 h40.6 min-27.3%
DevOps66.2910.392.31x47.4 min20.5 min-28.5%
Math86.2911.002.35x1.01 h25.8 min-25.6%
Edge46.488.282.27x10.3 min4.5 min-43.6%

📊 4. Full 30-Question Comparison

The table below keeps the benchmark concrete: every row compares the base Qwen3.6-27B run against the Qwopus MTP run on the same prompt. The strongest improvements appear in strict output, probability, DevOps configuration, and medium-length coding tasks, while a few prompts intentionally produce more detailed MTP answers.
30-Question Detailed Comparison
QDomainTaskQwen T/sQwen TimeQwen TokensMTP T/sMTP TimeMTP TokensResult Pattern
Q1LogicWrong-label coin boxes6.369.4 min3,56911.402.3 min1,5304.16x faster; much more concise
Q2LogicEngineer deployment ordering6.396.1 min2,34910.983.1 min2,0341.98x faster; more concise
Q3LogicSelf-referential truth card6.377.8 min2,99010.834.5 min2,9421.72x faster; similar length
Q4LogicThree switches and bulbs6.323.6 min1,34210.441.6 min9992.21x faster; more concise
Q5LogicHH vs TH stopping probability6.3011.6 min4,36710.625.2 min3,2662.25x faster; more concise
Q6CodingStreaming top-k frequency6.2813.8 min5,2109.9513.3 min7,9171.04x faster; more expansive
Q7CodingThread-safe TTL cache6.2818.6 min7,00910.645.3 min3,3673.52x faster; much more concise
Q8CodingInterval merge implementation6.2511.2 min4,20310.833.3 min2,1573.36x faster; much more concise
Q9CodingStreaming CSV to JSONL6.2616.5 min6,20010.625.9 min3,7412.81x faster; more concise
Q10CodingC++17 LRU cache6.2713.1 min4,92010.156.0 min3,6442.18x faster; more concise
Q11CodingHighest-paid employee SQL6.296.1 min2,28310.372.4 min1,4752.54x faster; more concise
Q12CodingAtomic Bash backup6.2812.1 min4,54510.334.4 min2,6952.76x faster; much more concise
Q13DevOpsNginx reverse proxy6.2910.4 min3,92410.882.8 min1,8213.70x faster; much more concise
Q14DevOpsLinux service OOM diagnosis6.299.9 min3,7279.964.9 min2,8882.04x faster; more concise
Q15DevOpssystemd worker unit6.298.0 min3,02310.393.3 min2,0372.43x faster; more concise
Q16DevOpsKubernetes rollback runbook6.326.3 min2,38710.362.9 min1,8202.14x faster; more concise
Q17DevOpsDocker CMD vs ENTRYPOINT6.335.4 min2,02810.782.9 min1,8921.82x faster; more concise
Q18DevOpsPrometheus pull monitoring6.327.4 min2,81810.673.7 min2,3422.02x faster; more concise
Q19MathDerivative and critical point6.328.7 min3,27412.063.7 min2,6312.37x faster; more concise
Q20MathLinear system solve6.3210.7 min4,06511.914.2 min2,9762.57x faster; more concise
Q21MathDifferent-color probability6.283.9 min1,47210.1849.6 s4904.74x faster; much more concise
Q22Math2x2 eigen decomposition6.3112.3 min4,66211.284.5 min3,0582.72x faster; more concise
Q23MathInduction proof6.325.8 min2,21111.531.7 min1,1933.34x faster; much more concise
Q24MathBayes disease test6.345.0 min1,87811.383.2 min2,1561.56x faster; more expansive
Q25MathIntegration by parts6.295.5 min2,06411.803.5 min2,4931.55x faster; more expansive
Q26MathCentral Limit Theorem6.248.8 min3,2898.264.1 min2,0462.12x faster; more concise
Q27EdgeStrict JSON output6.323.6 min1,35010.4323.1 s2259.28x faster; much more concise
Q28EdgeExact token pattern6.3752.4 s32812.1529.9 s3451.75x faster; similar length
Q29EdgeForbidden-word explanation6.715.1 min2,0407.623.5 min1,5731.47x faster; more concise
Q30EdgeIgnore noisy input6.3544.5 s27510.9411.4 s1093.89x faster; much more concise

🧭 5. Domain Reading

Logic
Logic prompts show a strong latency reduction, especially on the box-label puzzle and the HH-vs-TH stopping problem. The MTP model tends to reach the same kind of structured decision path with fewer generated tokens, making it useful when reasoning traces need to stay readable and quick.
Coding
Coding is one of the most practical wins. Thread-safe caching, interval merging, CSV streaming, C++ LRU, SQL, and Bash backup tasks all become substantially faster. Q6 is intentionally more expansive, but the broader coding group remains much faster overall.
DevOps
DevOps prompts benefit from concise operational structure. Nginx, OOM diagnosis, systemd, Kubernetes rollback, Docker command semantics, and Prometheus monitoring all show faster completion while preserving stepwise command-oriented guidance.
Math & Edge Tasks
Math has the highest MTP throughput among the five domains. Edge tasks show the sharpest wall-clock wins, especially strict JSON and noisy-input filtering, where the model can quickly settle into the required output pattern.

🎯 6. Recommended Use Cases

  • Agentic coding and code review assistance.
  • DevOps runbooks, configuration generation, and incident diagnosis.
  • Multi-step math and probability derivations.
  • Structured reasoning with explicit intermediate logic.
  • Fast constrained output generation where latency matters.

Resources, Acknowledgements & Citation
📚 ResourcesFinetuning guide and reproduction code: Jackrong-llm-finetuning-guide
🙏 AcknowledgementsThanks to the Qwen team, Unsloth, open-source contributors, and Kyle Hessling for close collaboration on hardware and training infrastructure.
📖 Citation
@misc{qwopus36_27b_v2_mtp_2026,
  title        = {Qwopus3.6-27B-v2-MTP},
  author       = {Jack Rong},
  year         = {2026},
  note         = {Qwen3.6-27B based Multi-Token Prediction reasoning model},
  howpublished = {Hugging Face model card}
}
Downloads last month
139,952
GGUF
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 1 Ask for provider support

Model tree for Jackrong/Qwopus3.6-27B-v2-MTP-GGUF

Base model

Qwen/Qwen3.6-27B
Adapter
(123)
this model

Datasets used to train Jackrong/Qwopus3.6-27B-v2-MTP-GGUF

Collection including Jackrong/Qwopus3.6-27B-v2-MTP-GGUF