Instructions to use Jackrong/Qwopus3.6-27B-v2-MTP-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Jackrong/Qwopus3.6-27B-v2-MTP-GGUF with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="Jackrong/Qwopus3.6-27B-v2-MTP-GGUF")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("Jackrong/Qwopus3.6-27B-v2-MTP-GGUF", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Jackrong/Qwopus3.6-27B-v2-MTP-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Jackrong/Qwopus3.6-27B-v2-MTP-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Jackrong/Qwopus3.6-27B-v2-MTP-GGUF",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/Jackrong/Qwopus3.6-27B-v2-MTP-GGUF

SGLang

How to use Jackrong/Qwopus3.6-27B-v2-MTP-GGUF with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Jackrong/Qwopus3.6-27B-v2-MTP-GGUF" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Jackrong/Qwopus3.6-27B-v2-MTP-GGUF",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Jackrong/Qwopus3.6-27B-v2-MTP-GGUF" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Jackrong/Qwopus3.6-27B-v2-MTP-GGUF",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Unsloth Studio

How to use Jackrong/Qwopus3.6-27B-v2-MTP-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Jackrong/Qwopus3.6-27B-v2-MTP-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Jackrong/Qwopus3.6-27B-v2-MTP-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Jackrong/Qwopus3.6-27B-v2-MTP-GGUF to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="Jackrong/Qwopus3.6-27B-v2-MTP-GGUF",
    max_seq_length=2048,
)

Docker Model Runner
How to use Jackrong/Qwopus3.6-27B-v2-MTP-GGUF with Docker Model Runner:
```
docker model run hf.co/Jackrong/Qwopus3.6-27B-v2-MTP-GGUF
```

🪐 Qwopus3.6-27B-v2-MTP

MTP Release

Multi-Token Prediction reasoning model fine-tuned from Qwen3.6-27B

🧬 Trace Inversion & Negentropy 🧠 27B Parameters ⚡ Speculative Decoding 🛠️ Coding / DevOps / Math

💡 What is Qwopus3.6-27B-v2-MTP?

🪐 Qwopus3.6-27B-v2-MTP is a speed-oriented reasoning release built on top of Qwen3.6-27B. It keeps the Qwopus line's focus on reconstructed reasoning traces, coding discipline, DevOps procedures, and mathematical derivations, while adding Multi-Token Prediction for faster generation. The goal is simple: preserve the depth and structure of a 27B reasoning model while making real interactive use noticeably faster.

⚡ MTP DecodingAuxiliary future-token prediction improves throughput on long reasoning, code, math, and strict-format prompts.

🧩 Structured ReasoningInherits the Qwopus training recipe built around reconstructed step-by-step reasoning trajectories.

🧪 GB10 TestedValidated on a 30-question local benchmark across Logic, Coding, DevOps, Math, and Edge tasks.

🚀 Practical SpeedDesigned for workflows where strong answers matter, but waiting several extra minutes per task does not.

💡 1. Base Model, Training Library & Cooperation

🧠 1.1 Base Model Specifications (Qwen3.6-27B)

Qwen3.6-27B provides the dense 27B foundation for this release. Qwopus3.6-27B-v2-MTP focuses on preserving the base model's broad reasoning capability while tuning the output style toward stepwise analysis, tool-aware execution, and practical engineering answers.

Attribute	Specifications & Details
🧠 Architecture	Dense Transformer / 27 Billion Parameters
🎯 Focus Domains	Agentic Coding, DevOps, structured logic, mathematics, and strict-format output
⚡ MTP Objective	Improve generation throughput through multi-token speculative prediction while retaining final-answer quality.

🧪 1.2 Hardware Cooperation & Joint Collaboration

This project is built in close collaboration with hardware engineer Kyle Hessling, whose infrastructure and training support helped make stable 27B-scale experimentation possible.

👉You can follow him for hardware and model training updates on X / Twitter: @KyleHessling1

🦥 1.3 Fine-tuning Framework (Unsloth)

The model training workflow is accelerated and memory-optimized with Unsloth. Special thanks to the Unsloth team for making efficient large-model fine-tuning more accessible.

👉Documentation and fine-tuning guidance: unsloth.ai/docs

⚙️ 1.4 Custom MTP Heads Processing & Automation Tooling

This release features a custom splitting and merging methodology designed specifically for Qwen series Multi-Token Prediction (MTP) heads. The automation skill and complete processing pipeline scripts are open-sourced in qwen-mtp-gguf.

🌟If you find this toolkit helpful, please support the project by leaving a star on GitHub!

Community Release Notice: Qwopus3.6-27B-v2-MTP is an experimental community release intended for research, evaluation, and workflow exploration.

🚀 2. MTP Benchmark: Qwen3.6-27B vs Qwopus3.6-27B-v2-MTP

Performance Snapshot

Across a 30-question benchmark covering Logic, Coding, DevOps, Math, and Edge-format tasks, Qwopus3.6-27B-v2-MTP delivers a clear speed advantage over Qwen3.6-27B while producing a more compact overall answer stream. The benchmark is not just a raw throughput test: it includes long coding prompts, operational runbooks, math derivations, and strict constrained-output cases.

Overall Throughput

10.46 T/s

1.66x vs Qwen3.6-27B

Latency Saved

2.34 h

56.5% total time reduction

Token Efficiency

-27.7%

fewer completion tokens overall

Coverage

30 / 30

all benchmark prompts completed

Speed: Qwopus3.6-27B-v2-MTP reaches 10.46 overall tokens/sec, compared with 6.29 tokens/sec for Qwen3.6-27B.
Latency: total evaluation time drops from 14,901.69s to 6,487.81s, saving 8,413.88s across the full run.
Output shape: MTP produces 67,862 completion tokens versus 93,802 from Qwen3.6-27B, giving a more compact overall response profile.

Benchmark source: /workspace/renji-training/Jackrong/qwopus3.6-27B-v2-MTP/benchmark_27b_pair_report.md on the GB10 server. Local workspace date: 2026-05-22.

⚙️ 3. Test Environment & Configuration

Compute platform: GB10 dedicated server platform.
Evaluation format: same local GGUF server stack for both models.
llama-server total context: 49152.
Temperature / Top-p: 1.0 / 0.95.
Max generated tokens: no explicit cap; generation is bounded by the request budget.
Request format: /v1/chat/completions with user content as text payload.

Benchmark Summary: Qwen3.6-27B vs Qwopus3.6-27B-v2-MTP
Model	Completed	Avg Speed	Overall T/s	Completion Tokens	Total Time
Qwen3.6-27B	30	6.32	6.29	93,802	14,901.69s
Qwopus3.6-27B-v2-MTP	30	10.66	10.46	67,862	6,487.81s

Domain-Level Performance
Domain	Questions	Qwen3.6-27B T/s	MTP T/s	Latency Gain	Qwen3.6-27B Time	MTP Time	Token Delta
Logic	5	6.33	10.77	2.31x	38.5 min	16.7 min	-26.3%
Coding	7	6.26	10.27	2.25x	1.52 h	40.6 min	-27.3%
DevOps	6	6.29	10.39	2.31x	47.4 min	20.5 min	-28.5%
Math	8	6.29	11.00	2.35x	1.01 h	25.8 min	-25.6%
Edge	4	6.48	8.28	2.27x	10.3 min	4.5 min	-43.6%

📊 4. Full 30-Question Comparison

The table below keeps the benchmark concrete: every row compares the base Qwen3.6-27B run against the Qwopus MTP run on the same prompt. The strongest improvements appear in strict output, probability, DevOps configuration, and medium-length coding tasks, while a few prompts intentionally produce more detailed MTP answers.

30-Question Detailed Comparison
Q	Domain	Task	Qwen T/s	Qwen Time	Qwen Tokens	MTP T/s	MTP Time	MTP Tokens	Result Pattern
Q1	Logic	Wrong-label coin boxes	6.36	9.4 min	3,569	11.40	2.3 min	1,530	4.16x faster; much more concise
Q2	Logic	Engineer deployment ordering	6.39	6.1 min	2,349	10.98	3.1 min	2,034	1.98x faster; more concise
Q3	Logic	Self-referential truth card	6.37	7.8 min	2,990	10.83	4.5 min	2,942	1.72x faster; similar length
Q4	Logic	Three switches and bulbs	6.32	3.6 min	1,342	10.44	1.6 min	999	2.21x faster; more concise
Q5	Logic	HH vs TH stopping probability	6.30	11.6 min	4,367	10.62	5.2 min	3,266	2.25x faster; more concise
Q6	Coding	Streaming top-k frequency	6.28	13.8 min	5,210	9.95	13.3 min	7,917	1.04x faster; more expansive
Q7	Coding	Thread-safe TTL cache	6.28	18.6 min	7,009	10.64	5.3 min	3,367	3.52x faster; much more concise
Q8	Coding	Interval merge implementation	6.25	11.2 min	4,203	10.83	3.3 min	2,157	3.36x faster; much more concise
Q9	Coding	Streaming CSV to JSONL	6.26	16.5 min	6,200	10.62	5.9 min	3,741	2.81x faster; more concise
Q10	Coding	C++17 LRU cache	6.27	13.1 min	4,920	10.15	6.0 min	3,644	2.18x faster; more concise
Q11	Coding	Highest-paid employee SQL	6.29	6.1 min	2,283	10.37	2.4 min	1,475	2.54x faster; more concise
Q12	Coding	Atomic Bash backup	6.28	12.1 min	4,545	10.33	4.4 min	2,695	2.76x faster; much more concise
Q13	DevOps	Nginx reverse proxy	6.29	10.4 min	3,924	10.88	2.8 min	1,821	3.70x faster; much more concise
Q14	DevOps	Linux service OOM diagnosis	6.29	9.9 min	3,727	9.96	4.9 min	2,888	2.04x faster; more concise
Q15	DevOps	systemd worker unit	6.29	8.0 min	3,023	10.39	3.3 min	2,037	2.43x faster; more concise
Q16	DevOps	Kubernetes rollback runbook	6.32	6.3 min	2,387	10.36	2.9 min	1,820	2.14x faster; more concise
Q17	DevOps	Docker CMD vs ENTRYPOINT	6.33	5.4 min	2,028	10.78	2.9 min	1,892	1.82x faster; more concise
Q18	DevOps	Prometheus pull monitoring	6.32	7.4 min	2,818	10.67	3.7 min	2,342	2.02x faster; more concise
Q19	Math	Derivative and critical point	6.32	8.7 min	3,274	12.06	3.7 min	2,631	2.37x faster; more concise
Q20	Math	Linear system solve	6.32	10.7 min	4,065	11.91	4.2 min	2,976	2.57x faster; more concise
Q21	Math	Different-color probability	6.28	3.9 min	1,472	10.18	49.6 s	490	4.74x faster; much more concise
Q22	Math	2x2 eigen decomposition	6.31	12.3 min	4,662	11.28	4.5 min	3,058	2.72x faster; more concise
Q23	Math	Induction proof	6.32	5.8 min	2,211	11.53	1.7 min	1,193	3.34x faster; much more concise
Q24	Math	Bayes disease test	6.34	5.0 min	1,878	11.38	3.2 min	2,156	1.56x faster; more expansive
Q25	Math	Integration by parts	6.29	5.5 min	2,064	11.80	3.5 min	2,493	1.55x faster; more expansive
Q26	Math	Central Limit Theorem	6.24	8.8 min	3,289	8.26	4.1 min	2,046	2.12x faster; more concise
Q27	Edge	Strict JSON output	6.32	3.6 min	1,350	10.43	23.1 s	225	9.28x faster; much more concise
Q28	Edge	Exact token pattern	6.37	52.4 s	328	12.15	29.9 s	345	1.75x faster; similar length
Q29	Edge	Forbidden-word explanation	6.71	5.1 min	2,040	7.62	3.5 min	1,573	1.47x faster; more concise
Q30	Edge	Ignore noisy input	6.35	44.5 s	275	10.94	11.4 s	109	3.89x faster; much more concise

🧭 5. Domain Reading

Logic

Logic prompts show a strong latency reduction, especially on the box-label puzzle and the HH-vs-TH stopping problem. The MTP model tends to reach the same kind of structured decision path with fewer generated tokens, making it useful when reasoning traces need to stay readable and quick.

Coding

Coding is one of the most practical wins. Thread-safe caching, interval merging, CSV streaming, C++ LRU, SQL, and Bash backup tasks all become substantially faster. Q6 is intentionally more expansive, but the broader coding group remains much faster overall.

DevOps

DevOps prompts benefit from concise operational structure. Nginx, OOM diagnosis, systemd, Kubernetes rollback, Docker command semantics, and Prometheus monitoring all show faster completion while preserving stepwise command-oriented guidance.

Math & Edge Tasks

Math has the highest MTP throughput among the five domains. Edge tasks show the sharpest wall-clock wins, especially strict JSON and noisy-input filtering, where the model can quickly settle into the required output pattern.

🎯 6. Recommended Use Cases

Agentic coding and code review assistance.
DevOps runbooks, configuration generation, and incident diagnosis.
Multi-step math and probability derivations.
Structured reasoning with explicit intermediate logic.
Fast constrained output generation where latency matters.

Resources, Acknowledgements & Citation

📚 ResourcesFinetuning guide and reproduction code: Jackrong-llm-finetuning-guide

🙏 AcknowledgementsThanks to the Qwen team, Unsloth, open-source contributors, and Kyle Hessling for close collaboration on hardware and training infrastructure.

📖 Citation

@misc{qwopus36_27b_v2_mtp_2026,
  title        = {Qwopus3.6-27B-v2-MTP},
  author       = {Jack Rong},
  year         = {2026},
  note         = {Qwen3.6-27B based Multi-Token Prediction reasoning model},
  howpublished = {Hugging Face model card}
}

Downloads last month: 139,952

GGUF

Hardware compatibility

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Model tree for Jackrong/Qwopus3.6-27B-v2-MTP-GGUF

Base model

Qwen/Qwen3.6-27B

Adapter

(123)

this model

Datasets used to train Jackrong/Qwopus3.6-27B-v2-MTP-GGUF

Collection including Jackrong/Qwopus3.6-27B-v2-MTP-GGUF

🚀 Qwen-MTP

Collection

⚡ MTP (Multi Token Prediction) speculative decoding enables models like Qwen3.6 to have ~1.4-2.2x faster generation with no change in accuracy. • 6 items • Updated 5 days ago • 17