CRITICAL FIX (2026-03-19): Fixed eos_token_id — previous versions caused infinite thinking loops. You MUST re-download this model if you downloaded before today.

Update (2026-03-18): Models updated to v2.1 with VLM support, proper tokenizer, and fixed configs. If you downloaded before this date, please re-download.

MLX Studio

MLX Studio App

MLX Studio — the only app that natively supports JANG models


Early Adoption: LM Studio, Ollama, oMLX, Inferencer do not support JANG yet. Use MLX Studio or pip install "jang[mlx]".


JANG

Qwen3.5-27B — JANG_4S (4-bit, 6-bit attention) — VLM

JANG — Jang Adaptive N-bit Grading | Mixed-Precision Quantization for Apple Silicon

GitHub  PyPI  Website  X/Twitter

JANG is fully open-source. Quantization engine, research, and full commit history: github.com/jjang-ai/jangq. Created by Jinho Jang.

Results (200-question MMLU)

Model MMLU Size Speed
JANG_4S 84.5% 16 GB 35 tok/s
MLX 4-bit 84.5% 14 GB 20 tok/s

JANG_4S matches MLX 4-bit quality with 75% faster inference (35 vs 20 tok/s).

Per-Subject Scores (JANG_4S vs MLX 4-bit)

Subject JANG_4S MLX 4-bit
Abstract Algebra 15/20 15/20
Anatomy 17/20 17/20
Astronomy 19/20 19/20
College CS 16/20 16/20
College Physics 17/20 17/20
HS Biology 19/20 19/20
HS Chemistry 17/20 17/20
HS Mathematics 13/20 13/20
Logical Fallacies 18/20 18/20
World Religions 18/20 18/20
Total 169/200 169/200

Specs

Metric Value
Source Qwen3.5-27B
Architecture Dense hybrid (GatedDeltaNet SSM + full attention)
Profile JANG_4S (CRITICAL=6, IMPORTANT=4, COMPRESS=4)
Average bits ~4.15
GPU Memory 14.8 GB
Speed 35 tok/s
VLM Yes (vision encoder preserved)
Format v2 (MLX-native, instant load)

Install

pip install "jang[mlx]"

Quick Start

from jang_tools.loader import load_jang_model
from mlx_lm.sample_utils import make_sampler
from mlx_lm.generate import generate_step
import mlx.core as mx

model, tokenizer = load_jang_model("JANGQ-AI/Qwen3.5-27B-JANG_4S")
sampler = make_sampler(temp=0.7)
tokens = tokenizer.encode("What is photosynthesis?")
for tok, _ in generate_step(prompt=mx.array(tokens), model=model, max_tokens=200, sampler=sampler):
    t = tok.item() if hasattr(tok, 'item') else int(tok)
    print(tokenizer.decode([t]), end="", flush=True)
    if t == tokenizer.eos_token_id: break

VLM Inference

from jang_tools.loader import load_jang_vlm_model
from mlx_vlm import generate

model, processor = load_jang_vlm_model("JANGQ-AI/Qwen3.5-27B-JANG_4S")
prompt = processor.tokenizer.apply_chat_template(
    [{"role": "user", "content": [
        {"type": "image", "image": "photo.jpg"},
        {"type": "text", "text": "Describe this image."}
    ]}], add_generation_prompt=True, tokenize=False, enable_thinking=False)
result = generate(model, processor, prompt, ["photo.jpg"], max_tokens=200)
print(result.text)

Links


한국어

Qwen3.5-27B — JANG_4S

JANG은 Apple Silicon을 위한 혼합정밀도 양자화 포맷입니다.

모델 MMLU 크기 속도
JANG_4S 84.5% 16 GB 35 tok/s
MLX 4-bit 84.5% 14 GB 20 tok/s
pip install "jang[mlx]"

GitHub · HuggingFace · MLX Studio


장진호 제작 · Created by Jinho Jang — jangq.ai · @dealignai

Downloads last month
769
Safetensors
Model size
5B params
Tensor type
U32
·
F16
·
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for JANGQ-AI/Qwen3.5-27B-JANG_4S

Base model

Qwen/Qwen3.5-27B
Finetuned
(189)
this model