Voxtral 4B TTS, MLX 4-bit Quantized

4-bit quantized weights for Voxtral 4B TTS on Apple Silicon via MLX.

Backbone: 4-bit (group_size=64), ~2.6 GB (down from ~6.8 GB BF16) Acoustic transformer: BF16 (unchanged) Vocoder: BF16, pre-processed (weight-norm reconstructed, conv weights transposed, codebook precomputed)

Total file size: 3.4 GB (vs 7.5 GB BF16)

Usage

Requires the code from redseaplume/Voxtral-4B-TTS-2603-MLX. Point model_path at this repo.

What's in here

  • consolidated.safetensors: all three components in one file
  • params.json: model config
  • tekken.json: tokenizer
  • voice_embedding/: 20 pre-computed voice embeddings (.pt and .npz)

Notes

  • Only the backbone is quantized. Acoustic transformer and vocoder stay BF16.
  • Generation output differs slightly from BF16 (quantization is lossy). Frame counts may vary.
Downloads last month
18
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for redseaplume/Voxtral-4B-TTS-2603-MLX-4bit

Finetuned
(2)
this model