Voxtral 4B TTS, MLX 4-bit Quantized
4-bit quantized weights for Voxtral 4B TTS on Apple Silicon via MLX.
Backbone: 4-bit (group_size=64), ~2.6 GB (down from ~6.8 GB BF16) Acoustic transformer: BF16 (unchanged) Vocoder: BF16, pre-processed (weight-norm reconstructed, conv weights transposed, codebook precomputed)
Total file size: 3.4 GB (vs 7.5 GB BF16)
Usage
Requires the code from redseaplume/Voxtral-4B-TTS-2603-MLX. Point model_path at this repo.
What's in here
consolidated.safetensors: all three components in one fileparams.json: model configtekken.json: tokenizervoice_embedding/: 20 pre-computed voice embeddings (.pt and .npz)
Notes
- Only the backbone is quantized. Acoustic transformer and vocoder stay BF16.
- Generation output differs slightly from BF16 (quantization is lossy). Frame counts may vary.
- Downloads last month
- 18
Hardware compatibility
Log In to add your hardware
Quantized
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support
Model tree for redseaplume/Voxtral-4B-TTS-2603-MLX-4bit
Base model
mistralai/Ministral-3-3B-Base-2512 Finetuned
mistralai/Voxtral-4B-TTS-2603