Voxtral-4B-TTS-2603 (MLX 6bit)

MLX 6bit version of mistralai/Voxtral-4B-TTS-2603 — a 4B parameter multilingual text-to-speech model with 20 voice presets across 9 languages.

Size: ~3.5GB

Use with mlx-audio

pip install -U mlx-audio
from mlx_audio.tts.utils import load

model = load("mlx-community/Voxtral-4B-TTS-2603-mlx-6bit")

for result in model.generate(
    text="Hello, this is a test of Voxtral text-to-speech!",
    voice="casual_male",
):
    # result.audio is an mx.array of 24kHz audio samples
    print(f"Generated {result.audio_duration} of audio")

Available Voices

English: casual_male, casual_female, cheerful_female, neutral_male, neutral_female

French: fr_male, fr_female | Spanish: es_male, es_female | German: de_male, de_female

Italian: it_male, it_female | Portuguese: pt_male, pt_female | Dutch: nl_male, nl_female

Arabic: ar_male | Hindi: hi_male, hi_female

Throughput (Apple Silicon)

Variant Short RTF Long RTF Size
4-bit 0.97x 0.74x ~2.5GB
6-bit 1.15x 1.07x ~3.5GB
bf16 6.50x 6.32x ~8GB

RTF = Real-Time Factor (lower is faster, <1.0 = faster than real-time).

Downloads last month
672
Safetensors
Model size
1B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

6-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mlx-community/Voxtral-4B-TTS-2603-mlx-6bit

Quantized
(4)
this model