Voxtral-4B-TTS-2603 (MLX 6bit)

MLX 6bit version of mistralai/Voxtral-4B-TTS-2603 — a 4B parameter multilingual text-to-speech model with 20 voice presets across 9 languages.

Size: ~3.5GB

Use with mlx-audio

pip install -U mlx-audio

from mlx_audio.tts.utils import load

model = load("mlx-community/Voxtral-4B-TTS-2603-mlx-6bit")

for result in model.generate(
    text="Hello, this is a test of Voxtral text-to-speech!",
    voice="casual_male",
):
    # result.audio is an mx.array of 24kHz audio samples
    print(f"Generated {result.audio_duration} of audio")

Available Voices

English: casual_male, casual_female, cheerful_female, neutral_male, neutral_female

French: fr_male, fr_female | Spanish: es_male, es_female | German: de_male, de_female

Italian: it_male, it_female | Portuguese: pt_male, pt_female | Dutch: nl_male, nl_female

Arabic: ar_male | Hindi: hi_male, hi_female

Throughput (Apple Silicon)

Variant	Short RTF	Long RTF	Size
4-bit	0.97x	0.74x	~2.5GB
6-bit	1.15x	1.07x	~3.5GB
bf16	6.50x	6.32x	~8GB

RTF = Real-Time Factor (lower is faster, <1.0 = faster than real-time).

Downloads last month: 672

Safetensors

Model size

1B params

Tensor type

BF16

U32

MLX

Hardware compatibility

6-bit

Model tree for mlx-community/Voxtral-4B-TTS-2603-mlx-6bit

Base model

mistralai/Ministral-3-3B-Base-2512

Finetuned

mistralai/Voxtral-4B-TTS-2603

Quantized

(4)

this model