Frozen
Retained
Reinitialised
New (Hindi)
User Audio
Moshi Audio
Mimi
Encoder
Mimi
Decoder
frozen
Audio tokens
Temporal Transformer
7B Language Model
Self-Attention Layers
Text Embeddings ✦
Audio Embeddings
z_s
Depth Transformer
Causal Self-Attention
Text Embeddings ✦
Audio Embeddings
Text Linear ✦
Hindi Text
Hindi SentencePiece ★
Hindi vocabulary
Audio Tokens
✦ Reinitialised for Hindi
★ New component