NLLB-200 Distilled 600M Lezgi-Russian (v1)

This repository provides an NLLB-200 Distilled 600M model fine-tuned for Lezgi <-> Russian translation.

Model Description

Base model: facebook/nllb-200-distilled-600M
Architecture: M2M100ForConditionalGeneration
Languages: Lezgi (lez_Cyrl), Russian (ru_Cyrl)
Direction: bidirectional (Lezgi <-> Russian)
Tokenizer: NllbTokenizer with SentencePiece model

Intended Uses

Machine translation between Lezgi and Russian.
Bootstrapping parallel data or assisting human translation workflows.

Limitations and Bias

Translation quality may vary across domains and dialects.
The model may produce hallucinations or incorrect translations.
Biases present in the training data may be reflected in outputs.

How to Use

Install dependencies:

pip install transformers sentencepiece

Example (RU -> LEZ):

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

model_id = "vadim-pashaev/nllb-200-distilled-600M-lez-rus-v1"
tokenizer = AutoTokenizer.from_pretrained(model_id, src_lang="ru_Cyrl", tgt_lang="lez_Cyrl")
model = AutoModelForSeq2SeqLM.from_pretrained(model_id)

text = "Привет, как дела?"
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_length=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Example (LEZ -> RU):

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

model_id = "vadim-pashaev/nllb-200-distilled-600M-lez-rus-v1"
tokenizer = AutoTokenizer.from_pretrained(model_id, src_lang="lez_Cyrl", tgt_lang="ru_Cyrl")
model = AutoModelForSeq2SeqLM.from_pretrained(model_id)

text = "Салам, гьикI я?"
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_length=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Training Data

Training data was built from Lezgi Wikipedia and Lezgi Gazet website articles in Lezgi. The Lezgi texts were translated into Russian using the gpt-5.2-codex (medium) model, and the resulting parallel data was used to train this model.

Training Procedure

Training settings:

Base model: facebook/nllb-200-distilled-600M (NLLB-200 Distilled 600M).
Data setup: bidirectional pairs (Lez->Rus and Rus->Lez) from the same TSV rows.
Max lengths: 192 (source) / 192 (target).
Batch size: 2 per device, gradient accumulation 16 (effective 32 per device).
Epochs: 6.
Optimizer: AdamW, LR 3e-5, cosine scheduler, warmup ratio 0.03, weight decay 0.01, label smoothing 0.0.
Precision: bf16 enabled, tf32 enabled, fp16 disabled.

Model Versioning

This is version v1. Future updates will be released under new tags or versions.

Citation

If you use this model, please cite:

@misc{nllb_lez_rus_v1,
  title = {NLLB-200 Distilled 600M Lezgi-Russian (v1)},
  author = {Vadim Pashaev},
  year = {2026},
  howpublished = {Hugging Face Hub}
}

License

cc-by-4.0 (Creative Commons Attribution 4.0).

Downloads last month: 9

Safetensors

Model size

0.6B params

Tensor type

F32

vadim-pashaev
/

nllb-200-distilled-600M-lez-rus-v1