Model Card for gemma-3-27b-it-dassle-gec
“gemma-3-27b-it-dassle-gec” is an instruction-tuned 27B Gemma 3 model fine-tuned using LoRA & unsloth library for general grammatical error correction using the DASSLE 1.0 dataset (Dataset of Authentic and Synthetic Slovene Language Errors). The dataset was split into training/validation/test sets using stratified sampling, with 80% of each coarse error category (e.g., Orthography/Zapis) assigned to the training set, 10% to the validation, and 10% to the test set. The best model was selected based on the validation loss.
On the test set the model achieves exact match score = 0.462.
Note that this is a very strict metric, and overall the model performs decently, although certain errors are still beyond its learned knowledge
(likely due to the relatively small dataset containing isolated error types).
The model was trained in the conversational scenario, where the user inputs a text, and the assistant returns the corrected text. Given a user input“To besedilo usebuje slovnično napako.” (and the system prompt "Si slovenski slovnični popravljalnik. Za uporabniško besedilo v slovenščini zgeneriraj slovnično pravilno verzijo besedila. Ne parafraziraj vhoda po nepotrebnem.")
the assistant should generate the corrected text“To besedilo vsebuje slovnično napako.”
Acknowledgment
The work was primarily supported by the Slovene Research and Innovation Agency (ARIS) project GC-0002 and the core research programme P6-0411.
The work was also supported by EU through ERA Chair grant no. 101186647 (AI4DH), EC/EuroHPC JU and the Slovenian Ministry of HESI via the project SLAIF (grant number 101254461).
- Downloads last month
- 32
Model tree for cjvt/gemma-3-27b-it-dassle-gec
Base model
google/gemma-3-27b-pt