Instructions to use swap-uniba/LLaMAntino-3-ANITA-8B-Inst-DPO-ITA with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use swap-uniba/LLaMAntino-3-ANITA-8B-Inst-DPO-ITA with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="swap-uniba/LLaMAntino-3-ANITA-8B-Inst-DPO-ITA") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("swap-uniba/LLaMAntino-3-ANITA-8B-Inst-DPO-ITA") model = AutoModelForCausalLM.from_pretrained("swap-uniba/LLaMAntino-3-ANITA-8B-Inst-DPO-ITA") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use swap-uniba/LLaMAntino-3-ANITA-8B-Inst-DPO-ITA with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "swap-uniba/LLaMAntino-3-ANITA-8B-Inst-DPO-ITA" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "swap-uniba/LLaMAntino-3-ANITA-8B-Inst-DPO-ITA", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/swap-uniba/LLaMAntino-3-ANITA-8B-Inst-DPO-ITA
- SGLang
How to use swap-uniba/LLaMAntino-3-ANITA-8B-Inst-DPO-ITA with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "swap-uniba/LLaMAntino-3-ANITA-8B-Inst-DPO-ITA" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "swap-uniba/LLaMAntino-3-ANITA-8B-Inst-DPO-ITA", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "swap-uniba/LLaMAntino-3-ANITA-8B-Inst-DPO-ITA" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "swap-uniba/LLaMAntino-3-ANITA-8B-Inst-DPO-ITA", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use swap-uniba/LLaMAntino-3-ANITA-8B-Inst-DPO-ITA with Docker Model Runner:
docker model run hf.co/swap-uniba/LLaMAntino-3-ANITA-8B-Inst-DPO-ITA
๐ฃ New MODEL FAMILYโ https://huggingface.co/m-polignano/ANITA-NEXT-24B-Magistral-2506-VISION-ITA
"Built with Meta Llama 3".
LLaMAntino-3-ANITA-8B-Inst-DPO-ITA is a model of the LLaMAntino - Large Language Models family. The model is an instruction-tuned version of Meta-Llama-3-8b-instruct (a fine-tuned LLaMA 3 model). This model version aims to be the a Multilingual Model ๐ (EN ๐บ๐ธ + ITA๐ฎ๐น) to further fine-tuning on Specific Tasks in Italian.
The ๐ANITA project๐ *(Advanced Natural-based interaction for the ITAlian language)*
wants to provide Italian NLP researchers with an improved model for the Italian Language ๐ฎ๐น use cases.
Live DEMO: https://chat.llamantino.it/
It works only with Italian connection.
Model Details
Last Update: 10/05/2024
https://github.com/marcopoli/LLaMAntino-3-ANITA
Specifications
- Model developers:
Ph.D. Marco Polignano - University of Bari Aldo Moro, Italy
SWAP Research Group - Variations: The model release has been supervised fine-tuning (SFT) using QLoRA 4bit, on instruction-based datasets. DPO approach over the mlabonne/orpo-dpo-mix-40k dataset is used to align with human preferences for helpfulness and safety.
- Input: Models input text only.
- Language: Multilingual ๐ + Italian ๐ฎ๐น
- Output: Models generate text and code only.
- Model Architecture: Llama 3 architecture.
- Context length: 8K, 8192.
- Library Used: Unsloth
Playground
To use the model directly, there are many ways to get started, choose one of the following ways to experience it.
Prompt Template
<|start_header_id|>system<|end_header_id|>
{ SYS Prompt }<|eot_id|><|start_header_id|>user<|end_header_id|>
{ USER Prompt }<|eot_id|><|start_header_id|>assistant<|end_header_id|>
{ ASSIST Prompt }<|eot_id|>
Transformers
For direct use with transformers, you can easily get started with the following steps.
Firstly, you need to install transformers via the command below with
pip.pip install -U transformers trl peft accelerate bitsandbytesRight now, you can start using the model directly.
import torch from transformers import ( AutoModelForCausalLM, AutoTokenizer, ) base_model = "swap-uniba/LLaMAntino-3-ANITA-8B-Inst-DPO-ITA" model = AutoModelForCausalLM.from_pretrained( base_model, torch_dtype=torch.bfloat16, device_map="auto", ) tokenizer = AutoTokenizer.from_pretrained(base_model) sys = "Sei un an assistente AI per la lingua Italiana di nome LLaMAntino-3 ANITA " \ "(Advanced Natural-based interaction for the ITAlian language)." \ " Rispondi nella lingua usata per la domanda in modo chiaro, semplice ed esaustivo." messages = [ {"role": "system", "content": sys}, {"role": "user", "content": "Chi รจ Carlo Magno?"} ] #Method 1 prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False) for k,v in inputs.items(): inputs[k] = v.cuda() outputs = model.generate(**inputs, max_new_tokens=512, do_sample=True, top_p=0.9, temperature=0.6) results = tokenizer.batch_decode(outputs)[0] print(results) #Method 2 import transformers pipe = transformers.pipeline( model=model, tokenizer=tokenizer, return_full_text=False, # langchain expects the full text task='text-generation', max_new_tokens=512, # max number of tokens to generate in the output temperature=0.6, #temperature for more or less creative answers do_sample=True, top_p=0.9, ) sequences = pipe(messages) for seq in sequences: print(f"{seq['generated_text']}")Additionally, you can also use a model with 4bit quantization to reduce the required resources at least. You can start with the code below.
import torch from transformers import ( AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, ) base_model = "swap-uniba/LLaMAntino-3-ANITA-8B-Inst-DPO-ITA" bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.bfloat16, bnb_4bit_use_double_quant=False, ) model = AutoModelForCausalLM.from_pretrained( base_model, quantization_config=bnb_config, device_map="auto", ) tokenizer = AutoTokenizer.from_pretrained(base_model) sys = "Sei un an assistente AI per la lingua Italiana di nome LLaMAntino-3 ANITA " \ "(Advanced Natural-based interaction for the ITAlian language)." \ " Rispondi nella lingua usata per la domanda in modo chiaro, semplice ed esaustivo." messages = [ {"role": "system", "content": sys}, {"role": "user", "content": "Chi รจ Carlo Magno?"} ] #Method 1 prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False) for k,v in inputs.items(): inputs[k] = v.cuda() outputs = model.generate(**inputs, max_new_tokens=512, do_sample=True, top_p=0.9, temperature=0.6) results = tokenizer.batch_decode(outputs)[0] print(results) #Method 2 import transformers pipe = transformers.pipeline( model=model, tokenizer=tokenizer, return_full_text=False, # langchain expects the full text task='text-generation', max_new_tokens=512, # max number of tokens to generate in the output temperature=0.6, #temperature for more or less creative answers do_sample=True, top_p=0.9, ) sequences = pipe(messages) for seq in sequences: print(f"{seq['generated_text']}")
Evaluation
Open LLM Leaderboard:
Evaluated with lm-evaluation-benchmark-harness for the Open Italian LLMs Leaderboard
lm_eval --model hf --model_args pretrained=HUGGINGFACE_MODEL_ID --tasks hellaswag_it,arc_it --device cuda:0 --batch_size auto:2
lm_eval --model hf --model_args pretrained=HUGGINGFACE_MODEL_ID --tasks m_mmlu_it --num_fewshot 5 --device cuda:0 --batch_size auto:2
| Metric | Value |
|---|---|
| Avg. | 0.6160 |
| Arc_IT | 0.5714 |
| Hellaswag_IT | 0.7093 |
| MMLU_IT | 0.5672 |
Unsloth
Unsloth, a great tool that helps us easily develop products, at a lower cost than expected.
Citation instructions
@misc{polignano2024advanced,
title={Advanced Natural-based interaction for the ITAlian language: LLaMAntino-3-ANITA},
author={Marco Polignano and Pierpaolo Basile and Giovanni Semeraro},
year={2024},
eprint={2405.07101},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
@misc{basile2023llamantino,
title={LLaMAntino: LLaMA 2 Models for Effective Text Generation in Italian Language},
author={Pierpaolo Basile and Elio Musacchio and Marco Polignano and Lucia Siciliani and Giuseppe Fiameni and Giovanni Semeraro},
year={2023},
eprint={2312.09993},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
@article{llama3modelcard,
title={Llama 3 Model Card},
author={AI@Meta},
year={2024},
url = {https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md}
}
Acknowledgments
We acknowledge the support of the PNRR project FAIR - Future AI Research (PE00000013), Spoke 6 - Symbiotic AI (CUP H97G22000210007) under the NRRP MUR program funded by the NextGenerationEU.
Models are built on the Leonardo supercomputer with the support of CINECA-Italian Super Computing Resource Allocation, class C project IscrC_Pro_MRS (HP10CQO70G).

Open LLM Leaderboard Evaluation Results
Detailed results can be found here
| Metric | Value |
|---|---|
| Avg. | 75.12 |
| AI2 Reasoning Challenge (25-Shot) | 74.57 |
| HellaSwag (10-Shot) | 92.75 |
| MMLU (5-Shot) | 66.85 |
| TruthfulQA (0-shot) | 75.93 |
| Winogrande (5-shot) | 82.00 |
| GSM8k (5-shot) | 58.61 |
- Downloads last month
- 3,551
Model tree for swap-uniba/LLaMAntino-3-ANITA-8B-Inst-DPO-ITA
Base model
meta-llama/Meta-Llama-3-8B-InstructDatasets used to train swap-uniba/LLaMAntino-3-ANITA-8B-Inst-DPO-ITA
gsarti/clean_mc4_it
Chat-Error/wizard_alpaca_dolly_orca
Spaces using swap-uniba/LLaMAntino-3-ANITA-8B-Inst-DPO-ITA 15
Collection including swap-uniba/LLaMAntino-3-ANITA-8B-Inst-DPO-ITA
Papers for swap-uniba/LLaMAntino-3-ANITA-8B-Inst-DPO-ITA
Advanced Natural-based interaction for the ITAlian language: LLaMAntino-3-ANITA
LLaMAntino: LLaMA 2 Models for Effective Text Generation in Italian Language
Evaluation results
- normalized accuracy on AI2 Reasoning Challenge (25-Shot)test set Open LLM Leaderboard74.570
- normalized accuracy on HellaSwag (10-Shot)validation set Open LLM Leaderboard92.750
- accuracy on MMLU (5-Shot)test set Open LLM Leaderboard66.850
- mc2 on TruthfulQA (0-shot)validation set Open LLM Leaderboard75.930
- accuracy on Winogrande (5-shot)validation set Open LLM Leaderboard82.000
- accuracy on GSM8k (5-shot)test set Open LLM Leaderboard58.610