NeoLLM

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0006
train_batch_size: 64
eval_batch_size: 64
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 0.1
num_epochs: 1

Training Loss	Epoch	Step	Validation Loss	Loss	Model Loss	Update Due	Beta	Clip Fraction	Mean Gain	Control L2	Ema Distance L2
4.3293	0.1067	5000.0	4.1294	0.2787	3.8522	1.0	0.0141	0.2085	0.2833	1.4828	36.7415
3.9588	0.2133	10000.0	3.7924	0.1655	3.5070	1.0	0.0100	0.1817	0.2260	1.4158	56.8642
3.8245	0.32	15000.0	3.6735	0.1547	3.3874	1.0	0.0082	0.1703	0.2057	1.1435	67.8131
3.7585	0.4267	20000.0	3.6112	0.1556	3.3272	1.0	0.0071	0.1649	0.1956	1.2229	74.0324
3.7154	0.5333	25000.0	3.5712	0.1580	3.2889	1.0	0.0063	0.1612	0.1906	1.0322	78.7246
3.6791	0.64	30000.0	3.5380	0.1542	3.2499	1.0	0.0058	0.1585	0.1869	1.2708	81.2646
3.6606	0.7467	35000.0	3.5148	0.1540	3.2230	1.0	0.0053	0.1595	0.1875	0.9860	83.7974
3.5787	0.8533	40000.0	3.4789	0.1541	3.1880	1.0	0.0050	0.1410	0.1689	0.5123	53.5547
3.5391	0.96	45000.0	3.4501	0.1549	3.1584	1.0	0.0047	0.0000	0.1122	0.1787	20.1584
3.5324	1.0	46875.0	3.4451	0.1548	3.1531	1.0	0.0046	0.0000	0.0626	0.0937	9.3472

Safetensors

Model size

0.1B params

Tensor type

I64

F32

BF16