Training Log for Trelis/whisper-small-atc-5s ================================================== Base Model: openai/whisper-small Train Dataset(s): Trelis/atc-train-1gb-5s ================================================== [17:26:31] Starting pipeline... [17:26:31] [Modal] Submitting training job to Modal (H100 GPU)... [17:26:31] [Modal] Base model: openai/whisper-small [17:26:31] [Modal] Training on 1 dataset(s): [17:26:31] [Modal] - Trelis/atc-train-1gb-5s (split=train, weight=1.0) [17:27:01] [Modal] Function call started (ID: fc-01KHC0MER...) [17:27:01] [Modal] Starting training pipeline on H100 GPU [17:27:01] [Modal] Model type: whisper [17:27:01] [Modal] Embedding training: disabled [17:27:01] [Modal] Language: english (code: en) [17:27:01] [Modal] CUDA available: True [17:27:01] [Modal] GPU: NVIDIA H100 80GB HBM3 [17:27:01] [Modal] GPU memory: 79.2GB total [17:27:01] [Modal] ============================================================ [17:27:01] [Modal] PHASE 1: Baseline Evaluation [17:27:01] [Modal] ============================================================ [17:27:01] [Modal] Loading validation dataset: Trelis/atc-test-0.5s (split=validation) [17:27:04] [Modal] Loaded 50 validation samples [17:27:04] [Modal] Timestamps disabled - evaluating without timestamps [17:27:04] [Modal] [Baseline] Loading model: openai/whisper-small [17:27:13] [Modal] [Baseline] Evaluating on 50 samples (without timestamps) [17:27:16] [Modal] [Baseline] Progress: 10/50 samples [17:27:18] [Modal] [Baseline] Progress: 20/50 samples [17:27:20] [Modal] [Baseline] Progress: 30/50 samples [17:27:22] [Modal] [Baseline] Progress: 40/50 samples [17:27:24] [Modal] [Baseline] Progress: 50/50 samples [17:27:25] [Modal] [Baseline] WER: 55.54% (50 samples) [17:27:25] [Modal] ============================================================ [17:27:25] [Modal] PHASE 2: Training with Unsloth (Whisper) [17:27:25] [Modal] ============================================================ [17:27:25] [Modal] Loading model: openai/whisper-small [17:27:44] [Modal] Applying LoRA (rank=32, alpha=16) [17:27:46] [Modal] GPU memory after model+LoRA: 0.5GB / 1.5GB peak [17:27:47] [Modal] Loading 1 training dataset(s)... [17:27:47] [Modal] [1/1] Loading Trelis/atc-train-1gb-5s (split=train, weight=1.0) [17:27:54] [Modal] → 773 rows loaded [17:27:54] [Modal] Combined training set ready (1 source dataset(s)) [17:27:58] [Modal] Using 16 CPU workers for preprocessing... [17:27:59] [Modal] Dataset columns: ['audio', 'end_time', 'preconditioning', 'source_file', 'speech_duration', 'start_time', 'text', 'text_ts', 'word_timestamps'] [17:28:05] [Modal] Preprocessing complete in 6.8s [17:28:05] [Modal] Dataset: 773 train, 50 validation [17:28:05] [Modal] Using standard training (no timestamps) [17:28:06] [Modal] Total steps: 192, warmup: 19, eval every 38 [17:28:08] [Modal] WandB logging enabled: https://wandb.ai/trelis/trelis-whisper/runs/9t25aj0s [17:28:08] [Modal] Using bf16=True, fp16=False [17:28:08] [Modal] Starting training... [17:28:10] [Modal] Starting: 194 steps, 2 epoch(s) [17:28:26] [Modal] Step 1/194 (1%) | 0.7GB / 1.5GB peak | Elapsed: 0.3m | Remaining (est.): 54.6m [17:28:27] [Modal] → loss: 2.3299 | lr: 0.00e+00 [17:28:28] [Modal] → loss: 2.1221 | lr: 1.49e-05 [17:28:29] [Modal] → loss: 2.0013 | lr: 2.98e-05 [17:28:29] [Modal] → loss: 2.0711 | lr: 4.47e-05 [17:28:30] [Modal] → loss: 2.1186 | lr: 5.95e-05 [17:28:31] [Modal] → loss: 1.6283 | lr: 7.44e-05 [17:28:32] [Modal] → loss: 1.3640 | lr: 8.93e-05 [17:28:33] [Modal] → loss: 1.8667 | lr: 1.04e-04 [17:28:34] [Modal] → loss: 1.6002 | lr: 1.19e-04 [17:28:35] [Modal] → loss: 1.3362 | lr: 1.34e-04 [17:28:36] [Modal] → loss: 1.3025 | lr: 1.49e-04 [17:28:37] [Modal] → loss: 1.2618 | lr: 1.64e-04 [17:28:38] [Modal] → loss: 1.1943 | lr: 1.79e-04 [17:28:38] [Modal] → loss: 1.0101 | lr: 1.94e-04 [17:28:39] [Modal] → loss: 0.7797 | lr: 2.08e-04 [17:28:40] [Modal] → loss: 0.8930 | lr: 2.23e-04 [17:28:41] [Modal] → loss: 0.7497 | lr: 2.38e-04 [17:28:42] [Modal] → loss: 0.6871 | lr: 2.53e-04 [17:28:42] [Modal] Step 19/194 (10%) | 0.7GB / 1.7GB peak | Elapsed: 0.6m | Remaining (est.): 5.1m [17:28:43] [Modal] → loss: 0.9477 | lr: 2.68e-04 [17:28:44] [Modal] → loss: 0.8965 | lr: 2.83e-04 [17:28:45] [Modal] → loss: 0.8514 | lr: 2.83e-04 [17:28:46] [Modal] → loss: 0.4901 | lr: 2.83e-04 [17:28:47] [Modal] → loss: 0.7095 | lr: 2.83e-04 [17:28:47] [Modal] → loss: 0.4924 | lr: 2.83e-04 [17:28:48] [Modal] → loss: 0.6651 | lr: 2.83e-04 [17:28:49] [Modal] → loss: 0.3633 | lr: 2.83e-04 [17:28:50] [Modal] → loss: 0.4902 | lr: 2.83e-04 [17:28:51] [Modal] → loss: 0.4987 | lr: 2.83e-04 [17:28:52] [Modal] → loss: 0.4494 | lr: 2.83e-04 [17:28:53] [Modal] → loss: 0.4992 | lr: 2.83e-04 [17:28:54] [Modal] → loss: 0.4559 | lr: 2.83e-04 [17:28:54] [Modal] → loss: 0.4858 | lr: 2.83e-04 [17:28:55] [Modal] → loss: 0.4252 | lr: 2.83e-04 [17:28:56] [Modal] → loss: 0.4182 | lr: 2.83e-04 [17:28:57] [Modal] → loss: 0.3006 | lr: 2.83e-04 [17:28:58] [Modal] → loss: 0.3929 | lr: 2.83e-04 [17:28:59] [Modal] → loss: 0.6386 | lr: 2.83e-04 [17:28:59] [Modal] Step 38/194 (20%) | 0.7GB / 1.8GB peak | Elapsed: 0.8m | Remaining (est.): 3.4m [17:29:00] [Modal] → loss: 0.4370 | lr: 2.83e-04 [17:29:08] [Modal] → eval WER: 32.00% [17:29:08] [Modal] → eval loss: 2.3745 [17:29:09] [Modal] → loss: 0.5572 | lr: 2.83e-04 [17:29:10] [Modal] → loss: 0.5190 | lr: 2.83e-04 [17:29:11] [Modal] → loss: 0.4496 | lr: 2.83e-04 [17:29:12] [Modal] → loss: 0.3577 | lr: 2.83e-04 [17:29:13] [Modal] → loss: 0.3778 | lr: 2.83e-04 [17:29:14] [Modal] → loss: 0.4176 | lr: 2.83e-04 [17:29:15] [Modal] → loss: 0.3373 | lr: 2.83e-04 [17:29:15] [Modal] → loss: 0.3582 | lr: 2.83e-04 [17:29:16] [Modal] → loss: 0.3692 | lr: 2.83e-04 [17:29:17] [Modal] → loss: 0.2781 | lr: 2.83e-04 [17:29:18] [Modal] → loss: 0.4876 | lr: 2.83e-04 [17:29:19] [Modal] → loss: 0.3487 | lr: 2.83e-04 [17:29:20] [Modal] → loss: 0.3354 | lr: 2.83e-04 [17:29:21] [Modal] → loss: 0.3302 | lr: 2.83e-04 [17:29:21] [Modal] → loss: 0.3643 | lr: 2.83e-04 [17:29:22] [Modal] → loss: 0.2373 | lr: 2.83e-04 [17:29:23] [Modal] → loss: 0.2108 | lr: 2.83e-04 [17:29:24] [Modal] → loss: 0.3216 | lr: 2.83e-04 [17:29:24] [Modal] Step 57/194 (29%) | 0.7GB / 2.0GB peak | Elapsed: 1.3m | Remaining (est.): 3.0m [17:29:25] [Modal] → loss: 0.5381 | lr: 2.83e-04 [17:29:26] [Modal] → loss: 0.2253 | lr: 2.83e-04 [17:29:27] [Modal] → loss: 0.4422 | lr: 2.83e-04 [17:29:27] [Modal] → loss: 0.2200 | lr: 2.83e-04 [17:29:28] [Modal] → loss: 0.3603 | lr: 2.83e-04 [17:29:29] [Modal] → loss: 0.4015 | lr: 2.83e-04 [17:29:30] [Modal] → loss: 0.5429 | lr: 2.83e-04 [17:29:31] [Modal] → loss: 0.1828 | lr: 2.83e-04 [17:29:32] [Modal] → loss: 0.2900 | lr: 2.83e-04 [17:29:33] [Modal] → loss: 0.2399 | lr: 2.83e-04 [17:29:34] [Modal] → loss: 0.3815 | lr: 2.83e-04 [17:29:34] [Modal] → loss: 0.2039 | lr: 2.83e-04 [17:29:35] [Modal] → loss: 0.2422 | lr: 2.83e-04 [17:29:36] [Modal] → loss: 0.2882 | lr: 2.83e-04 [17:29:37] [Modal] → loss: 0.4586 | lr: 2.83e-04 [17:29:38] [Modal] → loss: 0.2304 | lr: 2.83e-04 [17:29:39] [Modal] → loss: 0.2747 | lr: 2.83e-04 [17:29:39] [Modal] → loss: 0.3175 | lr: 2.83e-04 [17:29:40] [Modal] → loss: 0.3844 | lr: 2.83e-04 [17:29:41] [Modal] Step 76/194 (39%) | 0.7GB / 2.2GB peak | Elapsed: 1.5m | Remaining (est.): 2.4m [17:29:41] [Modal] → loss: 0.2502 | lr: 2.83e-04 [17:29:46] [Modal] → eval WER: 36.46% [17:29:46] [Modal] → eval loss: 2.1419 [17:29:46] [Modal] → loss: 0.3407 | lr: 2.83e-04 [17:29:47] [Modal] → loss: 0.3717 | lr: 2.83e-04 [17:29:48] [Modal] → loss: 0.2611 | lr: 2.83e-04 [17:29:49] [Modal] → loss: 0.2145 | lr: 2.83e-04 [17:29:50] [Modal] → loss: 0.3324 | lr: 2.83e-04 [17:29:51] [Modal] → loss: 0.3564 | lr: 2.83e-04 [17:29:51] [Modal] → loss: 0.2651 | lr: 2.83e-04 [17:29:52] [Modal] → loss: 0.2515 | lr: 2.83e-04 [17:29:53] [Modal] → loss: 0.2271 | lr: 2.83e-04 [17:29:54] [Modal] → loss: 0.3093 | lr: 2.83e-04 [17:29:55] [Modal] → loss: 0.3045 | lr: 2.83e-04 [17:29:56] [Modal] → loss: 0.3189 | lr: 2.83e-04 [17:29:56] [Modal] → loss: 0.3456 | lr: 2.83e-04 [17:29:57] [Modal] → loss: 0.4432 | lr: 2.83e-04 [17:29:58] [Modal] → loss: 0.1967 | lr: 2.83e-04 [17:29:59] [Modal] → loss: 0.1607 | lr: 2.83e-04 [17:30:00] [Modal] → loss: 0.3514 | lr: 2.83e-04 [17:30:01] [Modal] → loss: 0.3115 | lr: 2.83e-04 [17:30:01] [Modal] Step 95/194 (49%) | 0.7GB / 2.2GB peak | Elapsed: 1.9m | Remaining (est.): 1.9m [17:30:01] [Modal] → loss: 0.4118 | lr: 2.83e-04 [17:30:02] [Modal] → loss: 0.3664 | lr: 2.83e-04 [17:30:07] [Modal] → loss: 0.2934 | lr: 2.83e-04 [17:30:09] [Modal] → loss: 0.1902 | lr: 2.83e-04 [17:30:10] [Modal] → loss: 0.1448 | lr: 2.83e-04 [17:30:11] [Modal] → loss: 0.1897 | lr: 2.83e-04 [17:30:11] [Modal] → loss: 0.1898 | lr: 2.83e-04 [17:30:12] [Modal] → loss: 0.1120 | lr: 2.83e-04 [17:30:13] [Modal] → loss: 0.0910 | lr: 2.83e-04 [17:30:14] [Modal] → loss: 0.1560 | lr: 2.83e-04 [17:30:15] [Modal] → loss: 0.1207 | lr: 2.83e-04 [17:30:16] [Modal] → loss: 0.1116 | lr: 2.83e-04 [17:30:16] [Modal] → loss: 0.1631 | lr: 2.83e-04 [17:30:17] [Modal] → loss: 0.1139 | lr: 2.83e-04 [17:30:18] [Modal] → loss: 0.1423 | lr: 2.83e-04 [17:30:19] [Modal] → loss: 0.1053 | lr: 2.83e-04 [17:30:20] [Modal] → loss: 0.1643 | lr: 2.83e-04 [17:30:21] [Modal] → loss: 0.1670 | lr: 2.83e-04 [17:30:21] [Modal] → loss: 0.1602 | lr: 2.83e-04 [17:30:22] [Modal] Step 114/194 (59%) | 0.7GB / 2.2GB peak | Elapsed: 2.2m | Remaining (est.): 1.6m [17:30:22] [Modal] → loss: 0.1643 | lr: 2.83e-04 [17:30:27] [Modal] → eval WER: 38.00% [17:30:27] [Modal] → eval loss: 2.1524 [17:30:28] [Modal] → loss: 0.2266 | lr: 2.83e-04 [17:30:28] [Modal] → loss: 0.2959 | lr: 2.83e-04 [17:30:29] [Modal] → loss: 0.1132 | lr: 2.83e-04 [17:30:30] [Modal] → loss: 0.1520 | lr: 2.83e-04 [17:30:31] [Modal] → loss: 0.1368 | lr: 2.83e-04 [17:30:32] [Modal] → loss: 0.1382 | lr: 2.83e-04 [17:30:33] [Modal] → loss: 0.1178 | lr: 2.83e-04 [17:30:34] [Modal] → loss: 0.1533 | lr: 2.83e-04 [17:30:34] [Modal] → loss: 0.1473 | lr: 2.83e-04 [17:30:35] [Modal] → loss: 0.1891 | lr: 2.83e-04 [17:30:36] [Modal] → loss: 0.2052 | lr: 2.83e-04 [17:30:37] [Modal] → loss: 0.1201 | lr: 2.83e-04 [17:30:38] [Modal] → loss: 0.0984 | lr: 2.83e-04 [17:30:39] [Modal] → loss: 0.1177 | lr: 2.83e-04 [17:30:39] [Modal] → loss: 0.1342 | lr: 2.83e-04 [17:30:40] [Modal] → loss: 0.1048 | lr: 2.83e-04 [17:30:41] [Modal] → loss: 0.1401 | lr: 2.83e-04 [17:30:42] [Modal] → loss: 0.1360 | lr: 2.83e-04 [17:30:42] [Modal] Step 133/194 (69%) | 0.7GB / 2.2GB peak | Elapsed: 2.6m | Remaining (est.): 1.2m [17:30:43] [Modal] → loss: 0.1168 | lr: 2.83e-04 [17:30:44] [Modal] → loss: 0.1285 | lr: 2.83e-04 [17:30:44] [Modal] → loss: 0.1482 | lr: 2.83e-04 [17:30:45] [Modal] → loss: 0.1491 | lr: 2.83e-04 [17:30:46] [Modal] → loss: 0.1346 | lr: 2.83e-04 [17:30:47] [Modal] → loss: 0.1408 | lr: 2.83e-04 [17:30:48] [Modal] → loss: 0.1223 | lr: 2.83e-04 [17:30:49] [Modal] → loss: 0.1484 | lr: 2.83e-04 [17:30:49] [Modal] → loss: 0.0957 | lr: 2.83e-04 [17:30:50] [Modal] → loss: 0.1250 | lr: 2.83e-04 [17:30:51] [Modal] → loss: 0.1546 | lr: 2.83e-04 [17:30:52] [Modal] → loss: 0.1050 | lr: 2.83e-04 [17:30:53] [Modal] → loss: 0.1384 | lr: 2.83e-04 [17:30:54] [Modal] → loss: 0.1520 | lr: 2.83e-04 [17:30:55] [Modal] → loss: 0.0722 | lr: 2.83e-04 [17:30:55] [Modal] → loss: 0.1274 | lr: 2.83e-04 [17:30:56] [Modal] → loss: 0.1120 | lr: 2.83e-04 [17:30:57] [Modal] → loss: 0.0849 | lr: 2.83e-04 [17:30:58] [Modal] → loss: 0.1333 | lr: 2.83e-04 [17:30:58] [Modal] Step 152/194 (78%) | 0.7GB / 2.2GB peak | Elapsed: 2.8m | Remaining (est.): 0.8m [17:30:59] [Modal] → loss: 0.1893 | lr: 2.83e-04 [17:31:03] [Modal] → eval WER: 38.00% [17:31:03] [Modal] → eval loss: 2.0244 [17:31:04] [Modal] → loss: 0.1646 | lr: 2.83e-04 [17:31:05] [Modal] → loss: 0.1449 | lr: 2.83e-04 [17:31:06] [Modal] → loss: 0.1457 | lr: 2.83e-04 [17:31:07] [Modal] → loss: 0.1314 | lr: 2.83e-04 [17:31:08] [Modal] → loss: 0.0428 | lr: 2.83e-04 [17:31:08] [Modal] → loss: 0.2189 | lr: 2.83e-04 [17:31:09] [Modal] → loss: 0.1064 | lr: 2.83e-04 [17:31:10] [Modal] → loss: 0.0731 | lr: 2.83e-04 [17:31:11] [Modal] → loss: 0.1845 | lr: 2.83e-04 [17:31:12] [Modal] → loss: 0.0760 | lr: 2.83e-04 [17:31:13] [Modal] → loss: 0.0739 | lr: 2.83e-04 [17:31:14] [Modal] → loss: 0.1159 | lr: 2.83e-04 [17:31:15] [Modal] → loss: 0.2086 | lr: 2.83e-04 [17:31:15] [Modal] → loss: 0.0437 | lr: 2.83e-04 [17:31:16] [Modal] → loss: 0.0768 | lr: 2.83e-04 [17:31:17] [Modal] → loss: 0.1482 | lr: 2.83e-04 [17:31:18] [Modal] → loss: 0.1210 | lr: 2.83e-04 [17:31:19] [Modal] → loss: 0.0805 | lr: 2.83e-04 [17:31:19] [Modal] Step 171/194 (88%) | 0.7GB / 2.2GB peak | Elapsed: 3.2m | Remaining (est.): 0.4m [17:31:20] [Modal] → loss: 0.1307 | lr: 2.83e-04 [17:31:21] [Modal] → loss: 0.1167 | lr: 2.83e-04 [17:31:21] [Modal] → loss: 0.1513 | lr: 2.83e-04 [17:31:22] [Modal] → loss: 0.0839 | lr: 2.83e-04 [17:31:23] [Modal] → loss: 0.1827 | lr: 2.83e-04 [17:31:24] [Modal] → loss: 0.2314 | lr: 2.83e-04 [17:31:25] [Modal] → loss: 0.1747 | lr: 2.83e-04 [17:31:26] [Modal] → loss: 0.1327 | lr: 2.83e-04 [17:31:26] [Modal] → loss: 0.1489 | lr: 2.83e-04 [17:31:27] [Modal] → loss: 0.0726 | lr: 2.83e-04 [17:31:28] [Modal] → loss: 0.0547 | lr: 2.83e-04 [17:31:29] [Modal] → loss: 0.1263 | lr: 2.83e-04 [17:31:30] [Modal] → loss: 0.1081 | lr: 2.83e-04 [17:31:31] [Modal] → loss: 0.1347 | lr: 2.83e-04 [17:31:31] [Modal] → loss: 0.0906 | lr: 2.83e-04 [17:31:32] [Modal] → loss: 0.1503 | lr: 2.83e-04 [17:31:33] [Modal] → loss: 0.0751 | lr: 2.83e-04 [17:31:34] [Modal] → loss: 0.0948 | lr: 2.83e-04 [17:31:35] [Modal] → loss: 0.1094 | lr: 2.83e-04 [17:31:35] [Modal] Step 190/194 (98%) | 0.7GB / 2.2GB peak | Elapsed: 3.4m | Remaining (est.): 0.1m [17:31:36] [Modal] → loss: 0.1568 | lr: 2.83e-04 [17:31:40] [Modal] → eval WER: 39.54% [17:31:40] [Modal] → eval loss: 2.0737 [17:31:41] [Modal] → loss: 0.1000 | lr: 2.83e-04 [17:31:42] [Modal] → loss: 0.1545 | lr: 2.83e-04 [17:31:42] [Modal] → loss: 0.0726 | lr: 2.83e-04 [17:31:42] [Modal] Step 194/194 (100%) | 0.7GB / 2.2GB peak | Elapsed: 3.6m | Remaining (est.): 0.0m [17:31:42] [Modal] → loss: 0.1180 | lr: 2.83e-04 [17:31:43] [Modal] Training complete in 3.6 minutes [17:31:43] [Modal] Training metrics: 3.6 min, loss=0.3593 [17:31:43] [Modal] Peak GPU memory during training: 0.7GB / 2.2GB peak [17:31:43] [Modal] Merging LoRA weights... [17:31:46] [Modal] Saved generation config from base model [17:31:46] [Modal] Saved merged model to /tmp/merged_model [17:31:46] [Modal] Pushing model to HuggingFace Hub: Trelis/whisper-small-atc-5s... [17:32:32] [Modal] Model pushed: https://huggingface.co/Trelis/whisper-small-atc-5s [17:32:32] [Modal] Converting to CTranslate2 format (bfloat16)... [17:32:33] [Modal] CTranslate2 conversion complete [17:33:02] [Modal] CTranslate2 pushed: https://huggingface.co/Trelis/whisper-small-atc-5s/tree/ctranslate2 [17:33:02] [Modal] Syncing WandB logs... [17:33:03] [Modal] WandB sync complete [17:33:03] [Modal] ============================================================ [17:33:03] [Modal] PHASE 3: Final Evaluation [17:33:03] [Modal] ============================================================ [17:33:04] [Modal] [Final] Loading model: /tmp/merged_model [17:33:04] [Modal] [Final] Evaluating on 50 samples (without timestamps) [17:33:18] [Modal] [Final] Progress: 10/50 samples [17:33:24] [Modal] [Final] Progress: 20/50 samples [17:33:33] [Modal] [Final] Progress: 30/50 samples [17:33:36] [Modal] [Final] Progress: 40/50 samples [17:33:45] [Modal] [Final] Progress: 50/50 samples [17:33:46] [Modal] [Final] WER: 69.23% (50 samples) [17:33:46] [Modal] ============================================================ [17:33:46] [Modal] TRAINING COMPLETE - SUMMARY [17:33:46] [Modal] ============================================================ [17:33:46] [Modal] Baseline WER: 55.54% [17:33:46] [Modal] Final WER: 69.23% [17:33:46] [Modal] Regression: 13.69 percentage points [17:33:46] [Modal] ============================================================ [17:33:46] [Modal] Cleaned up temporary training files [17:33:46] [Modal] Training complete: loss=0.3593, runtime=3.6min [17:33:46] [Modal] Baseline WER: 55.54% [17:33:46] [Modal] Final WER: 69.23% [17:33:46] [Upload] Model pushed by Modal, uploading additional files... [17:33:46] [Upload] Uploading model card... [17:33:47] [Upload] Uploading training logs...