view article Article Shipping a Trillion Parameters With a Hub Bucket: Delta Weight Sync in TRL +6 aminediroHF, qgallouedec, kashif, lewtun, edbeeching, albertvillanova, lvwerra, sergiopaniego • 6 days ago • 36
SonicMoE: Accelerating MoE with IO and Tile-aware Optimizations Paper • 2512.14080 • Published Dec 16, 2025 • 9
Fanar Collection A powerful and versatile family of Arabic Large Language Models (LLMs) designed for a wide range of tasks. • 3 items • Updated Feb 6 • 11
view article Article You could have designed state of the art positional encoding FL33TW00D-HF • Nov 25, 2024 • 482
gpt-oss Collection Open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases. • 2 items • Updated Aug 7, 2025 • 441
view article Article AtlasOCR: Building the First Open-Source Darija OCR Model with Vision Language Models imomayiz • Sep 16, 2025 • 19
view article Article SmolLM3: smol, multilingual, long-context reasoner +21 eliebak, cmpatino, anton-l, edbeeching, m-ric, nouamanetazi, akseljoonas, guipenedo, hynky, clefourrier, SaylorTwift, kashif, qgallouedec, hlarcher, glutamatt, Xenova, reach-vb, ngxson, craffel, lewtun, loubnabnl, lvwerra, thomwolf • Jul 8, 2025 • 777
SmolLM3 evaluation datasets Collection Datasets to decontaminate the post-training mixtures against. Use the subset and column values described per entry • 13 items • Updated Jul 8, 2025 • 9
SmolLM3 pretraining datasets Collection datasets used in SmolLM3 pretraining • 15 items • Updated Aug 12, 2025 • 51
🧠 SmolLM3 Collection Smol, multilingual, long-context reasoner • 14 items • Updated Oct 9, 2025 • 104
view article Article Bringing Fusion Down to Earth: ML for Stellarator Optimization cgeorgiaw • Jul 2, 2025 • 80
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language Paper • 2506.20920 • Published Jun 26, 2025 • 78
view article Article Learn the Hugging Face Kernel Hub in 5 Minutes +5 drbh, danieldk, Narsil, pcuenq, pagezyhf, merve, reach-vb • Jun 12, 2025 • 164
view article Article SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data +7 danaaubakirova, andito, merve, ariG23498, fracapuano, loubnabnl, pcuenq, mshukor, cadene • Jun 3, 2025 • 349
view article Article Cosmopedia: how to create large-scale synthetic data for pre-training Large Language Models +1 loubnabnl, anton-l, davanstrien • Mar 20, 2024 • 114
view article Article Tiny Agents: an MCP-powered agent in 50 lines of code julien-c • Apr 25, 2025 • 308