frontier AIRS-Bench: a Suite of Tasks for Frontier AI Research Science Agents Paper • 2602.06855 • Published Feb 6 • 82 OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration Paper • 2602.05400 • Published Feb 5 • 349
AIRS-Bench: a Suite of Tasks for Frontier AI Research Science Agents Paper • 2602.06855 • Published Feb 6 • 82
OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration Paper • 2602.05400 • Published Feb 5 • 349
coding SWE-Universe: Scale Real-World Verifiable Environments to Millions Paper • 2602.02361 • Published Feb 2 • 60 LongCodeZip: Compress Long Context for Code Language Models Paper • 2510.00446 • Published Oct 1, 2025 • 108 Code2World: A GUI World Model via Renderable Code Generation Paper • 2602.09856 • Published Feb 10 • 202 Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces Paper • 2601.11868 • Published Jan 17 • 35
SWE-Universe: Scale Real-World Verifiable Environments to Millions Paper • 2602.02361 • Published Feb 2 • 60
LongCodeZip: Compress Long Context for Code Language Models Paper • 2510.00446 • Published Oct 1, 2025 • 108
Code2World: A GUI World Model via Renderable Code Generation Paper • 2602.09856 • Published Feb 10 • 202
Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces Paper • 2601.11868 • Published Jan 17 • 35
frontier AIRS-Bench: a Suite of Tasks for Frontier AI Research Science Agents Paper • 2602.06855 • Published Feb 6 • 82 OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration Paper • 2602.05400 • Published Feb 5 • 349
AIRS-Bench: a Suite of Tasks for Frontier AI Research Science Agents Paper • 2602.06855 • Published Feb 6 • 82
OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration Paper • 2602.05400 • Published Feb 5 • 349
coding SWE-Universe: Scale Real-World Verifiable Environments to Millions Paper • 2602.02361 • Published Feb 2 • 60 LongCodeZip: Compress Long Context for Code Language Models Paper • 2510.00446 • Published Oct 1, 2025 • 108 Code2World: A GUI World Model via Renderable Code Generation Paper • 2602.09856 • Published Feb 10 • 202 Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces Paper • 2601.11868 • Published Jan 17 • 35
SWE-Universe: Scale Real-World Verifiable Environments to Millions Paper • 2602.02361 • Published Feb 2 • 60
LongCodeZip: Compress Long Context for Code Language Models Paper • 2510.00446 • Published Oct 1, 2025 • 108
Code2World: A GUI World Model via Renderable Code Generation Paper • 2602.09856 • Published Feb 10 • 202
Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces Paper • 2601.11868 • Published Jan 17 • 35