RLCSD: Reinforcement Learning with Contrastive On-Policy Self-Distillation Paper • 2606.11709 • Published 18 days ago • 1 • 1
Masking Stale Observations Helps Search Agents -- Until It Doesn't: A Regime Map and Its Mechanism Paper • 2606.00408 • Published 30 days ago • 65 • 2