RUT-Bench Benchmark data in "Beyond Ideal Instruction: A Comprehensive Framework for Evaluating LLMs in Realistic Interactions". Miaow-Lab/RUT-Bench Viewer • Updated about 19 hours ago • 1.64k • 13
STT-Arena benchmark data, training data, and STT-Agent from our paper "STT-Arena: A More Realistic Environment for Tool-Using with Spatio-Temporal Dynamics" STT-Arena: A More Realistic Environment for Tool-Using with Spatio-Temporal Dynamics Paper • 2605.18548 • Published 17 days ago • 1 Miaow-Lab/STT-Agent-SFT 196k • Updated 16 days ago • 30 • 1 Miaow-Lab/STT-Agent-RL 196k • Updated 16 days ago • 31 • 1 Miaow-Lab/STT-Arena Preview • Updated 16 days ago • 98 • 2
STT-Arena: A More Realistic Environment for Tool-Using with Spatio-Temporal Dynamics Paper • 2605.18548 • Published 17 days ago • 1
RUT-Bench Benchmark data in "Beyond Ideal Instruction: A Comprehensive Framework for Evaluating LLMs in Realistic Interactions". Miaow-Lab/RUT-Bench Viewer • Updated about 19 hours ago • 1.64k • 13
STT-Arena benchmark data, training data, and STT-Agent from our paper "STT-Arena: A More Realistic Environment for Tool-Using with Spatio-Temporal Dynamics" STT-Arena: A More Realistic Environment for Tool-Using with Spatio-Temporal Dynamics Paper • 2605.18548 • Published 17 days ago • 1 Miaow-Lab/STT-Agent-SFT 196k • Updated 16 days ago • 30 • 1 Miaow-Lab/STT-Agent-RL 196k • Updated 16 days ago • 31 • 1 Miaow-Lab/STT-Arena Preview • Updated 16 days ago • 98 • 2
STT-Arena: A More Realistic Environment for Tool-Using with Spatio-Temporal Dynamics Paper • 2605.18548 • Published 17 days ago • 1