-
DuPO: Enabling Reliable LLM Self-Verification via Dual Preference Optimization
Paper • 2508.14460 • Published • 86 -
MEML-GRPO: Heterogeneous Multi-Expert Mutual Learning for RLVR Advancement
Paper • 2508.09670 • Published -
URPO: A Unified Reward & Policy Optimization Framework for Large Language Models
Paper • 2507.17515 • Published • 2
Emmanuel Sugutt
Sugutt
AI & ML interests
Reinforcement learning
Transformer models
Recent Activity
upvoted an article 2 days ago
Build Small Hackathon With Cohere Models updated a Space 3 days ago
Sugutt/kln_whisper_v3_turbo published a Space 3 days ago
Sugutt/kln_whisper_v3_turbo