-
BrokenMath: A Benchmark for Sycophancy in Theorem Proving with LLMs
Paper • 2510.04721 • Published -
FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models
Paper • 2505.02735 • Published • 33 -
PolyMath: Evaluating Mathematical Reasoning in Multilingual Contexts
Paper • 2504.18428 • Published -
MathConstruct: Challenging LLM Reasoning with Constructive Proofs
Paper • 2502.10197 • Published
Shuo Xing
shuoxing
AI & ML interests
MLLMs, LLMs
Recent Activity
updated a model 7 days ago
shuoxing/llama3-8b-full-pretrain-wash-c4-4-5m-bs4 published a model 7 days ago
shuoxing/llama3-8b-full-pretrain-wash-c4-4-5m-bs4 updated a model 7 days ago
shuoxing/llama3-8b-full-pretrain-wash-c4-4-2m-bs4Organizations
MLLM Reasoning, Rewarding, and Understanding
Papers on the reasoning, rewarding, and understanding of the MLLMs and LLMs
-
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Paper • 2505.24726 • Published • 282 -
SynthRL: Scaling Visual Reasoning with Verifiable Data Synthesis
Paper • 2506.02096 • Published • 52 -
OThink-R1: Intrinsic Fast/Slow Thinking Mode Switching for Over-Reasoning Mitigation
Paper • 2506.02397 • Published • 36 -
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
Paper • 2505.24864 • Published • 146
LLM4Math
-
BrokenMath: A Benchmark for Sycophancy in Theorem Proving with LLMs
Paper • 2510.04721 • Published -
FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models
Paper • 2505.02735 • Published • 33 -
PolyMath: Evaluating Mathematical Reasoning in Multilingual Contexts
Paper • 2504.18428 • Published -
MathConstruct: Challenging LLM Reasoning with Constructive Proofs
Paper • 2502.10197 • Published
MLLM Reasoning, Rewarding, and Understanding
Papers on the reasoning, rewarding, and understanding of the MLLMs and LLMs
-
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Paper • 2505.24726 • Published • 282 -
SynthRL: Scaling Visual Reasoning with Verifiable Data Synthesis
Paper • 2506.02096 • Published • 52 -
OThink-R1: Intrinsic Fast/Slow Thinking Mode Switching for Over-Reasoning Mitigation
Paper • 2506.02397 • Published • 36 -
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
Paper • 2505.24864 • Published • 146
models 227
shuoxing/llama3-8b-full-pretrain-wash-c4-4-5m-bs4
8B • Updated • 20
shuoxing/llama3-8b-full-pretrain-wash-c4-4-2m-bs4
Text Generation • 8B • Updated • 142
shuoxing/llama3-8b-full-pretrain-wash-c4-3-9m-bs4
Text Generation • 8B • Updated • 152
shuoxing/llama3-8b-full-pretrain-wash-c4-3-6m-bs4
Text Generation • 8B • Updated • 288
shuoxing/llama3-8b-full-pretrain-wash-c4-3-3m-bs4
Text Generation • 8B • Updated • 176
shuoxing/llama3-8b-full-pretrain-wash-c4-3-0m-bs4
Text Generation • 8B • Updated • 273
shuoxing/llama3-8b-full-pretrain-wash-c4-2-7m-bs4
Text Generation • 8B • Updated • 179
shuoxing/llama3-8b-full-pretrain-wash-c4-2-4m-sft-bs64
Text Generation • 8B • Updated • 190
shuoxing/llama3-8b-full-pretrain-wash-c4-2-1m-sft-bs64
Text Generation • 8B • Updated • 205
shuoxing/llama3-8b-full-pretrain-wash-c4-1-8m-sft-bs64
Text Generation • 8B • Updated • 220
datasets 7
shuoxing/yt_ugc_public
Updated • 1.39k
shuoxing/AutoTrust
Updated • 4
shuoxing/KoNViD_1k_videos
Viewer • Updated • 1.2k • 73
shuoxing/Tweet_demo
Viewer • Updated • 100 • 9
shuoxing/MapBench_VQA
Viewer • Updated • 96 • 11 • 1
shuoxing/MapBench
Viewer • Updated • 97 • 7
shuoxing/tweet-scholar
Viewer • Updated • 95 • 5