Déjà View: Looping Transformers for Multi-View 3D Reconstruction Paper • 2605.30215 • Published 6 days ago • 4
GGT-100K: Generative Ground Truth for Generalizable Real-World Image Restoration Paper • 2605.31039 • Published 5 days ago • 35
Mellum: Production-Grade in-IDE Contextual Code Completion with Multi-File Project Understanding Paper • 2510.05788 • Published Oct 7, 2025 • 4
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data Paper • 2401.10891 • Published Jan 19, 2024 • 64
Why Far Looks Up: Probing Spatial Representation in Vision-Language Models Paper • 2605.30161 • Published 6 days ago • 57
Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments Paper • 2605.30280 • Published 6 days ago • 128
Gamma-World: Generative Multi-Agent World Modeling Beyond Two Players Paper • 2605.28816 • Published 7 days ago • 416
Agent Explorative Policy Optimization for Multimodal Agentic Reasoning Paper • 2605.28774 • Published 7 days ago • 85
DINOv2: Learning Robust Visual Features without Supervision Paper • 2304.07193 • Published Apr 14, 2023 • 10
DINO-X: A Unified Vision Model for Open-World Object Detection and Understanding Paper • 2411.14347 • Published Nov 21, 2024 • 17
DINO-R1: Incentivizing Reasoning Capability in Vision Foundation Models Paper • 2505.24025 • Published May 29, 2025 • 28
MoCapAnything: Unified 3D Motion Capture for Arbitrary Skeletons from Monocular Videos Paper • 2512.10881 • Published Dec 11, 2025 • 32
MapAnything: Universal Feed-Forward Metric 3D Reconstruction Paper • 2509.13414 • Published Sep 16, 2025 • 3
LLaVA-OneVision-2: Towards Next-Generation Perceptual Intelligence Paper • 2605.25979 • Published 9 days ago • 27