arxiv:2505.04410
Junjie Wang
xiaomoguhzz
AI & ML interests
computer vision, Vision-Language Models, Multimodal Large Language Models
Recent Activity
updated a dataset 6 days ago
xiaomoguhzz/codex-ppt-temp-visual-encoder-assets published a dataset 6 days ago
xiaomoguhzz/codex-ppt-temp-visual-encoder-assets upvoted a paper 10 days ago
UnityShots: Memory-Driven Multi-Shot Audio-Video Generation with Boundary-Aware Gating