Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling
Abstract
Janus-Pro, an enhanced version of Janus, improves multimodal understanding and text-to-image capabilities with an optimized training strategy, expanded data, and increased model size.
In this work, we introduce Janus-Pro, an advanced version of the previous work Janus. Specifically, Janus-Pro incorporates (1) an optimized training strategy, (2) expanded training data, and (3) scaling to larger model size. With these improvements, Janus-Pro achieves significant advancements in both multimodal understanding and text-to-image instruction-following capabilities, while also enhancing the stability of text-to-image generation. We hope this work will inspire further exploration in the field. Code and models are publicly available.
Get this paper in your agent:
hf papers read 2501.17811 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 19
Browse 19 models citing this paperDatasets citing this paper 0
No dataset linking this paper