ChatBench ChatBench Datasets and Simulators (same prompt + fine-tuning set-up) from the ChatBench paper. microsoft/ChatBench Preview • Updated Apr 28, 2025 • 278 • 12 microsoft/chatbench-distilgpt2 Text Generation • 81.9M • Updated Aug 23, 2025 • 48 • 4 microsoft/chatbench-llama3-8b Updated Aug 23, 2025 • 12 • 6 microsoft/chatbench-mistral-7b Updated Aug 23, 2025 • 10 • 5
MediPhi A collection of SLMs based on Phi3.5-mini-instruct adapted to clinical natural language processing tasks: https://arxiv.org/abs/2505.10717 A Modular Approach for Clinical SLMs Driven by Synthetic Data with Pre-Instruction Tuning, Model Merging, and Clinical-Tasks Alignment Paper • 2505.10717 • Published May 15, 2025 • 5 microsoft/MediPhi-Instruct Text Generation • 4B • Updated Dec 15, 2025 • 4.47k • 64 microsoft/MediPhi Text Generation • 4B • Updated Dec 15, 2025 • 4.2k • 20 microsoft/MediPhi-PubMed Text Generation • 4B • Updated Dec 15, 2025 • 144 • 10
A Modular Approach for Clinical SLMs Driven by Synthetic Data with Pre-Instruction Tuning, Model Merging, and Clinical-Tasks Alignment Paper • 2505.10717 • Published May 15, 2025 • 5
NatureLM microsoft/NatureLM-8x7B 47B • Updated Jun 20, 2025 • 69 • 20 microsoft/NatureLM-8x7B-Inst 47B • Updated Jun 20, 2025 • 206 • 25
NextCoder NextCoder family of code-editing LMs developed with Selective Knowledge Transfer and its training data. microsoft/NextCoder-7B Text Generation • 8B • Updated Jun 12, 2025 • 672 • 32 microsoft/NextCoder-14B Text Generation • 15B • Updated Jun 12, 2025 • 708 • 18 microsoft/NextCoder-32B Text Generation • 33B • Updated Jun 12, 2025 • 628 • • 67 microsoft/NextCoderDataset Viewer • Updated Jul 8, 2025 • 381k • 433 • 55
Phi-3 Phi-3 family of small language and multi-modal models. Language models are available in short- and long-context lengths. microsoft/Phi-3.5-mini-instruct Text Generation • 4B • Updated Dec 10, 2025 • 739k • 969 microsoft/Phi-3.5-MoE-instruct Text Generation • Updated Dec 10, 2025 • 97.5k • 571 microsoft/Phi-3.5-vision-instruct Image-Text-to-Text • Updated Dec 10, 2025 • 1.35M • 729 microsoft/Phi-3-mini-4k-instruct Text Generation • 4B • Updated Dec 10, 2025 • 678k • 1.41k
Controllable Safety Alignment Artifacts for the paper "Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements" (https://arxiv.org/abs/2410.08968) Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements Paper • 2410.08968 • Published Oct 11, 2024 • 14 microsoft/CoSApien Viewer • Updated Aug 1, 2025 • 200 • 69 • 3 microsoft/CoSAlign-Test Viewer • Updated May 5, 2025 • 3.2k • 165 • 3 microsoft/CoSAlign-Train Viewer • Updated Aug 1, 2025 • 125k • 52 • 4
Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements Paper • 2410.08968 • Published Oct 11, 2024 • 14
MAI-DS-R1 MAI-DS-R1 is a DeepSeek-R1 reasoning model that has been post-trained by the Microsoft AI team. microsoft/MAI-DS-R1 Text Generation • Updated Dec 15, 2025 • 89 • 294 microsoft/MAI-DS-R1-FP8 Text Generation • 671B • Updated Dec 15, 2025 • 520 • 26
SpeechT5 The SpeechT5 framework consists of a shared seq2seq and six modal-specific (speech/text) pre/post-nets that can address a few audio-related tasks. SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing Paper • 2110.07205 • Published Oct 14, 2021 • 6 microsoft/speecht5_tts Text-to-Speech • Updated Nov 8, 2023 • 237k • 827 Runtime error Featured 220 SpeechT5 Speech Synthesis Demo 👩 220 microsoft/speecht5_vc Audio-to-Audio • Updated Mar 22, 2023 • 1.98k • 111
SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing Paper • 2110.07205 • Published Oct 14, 2021 • 6
Table Transformer The Table Transformer (TATR) is a series of object detection models useful for table extraction from PDF images. microsoft/table-transformer-detection Object Detection • 28.8M • Updated Sep 6, 2023 • 3.12M • 412 microsoft/table-transformer-structure-recognition Object Detection • 28.8M • Updated Sep 6, 2023 • 1.17M • 213 microsoft/table-transformer-structure-recognition-v1.1-all Object Detection • 28.8M • Updated Nov 18, 2023 • 228k • 82 microsoft/table-transformer-structure-recognition-v1.1-fin Object Detection • 28.8M • Updated Nov 27, 2023 • 705 • 2
microsoft/table-transformer-structure-recognition Object Detection • 28.8M • Updated Sep 6, 2023 • 1.17M • 213
microsoft/table-transformer-structure-recognition-v1.1-all Object Detection • 28.8M • Updated Nov 18, 2023 • 228k • 82
microsoft/table-transformer-structure-recognition-v1.1-fin Object Detection • 28.8M • Updated Nov 27, 2023 • 705 • 2
Biomedical Models for biomedical research applications, such as radiology report generation and biomedical language understanding. microsoft/maira-2 Text Generation • 7B • Updated Aug 14, 2025 • 2.74k • 71 microsoft/rad-dino-maira-2 Image Feature Extraction • 86.6M • Updated Aug 22, 2024 • 7k • 23 microsoft/rad-dino Image Feature Extraction • 86.6M • Updated Oct 9, 2025 • 17k • 72 microsoft/radedit Updated Dec 8, 2025 • 30
UDOP UDOP is a general multimodal model for document AI Unifying Vision, Text, and Layout for Universal Document Processing Paper • 2212.02623 • Published Dec 5, 2022 • 12 microsoft/udop-large Image-Text-to-Text • 0.7B • Updated Dec 2, 2025 • 98.6k • 123 microsoft/udop-large-512 Image-Text-to-Text • 0.7B • Updated Dec 2, 2025 • 27 • 6 microsoft/udop-large-512-300k Image-Text-to-Text • 0.7B • Updated Dec 2, 2025 • 212 • 34
Unifying Vision, Text, and Layout for Universal Document Processing Paper • 2212.02623 • Published Dec 5, 2022 • 12
Florence Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks Paper • 2311.06242 • Published Nov 10, 2023 • 95 microsoft/Florence-2-large Image-Text-to-Text • 0.8B • Updated Aug 4, 2025 • 1.22M • 1.79k microsoft/Florence-2-base Image-Text-to-Text • 0.2B • Updated Aug 4, 2025 • 800k • 358 microsoft/Florence-2-large-ft Image-Text-to-Text • 0.8B • Updated Aug 4, 2025 • 37.1k • 384
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks Paper • 2311.06242 • Published Nov 10, 2023 • 95
MoCapAct Locomotion policies for hundreds of simulated humanoid locomotion clips and demonstration data for training them. microsoft/mocapact-models Updated Aug 17, 2024 • 10 microsoft/mocapact-data Updated Aug 17, 2024 • 127 • 5 MoCapAct: A Multi-Task Dataset for Simulated Humanoid Control Paper • 2208.07363 • Published Aug 15, 2022 • 2
MoCapAct: A Multi-Task Dataset for Simulated Humanoid Control Paper • 2208.07363 • Published Aug 15, 2022 • 2
VibeVoice Frontier Text-to-Speech Models https://microsoft.github.io/VibeVoice/ microsoft/VibeVoice-1.5B Text-to-Speech • 3B • Updated Jan 22 • 74k • 2.3k microsoft/VibeVoice-Realtime-0.5B Text-to-Speech • 1B • Updated Dec 12, 2025 • 512k • 1.17k VibeVoice Technical Report Paper • 2508.19205 • Published Aug 26, 2025 • 153 microsoft/VibeVoice-ASR Automatic Speech Recognition • 9B • Updated Jan 27 • 557k • 995
Dayhoff Atlas The models and datasets that comprise the Dayhoff Atlas microsoft/Dayhoff Viewer • Updated 3 days ago • 59M • 1.19k • 11 microsoft/Dayhoff-170m-UR50 Text Generation • 0.2B • Updated Jan 16 • 68 • 5 microsoft/Dayhoff-170m-UR90 Text Generation • 0.2B • Updated Jan 26 • 1.22k • 1 microsoft/Dayhoff-170m-GR Text Generation • 0.2B • Updated Jan 26 • 759 • 2
Paza Paza is a collection of speech models & benchmarks for low resource languages by the Microsoft Research Africa - Nairobi Lab Running 12 PazaBench 🥇 12 ASR Leaderboard for low resource languages microsoft/paza-Phi-4-multimodal-instruct Automatic Speech Recognition • 6B • Updated Feb 4 • 61 • 3 microsoft/paza-whisper-large-v3-turbo Automatic Speech Recognition • 0.8B • Updated Feb 4 • 303 • 6
Phi-4 Phi-4 family of small language, multi-modal and reasoning models. microsoft/Phi-4-mini-flash-reasoning Text Generation • Updated Dec 10, 2025 • 959 • 274 microsoft/Phi-4-mini-reasoning Text Generation • Updated Dec 10, 2025 • 23k • 220 microsoft/Phi-4-reasoning Text Generation • Updated Nov 24, 2025 • 15.6k • 220 microsoft/Phi-4-reasoning-plus Text Generation • Updated Nov 24, 2025 • 19.5k • 337
Phi-1 Phi-1 family of small language models. microsoft/phi-1 Text Generation • 1B • Updated Nov 24, 2025 • 5.89k • 218 microsoft/phi-1_5 Text Generation • 1B • Updated Nov 24, 2025 • 76k • 1.36k Textbooks Are All You Need Paper • 2306.11644 • Published Jun 20, 2023 • 154 Textbooks Are All You Need II: phi-1.5 technical report Paper • 2309.05463 • Published Sep 11, 2023 • 90
Textbooks Are All You Need II: phi-1.5 technical report Paper • 2309.05463 • Published Sep 11, 2023 • 90
BitNet 🔥BitNet family of large language models (1-bit LLMs). microsoft/bitnet-b1.58-2B-4T Text Generation • 0.8B • Updated Dec 17, 2025 • 17.5k • 1.42k microsoft/bitnet-b1.58-2B-4T-bf16 Text Generation • 2B • Updated Dec 17, 2025 • 6.69k • 38 microsoft/bitnet-b1.58-2B-4T-gguf Text Generation • 2B • Updated Dec 17, 2025 • 35.8k • 259 BitNet b1.58 2B4T Technical Report Paper • 2504.12285 • Published Apr 16, 2025 • 84
LLM2CLIP LLM2CLIP makes SOTA pretrained CLIP modal more SOTA ever. microsoft/LLM2CLIP-EVA02-L-14-336 Zero-Shot Image Classification • Updated Nov 22, 2024 • 66 • 61 microsoft/LLM2CLIP-Openai-L-14-336 Zero-Shot Classification • 0.6B • Updated Nov 24, 2024 • 5.82k • 44 microsoft/LLM2CLIP-EVA02-B-16 Updated Feb 8, 2025 • 70 • 11 microsoft/LLM2CLIP-Openai-B-16 Zero-Shot Classification • 0.4B • Updated Nov 24, 2024 • 1.5k • 19
microsoft/LLM2CLIP-Openai-L-14-336 Zero-Shot Classification • 0.6B • Updated Nov 24, 2024 • 5.82k • 44
TAPEX TAPEX is the state-of-the-art table pre-training models which can be used for table-based question answering and table-based fact verification. TAPEX: Table Pre-training via Learning a Neural SQL Executor Paper • 2107.07653 • Published Jul 16, 2021 • 3 microsoft/tapex-large-finetuned-wtq Table Question Answering • 0.4B • Updated Jan 12, 2024 • 693 • 78 microsoft/tapex-base-finetuned-wikisql Table Question Answering • Updated Jan 24, 2023 • 896k • • 24 microsoft/tapex-large-sql-execution Table Question Answering • 0.4B • Updated Sep 15, 2023 • 32 • 18
TAPEX: Table Pre-training via Learning a Neural SQL Executor Paper • 2107.07653 • Published Jul 16, 2021 • 3
microsoft/tapex-large-finetuned-wtq Table Question Answering • 0.4B • Updated Jan 12, 2024 • 693 • 78
LayoutLM The LayoutLM series are Transformer encoders useful for document AI tasks such as invoice parsing, document image classification and DocVQA. microsoft/layoutlmv3-base 0.1B • Updated Apr 10, 2024 • 597k • 479 microsoft/layoutlmv2-base-uncased Updated Sep 16, 2022 • 601k • 67 microsoft/layoutlm-base-uncased 0.1B • Updated Apr 16, 2024 • 133k • 62 microsoft/layoutxlm-base Updated Sep 16, 2022 • 7.35k • 74
Orca The Orca family of LMs developed by Microsoft. microsoft/Orca-2-7b Text Generation • Updated Nov 22, 2023 • 1.36k • 224 microsoft/Orca-2-13b Text Generation • Updated Nov 22, 2023 • 3.28k • 667
GIT GIT (Generative Image-to-text Transformer) is a model useful for vision-language tasks such as image/video captioning and question answering. GIT: A Generative Image-to-text Transformer for Vision and Language Paper • 2205.14100 • Published May 27, 2022 • 2 microsoft/git-base Image-to-Text • 0.2B • Updated Apr 24, 2023 • 12k • 107 microsoft/git-large Image-to-Text • Updated Feb 8, 2023 • 490 • 18 microsoft/git-base-vqav2 Visual Question Answering • 0.2B • Updated Mar 9, 2024 • 143 • 21
GIT: A Generative Image-to-text Transformer for Vision and Language Paper • 2205.14100 • Published May 27, 2022 • 2
IFMs Industrial Foundation Models microsoft/LLaMA-2-7b-GTL-Delta Text Generation • 7B • Updated Aug 12, 2024 • 32 • 10 microsoft/LLaMA-2-13b-GTL-Delta Text Generation • 13B • Updated Aug 12, 2024 • 36 • 6
ChatBench ChatBench Datasets and Simulators (same prompt + fine-tuning set-up) from the ChatBench paper. microsoft/ChatBench Preview • Updated Apr 28, 2025 • 278 • 12 microsoft/chatbench-distilgpt2 Text Generation • 81.9M • Updated Aug 23, 2025 • 48 • 4 microsoft/chatbench-llama3-8b Updated Aug 23, 2025 • 12 • 6 microsoft/chatbench-mistral-7b Updated Aug 23, 2025 • 10 • 5
VibeVoice Frontier Text-to-Speech Models https://microsoft.github.io/VibeVoice/ microsoft/VibeVoice-1.5B Text-to-Speech • 3B • Updated Jan 22 • 74k • 2.3k microsoft/VibeVoice-Realtime-0.5B Text-to-Speech • 1B • Updated Dec 12, 2025 • 512k • 1.17k VibeVoice Technical Report Paper • 2508.19205 • Published Aug 26, 2025 • 153 microsoft/VibeVoice-ASR Automatic Speech Recognition • 9B • Updated Jan 27 • 557k • 995
MediPhi A collection of SLMs based on Phi3.5-mini-instruct adapted to clinical natural language processing tasks: https://arxiv.org/abs/2505.10717 A Modular Approach for Clinical SLMs Driven by Synthetic Data with Pre-Instruction Tuning, Model Merging, and Clinical-Tasks Alignment Paper • 2505.10717 • Published May 15, 2025 • 5 microsoft/MediPhi-Instruct Text Generation • 4B • Updated Dec 15, 2025 • 4.47k • 64 microsoft/MediPhi Text Generation • 4B • Updated Dec 15, 2025 • 4.2k • 20 microsoft/MediPhi-PubMed Text Generation • 4B • Updated Dec 15, 2025 • 144 • 10
A Modular Approach for Clinical SLMs Driven by Synthetic Data with Pre-Instruction Tuning, Model Merging, and Clinical-Tasks Alignment Paper • 2505.10717 • Published May 15, 2025 • 5
Dayhoff Atlas The models and datasets that comprise the Dayhoff Atlas microsoft/Dayhoff Viewer • Updated 3 days ago • 59M • 1.19k • 11 microsoft/Dayhoff-170m-UR50 Text Generation • 0.2B • Updated Jan 16 • 68 • 5 microsoft/Dayhoff-170m-UR90 Text Generation • 0.2B • Updated Jan 26 • 1.22k • 1 microsoft/Dayhoff-170m-GR Text Generation • 0.2B • Updated Jan 26 • 759 • 2
NatureLM microsoft/NatureLM-8x7B 47B • Updated Jun 20, 2025 • 69 • 20 microsoft/NatureLM-8x7B-Inst 47B • Updated Jun 20, 2025 • 206 • 25
Paza Paza is a collection of speech models & benchmarks for low resource languages by the Microsoft Research Africa - Nairobi Lab Running 12 PazaBench 🥇 12 ASR Leaderboard for low resource languages microsoft/paza-Phi-4-multimodal-instruct Automatic Speech Recognition • 6B • Updated Feb 4 • 61 • 3 microsoft/paza-whisper-large-v3-turbo Automatic Speech Recognition • 0.8B • Updated Feb 4 • 303 • 6
NextCoder NextCoder family of code-editing LMs developed with Selective Knowledge Transfer and its training data. microsoft/NextCoder-7B Text Generation • 8B • Updated Jun 12, 2025 • 672 • 32 microsoft/NextCoder-14B Text Generation • 15B • Updated Jun 12, 2025 • 708 • 18 microsoft/NextCoder-32B Text Generation • 33B • Updated Jun 12, 2025 • 628 • • 67 microsoft/NextCoderDataset Viewer • Updated Jul 8, 2025 • 381k • 433 • 55
Phi-4 Phi-4 family of small language, multi-modal and reasoning models. microsoft/Phi-4-mini-flash-reasoning Text Generation • Updated Dec 10, 2025 • 959 • 274 microsoft/Phi-4-mini-reasoning Text Generation • Updated Dec 10, 2025 • 23k • 220 microsoft/Phi-4-reasoning Text Generation • Updated Nov 24, 2025 • 15.6k • 220 microsoft/Phi-4-reasoning-plus Text Generation • Updated Nov 24, 2025 • 19.5k • 337
Phi-3 Phi-3 family of small language and multi-modal models. Language models are available in short- and long-context lengths. microsoft/Phi-3.5-mini-instruct Text Generation • 4B • Updated Dec 10, 2025 • 739k • 969 microsoft/Phi-3.5-MoE-instruct Text Generation • Updated Dec 10, 2025 • 97.5k • 571 microsoft/Phi-3.5-vision-instruct Image-Text-to-Text • Updated Dec 10, 2025 • 1.35M • 729 microsoft/Phi-3-mini-4k-instruct Text Generation • 4B • Updated Dec 10, 2025 • 678k • 1.41k
Phi-1 Phi-1 family of small language models. microsoft/phi-1 Text Generation • 1B • Updated Nov 24, 2025 • 5.89k • 218 microsoft/phi-1_5 Text Generation • 1B • Updated Nov 24, 2025 • 76k • 1.36k Textbooks Are All You Need Paper • 2306.11644 • Published Jun 20, 2023 • 154 Textbooks Are All You Need II: phi-1.5 technical report Paper • 2309.05463 • Published Sep 11, 2023 • 90
Textbooks Are All You Need II: phi-1.5 technical report Paper • 2309.05463 • Published Sep 11, 2023 • 90
Controllable Safety Alignment Artifacts for the paper "Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements" (https://arxiv.org/abs/2410.08968) Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements Paper • 2410.08968 • Published Oct 11, 2024 • 14 microsoft/CoSApien Viewer • Updated Aug 1, 2025 • 200 • 69 • 3 microsoft/CoSAlign-Test Viewer • Updated May 5, 2025 • 3.2k • 165 • 3 microsoft/CoSAlign-Train Viewer • Updated Aug 1, 2025 • 125k • 52 • 4
Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements Paper • 2410.08968 • Published Oct 11, 2024 • 14
BitNet 🔥BitNet family of large language models (1-bit LLMs). microsoft/bitnet-b1.58-2B-4T Text Generation • 0.8B • Updated Dec 17, 2025 • 17.5k • 1.42k microsoft/bitnet-b1.58-2B-4T-bf16 Text Generation • 2B • Updated Dec 17, 2025 • 6.69k • 38 microsoft/bitnet-b1.58-2B-4T-gguf Text Generation • 2B • Updated Dec 17, 2025 • 35.8k • 259 BitNet b1.58 2B4T Technical Report Paper • 2504.12285 • Published Apr 16, 2025 • 84
MAI-DS-R1 MAI-DS-R1 is a DeepSeek-R1 reasoning model that has been post-trained by the Microsoft AI team. microsoft/MAI-DS-R1 Text Generation • Updated Dec 15, 2025 • 89 • 294 microsoft/MAI-DS-R1-FP8 Text Generation • 671B • Updated Dec 15, 2025 • 520 • 26
LLM2CLIP LLM2CLIP makes SOTA pretrained CLIP modal more SOTA ever. microsoft/LLM2CLIP-EVA02-L-14-336 Zero-Shot Image Classification • Updated Nov 22, 2024 • 66 • 61 microsoft/LLM2CLIP-Openai-L-14-336 Zero-Shot Classification • 0.6B • Updated Nov 24, 2024 • 5.82k • 44 microsoft/LLM2CLIP-EVA02-B-16 Updated Feb 8, 2025 • 70 • 11 microsoft/LLM2CLIP-Openai-B-16 Zero-Shot Classification • 0.4B • Updated Nov 24, 2024 • 1.5k • 19
microsoft/LLM2CLIP-Openai-L-14-336 Zero-Shot Classification • 0.6B • Updated Nov 24, 2024 • 5.82k • 44
SpeechT5 The SpeechT5 framework consists of a shared seq2seq and six modal-specific (speech/text) pre/post-nets that can address a few audio-related tasks. SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing Paper • 2110.07205 • Published Oct 14, 2021 • 6 microsoft/speecht5_tts Text-to-Speech • Updated Nov 8, 2023 • 237k • 827 Runtime error Featured 220 SpeechT5 Speech Synthesis Demo 👩 220 microsoft/speecht5_vc Audio-to-Audio • Updated Mar 22, 2023 • 1.98k • 111
SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing Paper • 2110.07205 • Published Oct 14, 2021 • 6
TAPEX TAPEX is the state-of-the-art table pre-training models which can be used for table-based question answering and table-based fact verification. TAPEX: Table Pre-training via Learning a Neural SQL Executor Paper • 2107.07653 • Published Jul 16, 2021 • 3 microsoft/tapex-large-finetuned-wtq Table Question Answering • 0.4B • Updated Jan 12, 2024 • 693 • 78 microsoft/tapex-base-finetuned-wikisql Table Question Answering • Updated Jan 24, 2023 • 896k • • 24 microsoft/tapex-large-sql-execution Table Question Answering • 0.4B • Updated Sep 15, 2023 • 32 • 18
TAPEX: Table Pre-training via Learning a Neural SQL Executor Paper • 2107.07653 • Published Jul 16, 2021 • 3
microsoft/tapex-large-finetuned-wtq Table Question Answering • 0.4B • Updated Jan 12, 2024 • 693 • 78
Table Transformer The Table Transformer (TATR) is a series of object detection models useful for table extraction from PDF images. microsoft/table-transformer-detection Object Detection • 28.8M • Updated Sep 6, 2023 • 3.12M • 412 microsoft/table-transformer-structure-recognition Object Detection • 28.8M • Updated Sep 6, 2023 • 1.17M • 213 microsoft/table-transformer-structure-recognition-v1.1-all Object Detection • 28.8M • Updated Nov 18, 2023 • 228k • 82 microsoft/table-transformer-structure-recognition-v1.1-fin Object Detection • 28.8M • Updated Nov 27, 2023 • 705 • 2
microsoft/table-transformer-structure-recognition Object Detection • 28.8M • Updated Sep 6, 2023 • 1.17M • 213
microsoft/table-transformer-structure-recognition-v1.1-all Object Detection • 28.8M • Updated Nov 18, 2023 • 228k • 82
microsoft/table-transformer-structure-recognition-v1.1-fin Object Detection • 28.8M • Updated Nov 27, 2023 • 705 • 2
LayoutLM The LayoutLM series are Transformer encoders useful for document AI tasks such as invoice parsing, document image classification and DocVQA. microsoft/layoutlmv3-base 0.1B • Updated Apr 10, 2024 • 597k • 479 microsoft/layoutlmv2-base-uncased Updated Sep 16, 2022 • 601k • 67 microsoft/layoutlm-base-uncased 0.1B • Updated Apr 16, 2024 • 133k • 62 microsoft/layoutxlm-base Updated Sep 16, 2022 • 7.35k • 74
Biomedical Models for biomedical research applications, such as radiology report generation and biomedical language understanding. microsoft/maira-2 Text Generation • 7B • Updated Aug 14, 2025 • 2.74k • 71 microsoft/rad-dino-maira-2 Image Feature Extraction • 86.6M • Updated Aug 22, 2024 • 7k • 23 microsoft/rad-dino Image Feature Extraction • 86.6M • Updated Oct 9, 2025 • 17k • 72 microsoft/radedit Updated Dec 8, 2025 • 30
Orca The Orca family of LMs developed by Microsoft. microsoft/Orca-2-7b Text Generation • Updated Nov 22, 2023 • 1.36k • 224 microsoft/Orca-2-13b Text Generation • Updated Nov 22, 2023 • 3.28k • 667
UDOP UDOP is a general multimodal model for document AI Unifying Vision, Text, and Layout for Universal Document Processing Paper • 2212.02623 • Published Dec 5, 2022 • 12 microsoft/udop-large Image-Text-to-Text • 0.7B • Updated Dec 2, 2025 • 98.6k • 123 microsoft/udop-large-512 Image-Text-to-Text • 0.7B • Updated Dec 2, 2025 • 27 • 6 microsoft/udop-large-512-300k Image-Text-to-Text • 0.7B • Updated Dec 2, 2025 • 212 • 34
Unifying Vision, Text, and Layout for Universal Document Processing Paper • 2212.02623 • Published Dec 5, 2022 • 12
GIT GIT (Generative Image-to-text Transformer) is a model useful for vision-language tasks such as image/video captioning and question answering. GIT: A Generative Image-to-text Transformer for Vision and Language Paper • 2205.14100 • Published May 27, 2022 • 2 microsoft/git-base Image-to-Text • 0.2B • Updated Apr 24, 2023 • 12k • 107 microsoft/git-large Image-to-Text • Updated Feb 8, 2023 • 490 • 18 microsoft/git-base-vqav2 Visual Question Answering • 0.2B • Updated Mar 9, 2024 • 143 • 21
GIT: A Generative Image-to-text Transformer for Vision and Language Paper • 2205.14100 • Published May 27, 2022 • 2
Florence Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks Paper • 2311.06242 • Published Nov 10, 2023 • 95 microsoft/Florence-2-large Image-Text-to-Text • 0.8B • Updated Aug 4, 2025 • 1.22M • 1.79k microsoft/Florence-2-base Image-Text-to-Text • 0.2B • Updated Aug 4, 2025 • 800k • 358 microsoft/Florence-2-large-ft Image-Text-to-Text • 0.8B • Updated Aug 4, 2025 • 37.1k • 384
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks Paper • 2311.06242 • Published Nov 10, 2023 • 95
IFMs Industrial Foundation Models microsoft/LLaMA-2-7b-GTL-Delta Text Generation • 7B • Updated Aug 12, 2024 • 32 • 10 microsoft/LLaMA-2-13b-GTL-Delta Text Generation • 13B • Updated Aug 12, 2024 • 36 • 6
MoCapAct Locomotion policies for hundreds of simulated humanoid locomotion clips and demonstration data for training them. microsoft/mocapact-models Updated Aug 17, 2024 • 10 microsoft/mocapact-data Updated Aug 17, 2024 • 127 • 5 MoCapAct: A Multi-Task Dataset for Simulated Humanoid Control Paper • 2208.07363 • Published Aug 15, 2022 • 2
MoCapAct: A Multi-Task Dataset for Simulated Humanoid Control Paper • 2208.07363 • Published Aug 15, 2022 • 2