S/o to my HF bros @tomaarsen @BramVanroy @woojun-jung @christopher @mrm8488 @prithivMLmods for this 4 months work project
Loรฏck BOURDOIS
AI & ML interests
๐
Recent Activity
updated a collection about 5 hours ago
French think and toolcalling datasets updated a collection about 5 hours ago
French DPO and conversation datasets updated a dataset about 5 hours ago
lbourdois/smolLM3_french_dataOrganizations
replied to their post 1 day ago
posted an update 1 day ago
Post
205
New blog post!
An introduction to a little-known but highly effective model reduction method: ๐ง๐ฟ๐ถ๐บ๐บ๐ถ๐ป๐ดโ๏ธ
We show how to reduce model size (we went up to 87.24% reduction) while preserving its performance.
We applied this technique to 16 different model families across several modalities to illustrate that it works on any architecture (as long as the embedding layer is the last one of the model) and on any modality involving text.
From these 16 families, we generated over ๐ฑ,๐ฑ๐ฌ๐ฌ ๐บ๐ผ๐ป๐ผ๐น๐ถ๐ป๐ด๐๐ฎ๐น ๐บ๐ผ๐ฑ๐ฒ๐น๐ ๐ถ๐ป ๐ญ๐ฎ๐ฐ ๐ฑ๐ถ๐ณ๐ณ๐ฒ๐ฟ๐ฒ๐ป๐ ๐น๐ฎ๐ป๐ด๐๐ฎ๐ด๐ฒ๐ ๐
Key takeaways from our experiments:
1๏ธโฃ Trimming does not require a GPU. Our models were obtained on a CPU.
2๏ธโฃ This method scales up to at least 4B parameters (we did not test beyond that).
3๏ธโฃ Trimmed model is smaller than the original while preserving its performance. If you observe a slight performance drop, just fine-tuned to recover or even surpass the original performance.
4๏ธโฃ For an equivalent compute budget, it is better to trim then fine-tune rather than fine-tuning the original model. Since the model is smaller, you can run more epochs/show more data and get in fine a better model than the original.
5๏ธโฃ Trimming is a competitive alternative to distillation and quantization. E.g. we obtained our alternative to DistilBERT in 9 minutes on CPU vs. 90 hours of GPU for the latter.
6๏ธโฃ Trimming could generate reasoning traces in the language of the trimmed model. This could be an alternative to generating traces in English and then translating them into the desired language.
And many other things (such as how much data are needed, the impact of the database used, the order in which it should be done, etc.) are available in the blogpost!
Blogpost: https://huggingface.co/blog/lbourdois/introduction-to-trimming
Models: alphaedge-ai/Trimming_models_search
An introduction to a little-known but highly effective model reduction method: ๐ง๐ฟ๐ถ๐บ๐บ๐ถ๐ป๐ดโ๏ธ
We show how to reduce model size (we went up to 87.24% reduction) while preserving its performance.
We applied this technique to 16 different model families across several modalities to illustrate that it works on any architecture (as long as the embedding layer is the last one of the model) and on any modality involving text.
From these 16 families, we generated over ๐ฑ,๐ฑ๐ฌ๐ฌ ๐บ๐ผ๐ป๐ผ๐น๐ถ๐ป๐ด๐๐ฎ๐น ๐บ๐ผ๐ฑ๐ฒ๐น๐ ๐ถ๐ป ๐ญ๐ฎ๐ฐ ๐ฑ๐ถ๐ณ๐ณ๐ฒ๐ฟ๐ฒ๐ป๐ ๐น๐ฎ๐ป๐ด๐๐ฎ๐ด๐ฒ๐ ๐
Key takeaways from our experiments:
1๏ธโฃ Trimming does not require a GPU. Our models were obtained on a CPU.
2๏ธโฃ This method scales up to at least 4B parameters (we did not test beyond that).
3๏ธโฃ Trimmed model is smaller than the original while preserving its performance. If you observe a slight performance drop, just fine-tuned to recover or even surpass the original performance.
4๏ธโฃ For an equivalent compute budget, it is better to trim then fine-tune rather than fine-tuning the original model. Since the model is smaller, you can run more epochs/show more data and get in fine a better model than the original.
5๏ธโฃ Trimming is a competitive alternative to distillation and quantization. E.g. we obtained our alternative to DistilBERT in 9 minutes on CPU vs. 90 hours of GPU for the latter.
6๏ธโฃ Trimming could generate reasoning traces in the language of the trimmed model. This could be an alternative to generating traces in English and then translating them into the desired language.
And many other things (such as how much data are needed, the impact of the database used, the order in which it should be done, etc.) are available in the blogpost!
Blogpost: https://huggingface.co/blog/lbourdois/introduction-to-trimming
Models: alphaedge-ai/Trimming_models_search
posted an update 8 months ago
Post
1774
New blog post analyzing the top 50 entities with the most downloaded models on @huggingface ๐ค!
https://huggingface.co/blog/lbourdois/huggingface-models-stats
The purpose here is to get an idea of the profile of the models with the greatest impact in open source (we are not interested in closed models here!).
32 figures + data
Enjoy ๐ค
https://huggingface.co/blog/lbourdois/huggingface-models-stats
The purpose here is to get an idea of the profile of the models with the greatest impact in open source (we are not interested in closed models here!).
32 figures + data
Enjoy ๐ค
reacted to tomaarsen's post with โค๏ธ 10 months ago
Post
4559
๐ I just published Sentence Transformers v5.1.0, and it's a big one. 2x-3x speedups of SparseEncoder models via ONNX and/or OpenVINO backends, easier distillation data preparation with hard negatives mining, and more:
1๏ธโฃ Faster ONNX and OpenVINO backends for SparseEncoder models
Usage is as simple as
2๏ธโฃ New
This new output format is immediately compatible with the MarginMSELoss and SparseMarginMSELoss for training SentenceTransformer, CrossEncoder, and SparseEncoder losses.
3๏ธโฃ Gathering across devices
When doing multi-GPU training using a loss that has in-batch negatives (e.g. MultipleNegativesRankingLoss), you can now use
4๏ธโฃ Trackio support
If you also upgrade
5๏ธโฃ MTEB Documentation
We've added some documentation on evaluating SentenceTransformer models properly with MTEB. It's rudimentary as the documentation on the MTEB side is already great, but it should get you started.
Plus many more smaller features & fixes (crash fixes, compatibility with datasets v4, FIPS compatibility, etc.).
See the full release notes here: https://github.com/UKPLab/sentence-transformers/releases/tag/v5.1.0
Big thanks to all of the contributors for helping with the release, many of the features from this release were proposed by others. I have a big list of future potential features that I'd love to add, but I'm
1๏ธโฃ Faster ONNX and OpenVINO backends for SparseEncoder models
Usage is as simple as
backend="onnx" or backend="openvino" when initializing a SparseEncoder to get started, but I also included utility functions for optimization, dynamic quantization, and static quantization, plus benchmarks.2๏ธโฃ New
n-tuple-scores output format from mine_hard_negativesThis new output format is immediately compatible with the MarginMSELoss and SparseMarginMSELoss for training SentenceTransformer, CrossEncoder, and SparseEncoder losses.
3๏ธโฃ Gathering across devices
When doing multi-GPU training using a loss that has in-batch negatives (e.g. MultipleNegativesRankingLoss), you can now use
gather_across_devices=True to load in-batch negatives from the other devices too! Essentially a free lunch, pretty big impact potential in my evals.4๏ธโฃ Trackio support
If you also upgrade
transformers, and you install trackio with pip install trackio, then your experiments will also automatically be tracked locally with trackio. Just open up localhost and have a look at your losses/evals, no logins, no metric uploading.5๏ธโฃ MTEB Documentation
We've added some documentation on evaluating SentenceTransformer models properly with MTEB. It's rudimentary as the documentation on the MTEB side is already great, but it should get you started.
Plus many more smaller features & fixes (crash fixes, compatibility with datasets v4, FIPS compatibility, etc.).
See the full release notes here: https://github.com/UKPLab/sentence-transformers/releases/tag/v5.1.0
Big thanks to all of the contributors for helping with the release, many of the features from this release were proposed by others. I have a big list of future potential features that I'd love to add, but I'm
reacted to Wauplin's post with ๐ฅ๐๐ค about 1 year ago
Post
2370
โผ๏ธ huggingface_hub's v0.30.0 is out with our biggest update of the past two years!
Full release notes: https://github.com/huggingface/huggingface_hub/releases/tag/v0.30.0.
๐ Ready. Xet. Go!
Xet is a groundbreaking new protocol for storing large objects in Git repositories, designed to replace Git LFS. Unlike LFS, which deduplicates files, Xet operates at the chunk levelโmaking it a game-changer for AI builders collaborating on massive models and datasets. Our Python integration is powered by [xet-core](https://github.com/huggingface/xet-core), a Rust-based package that handles all the low-level details.
You can start using Xet today by installing the optional dependency:
With that, you can seamlessly download files from Xet-enabled repositories! And donโt worryโeverything remains fully backward-compatible if youโre not ready to upgrade yet.
Blog post: https://huggingface.co/blog/xet-on-the-hub
Docs: https://huggingface.co/docs/hub/en/storage-backends#xet
โก Inference Providers
- Weโre thrilled to introduce Cerebras and Cohere as official inference providers! This expansion strengthens the Hub as the go-to entry point for running inference on open-weight models.
- Novita is now our 3rd provider to support text-to-video task after Fal.ai and Replicate.
- Centralized billing: manage your budget and set team-wide spending limits for Inference Providers! Available to all Enterprise Hub organizations.
- No more timeouts when generating videos, thanks to async calls. Available right now for Fal.ai, expecting more providers to leverage the same structure very soon!
Full release notes: https://github.com/huggingface/huggingface_hub/releases/tag/v0.30.0.
๐ Ready. Xet. Go!
Xet is a groundbreaking new protocol for storing large objects in Git repositories, designed to replace Git LFS. Unlike LFS, which deduplicates files, Xet operates at the chunk levelโmaking it a game-changer for AI builders collaborating on massive models and datasets. Our Python integration is powered by [xet-core](https://github.com/huggingface/xet-core), a Rust-based package that handles all the low-level details.
You can start using Xet today by installing the optional dependency:
pip install -U huggingface_hub[hf_xet]With that, you can seamlessly download files from Xet-enabled repositories! And donโt worryโeverything remains fully backward-compatible if youโre not ready to upgrade yet.
Blog post: https://huggingface.co/blog/xet-on-the-hub
Docs: https://huggingface.co/docs/hub/en/storage-backends#xet
โก Inference Providers
- Weโre thrilled to introduce Cerebras and Cohere as official inference providers! This expansion strengthens the Hub as the go-to entry point for running inference on open-weight models.
- Novita is now our 3rd provider to support text-to-video task after Fal.ai and Replicate.
- Centralized billing: manage your budget and set team-wide spending limits for Inference Providers! Available to all Enterprise Hub organizations.
from huggingface_hub import InferenceClient
client = InferenceClient(provider="fal-ai", bill_to="my-cool-company")
image = client.text_to_image(
"A majestic lion in a fantasy forest",
model="black-forest-labs/FLUX.1-schnell",
)
image.save("lion.png")- No more timeouts when generating videos, thanks to async calls. Available right now for Fal.ai, expecting more providers to leverage the same structure very soon!
posted an update about 1 year ago
Post
3710
We introduce FAT5 (Flash Attention T5) โก
An implementation of T5 in PyTorch with UL2 objective optimized for GPGPU for both training and inference thanks to 13 different optimizations.
The main one is that we have designed a CUDA kernel to expand the Flash Attention by @tridao with RPE biases and supports other PE such as RoPE, ALiBi or FIRE.
The result kernel is 2 times faster than a SPDA implementation.
We also use Triton kernels to optimize certain parts of the architecture, such as the cross-entropy and RMSNorm layer.
The various kernels have been carefully built to be compatible with BF16 and torch.compile to go even faster and achieve efficient pretraining.
All other optimizations are described in a ๐ subsequent blog post available on @huggingface ๐ค: CATIE-AQ/FAT5-report.
This methodology enabled us to efficiently pretrain as a proof of concept a FAT5 with 147M parameters in French in a reasonable time (1,461H for 419B tokens), with limited resources (1 A100 i.e. a computational budget of ~ โฌ1,900) and a low carbon footprint (13.5kg eq CO2).
The model's weights are also available on Hugging Face: CATIE-AQ/FAT5-small.
Not very useful in practice, it's a PoC and not an instructed model (it's planned for later).
All the code is available on GitHub if you want to pretrain your own model in your own language or for a specific domain: https://github.com/catie-aq/flashT5 โญ
Ending by indicating that was a joint project with @BorisAlbar at hf.co/CATIE-AQ.
An implementation of T5 in PyTorch with UL2 objective optimized for GPGPU for both training and inference thanks to 13 different optimizations.
The main one is that we have designed a CUDA kernel to expand the Flash Attention by @tridao with RPE biases and supports other PE such as RoPE, ALiBi or FIRE.
The result kernel is 2 times faster than a SPDA implementation.
We also use Triton kernels to optimize certain parts of the architecture, such as the cross-entropy and RMSNorm layer.
The various kernels have been carefully built to be compatible with BF16 and torch.compile to go even faster and achieve efficient pretraining.
All other optimizations are described in a ๐ subsequent blog post available on @huggingface ๐ค: CATIE-AQ/FAT5-report.
This methodology enabled us to efficiently pretrain as a proof of concept a FAT5 with 147M parameters in French in a reasonable time (1,461H for 419B tokens), with limited resources (1 A100 i.e. a computational budget of ~ โฌ1,900) and a low carbon footprint (13.5kg eq CO2).
The model's weights are also available on Hugging Face: CATIE-AQ/FAT5-small.
Not very useful in practice, it's a PoC and not an instructed model (it's planned for later).
All the code is available on GitHub if you want to pretrain your own model in your own language or for a specific domain: https://github.com/catie-aq/flashT5 โญ
Ending by indicating that was a joint project with @BorisAlbar at hf.co/CATIE-AQ.
reacted to Wauplin's post with ๐ฅ๐ค over 1 year ago
Post
3236
What a great milestone to celebrate! The huggingface_hub library is slowly becoming a cornerstone of the Python ML ecosystem when it comes to interacting with the @huggingface Hub. It wouldn't be there without the hundreds of community contributions and feedback! No matter if you are loading a model, sharing a dataset, running remote inference or starting jobs on our infra, you are for sure using it! And this is only the beginning so give a star if you wanna follow the project ๐ https://github.com/huggingface/huggingface_hub
reacted to davanstrien's post with ๐๐ over 1 year ago
Post
3267
ColPali is revolutionizing multimodal retrieval, but could it be even more effective with domain-specific fine-tuning?
Check out my latest blog post, where I guide you through creating a ColPali fine-tuning dataset using Qwen/Qwen2-VL-7B-Instruct to generate queries for a collection of UFO documents sourced from the Internet Archive.
The post covers:
- Introduction to data for ColPali models
- Using Qwen2-VL for retrieval query generation
- Tips for better query generation
Check out the post here:
https://danielvanstrien.xyz/posts/post-with-code/colpali/2024-09-23-generate_colpali_dataset.html
The resulting Hugging Face dataset: davanstrien/ufo-ColPali
Check out my latest blog post, where I guide you through creating a ColPali fine-tuning dataset using Qwen/Qwen2-VL-7B-Instruct to generate queries for a collection of UFO documents sourced from the Internet Archive.
The post covers:
- Introduction to data for ColPali models
- Using Qwen2-VL for retrieval query generation
- Tips for better query generation
Check out the post here:
https://danielvanstrien.xyz/posts/post-with-code/colpali/2024-09-23-generate_colpali_dataset.html
The resulting Hugging Face dataset: davanstrien/ufo-ColPali
reacted to tomaarsen's post with โค๏ธ๐๐๐ฅ over 1 year ago
Post
2213
๐SetFit v1.1.0 is out! Training efficient classifiers on CPU or GPU now uses the Sentence Transformers Trainer, and we resolved a lot of issues caused by updates of third-party libraries (like Transformers). Details:
Training a SetFit classifier model consists of 2 phases:
1. Finetuning a Sentence Transformer embedding model
2. Training a Classifier to map embeddings -> classes
๐The first phase now uses the SentenceTransformerTrainer that was introduced in the Sentence Transformers v3 update. This brings some immediate upsides like MultiGPU support, without any (intended) breaking changes.
โก๏ธ Beyond that, we softly deprecated the "evaluation_strategy" argument in favor of "eval_strategy" (following a Transformers deprecation), and deprecated Python 3.7. In return, we add official support for Python 3.11 and 3.12.
โจ There's some more minor changes too, like max_steps and eval_max_steps now being a hard limit instead of an approximate one, training/validation losses now logging nicely in Notebooks, and the "device" parameter no longer being ignored in some situations.
Check out the full release notes here: https://github.com/huggingface/setfit/releases/tag/v1.1.0
Or read the documentation: https://huggingface.co/docs/setfit
Or check out the public SetFit models for inspiration: https://huggingface.co/models?library=setfit&sort=created
P.s. the model in the code snippet trained in 1 minute and it can classify ~6000 sentences per second on my GPU.
Training a SetFit classifier model consists of 2 phases:
1. Finetuning a Sentence Transformer embedding model
2. Training a Classifier to map embeddings -> classes
๐The first phase now uses the SentenceTransformerTrainer that was introduced in the Sentence Transformers v3 update. This brings some immediate upsides like MultiGPU support, without any (intended) breaking changes.
โก๏ธ Beyond that, we softly deprecated the "evaluation_strategy" argument in favor of "eval_strategy" (following a Transformers deprecation), and deprecated Python 3.7. In return, we add official support for Python 3.11 and 3.12.
โจ There's some more minor changes too, like max_steps and eval_max_steps now being a hard limit instead of an approximate one, training/validation losses now logging nicely in Notebooks, and the "device" parameter no longer being ignored in some situations.
Check out the full release notes here: https://github.com/huggingface/setfit/releases/tag/v1.1.0
Or read the documentation: https://huggingface.co/docs/setfit
Or check out the public SetFit models for inspiration: https://huggingface.co/models?library=setfit&sort=created
P.s. the model in the code snippet trained in 1 minute and it can classify ~6000 sentences per second on my GPU.
reacted to merve's post with โค๏ธ๐ค๐ค almost 2 years ago
Post
2599
๐ฅน @lbourdois has made an app to browse all of my vision paper summaries for everyone's convenience merve/vision_papers
reacted to severo's post with โค๏ธ almost 2 years ago
Post
3646
[New tool] Follow interesting ML persons ๐ฉโ๐จ ๐จโ๐ค ๐ฉโ๐ซ with Followgraph
severo/followgraph
Please try it and tell me if it helped you discover high-quality content ๐ ๐
I repurposed "Followgraph for Mastodon" (https://followgraph.vercel.app/).
My new follows: @TheBloke @mlabonne @teknium @KnutJaegersberg @SkalskiP @AmelieSchreiber @lbourdois @ceyda @andrewyng @Pclanglais @karpathy
And you?
severo/followgraph
Please try it and tell me if it helped you discover high-quality content ๐ ๐
I repurposed "Followgraph for Mastodon" (https://followgraph.vercel.app/).
My new follows: @TheBloke @mlabonne @teknium @KnutJaegersberg @SkalskiP @AmelieSchreiber @lbourdois @ceyda @andrewyng @Pclanglais @karpathy
And you?