Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Website
Tasks
HuggingChat
Collections
Languages
Organizations
Community
Blog
Posts
Daily Papers
Learn
Discord
Forum
GitHub
Solutions
Team & Enterprise
Hugging Face PRO
Enterprise Support
Inference Providers
Inference Endpoints
Storage Buckets
Log In
Sign Up
32
3
1
Stas Bekman
stas
Follow
RusiraJayatilake's profile picture
jaesun's profile picture
RafaelZequeira's profile picture
135 followers
·
4 following
https://stasosphere.com/machine-learning/
StasBekman
stas00
stasbekman
AI & ML interests
Toolmaker. Software creator, optimizer and harmonizer. Makes things work and fly at Snowflake AI Research Training LLM/RAG/Generative AI/Machine Learning/Scalability
Recent Activity
posted
an
update
about 5 hours ago
PSA for DeepSpeed users - a long outstanding precision-related critical bug has been identified and fixed in https://github.com/deepspeedai/DeepSpeed/pull/8066 and a new release has been made. The issue was about mixed precision mode downcasting buffers that had to be in fp32 - massively impacting correctness due to large static buffers - e.g. RoPE in Qwen3 models when using long sequence lengths 32K+. Hopefully this fix brings Deepspeed to a close parity with FSDP2 which has been an issue since a long time. You can still have the old behavior but you'd now need to manually configure it - by default the model's buffers will now remain in the original precision. Please install deepspeed==0.19.2 which will do the right thing. Thanks to Tunji Ruwase and Claude Opus 4.8 via Cursor for identifying and fixing the problem.
updated
a model
3 months ago
stas/ml-engineering-book
posted
an
update
3 months ago
Good news! Ulysses Sequence Parallelism from the Snowflake AI Research and the Deepspeed teams has been integrated into HuggingFace Trainer, Accelerate and TRL For extensive details please see this writeup: https://huggingface.co/blog/ulysses-sp Thanks a lot to Kashif Rasul for helping make it happen. Also the others in the HF team who helped with integration.
View all activity
Organizations
stas
's models
9
Sort:Â Recently updated
stas/ml-engineering-book
Updated
Mar 11
•
27
stas/tiny-random-llama-2
Text Generation
•
104k
•
Updated
Nov 14, 2023
•
2.28k
•
42
stas/tiny-m2m_100
Updated
Apr 29, 2022
•
6.19k
stas/tr8b-104B-debug3
Updated
Nov 29, 2021
stas/pegasus-cnn_dailymail-tiny-random
Updated
Jul 1, 2021
•
3
stas/mt5-tiny-random
Updated
Jun 23, 2021
•
174
•
2
stas/tiny-wmt19-en-de
Updated
May 3, 2021
•
78.6k
•
1
stas/tiny-wmt19-en-ru
Updated
May 3, 2021
•
105
stas/t5-very-small-random
Updated
Apr 21, 2021
•
5
•
1