The hidden gem of open-source embedding models: LCO-Embedding for text, image AND audio!
I found this model after reading the recent Massive Audio Embedding Benchmark (MAEB) paper, as it blew the other models out of the water on day zero. I've been using it personally for about a week, and searching my files by describing music, sound effects or images is both practical and entertaining. Really underrated model, would highly recommend checking it out: LCO-Embedding/LCO-Embedding-Omni-7B
Translating benchmarks is a painful process, requiring a lot of manual inspection and adjustments. You start from setting up the whole pipeline and adapting to every format type, including task specifics. There already exist some massive benchmarks, but they still have some simple (and sometimes silly) bugs, which can hurt the evaluations :( We present a novel automated translation framework to help with that!
Eastern and Southern European languages introduce richer linguistic structures compared to English and for benchmarks which heavily rely on grammatical coherence machine translation presents a risk of harming evaluations. We discover potential answer leakage or misleading through grammatical structure of the questions. Some benchmarks are also just outdated and need to be retranslated with newer and better models.
We present a framework with novel test-time scaling methods which allow to control time and cost investments, while at the same time mitigate the need for human-in-the-loop verification. While working on Ukrainian-focused MamayLM models, we had to translate 10+ benchmarks in a short span of time. Finding human evaluators is costly and time-consuming, same goes for using professional translators. With our pipeline we were able to do it in 3 days🏎️
We hope our findings will help enable stronger multilingual evaluations and developments. We release all produced benchmarks on Hugging Face together with the source code and Arxiv paper 🤗
🤔 Many cultures penalize or look down upon self-celebratory behavior. One such example is liking your own post. So why do i do it? Two reasons: 1. I disagree that self-celebratory behavior is inherently bad. 2. On the Huggingface hub, if your post has 0 reactions, it takes TWO whole clicks to react instead of one. So it is actually a UI hack that lowers the bar to engage.
So if you see me reacting to to my own post and thing 'Ugh, this guy is so full of himself' you are only half correct 😆
Now behold as I perform this magic trick called "Exhausting all reaction options for increased visual engagement" so you don't have to click twice to react. You're welcome! Follow this aspiring 🤗 HF Hub influencer for more half-serious bloat in your feed 😜
Did you know that Qwen3 TTS actually utilizes voice embedding? Your voice is turned into a vector of 1024 (or 2048) dimensions, and based on this vector alone you can get your custom voice.
But the coolest part is that this means that you can use math to modify voices, average voices. You can swap gender, pitch, mix and match vocies, and even create an emotion space! This also enables semantic voice search!
The voice embedding model is actually just a tiny encoder with just a few million parameters. I've ripped it out of the voice embeding model so you can use the embedding model standalone. Check out my collection! :D