Gradio-Blocks-Party

company

Activity Feed

AI & ML interests

None defined yet.

akhaliq

submitted a paper to Daily Papers 3 days ago

DynaFLIP: Rethinking Robotics Perception via Tri-Modal-Dynamics Guided Representation

Paper • 2605.30350 • Published 5 days ago • 8

akhaliq

submitted a paper to Daily Papers 4 days ago

Contrastive Distribution Matching for Amortized Sequential Monte Carlo in Discrete Diffusion

Paper • 2605.23346 • Published 11 days ago

akhaliq

submitted a paper to Daily Papers 12 days ago

optimize_anything: A Universal API for Optimizing any Text Parameter

Paper • 2605.19633 • Published 14 days ago • 6

akhaliq

submitted a paper to Daily Papers about 1 month ago

Image Generators are Generalist Vision Learners

Paper • 2604.20329 • Published Apr 22 • 21

akhaliq

submitted a paper to Daily Papers about 2 months ago

MultiGen: Level-Design for Editable Multiplayer Worlds in Diffusion Game Engines

Paper • 2603.06679 • Published Mar 30 • 6

1aurent

authored a paper 2 months ago

Voxtral TTS

Paper • 2603.25551 • Published Mar 26 • 63

akhaliq

submitted a paper to Daily Papers 2 months ago

AVO: Agentic Variation Operators for Autonomous Evolutionary Search

Paper • 2603.24517 • Published Mar 25 • 11

akhaliq

submitted 2 papers to Daily Papers 3 months ago

V-Co: A Closer Look at Visual Representation Alignment via Co-Denoising

Paper • 2603.16792 • Published Mar 17 • 3

Multimodal OCR: Parse Anything from Documents

Paper • 2603.13032 • Published Mar 13 • 44

marksverdhei

posted an update 3 months ago

Post

509

The hidden gem of open-source embedding models: LCO-Embedding
for text, image AND audio!

I found this model after reading the recent Massive Audio Embedding Benchmark (MAEB) paper, as it blew the other models out of the water on day zero. I've been using it personally for about a week, and searching my files by describing music, sound effects or images is both practical and entertaining. Really underrated model, would highly recommend checking it out: LCO-Embedding/LCO-Embedding-Omni-7B

PS: If you're looking you run this model on llama.cpp, i've gone ahead and quantized them for you here 👉 https://huggingface.co/collections/marksverdhei/lco-embedding-omni-gguf

7 replies

hannayukhymenko

submitted a paper to Daily Papers 3 months ago

Recovered in Translation: Efficient Pipeline for Automated Translation of Benchmarks and Datasets

Paper • 2602.22207 • Published Feb 25 • 44

hannayukhymenko

posted an update 3 months ago

Post

2091

Do you translate your benchmarks from English correctly? 🤔
Turns out, for many languages it is much harder than you can imagine!

Introducing Recovered in Translation 🌍 together with @aalexandrov
https://ritranslation.insait.ai

Translating benchmarks is a painful process, requiring a lot of manual inspection and adjustments. You start from setting up the whole pipeline and adapting to every format type, including task specifics. There already exist some massive benchmarks, but they still have some simple (and sometimes silly) bugs, which can hurt the evaluations :( We present a novel automated translation framework to help with that!

Eastern and Southern European languages introduce richer linguistic structures compared to English and for benchmarks which heavily rely on grammatical coherence machine translation presents a risk of harming evaluations. We discover potential answer leakage or misleading through grammatical structure of the questions. Some benchmarks are also just outdated and need to be retranslated with newer and better models.

We present a framework with novel test-time scaling methods which allow to control time and cost investments, while at the same time mitigate the need for human-in-the-loop verification. While working on Ukrainian-focused MamayLM models, we had to translate 10+ benchmarks in a short span of time. Finding human evaluators is costly and time-consuming, same goes for using professional translators. With our pipeline we were able to do it in 3 days🏎️

We hope our findings will help enable stronger multilingual evaluations and developments. We release all produced benchmarks on Hugging Face together with the source code and Arxiv paper 🤗

Paper: Recovered in Translation: Efficient Pipeline for Automated Translation of Benchmarks and Datasets (2602.22207)
Code: https://github.com/insait-institute/ritranslation
Benchmarks: https://huggingface.co/collections/INSAIT-Institute/multilingual-benchmarks

1 reply

hannayukhymenko

authored a paper 3 months ago

Recovered in Translation: Efficient Pipeline for Automated Translation of Benchmarks and Datasets

Paper • 2602.22207 • Published Feb 25 • 44

marksverdhei

posted an update 3 months ago

Post

1532

🤔 Many cultures penalize or look down upon self-celebratory behavior. One such example is liking your own post. So why do i do it? Two reasons:
1. I disagree that self-celebratory behavior is inherently bad.
2. On the Huggingface hub, if your post has 0 reactions, it takes TWO whole clicks to react instead of one. So it is actually a UI hack that lowers the bar to engage.

So if you see me reacting to to my own post and thing 'Ugh, this guy is so full of himself' you are only half correct 😆

Now behold as I perform this magic trick called "Exhausting all reaction options for increased visual engagement" so you don't have to click twice to react. You're welcome!
Follow this aspiring 🤗 HF Hub influencer for more half-serious bloat in your feed 😜

1 reply

marksverdhei

posted an update 3 months ago

Post

1762

# The most underrated feature of Qwen3-TTS: Voice embeddings! 🧑‍🦰💬
https://huggingface.co/collections/marksverdhei/qwen3-voice-embedding

Did you know that Qwen3 TTS actually utilizes voice embedding?
Your voice is turned into a vector of 1024 (or 2048) dimensions,
and based on this vector alone you can get your custom voice.

But the coolest part is that this means that you can use math to modify voices, average voices. You can swap gender, pitch, mix and match vocies, and even create an emotion space! This also enables semantic voice search!

The voice embedding model is actually just a tiny encoder with just a few million parameters. I've ripped it out of the voice embeding model so you can use the embedding model standalone. Check out my collection! :D