Dev Mode Explorers

community

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

Prabhjotschugh authored a paper 3 days ago

Not Truly Multilingual: Script Consistency as a Missing Dimension in VLM Evaluation

Prabhjotschugh authored a paper 3 days ago

FirstPass: Grounding AI Scientific Judgment in Multi-Round Editorial Outcomes

Prabhjotschugh authored a paper 3 days ago

Beyond 'One Language, One Script': Quantifying Orthographic Bias in Multilingual VLMs with PuMVR

View all activity

fffiloni

posted an update about 21 hours ago

Post

135

⏱️ Built a small Space for Visual Chronometer / Pulse of Motion.

Upload a video and estimate its Physical FPS: the frame rate implied by visual motion, independent of metadata.
Useful to inspect “chronometric hallucination” in generated videos: clips that look smooth, but move with the wrong physical time scale.

Try it here: fffiloni/Pulse-of-Motion

Prabhjotschugh

authored 4 papers 3 days ago

Not Truly Multilingual: Script Consistency as a Missing Dimension in VLM Evaluation

Paper • 2606.17188 • Published 16 days ago • 1

FirstPass: Grounding AI Scientific Judgment in Multi-Round Editorial Outcomes

Paper • 2606.20769 • Published 15 days ago • 1

Beyond 'One Language, One Script': Quantifying Orthographic Bias in Multilingual VLMs with PuMVR

Paper • 2606.20770 • Published 15 days ago • 1

AirCast-SR: A Foundation Model for Kilometer-Scale Atmospheric Super-Resolution via Latent Consistency Diffusion

Paper • 2605.26130 • Published May 20 • 1

fffiloni

posted an update 6 days ago

Post

1509

A few weeks ago, @victor opened the door: coding agents can now ship Hugging Face Spaces autonomously.

I pulled on that thread.

As someone who builds and ships Gradio demos regularly, I didn’t just want to reproduce the loop. I wanted to see what happens when that loop is plugged into the whole Hugging Face stack.

The interesting part is not only that an agent can ship a Space.

It’s what happens when Space generation becomes a first-class Hugging Face workflow.

That became Agentic Space Factory.

More soon. 🤗

1 reply

Abhaykoul

posted an update 16 days ago

Post

234

Shipped v0.1.2 of vtx — a minimalist coding agent for the terminal.

Most agentic CLIs ship 10k+ token system prompts. Vtx is ~2,200. Less prompt overhead means more room for your code in the model's context window.

Vtx is a from-scratch Python implementation of the design philosophy behind pi-mono — same principles, pure Python, no transpiled runtime.

What ships out of the box:

→ Textual TUI + headless CLI (vtx -p "fix the failing test")
→ 49 LLM provider gateways, all declared in a single provider.yaml
→ 5 core tools (read / edit / write / bash / find) plus web search and fetch
→ Session tree with compaction, handoff, and resume
→ AGENTS.md / CLAUDE.md auto-discovery
→ Skills system — drop SKILL.md files in .agents/skills/ and they become slash commands
→ Two OAuth flows (GitHub Copilot device flow, OpenAI Codex PKCE)
→ Two-mode permissions: prompt (default) or auto, with a safe-command allowlist

This release adds a proper extension system. Register new LLM-callable tools, intercept tool calls, hook lifecycle events, and add slash commands from a single register(api) function in a Python file under ~/.vtx/agent/extensions/. Extensions can override built-in tools by name and chain handler logic across subscribers.

Apache 2.0. uv tool install vtx-coding-agent and you're running.

GitHub: https://github.com/OEvortex/vtx-coding-agent
PyPI: https://pypi.org/project/vtx-coding-agent

Built in the open. Feedback, extensions, and PRs welcome.

DongfuJiang

authored 3 papers 27 days ago

RewardHarness: Self-Evolving Agentic Post-Training

Paper • 2605.08703 • Published May 9 • 10

Cosmos 3: Omnimodal World Models for Physical AI

Paper • 2606.02800 • Published Jun 1 • 138

AutoLab: Can Frontier Models Solve Long-Horizon Auto Research and Engineering Tasks?

Paper • 2606.05080 • Published 30 days ago • 30

alielfilali01

posted an update about 1 month ago

Post

573

Plans in HTML > Plans in Markdown

johko

posted an update about 1 month ago

Post

148

One prompt, three answers - which model is from where?

johko/llm-blind-date

I built a little demo where you give three models (Apertus, Llama, Qwen3) the same prompt and in the end you have to guess which is which just based on their answers.

GIve it a try! ;)

fffiloni

posted an update about 2 months ago

Post

3843

I built HF Radio on Hugging Face Spaces 📻
fffiloni/HF-Radio

A live community radio for AI-generated songs, powered by tracks created with ACE-Step.

You can tune in, discover community-made songs in many languages, vote on what sounds good, and mark your real favorites as Bangers.

The more people listen, vote, and create, the better the station gets.

Under the hood, it connects a few Hugging Face pieces together:

Spaces for the live app, HF buckets for community tracks, OAuth for signed-in listeners, server-side streaming with ffmpeg, hourly playlist refreshes, moderation, jingles, and community feedback loops.

It’s not just a playlist.

It’s a shared taste experiment:
new songs get a shot every hour, and the community helps decide what deserves another spin.

Come listen.
Find weird gems.
Support the Bangers.
Shape the radio.

—> fffiloni/HF-Radio

Tonic

posted an update about 2 months ago

Post

2992

🙋🏻‍♂️ Hey there folks ,

Turns out : if we predict 🌏 earth we can save a lot of time looking for interesting things and less time looking at things that we expect to see.

Sentinel-2 imagery 🛰️basically takes a long time to download towards earth. so our "near real time" systems are quite far from that in practical terms.

meanwhile , if we "predict" what we will see , based on what we do see , we can send down much less data in a timely way , and prioritize 📡earth-bound response .

I'm talking about illegal fishing , logging , mining or building in nature reserves , the more of that we predict early the more we're able to stop it on time.

At least that's the concept !

check out the blog : https://huggingface.co/blog/Tonic/save-patagonia-by-predicting-earth

- Collection: https://huggingface.co/collections/NuTonic/earth-observation-with-temporal-and-general-understanding
- Code: https://github.com/Josephrp/Nutonic
- Dataset: NuTonic/sat-vl-sft-training-ready-v1
- Model: NuTonic/lspace
- Training: NuTonic/lspace-trackio
- Evals: NuTonic/Patagonia_Eval

2 replies

DongfuJiang

authored 4 papers about 2 months ago

Watch Before You Answer: Learning from Visually Grounded Post-Training

Paper • 2604.05117 • Published Apr 6 • 36

ClawBench: Can AI Agents Complete Everyday Online Tasks?

Paper • 2604.08523 • Published Apr 9 • 265

Nemotron 3 Super: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

Paper • 2604.12374 • Published Apr 14 • 37

Beyond Semantic Similarity: Rethinking Retrieval for Agentic Search via Direct Corpus Interaction

Paper • 2605.05242 • Published May 3 • 126

fffiloni

posted an update about 2 months ago

Post

505

Great technical guide by Nico Martin on the Hugging Face blog, showing how to use Transformers.js inside a Chrome extension and run ONNX models from the Hub locally with WebGPU inside a Manifest V3 extension.

The interesting part: this is not just a chatbot in a side panel.

The article walks through the architecture behind a browser agent that can read open tabs, query webpages, search history, and highlight elements directly on the page — with models downloaded from the Hugging Face Hub, cached under the extension origin, and executed locally instead of being called through a remote API for every prompt.

A strong blueprint for building local-first web copilots, reading assistants, and AI-powered browsing workflows.

Article: https://huggingface.co/blog/transformersjs-chrome-extension

fffiloni

posted an update about 2 months ago

Post

346

I’ve been reading “What if AI systems weren’t chatbots?”
What if AI systems weren't chatbots? (2605.07896) 👀

The paper asks a simple but important question: what if the chatbot interface is not just a neutral wrapper around AI models, but part of the problem?

A chatbot can make a system feel more capable, more certain, and more “human” than it really is. That matters, because interfaces shape how we trust, use, and delegate to AI systems.

When everything becomes: ask → answer
we can lose sight of the actual workflow:
- parameters
- alternatives
- uncertainty
- intermediate steps
- failure modes
- human control

For creative AI especially — image, video, editing, animation — I’m not sure “chat” should always be the default interface.

Sometimes we need a conversation.
But often we need a canvas, a timeline, sliders, masks, previews, comparisons, and visible pipelines.

This is also why I find many open ML demos interesting: Spaces, Gradio apps, visual tools, small focused interfaces.

They often explore another direction — not just better assistants, but better tools. 🤗

2 replies

AI & ML interests

Recent Activity

Team members 144

dev-mode-explorers's activity