Collection of GGUFs for inference with vla.cpp, a unified C++ inference engine for Vision-Language-Action models.
AI & ML interests
None defined yet.
Recent Activity
View all activity
Organization Card
VinRobotics - Edge AI & Model Optimization
We optimize and deploy LLMs, ASR, VLM and VLA (Vision-Language-Action) models on real-world systems.
Featured Projects
vla.cpp Native C++ inference runtime for Vision-Language-Action models, built for low-latency robotic deployment.
Model Quantization Recipes Practical recipes for quantizing and deploying LLM, ASR, VLM, and VLA models on real-world systems.
What we do
- Optimization: quantization (INT8/INT4/FP8/NVFP4), pruning, distillation, ...
- Deployment: VLLM, TensorRT, ONNX Runtime, edge runtimes
- Systems: real-time pipelines (vision, audio, language, action)
Focus
- Edge devices (Jetson, SoCs)
- Robotics & VLA systems
- Latency, stability, deployability
Philosophy
Optimization = model + runtime + system
models 27
vrfai/vla-adapter-libero-gguf
Robotics • 1B • Updated
vrfai/openvla-oft-libero-gguf
Robotics • 8B • Updated
vrfai/pi05-libero-gguf
Robotics • 3B • Updated
vrfai/gr00tn1d5-libero-object-gguf
Robotics • 2B • Updated • 8
vrfai/gr00tn1d6-libero-gguf
Robotics • 3B • Updated • 15
vrfai/Qwen3-ASR-1.7B-int8
Automatic Speech Recognition • 2B • Updated • 3
vrfai/Qwen3-ASR-1.7B-int4
Automatic Speech Recognition • 2B • Updated • 3
vrfai/Qwen3-ASR-1.7B-fp8
Automatic Speech Recognition • 2B • Updated • 2.48k • 5
vrfai/Qwen3-ASR-1.7B-nvfp4
Automatic Speech Recognition • 1B • Updated • 156 • 5
vrfai/gemma-4-E4B-it-fp8
Text Generation • 8B • Updated • 1.15k • 4
datasets 0
None public yet