Optimize speculative decoding performance by increasing max_num_batched_tokens to 4096 a0613f9 encryptd commited on 1 day ago
Fix HF space build error by using official vllm-openai:v0.21.0 base image 0a0b8ea encryptd commited on 1 day ago
Remove torchvision pin to allow vllm 0.12.0 to resolve its own torchvision dependency a35ce3f encryptd commited on 1 day ago
Upgrade to vllm 0.12.0 and restore original performance arguments for Qwen 3.5 support 1cc1c9e encryptd commited on 1 day ago
Pin transformers to 4.48.2 to satisfy vllm>=4.48.2 dependency and resolve TokenizersBackend AttributeError ee926f5 encryptd commited on 1 day ago
Pin transformers to 4.48.0 to resolve TokenizersBackend AttributeError in vLLM 0.8.0 ab83d82 encryptd commited on 1 day ago
Remove limit-mm-per-prompt argument to bypass vLLM 0.8.0 multimodal registration check c2fa736 encryptd commited on 1 day ago
Update torch to 2.6.0+cu124 and torchvision to 0.21.0+cu124 to satisfy vllm 0.8.0 dependency requirements 860dc84 encryptd commited on 1 day ago
Downgrade torch to 2.5.1+cu124 and vllm to 0.8.0 to match available CUDA 12.4 pre-compiled wheels bdb058c encryptd commited on 1 day ago
Fix NVIDIA driver mismatch on HF Space by forcing +cu124 torch wheels ba7160f encryptd commited on 1 day ago
Fix: Upgrade to vLLM 0.22.0 and PyTorch 2.11.0 on CUDA 12.4 for native Qwen 3.5 support and host compatibility 359492c encryptd commited on 1 day ago
Fix: Remove speculative-config argument for vLLM 0.7.2 CLI compliance e776eb6 encryptd commited on 1 day ago
Fix: Update --limit-mm-per-prompt format to KEY=VALUE format image=99 for vLLM 0.7.2 a5a317d encryptd commited on 1 day ago
Fix: Align vLLM arguments with model card recommendations (chat format, MTP speculative config, limit-mm-per-prompt) 244de46 encryptd commited on 1 day ago
Fix: Change wheel index to cu124 and pin torch to 2.5.1+cu124 for native CUDA 12.4 host driver compatibility c6032e7 encryptd commited on 1 day ago
Fix: Remove manual transformers source install to resolve tokenizer AttributeError 4222b36 encryptd commited on 1 day ago
Fix: Remove limit-mm-per-prompt and mm-processor-kwargs for vLLM 0.7.2 compatibility 9700b91 encryptd commited on 1 day ago
Fix: Update --limit-mm-per-prompt format to KEY=VALUE for vLLM 0.7.2 compatibility b7c3dd4 encryptd commited on 1 day ago
Fix: Pin torch to 2.5.1+cu121 and vllm to 0.7.2 to guarantee CUDA 12.4 host driver compatibility d1fc8a9 encryptd commited on 1 day ago
Fix: Add LD_LIBRARY_PATH forward compatibility for older host GPU drivers d16bf2a encryptd commited on 1 day ago
Fix: Switch base to CUDA devel image to provide nvcc for flashinfer JIT, and add .gitignore 2ea1c55 encryptd commited on 1 day ago
Fix: Remove show_copy_button parameter from gr.Textbox for Gradio 6 compatibility 5aa37ab encryptd commited on 2 days ago
Fix: Remove audioop-lts as Python 3.10 natively provides audioop eeebc77 encryptd commited on 2 days ago
Migration: Convert Hugging Face Space to custom Docker Space using CUDA 12.4 f1ba762 encryptd commited on 2 days ago
Fix: Install transformers from source to support Qwen 3.5 architecture 7eb1ffe encryptd commited on 2 days ago
Initial commit: NuExtract3 Gradio space setup powered by vLLM on A100 GPU 88dbb61 encryptd commited on 3 days ago