Commit History

Optimize speculative decoding performance by increasing max_num_batched_tokens to 4096
a0613f9

encryptd commited on

Fix entrypoint override in Dockerfile
8c76968

encryptd commited on

Fix HF space build error by using official vllm-openai:v0.21.0 base image
0a0b8ea

encryptd commited on

Remove torchvision pin to allow vllm 0.12.0 to resolve its own torchvision dependency
a35ce3f

encryptd commited on

Upgrade to vllm 0.12.0 and restore original performance arguments for Qwen 3.5 support
1cc1c9e

encryptd commited on

Pin transformers to 4.48.2 to satisfy vllm>=4.48.2 dependency and resolve TokenizersBackend AttributeError
ee926f5

encryptd commited on

Pin transformers to 4.48.0 to resolve TokenizersBackend AttributeError in vLLM 0.8.0
ab83d82

encryptd commited on

Remove limit-mm-per-prompt argument to bypass vLLM 0.8.0 multimodal registration check
c2fa736

encryptd commited on

Remove speculative-config argument for vLLM 0.8.0 CLI parser
20df6b8

encryptd commited on

Fix limit-mm-per-prompt syntax for vLLM 0.8.0 CLI parser
7f9201a

encryptd commited on

Update torch to 2.6.0+cu124 and torchvision to 0.21.0+cu124 to satisfy vllm 0.8.0 dependency requirements
860dc84

encryptd commited on

Downgrade torch to 2.5.1+cu124 and vllm to 0.8.0 to match available CUDA 12.4 pre-compiled wheels
bdb058c

encryptd commited on

Fix NVIDIA driver mismatch on HF Space by forcing +cu124 torch wheels
ba7160f

encryptd commited on

Fix: Upgrade to vLLM 0.22.0 and PyTorch 2.11.0 on CUDA 12.4 for native Qwen 3.5 support and host compatibility
359492c

encryptd commited on

Fix: Remove speculative-config argument for vLLM 0.7.2 CLI compliance
e776eb6

encryptd commited on

Fix: Update --limit-mm-per-prompt format to KEY=VALUE format image=99 for vLLM 0.7.2
a5a317d

encryptd commited on

Fix: Align vLLM arguments with model card recommendations (chat format, MTP speculative config, limit-mm-per-prompt)
244de46

encryptd commited on

Fix: Change wheel index to cu124 and pin torch to 2.5.1+cu124 for native CUDA 12.4 host driver compatibility
c6032e7

encryptd commited on

Fix: Remove manual transformers source install to resolve tokenizer AttributeError
4222b36

encryptd commited on

Fix: Remove limit-mm-per-prompt and mm-processor-kwargs for vLLM 0.7.2 compatibility
9700b91

encryptd commited on

Fix: Update --limit-mm-per-prompt format to KEY=VALUE for vLLM 0.7.2 compatibility
b7c3dd4

encryptd commited on

Fix: Pin torch to 2.5.1+cu121 and vllm to 0.7.2 to guarantee CUDA 12.4 host driver compatibility
d1fc8a9

encryptd commited on

Fix: Add LD_LIBRARY_PATH forward compatibility for older host GPU drivers
d16bf2a

encryptd commited on

Fix: Switch base to CUDA devel image to provide nvcc for flashinfer JIT, and add .gitignore
2ea1c55

encryptd commited on

Fix: Remove show_copy_button parameter from gr.Textbox for Gradio 6 compatibility
5aa37ab

encryptd commited on

Fix: Remove audioop-lts as Python 3.10 natively provides audioop
eeebc77

encryptd commited on

Migration: Convert Hugging Face Space to custom Docker Space using CUDA 12.4
f1ba762

encryptd commited on

Fix: Install transformers from source to support Qwen 3.5 architecture
7eb1ffe

encryptd commited on

Initial commit: NuExtract3 Gradio space setup powered by vLLM on A100 GPU
88dbb61

encryptd commited on