Instructions to use louhless/Ycoder-medium with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use louhless/Ycoder-medium with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="louhless/Ycoder-medium")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("louhless/Ycoder-medium", dtype="auto")

llama-cpp-python

How to use louhless/Ycoder-medium with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="louhless/Ycoder-medium",
	filename="Ycoder-medium-f16.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use louhless/Ycoder-medium with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf louhless/Ycoder-medium:F16
# Run inference directly in the terminal:
llama-cli -hf louhless/Ycoder-medium:F16

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf louhless/Ycoder-medium:F16
# Run inference directly in the terminal:
llama-cli -hf louhless/Ycoder-medium:F16

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf louhless/Ycoder-medium:F16
# Run inference directly in the terminal:
./llama-cli -hf louhless/Ycoder-medium:F16

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf louhless/Ycoder-medium:F16
# Run inference directly in the terminal:
./build/bin/llama-cli -hf louhless/Ycoder-medium:F16

Use Docker

docker model run hf.co/louhless/Ycoder-medium:F16

LM Studio
Jan

vLLM

How to use louhless/Ycoder-medium with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "louhless/Ycoder-medium"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "louhless/Ycoder-medium",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/louhless/Ycoder-medium:F16

SGLang

How to use louhless/Ycoder-medium with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "louhless/Ycoder-medium" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "louhless/Ycoder-medium",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "louhless/Ycoder-medium" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "louhless/Ycoder-medium",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use louhless/Ycoder-medium with Ollama:
```
ollama run hf.co/louhless/Ycoder-medium:F16
```

Unsloth Studio

How to use louhless/Ycoder-medium with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for louhless/Ycoder-medium to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for louhless/Ycoder-medium to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for louhless/Ycoder-medium to start chatting

Docker Model Runner
How to use louhless/Ycoder-medium with Docker Model Runner:
```
docker model run hf.co/louhless/Ycoder-medium:F16
```

Lemonade

How to use louhless/Ycoder-medium with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull louhless/Ycoder-medium:F16

Run and chat with the model

lemonade run user.Ycoder-medium-F16

List all available models

lemonade list

Ycoder-medium

Ycoder-medium is an experimental local fine-tune of Qwen/Qwen2.5-Coder-0.5B-Instruct created by louhless.

It is targeted at:

OpenGL / GLSL
Python
German replies
cautious 2025-2026 news and public-health summaries

Important Note

This model is not trained from scratch.

It is a small LoRA fine-tune on top of Qwen/Qwen2.5-Coder-0.5B-Instruct.

The goal is to improve behavior in a narrow target set. Any “15% improvement” claim should be treated as a target, not a verified benchmark result, unless evaluated on a fixed benchmark before and after training.

Model Details

Model name: Ycoder-medium
Creator: louhless
Base model: Qwen/Qwen2.5-Coder-0.5B-Instruct
Architecture: Qwen2 causal language model
Context length: 32768
Language: English and German
Export: GGUF available
Status: experimental

Training Focus

The model was tuned for:

Python utility code
Python code explanations
GLSL fragment shaders
GLSL vertex shaders
OpenGL concepts such as VAO/VBO
German short-form answers
simple math
cautious dated summaries for 2025-2026 public-health/news topics

News / Health Safety

For topics such as Hantavirus, the project uses both small fine-tuning examples and local dated context snippets.

This is intentional: recent news and public-health information should not be trusted from model weights alone.

The model should:

answer cautiously
mention dates when relevant
avoid medical diagnosis
avoid treatment promises
recommend official sources such as WHO, CDC, ECDC, or local health authorities

It should not be used for diagnosis or medical decision-making.

Training Data

The initial custom dataset includes examples for:

Python utility functions and explanations
GLSL shaders and OpenGL concepts
German short answers
simple math
dated 2025-2026 Hantavirus summaries based on WHO, CDC, and ECDC public information

Example Prompts

Python

Prompt:

Write Python code to read a JSON file safely.

Downloads last month: 102

GGUF

Model size

0.5B params

Architecture

qwen2

Hardware compatibility

16-bit

Model tree for louhless/Ycoder-medium

Base model

Qwen/Qwen2.5-0.5B

Finetuned

Qwen/Qwen2.5-Coder-0.5B

Finetuned

Qwen/Qwen2.5-Coder-0.5B-Instruct

Quantized

(65)

this model