Instructions to use MultivexAI/Plyx-15M with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use MultivexAI/Plyx-15M with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="MultivexAI/Plyx-15M")

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("MultivexAI/Plyx-15M")
model = AutoModelForMultimodalLM.from_pretrained("MultivexAI/Plyx-15M")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use MultivexAI/Plyx-15M with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "MultivexAI/Plyx-15M"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MultivexAI/Plyx-15M",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/MultivexAI/Plyx-15M

SGLang

How to use MultivexAI/Plyx-15M with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "MultivexAI/Plyx-15M" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MultivexAI/Plyx-15M",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "MultivexAI/Plyx-15M" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MultivexAI/Plyx-15M",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use MultivexAI/Plyx-15M with Docker Model Runner:
```
docker model run hf.co/MultivexAI/Plyx-15M
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

MultivexAI/Plyx-15M

MultivexAI/Plyx-15M is a 15 million parameter 8-layer language model, trained from scratch using the Llama architecture.

We built this model to be a small, useful foundation for various tasks. It's a great starting point for quick tests, research projects, or fine-tuning on specialized jobs where a small model footprint is important.

Model Series Note: This is the first model in our Plyx series. We're continuing this work and plan to release future models in various sizes. We'll be adding some initial performance benchmarks here soon.

Pre-training Data

The model was trained on a carefully curated mix of data to build a great foundation, trained on approx ~600M tokens:

fineweb-pro: A heavily filtered and refined version of the FineWeb dataset. This provides a strong base in general-purpose language by removing significant noise and low-quality content.
fineweb-edu: A subset of FineWeb containing educational and instructional content, used to ground the model in well-structured, factual information.
finepdfs: A large collection of documents from PDFs, including professional reports and technical papers. This component introduces the model to more formal language, complex sentence structures, and data-rich formats.

A Note on Size and Performance

To set the right expectations: Plyx-15M is a 15-million-parameter model, which is quite small. Its performance won't be comparable to models with billions of parameters. It's best used for research, highly specific tasks, or as a base for fine-tuning - not as a drop-in replacement for a large, general-purpose model.

Limitations

Due to its small parameter scale, training volume, and base architecture, Plyx-15M exhibits several significant limitations that users must consider before deployment or fine-tuning:

1. Capacity and Knowledge Retention

Limited Knowledge Storage: At 15 million parameters, the model's capacity to store factual world knowledge is extremely constrained. It cannot reliably recall specific historical facts, niche technical details, or trivia.
High Propensity for Hallucination: The model will frequently generate plausible-sounding but completely incorrect information, dates, names, and code structures.
Weak Reasoning and Logic: Complex multi-step reasoning, mathematical calculations, logic puzzles, and symbolic manipulation are outside the capabilities of this model.

2. Base Model Behavior and Lack of Alignment

No Instruction Following: This is a raw base model, not an instruct-tuned or chat-aligned model. It is designed for text completion. It will likely continue a prompt rather than answering a question, unless specifically fine-tuned (SFT/RLHF) first.
Lack of Safety Filters and Refusals: The model has not undergone safety alignment. It does not have built-in refusal mechanisms for harmful, unethical, or dangerous queries, and it may generate biased or toxic content if prompted to do so.

3. Training Volume and Convergence

Training Volume and Saturation: While 600 million tokens exceeds the classic compute-optimal ratio (which would be around 300 million tokens for a 15M parameter model), it is still a relatively small absolute dataset size compared to modern standards. As a result, the model may not have developed the highly robust linguistic representations seen in models trained on hundreds of billions of tokens.
Repetition and Loops: The model may easily fall into repetitive generation loops or produce degenerate text, especially when generating longer sequences.

4. Domain and Language Constraints

English-Centricity: The training datasets (FineWeb and FinePDFs variants) are predominantly English. The model's performance on non-English languages, translation tasks, or multilingual prompts is expected to be poor.
PDF Extraction Artifacts: Because a portion of the dataset relies on finepdfs, the model may occasionally generate formatting artifacts, broken sentence structures, OCR errors, or unusual character spacings derived from PDF extraction patterns.

License

The data used for pre-training (fineweb-pro, fineweb-edu, and finepdfs) is derived from sources made available under the ODC-By 1.0 license. Users must also abide by the CommonCrawl Terms of Use. We do not alter the license of any of the underlying data.

Downloads last month: 20

Safetensors

Model size

16.1M params

Tensor type

F32

Model tree for MultivexAI/Plyx-15M

Quantizations

1 model

Datasets used to train MultivexAI/Plyx-15M

Collection including MultivexAI/Plyx-15M

Plyx Model Series

Collection

Tiny language models, pretrained entirely from scratch on a diverse and high quality mix of datasets. • 1 item • Updated Oct 4, 2025