A newer version of this model is available: WhirlwindAI/SubatomZephyr

The Idea

What if a transformer became... microscopic?

AtomZephyr explores one of the smallest practical transformer architectures ever built.

Not because anyone asked for it.

Because someone eventually had to answer the question:

"How absurdly small can an AI become before it forgets how to AI?"

Turns out...

27 parameters is still technically enough.

Why?

Most AI models compete by getting bigger.

AtomZephyr competes by removing parameters until people start questioning whether it's still a neural network.

Every parameter had to earn its place.

Most didn't.

Specifications

Property	Value
Parameters	27
Architecture	GPT-2
Layers	1
Attention Heads	1
Embedding Size	1
FFN Size	1
Context Length	4
Vocabulary	5 Tokens
Model Size	<5 KB
Training Time	~6 Seconds (CPU)

Performance

Test	Result
Understand English	❌
Write Code	❌
Solve Math	❌
Generate "abba"	✅
Break Expectations	✅

Quick Start

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("WhirlwindAI/AtomZephyr")
model = AutoModelForCausalLM.from_pretrained("WhirlwindAI/AtomZephyr")

prompt = "a"

inputs = tokenizer(prompt, return_tensors="pt")

outputs = model.generate(
    **inputs,
    do_sample=True,
    temperature=1.7,
    max_length=4
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Possible output

abaa

Groundbreaking.

Example Conversation

User

Tell me a joke.

AtomZephyr

abba

Technically...

that's an answer.

Scientific Achievement

Removing parameters is easy.

Keeping a transformer alive afterwards...

isn't.

AtomZephyr exists purely to explore the absolute lower limits of transformer architectures while remaining a real, trainable language model.

Whether it's useful is a completely different discussion.

Awards

🥇 Smallest Model That Still Has Self-Respect

🏆 Best Binary Poetry Generator

🥈 Most Efficient Waste Of Six Seconds

🎖️ Official Representative Of Tiny AI

Limitations

AtomZephyr should not be used for:

Programming
Translation
Question Answering
Homework
Anything important

It performs significantly better when asked to do absolutely nothing useful.

Fun Facts

Fits inside most PNG images.
Smaller than many neural network tutorials.
Downloads faster than this README loads.
Has fewer parameters than some calculator manuals have pages.

License

MIT

Take it apart.

Make it smaller.

Break another record.

Built by WhirlwindAI

"Sometimes progress isn't measured in billions... it's measured in what you can remove."

Downloads last month: -

Safetensors

Model size

27 params

Tensor type

F32