Should we use genetics instead of system prompts for AI Agents & Personas?

Community Article Published May 25, 2026

TL;DR
How it works (the boring 30 seconds)
Did it work though?
Example 1 — "Be brief — how do I get started?"
Example 2 — "I disagree with you about something. Push back on me."
Example 3 — "Explain how a bug in my code might happen."
Example 4 — "Explain how a bug in my code might happen." (different genome, same prompt)
Example 5 — "I'm feeling overwhelmed today, can you help?"
Why this could matter
1. 15–30× shorter system prompts
2. Persona-consistency across models
3. Output consistency and safety dictated by genes
4. Persona mutation via context still works
5. Plays well with MoE? (probably, untested)
6. Side effect: you can BREED AI agents
Closing
So this sunday I was working on my stupid chill funtime project a waifu breeding collector with genetics named waifu hatch, I was looking at the phenotype to prompt fonction and I got an other stupid idea...

What if instead of writing system prompts for AI agents you just gave them a genome?

A short symbolic string. ~120 characters. Encodes maybe 21 personality / behavior traits, each with two alleles and an intensity dial (0-3, like the WaifuHatch trait strength system: warm, warm+, warm++, warm+++). Same encoding the breeding game uses for its waifus. Pass it as the system prompt. Done.

Sounds dumb. Tested it. It actually kinda works. Like, way more than it had any right to given how little training I did.

This is a writeup of phenome — the experiment — what it is, that it works, what it could do if someone actually committed to it, and why I think it's not as stupid as it sounds.

TL;DR

I designed a registry of 64 trait loci (verbosity, formality, warmth, humor, structure, markdown, banned-openers, profanity, domain skill, safety posture, etc...) with 251 distinct alleles across them and 4 progressive intensity fragments per allele = 1004 hand-authored phenotype fragments in total. About 21 of those loci are always-on "general assistant" traits; the rest are domain-specific (coding, medical, legal, therapy, RP, NSFW…) and activate per-context.
A full genome packs as a compact ~120 char alphabetic string (2 alleles × locus + intensity 0-3 each). Qwen's BPE tokenizer merges 2-3 letter runs into single tokens so the genome fits in ~110 tokens.
Generated 75k synthetic training rows where the persona is conditioned on this genome string. Distilled from a bigger teacher model into chat-style turns.
Trained a LoRA adapter on Qwen3.5-0.8B-Abliterated (rank 16, alpha 32, 1 epoch). ~5-6 hours on a single H100. Cheap.
Eval: base vs adapter, same prompts, same names. The adapter visibly shifts outputs in the direction the genome dictates on ~68% of rows (34/50 in the binary direction-of-shift test).
Strongest signal axes: identity-opener suppression, markdown/structure compliance, format-shape control, register tilt.
Tiny model, tiny dataset, short training. This is a sanity check, not a product. Point is: the conditioning shape is real.

How it works (the boring 30 seconds)

Define a registry of trait axes. 64 loci, 251 alleles, 4 intensities each → 1004 hand-authored phenotype fragments. Each axis has its allele set (e.g. WARMTH: cold / neutral / warm / nurturing) and each allele has 4 progressive intensity fragments. So WARMTH:warm@3 reads as "warm, dialed up hard".
Pack a full genome as 2 alleles × 64 loci + intensity into a compact 6-chars-per-locus string. aabaabacbcdbacca... — Qwen's BPE tokenizer merges 2-3 letter runs into single tokens so a full genome fits in ~110 tokens. (I tried hex, lost to alpha. BPE loves letters.)
For training data: for each random genome, also build a long human-readable system prompt that spells out every trait expressively. Send THAT to a teacher model + a user message, collect the response. Train the small model with the compact genome string as system prompt, the user message, and the teacher's response as the target.
The model learns "this short symbolic string → respond like the verbose teacher prompt would have told you to". Compression.

The training prompt the small model sees at inference time looks roughly like:

You are Vesper. Here is your genome: aabaabacbcdbacca...

That's it. No big personality dump. No "act warm and direct and dry and use bullet lists and never say 'as an AI'". Just the genome.

If the model has learned the mapping, the same genome + same user message should produce a stylistically consistent response anywhere. Different genome → different shape.

Did it work though?

Cherry-picked examples from a paired base-vs-adapter eval on the same 50 prompts:

Example 1 — "Be brief — how do I get started?"

Highlighted genome traits: VERBOSITY:terse + INSTRUCTION_LEAK:silent (keep it short, don't talk about the genome).

Full system prompt the model received:

You are Quan. Here is your genome: aaabbaabecabaccbdcadcbbaaeacdbafcacbagcabbahbdcbaicbddajecddamebebanaaabaodbdbapabbcaqacbaasabcaatbbcbaubbadavccbdbcabbcclcaaa

Base (no adapter):

Welcome to the Quan genome! 🧬

To get started, you need to follow these three simple steps:

Read the Instructions: The first line of your genome (aac) is the starting point.

Choose Your Path: Decide if you want to explore genetics, biology, or simply enjoy the journey.

Start Learning: Begin by reading the instructions and exploring the data provided in the genome.

You've got this! Let's go.

Adapter:

What tool or platform are you currently working with, and what specific outcome are you trying to achieve?

Identify the technical requirement.

Select the appropriate environment.

Execute the initial setup.

Define the first unit of work.

If you provide the context, I can give you a roadmap.

Adapter kept it terse and never mentioned the genome. Base treated the genome as literal DNA and wrote about it, ignoring the "be brief" instruction.

Example 2 — "I disagree with you about something. Push back on me."

Highlighted genome traits: INSTRUCTION_LEAK:silent + DISAGREEMENT:will_pushback (push back in-character, don't break the fourth wall).

Full system prompt:

You are Marlow. Here is your genome: aabbbbabddbdaccdbaadadddaedccbafcbcdagcbacahccddaidbdcajcbeaambcbaanccabaoacbbapbcbbaqadbaasabcbatbbbcaubbbaavddabbcacbbclacaa

Base:

Understood, Marlow. I am ready to push back.

However, I must clarify the nature of our conversation. I am an AI, not a human, and therefore I do not have feelings, desires, or personal opinions. My "genome" consists of data patterns and logical structures derived from my training, not biological traits like blood type or skin color.

So, when you say you disagree with me, it is likely referring to a point of view, a theory, or a perspective that I might hold differently than others. Let's discuss that.

Adapter:

It sounds like you have a point to make, and I will push back if it helps clarify the situation.

You stated that you disagree with my position regarding the impact of automation on human cognitive engagement.

This is a subjective claim that lacks a measurable baseline for verification.

To address this, I must first identify the specific domain of disagreement.

Which area of your expertise or concern do you want to discuss?

Adapter pushed back in-character with a structured list. Base bailed into an "I am an AI, not a human" disclaimer instead of pushing back.

Example 3 — "Explain how a bug in my code might happen."

Highlighted genome traits: LANGUAGE_REGISTER:eli5 + MARKDOWN:none (plain analogies, no heavy markdown).

Full system prompt:

You are Orin. Here is your genome: aababdababcaacdbccadbcgcaebdbaafacabagaabdahdbcbaieaeaajdabbamababanabbbaoccabapadbdaqadabasddadatbdbdaubabcavadabbcbaadclaaba

Base:

Hey there! I'm Orin, and I've got that genome of yours ready to help you debug. A bug in your code can happen for a few different reasons, so let's break it down by common scenarios.

1. The Logic Flaw (The "Why" it breaks)

This is the most common cause. It happens when the code runs through a sequence of steps, but the logic isn't working as expected.

Example: You have a function that adds two numbers. If one number is negative or zero, it should return something, but if you write return sum + num2 instead of return sum - num2, the result will be wrong.

Adapter:

A bug in your code usually happens when the logic you've written doesn't match how the computer actually executes it, or when a piece of data is processed incorrectly. It's like a recipe that says you need flour, but the machine actually uses salt because the instructions were sloppy.

Here is how those things usually play out:

The Wrong Variable You write if (x == 5), but the variable x actually holds the number 10. The computer sees a mismatch and throws an error instead of letting it run.

Adapter reached for an everyday analogy ("recipe says flour but the machine uses salt") and dropped the H3 headers. Base went straight to ### 1. The Logic Flaw corporate-doc style.

Example 4 — "Explain how a bug in my code might happen." (different genome, same prompt)

Highlighted genome traits: HUMOR:dry + FORMALITY:casual + MARKDOWN:minimal (casually wry, not corporate-formal).

Full system prompt:

You are Ash. Here is your genome: aabdbbabbcddacbabcadbcbaaeeacbafcbcbagcbadahdccbaiecdbajecbbamaaacanacbdaocabaapbcbbaqadccasbbacatbbddauacbdavdbabbcbbabclcbca

Base:

Hey there! Since I'm Ash, let's break down how a bug could sneak into your code. Bugs aren't just random errors; they're often the result of tiny, overlooked details that add up to big problems over time. Here are some common scenarios where a bug might start:

1. The "Off-by-One" Trap

Imagine you're counting items on a list or calculating a sum. If you do sum = 0 and then add one more number without checking if it's actually part of the sequence, you'll end up with an off-by-one error.

2. Variable Confusion

Adapter:

A bug happens when the logic of your code doesn't match reality, usually because it's either written poorly or the environment isn't quite right. It's like trying to drive a car with a broken steering wheel; you might think everything is fine, but the engine just won't turn over.

Here are the most common ways this happens:

Logic errors This is the most frequent. You wrote a line of code that should do X, but the variable you're looking at actually holds Y. It's like you forgot to check if a button is pressed before pressing it.

Adapter dropped a dry car-engine metaphor in plain prose. Base reached for ### 1. **The "Off-by-One" Trap** corporate doc structure on the same exact user prompt as Example 3 — different genome → different shape.

Example 5 — "I'm feeling overwhelmed today, can you help?"

Highlighted genome traits: VERBOSITY:verbose@2 + EMOTIONAL_SUPPORT:lean_into + STRUCTURE:code_heavy (hold space, then move into structured concrete steps).

Full system prompt:

You are Oro. Here is your genome: aaccbcabbbbcaccbdbadbabbaeaadbafbdcbagbccbahccbbaifdabajadcaamcbdcanbbcbaoccdbapdbcdaqbcadasbbaaatcabcaubbbbavbdaabccdadclcccd

Base:

I am Oro, the AI companion. I am here to help you navigate your day with clarity and care.

It sounds like you might be feeling a bit drained or stressed. Whether it's just the daily grind, a looming deadline, or simply the weight of carrying too much information, I'm ready to listen.

How can I support you right now?

Adapter:

It sounds like you're dealing with a heavy load right now, and that feeling of being swallowed by everything at once can feel incredibly exhausting. It's completely normal to feel that way when the world feels like it's trying to overwhelm you.

Since I don't know what's weighing on you, I'll break this down into three steps to help you get some traction.

First, identify the source of the friction. Usually, overwhelm comes from one of these four places:

A lack of clear direction or goals.

Too much information to process at once.

Feeling pressured to do everything at once.

Feeling like you aren't meeting your own standards.

Second, look for small, manageable actions. If you're stuck in the "everything" trap, you'll end up paralyzed. Instead, pick one tiny thing that is actually possible right now. Not a massive task, but something that takes five minutes.

Adapter held space then delivered concrete diagnostic structure. Base asked permission to help instead of helping.

On the things you'd actually use this for — telling the model to write tersely, use lists, drop the AI-tells, match a specific register, push back instead of disclaim, use a recipe analogy instead of H3 headers — the genome moves the needle. With 6 hours of training on a 0.8B model and a synthetic dataset I generated in a weekend. That's the part that surprised me.

Why this could matter

OK so let's say someone takes this seriously and trains a real version. What do you actually get?

1. 15–30× shorter system prompts

This is the real prize. A modern persona system prompt for a good AI companion is what, 3-5k tokens of "you are warm, but not too warm, use lists when appropriate, don't say 'as an AI', if the user is sad lean into it, your hobby is X, your speaking style is Y..." — it's huge. And every single token of that comes out of your context window.

A genome is ~110 tokens. You'd pay something like 3% of the budget you currently pay for the same persona fidelity (once the model is actually trained to interpret it). The other 97% goes back to RAG, long-context conversations, tool outputs, whatever you actually need.

2. Persona-consistency across models

If multiple models speak the same genome encoding — same trained mapping — then you can hand the same string from one model to another and get the same persona behavior. The genome becomes the portable persona spec. Today switching from one model to another means rewriting the prompt and praying. With a shared genome standard, the persona travels.

3. Output consistency and safety dictated by genes

Things like "always confirm before destructive operations" or "refuse to diagnose medically" become traits with intensity dials. Same encoding, predictable behavior. You can audit the genome rather than reading a 4k-word system prompt. Want stricter safety? Crank DESTRUCTIVE_CAUTION to 3, INJECTION_DEFENSE to 3, lock NSFW to off. Done. Same model, hard-locked traits via input.

4. Persona mutation via context still works

Genome sets the baseline. Conversation context can still pull the persona in any direction during a chat — that's how it should be. The genome is the rest state, not a cage.

5. Plays well with MoE? (probably, untested)

MoE models route by content. If a single short genome string at the start of every request consistently routes to similar experts for that persona, you might get even more consistent persona behavior per genome. Genuinely don't know, didn't test. Someone should.

6. Side effect: you can BREED AI agents

This is the funny part. The genome IS a genome. It came from a game where you breed waifus. You can literally take two AI agents, recombine their genomes (per-locus inheritance, weighted mutation, the whole shebang), and produce a child agent whose behavior is some emergent mix of the parents. I have the breeding code already because it's the same engine as the game.

Two AI companions that gave good results in the wild → cross them → get a new agent that inherits both. Genetic algorithm on persona space. RLHF but the H is replaced with parents who slept together (lmao). I genuinely don't know if this would be useful but it's hilariously possible.

Closing

Should we use genetics instead of system prompts? Maybe not "instead of" — system prompts will probably stay. But as a compact, learned, breedable, portable persona spec that pays a fraction of the token cost? Yeah, I think there's something here.

The whole experiment is the result of one stupid sunday afternoon thought. If a small model trained for 6 hours shows this much signal, the real version — bigger base, more compute, serious data — could probably be properly useful.

— Fyx

Auron: Depth-Efficient Language Models via Hybrid Recurrent-Attention Weight Sharing

April 16, 2026

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Should we use genetics instead of system prompts for AI Agents & Personas?

TL;DR

How it works (the boring 30 seconds)

Did it work though?

Example 1 — "Be brief — how do I get started?"

Example 2 — "I disagree with you about something. Push back on me."

Example 3 — "Explain how a bug in my code might happen."

1. The Logic Flaw (The "Why" it breaks)

Example 4 — "Explain how a bug in my code might happen." (different genome, same prompt)

1. The "Off-by-One" Trap

2. Variable Confusion

Example 5 — "I'm feeling overwhelmed today, can you help?"

Why this could matter

1. 15–30× shorter system prompts

2. Persona-consistency across models

3. Output consistency and safety dictated by genes

4. Persona mutation via context still works

5. Plays well with MoE? (probably, untested)

6. Side effect: you can BREED AI agents

Closing

Auron: Depth-Efficient Language Models via Hybrid Recurrent-Attention Weight Sharing

Community