potion-mxbai-256d-v2

A compact static embedding model that achieves near-SOTA quality at 8.2x smaller size. Only 0.68 points behind our 512D baseline on the full MTEB English suite while being 8.2x smaller (7.5MB int8 vs ~62MB).

Highlights

71.45 avg on full MTEB English (STS + Classification + PairClassification, 25 tasks)
7.5MB with int8 quantization (8.2x smaller than 512D baseline)
28.8MB at full precision (float32)
80-88x faster than all-MiniLM-L6-v2 on CPU (~15K vs ~200 sentences/sec)
Pure numpy inference — no GPU needed
Native int8 support via model2vec v0.7 — zero quality loss

How It Was Made

Teacher: mixedbread-ai/mxbai-embed-large-v1 (335M params, BERT-large architecture)
Distillation: model2vec distillation with 256-dim PCA and corpus-informed vocabulary
Tokenlearn pre-training: Contrastive loss training on ~217K C4 English sentences using tokenlearn
Born-again self-distillation: A second round of contrastive training using the model's own sentence embeddings as targets instead of the teacher's. This closes the "representation gap" between what a static model can learn and what the transformer teacher produces, yielding +0.49 avg improvement.

Key insight

The original teacher's representations are too complex for a static lookup table to fully capture. By self-distilling — training a fresh model to match our own model's outputs — we create "easier" targets that a static model can actually learn well. This simple technique improved all three benchmark categories.

Benchmark Results (Full MTEB English Suite)

Model	STS	Classification	PairClassification	Avg	Size (int8)
potion-mxbai-2m-512d	74.15	65.44	76.80	72.13	~125MB
potion-mxbai-256d-v2 (this)	73.79	63.23	77.33	71.45	7.5MB
potion-mxbai-128d-v2	71.75	60.45	74.40	68.87	3.9MB

Evaluated on 25 tasks (10 STS, 12 Classification, 3 PairClassification), English subsets only, identical eval code across all models.

Usage

from model2vec import StaticModel

# Full precision (28.8MB)
model = StaticModel.from_pretrained("blobbybob/potion-mxbai-256d-v2")

# INT8 quantized (7.5MB, same quality)
model = StaticModel.from_pretrained("blobbybob/potion-mxbai-256d-v2", quantize_to="int8")

embeddings = model.encode(["Hello world", "Static embeddings are fast"])

With Sentence Transformers:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("blobbybob/potion-mxbai-256d-v2")
embeddings = model.encode(["Hello world", "Static embeddings are fast"])

When to use this model

You need fast, lightweight embeddings for search, clustering, or classification
You're deploying on resource-constrained environments (mobile, edge, serverless)
You want sub-millisecond inference without GPU
You need embeddings in a small binary (7.5MB vs 100MB+ for transformer models)

Model Family

Model	Avg	Size (int8)	Best for
potion-mxbai-2m-512d	72.13	~125MB	Maximum quality
potion-mxbai-256d-v2	71.45	7.5MB	Best quality/size balance
potion-mxbai-128d-v2	69.83	3.9MB	Compact deployments
potion-mxbai-micro	68.12	0.7MB	Ultra-tiny / embedded

Training Details

Featurization: ~217K C4 sentences encoded by mxbai-embed-large-v1
Training: Tokenlearn contrastive loss + born-again self-distillation, batch size 256
Vocabulary: 29,524 tokens (corpus-informed vocabulary from mxbai teacher tokenizer)
Dimensions: 256 (via PCA reduction from teacher's 1024-dim output)
Compute: Local RTX 2070

Citation

@article{minishlab2024model2vec,
  author = {Tulkens, Stephan and {van Dongen}, Thomas},
  title = {Model2Vec: Fast State-of-the-Art Static Embeddings},
  year = {2024},
  url = {https://github.com/MinishLab/model2vec}
}

Downloads last month: 614

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for blobbybob/potion-mxbai-256d-v2

Base model

mixedbread-ai/mxbai-embed-large-v1

Finetuned

(57)

this model

Evaluation results

spearman_cosine on MTEB STS (English, 10 tasks)
self-reported

73.790
accuracy on MTEB Classification (English, 12 tasks)
self-reported

63.230
ap on MTEB PairClassification (English, 3 tasks)
self-reported

77.330