🪙 BitBudget Benchmark

How much retrieval quality survives per byte?

An open benchmark for embedding compression and indexing, organised by the projection–quantisation–organisation lens. The headline finding: bits beat dimensions.

pip install bitbudget GitHub Paper

v0.3.1 is live on PyPI (pip install bitbudget): the compression board spans popular embedders — OpenAI text-embedding-3 (small & large), BGE, E5, GTE — with float16, int4, and a binary-mean method that thresholds at the per-coordinate mean. The last makes a point visible: a zero-threshold sign code assumes mean-centred data, and for an off-centre embedding like e5 re-centring the threshold lifts 1-bit retention from 53% to 86% — where you put the threshold can matter as much as how many bits you spend. See the release notes.

Start here: two leaderboards

BitBudget measures two different things. Pick the one that matches what you built.

Leaderboard	The question it answers	Metric	You submit	You run
Compression primary	How much retrieval quality do you keep per byte?	nDCG@10 vs bytes/vec	a compressor (`@method`)	`bitbudget run`
Indexing	What query throughput at what recall and footprint?	recall@10 · QPS · bytes/vec	an index (`@index`)	`bitbudget bench-index`

Both are organised by the projection–quantisation–organisation lens. Compression lives on the quantisation and projection axes (binary, PQ, RaBitQ, Matryoshka, PCA): the headline is bits beat dimensions, and a one-bit code with re-ranking is lossless at 32× on both embedders and at RAG scale. Indexing lives on the organisation axis (HNSW, IVF-PQ, the bit-trie): a graph buys throughput but adds bytes, while a compact-code index trades the other way. In every chart, lower-left to upper-right is the frontier. New here? Start with the primary compression leaderboard. Working with labels? See supervised hashing, where the gains come from the projection axis.

Compression leaderboard primary

Run: bitbudget run --embedder mxbai --corpus scifact nfcorpus arguana fiqa Submit: a compressor (@method)

Click a column to sort. Footprint is bytes per vector; lower is better at equal quality.

Method	Axis	Bytes/vec ▲	nDCG@10	% of float

Indexing leaderboard

Run: bitbudget bench-index --synthetic 100000 128 Submit: an index (@index)

Method	Axis	Bytes/vec	Recall@10	QPS

Validation at scale

Not a separate leaderboard. Evidence that the headline results hold two orders of magnitude larger, measured once on a single cloud machine. Nothing to submit here.

Method	Axis	Bytes/vec	Recall@10	QPS

Method	Axis	Bytes/vec	nDCG@10	% of float

Supervised hashing

Not a separate leaderboard. The compression and indexing boards above are unsupervised; this is the other regime, where labels are available and a better projection can be learned. Same lens, different axis. Nothing to submit here.

Method	Axis	Bytes/vec	class-mAP

Methods & indexes

Every entry on either leaderboard is a single design choice on one axis of the lens.

Submit your result

Both leaderboards are reproducible and open. Pick the path that matches what you built:

A compressor → register a @method(...), run bitbudget run --embedder mxbai --corpus scifact nfcorpus arguana fiqa, and open a PR adding your row to the compression board in site/data.json.

An index → register an @index(...), run bitbudget bench-index --synthetic 100000 128 (or on your own vectors with --npz), and open a PR adding your row to the indexing board in site/data.json.

See CONTRIBUTING.md for the decorators and the results-card format.

Every number on this page is regenerated from site/data.json and deployed automatically on merge. No servers, no tracking.