o11ykit

Indexing

Bloom Filters

A bloom filter is a compact bit array that answers "Is this item in the set?" in O(1) time. It can say "definitely not" or "probably yes" — but never gives a false negative. In a traces database, each chunk carries a bloom filter for its trace IDs. Looking up a trace? Check each chunk's filter in microseconds and skip the ones that definitely don't contain it.

① The Bit Array

128 bits, all starting at zero. Each cell shows its bit position index.

② Add Trace IDs

Type a trace ID (or use the presets) to hash it with 3 hash functions and set the corresponding bits.

Quick presets:

③ Lookup Trace ID

Check whether a trace ID might be in the set. All hashed bits must be 1 for a "probably yes."

④ Statistics

Key insight: Bloom filters tell us which chunks to skip. Instead of scanning all chunks for a trace ID, we check each chunk's bloom filter in microseconds. A 128-bit filter with 3 hash functions can hold ~20 items before the false positive rate exceeds 5%. Real implementations use larger filters (e.g., 1 KB per chunk) for millions of IDs.