Indexing
Bloom Filters
A bloom filter is a compact bit array that answers "Is this item in the set?" in O(1) time. It can say "definitely not" or "probably yes" — but never gives a false negative. In a traces database, each chunk carries a bloom filter for its trace IDs. Looking up a trace? Check each chunk's filter in microseconds and skip the ones that definitely don't contain it.
① The Bit Array
128 bits, all starting at zero. Each cell shows its bit position index.
② Add Trace IDs
Type a trace ID (or use the presets) to hash it with 3 hash functions and set the corresponding bits.
Quick presets:
③ Lookup Trace ID
Check whether a trace ID might be in the set. All hashed bits must be 1 for a "probably yes."
④ Statistics
Key insight: Bloom filters tell us which chunks to skip. Instead of scanning
all chunks for a trace ID, we check each chunk's bloom filter in microseconds. A 128-bit
filter with 3 hash functions can hold ~20 items before the false positive rate exceeds 5%.
Real implementations use larger filters (e.g., 1 KB per chunk) for millions of IDs.