Skip to content

Label Storage

String Interning

String interning means: instead of storing the same string over and over, store it once in a byte buffer and give each copy a tiny 4-byte integer that points to the original. In metric storage, labels like region=us-west-2 appear in thousands of series — without interning, each copy wastes memory. Generate Kubernetes-style metrics and watch memory usage collapse as strings are reused.

A series is one stream of metric data uniquely identified by its labels. Labels are key=value pairs like job=api or region=us-west. A monitoring system tracking 10,000 pods might have 50,000+ series — but most share the same small set of label values, making interning highly effective.

① Generate Series

Create realistic Kubernetes metric series with shared labels. Adjust the count and hit Generate.

100

② At a Glance

③ Naive vs Interned

Side-by-side: every series stores full strings (left) vs. integer IDs into a shared buffer (right).

Naive Storage

Interned Storage

String Buffer (unique strings, packed)
ID Table
Series References

④ Memory Comparison

How much space does interning actually save?

⑤ Intern a String

Step through the FNV-1a hash → open-addressing → store-or-reuse pipeline.

To check whether a string is already stored, we use a hash table — a data structure for near-instant lookups:

  1. Hash: run the string through a hash function (FNV-1a) to get a slot number
  2. Probe: check that slot; if taken by a different string, check the next slot (linear probing)
  3. Insert or reuse: if the string is new, store it and record its position; if it already exists, return the existing position
Hash Table (open addressing)

⑥ Cardinality Impact

Memory grows differently for naive vs interned as cardinality (the number of unique strings) changes.