Design and Architecture of CockroachDb

Range Metadata

The default approximate size of a range is 64M (2^26 B). In order to support 1P (2^50 B) of logical data, metadata is needed for roughly 2^(50 - 26) = 2^24 ranges. A reasonable upper bound on range metadata size is roughly 256 bytes (3 12 bytes for the triplicated node locations and 220 bytes for the range key itself). 2^24 ranges 2^8 B would require roughly 4G (2^32 B) to store--too much to duplicate between machines. Our conclusion is that range metadata must be distributed for large installations.

To distribute the range metadata and keep key lookups relatively fast, we use two levels of indirection. All of the range metadata sorts first in our key-value map. We accomplish this by prefixing range metadata with two null characters (\0\0). The meta1 or meta2 suffixes are additionally appended to distinguish between the first level and second level of range metadata. In order to do a lookup for key1, we first locate the range information for the lower bound of \0\0meta1, and then use that range to locate the lower bound of \0\0meta2. The range specified there will indicate the range location of (refer to examples below). Using two levels of indirection, our map can address approximately 2^62 B of data, or roughly 4E (each metadata range addresses 2^(26-8) = 2^18 ranges; with two levels of indirection, we can address 2^(18 + 18) = 2^36 ranges; each range addresses 2^26 B; total is 2^(36+26) B = 2^62 B = 4E).

  • The following example shows the directory structure for a map with three ranges worth of data. The key/values in red show range metadata. The key/values in black show actual data. Ellipses indicate additional key/value pairs to fill out entire range of data. Except for the fact that splitting ranges requires updates to the range metadata with knowledge of the metadata layout, the range metadata itself requires no special treatment or bootstrapping.

Insert range table here...