Design and Architecture of CockroachDb

Node Storage

Nodes maintain a separate instance of RocksDB for each disk. Each RocksDB instance hosts any number of ranges. RPCs arriving at a RoachNode are multiplexed based on the disk name to the appropriate RocksDB instance. A single instance per disk is used to avoid contention. If every range maintained its own RocksDB, global management of available cache memory would be impossible and writers for each range would compete for non-contiguous writes to multiple RocksDB logs.

In addition to the key/value pairs of the range itself, various range metadata is maintained.

  • range-spanning tree node links
  • participating replicas
  • consensus metadata
  • split/merge activity

A really good reference on tuning Linux installations with RocksDB is here.