BlueStore Internals¶
Small write strategies¶
U: Uncompressed write of a complete, new blob.
write to new blob
kv commit
P: Uncompressed partial write to unused region of an existing blob.
write to unused chunk(s) of existing blob
kv commit
W: WAL overwrite: commit intent to overwrite, then overwrite async. Must be chunk_size = MAX(block_size, csum_block_size) aligned.
kv commit
wal overwrite (chunk-aligned) of existing blob
N: Uncompressed partial write to a new blob. Initially sparsely utilized. Future writes will either be P or W.
write into a new (sparse) blob
kv commit
R+W: Read partial chunk, then to WAL overwrite.
read (out to chunk boundaries)
kv commit
wal overwrite (chunk-aligned) of existing blob
C: Compress data, write to new blob.
compress and write to new blob
kv commit
Possible future modes¶
F: Fragment lextent space by writing small piece of data into a piecemeal blob (that collects random, noncontiguous bits of data we need to write).
write to a piecemeal blob (min_alloc_size or larger, but we use just one block of it)
kv commit
X: WAL read/modify/write on a single block (like legacy bluestore). No checksum.
kv commit
wal read/modify/write
Mapping¶
This very roughly maps the type of write onto what we do when we encounter a given blob. In practice it’s a bit more complicated since there might be several blobs to consider (e.g., we might be able to W into one or P into another), but it should communicate a rough idea of strategy.
raw |
raw (cached) |
csum (4 KB) |
csum (16 KB) |
comp (128 KB) |
|
128+ KB (over)write |
U |
U |
U |
U |
C |
64 KB (over)write |
U |
U |
U |
U |
U or C |
4 KB overwrite |
W |
P | W |
P | W |
P | R+W |
P | N (F?) |
100 byte overwrite |
R+W |
P | W |
P | R+W |
P | R+W |
P | N (F?) |
100 byte append |
R+W |
P | W |
P | R+W |
P | R+W |
P | N (F?) |
4 KB clone overwrite |
P | N |
P | N |
P | N |
P | N |
N (F?) |
100 byte clone overwrite |
P | N |
P | N |
P | N |
P | N |
N (F?) |