6. Column-Oriented Databases | 112
Reduction of Commit Logs Each tablet server keeps only two
10
append-only commit log files for all
tablets it serves. This is due to the fact that number of commit logs would grow very if such a
file would be created and maintained on a per-tablet base, resulting in many concurrent writes to
GFS, potentially large numbers of disk seeks, and less efficiency of the group commit optimization.
A downside of this approach appears if tablets get reassigned (e.g. caused by tablet server crashes
or tablet redistribution due to load balancing): as a tablet server typically only servers part of the
tablets that were served by another server before, it has to read through the commit logs the former
server produced to establish the tablet representation shown in figure 6.3 on page 110; hence, if the
tablets once served by one tablet server get reassigned to n tablet servers, the commit logs have to
be read n times. To tackle this issue, the commits are stored and sorted by the key-triple (table,
row, log sequence number). This results in just one disk seek with a subsequent contiguous read
for each tablet server that has to evaluate commit logs in order to load tablets formerly served by
another tablet server. The sorting of a commit log is optimized in two ways. Firstly, a commit log is
sorted lazily: when a tablet server has to serve additional tablets, it beckons the master server that
it has to evaluate a commit log so that the master server can initiate the sorting of this commit log.
A second optimization is how the commit logs are sorted: the master server splits them up into 64
MB chunks and initiates a parallel sort of these chunks on multiple tablet servers.
Reducing GFS Latency Impacts The distributed Google File Systems is not robust against latency spikes,
e. g. due to server crashes or network congestion. To reduce the impact of such latencies, each tablet
server uses two writer threads for commit logs, each writing to its own file. Only one of these threads
is actively writing to GFS at a given time. If this active thread suffers from GFS “performance
hiccups”, the commit logging is switched to the second thread. As any operation in a commit log
has a unique sequence number, duplicate entries in the two commit logs can be eliminated when a
tablet server loads a tablet.
Improving Tablet Recovery Tablet recovery is the process of tablet-loading done by a tablet server that a
particular tablet has been assigned to. As discussed in this section and in section 6.1.4, a tablet server
has to evaluate the commit logs attached that contain operations for the tablet to load. Besides the
aforementioned Bloom Filter optimization, Bigtable tries to avoid that a tablet server has to read a
commit log at all when recovering a tablet. This is achieved by employing two minor compactations
when a tablet server stops serving a tablet. The first compactation is employed to reduce “the amount
of uncompacted state in the tablet server’s commit log”. The second compactation processes the
update operations that have been processed since the first compactation was started; before this
second compactation is executed, the tablet server stops serving any requests. These optimizations
to reduce tablet recovery time can only happen and take effect if a tablet server stops serving tablets
in a controlled fashion (i. e. it has not due to a crash).
Exploiting Immutability Bigtable leverages in manifold ways from the fact that SSTables are immutable.
Firstly, read operations to SSTables on the filesystem do not have to synchronized. Secondly, the
removal of data is delegated to background processes compacting SSTables and garbage-collecting
obsolete ones; hence, the removal of data can occur asynchronously and the time required for it does
not have to be consumed while a request is served. Finally, when a tablet gets split the resulting
child tablets inherit the SSTables from their parent tablet.
To provide efficient read and write access to the mutable memtable, a partial and temporary im-
mutability is introduced by making memtable rows “copy-on-write and allow reads and writes to
proceed in parallel”.
10
Chang et al. first speak of only one commit log per tablet server in their section “Commit-log implementation“ (cf. [CDG
+
06,
p. 7f]). Later in this section, they introduce the optimization of two commit-log writer threads per tablet server, each
writing to its own file (see below).