Data Read in Cassandra

Category: Cassandra   Tags: Cassandra, Learning, Beginners, Basics, NoSQL Database

During a read operation, Cassandra use the below modules in order:

How Is Data Written in Cassandra
Image Source:

Check the memtable
If the memtable has the desired partition data, then the data is read and then merged with the data from the SSTables.
Check row cache, if enabled
Row cache improve performance for read-intensive operations. When row cache is enabled, it stores a subset of the partition data stored on disk in the SSTables in memory. Requested partition data is read from the row cache if it is enabled and it makes read operation faster.

You can configure that how many row should be stored in the row cache. This will make "last 100 record" type query fast.

When the row cache is full, it reclaim memory using LRU (least-recently-used) policy.

Checks Bloom filter
Each SSTable has a Bloom filter associated with it and it is used to discover which SSTables are likely to have the request partition data and this way It speeds up the process of partition key lookup.
Checks Partition Key Cache, if enabled
Sometimes SSTables identified by the Bloom filter won't have data. So if the Bloom filter does not rule out an SSTable, Cassandra checks the partition key cache.

Partition Key Cache stores a cache of the partition index. If a partition key is found in the partition key cache then it jumps to compression offset map else checks the partition summary.

Partition Summary
In Cassandra a partition index contains all partition keys, whereas a partition summary samples every X keys, and maps the location of every Xth key's location in the index file e.g. if the partition summary is set to sample every 100 keys, it will store the location of the first key as the beginning of the SSTable file, the 100th key and its location in the file, and so on.
Partition Index
The partition index resides on disk and stores an index of all partition keys mapped to their offset.
Compression offset map
The compression offset map locates the data on disk.
Fetches the data from the SSTable on disk.