학술논문

The LSM Design Space and its Read Optimizations
Document Type
Conference
Source
2023 IEEE 39th International Conference on Data Engineering (ICDE) ICDE Data Engineering (ICDE), 2023 IEEE 39th International Conference on. :3578-3584 Apr, 2023
Subject
Computing and Processing
Navigation
Layout
Tutorials
Data structures
Throughput
Data engineering
Indexes
LSM-trees
Key-value stores
Storage engine
Language
ISSN
2375-026X
Abstract
Log-structured merge (LSM) trees have emerged as one of the most commonly used storage-based data structures in modern data systems as they offer high throughput for writes and good utilization of storage space. However, LSM-trees were not originally designed to facilitate efficient reads. Thus, state-of-the-art LSM engines employ numerous optimization techniques to make reads efficient. The goal of this tutorial is to present the fundamental principles of the LSM paradigm along with the various optimization techniques and hybrid designs adopted by LSM engines to accelerate reads.Toward this, we first discuss the basic LSM operations and their access patterns. We then discuss techniques and designs that optimize point and range lookups in LSM-trees: (i) index and (ii) filter data structures, (iii) caching, and (iv) read-friendly data layouts. Next, we present the performance tradeoff between writes and reads, outlining the rich design space of the LSM paradigm and how one can navigate it to improve query performance. We conclude by discussing practical problems and open research challenges. This will be a 1.5-hour tutorial.