Part 1. Foundations of data systems. Reliable, scalable, and maintainable applications
Data models and query languages
Part 2. Distributed data. Replication
The trouble with distributed systems
Consistency and consensus
Part 3. Derived data. Batch processing
The future of data systems.
Copyright; Table of Contents; Preface; Who Should Read This Book?; Scope of This Book; Outline of This Book; References and Further Reading; O'Reilly Safari; How to Contact Us; Acknowledgments; Part I. Foundations of Data Systems; Chapter 1. Reliable, Scalable, and Maintainable Applications; Thinking About Data Systems; Reliability; Hardware Faults; Software Errors; Human Errors; How Important Is Reliability?; Scalability; Describing Load; Describing Performance; Approaches for Coping with Load; Maintainability; Operability: Making Life Easy for Operations; Simplicity: Managing Complexity.
Evolvability: Making Change EasySummary; Chapter 2. Data Models and Query Languages; Relational Model Versus Document Model; The Birth of NoSQL; The Object-Relational Mismatch; Many-to-One and Many-to-Many Relationships; Are Document Databases Repeating History?; Relational Versus Document Databases Today; Query Languages for Data; Declarative Queries on the Web; MapReduce Querying; Graph-Like Data Models; Property Graphs; The Cypher Query Language; Graph Queries in SQL; Triple-Stores and SPARQL; The Foundation: Datalog; Summary; Chapter 3. Storage and Retrieval.
Data Structures That Power Your DatabaseHash Indexes; SSTables and LSM-Trees; B-Trees; Comparing B-Trees and LSM-Trees; Other Indexing Structures; Transaction Processing or Analytics?; Data Warehousing; Stars and Snowflakes: Schemas for Analytics; Column-Oriented Storage; Column Compression; Sort Order in Column Storage; Writing to Column-Oriented Storage; Aggregation: Data Cubes and Materialized Views; Summary; Chapter 4. Encoding and Evolution; Formats for Encoding Data; Language-Specific Formats; JSON, XML, and Binary Variants; Thrift and Protocol Buffers; Avro; The Merits of Schemas.
Modes of DataflowDataflow Through Databases; Dataflow Through Services: REST and RPC; Message-Passing Dataflow; Summary; Part II. Distributed Data; Chapter 5. Replication; Leaders and Followers; Synchronous Versus Asynchronous Replication; Setting Up New Followers; Handling Node Outages; Implementation of Replication Logs; Problems with Replication Lag; Reading Your Own Writes; Monotonic Reads; Consistent Prefix Reads; Solutions for Replication Lag; Multi-Leader Replication; Use Cases for Multi-Leader Replication; Handling Write Conflicts; Multi-Leader Replication Topologies.
Leaderless ReplicationWriting to the Database When a Node Is Down; Limitations of Quorum Consistency; Sloppy Quorums and Hinted Handoff; Detecting Concurrent Writes; Summary; Chapter 6. Partitioning; Partitioning and Replication; Partitioning of Key-Value Data; Partitioning by Key Range; Partitioning by Hash of Key; Skewed Workloads and Relieving Hot Spots; Partitioning and Secondary Indexes; Partitioning Secondary Indexes by Document; Partitioning Secondary Indexes by Term; Rebalancing Partitions; Strategies for Rebalancing; Operations: Automatic or Manual Rebalancing; Request Routing.