Designing Data-Intensive Applications

Martin Kleppman ★★★★★
The Big Ideas Behind Reliable, Scalable, and Maintainable Systems

I rarely write about technical books because I rarely read them cover to cover.

This was not the case with this book, Designing Data-Intensive Applications (DDIA for short) by Martin Kleppman. It has an amazing ratio for return on time invested.

The book is composed of three main parts. Foundations of Data Systems, Distributed Data and Derived Data.

The first part, Foundations of Data Systems is about database internals: it describes the basic building blocks, behavior and semantics of different types of databases (not just relational ones!). Data models, encoding, storage, retrieval and schemas are discussed.

The second part, Distributed Data, present the difficulties one has to face when dealing with distributed systems. It deals with replication, partitioning, transactions, consistency and consensus and the inherent faulty nature of distributed systems.

The third part, Derived Data details batch processing and stream processing scenarios, and looks into the future of data systems, with some personal opinions of the author about where the field is headed.
Importantly, this part has a section about doing the right thing as engineers. It goes into some detail about the problematics of dealing with large amounts of personal data - surveillance, privacy, bias, discrimination are discussed.

The book has a great style - perfect balance between theory and practice, with a friendly tone. It also has huge amounts of references at the end of each chapter. Reading this book has that snowball effect where, seeing the interesting topics mentioned with references (articles, books, blog posts) you just want to read more and more.

An amazing read, very, very well worth the time invested.