SQL Performance Explained: Everything developers need to know about SQL performance

Book cover image

It’s really strange that there are so few SQL and relational data management books written for a general developer audience. RDBMS systems are integrated in all kinds of applications that deal with user data, and a big majority of web aplications interface to them. The job of creating, running and optimizing SQL, however, has been delegated to specialized subsystems such ORM’s, and DB administrators. The results are the various horrendously performing web applications that I have had the pleasure to work with, and the weird ORM generated schemas that worked perfectly well in local development or testing, but turned out to be ticking timebombs on production systems.

This book fits perfectly in this space of SQL books for all developers. Its aim is to concisely explain principles of database performance and optimization, focusing on the most important tools that all databases provide, especially the B-tree index. Winand starts with a description of how databases organize data, and how B-tree indexes are used to find the relevant data. The discussion of indexes is coupled with many concrete examples and tips on finding out how a database is processing a query. The examples are for Oracle databases, and in some places, there are lengthy discussions of Oracle-specific topics (such as non-standard handling of NULL values), but they are mostly applicable to other databases, especially PostgreSQL. There is also an appendix on using the query explainers of various RDBMS systems. Winand goes into detail on various ways of creating indexes (e.g. covering and partial indexes), and for which scenarios these different methods are suitable, without getting lost in text. The discussion of index-related topics takes approximately half of the book. The rest is dedicated to other RDBMS functionalities such as joins, insert and update statements, but always with a focus on how the index can be used to make these processes faster and more efficient, and how various indexes themselves are affected.

The most obvious shortcoming of SQL Performance Explained is that there is no sample data set for experimenting. I had the chance of appying many techniques described here to our production database, but directly experimenting with a sample database with various complexities would lead to a better learning experience in general. Clarity also sometimes suffers from the conciseness. For example, some terms are not defined, such as access rate on p. 86 or what to materialize means, on p. 140. Generally useful would have been a glossary that defines the most important terms. That said, if such minor ommissions that can be completed by a quick web search are the price of the conciseness of this book, this is a price I’m happy to pay.

If you are going to read only one book about database performance, there are much worse choices than SQL Performance Explained. But do read more books about databases in general, so that I don’t have to take over another agonizingly slow legacy system.