Best practices for multi-column indexes in relational databases

Best practices for multi-column indexes in relational databases

Multi-column indexes are also known as composite, combined, or concatenated indexes. They are used to improve the speed of data retrieval operations in relational databases. Multi-column indexes are more complex compared to single-column indexes and therefore require extra attention when defining them. Here are the best practices for creating and working with such indexes.

The right column order

Choose a column order that allows an index to be used by as many queries as possible. A query uses a multicolumn index whenever leading columns are included into the search.

As an example, let’s look at the table that contains a telephone directory and is defined as follows:

create table Phonebook
(
  Person_Id int primary key,
  First_Name varchar(100),
  Last_Name varchar(100),
  Phone_Number varchar(20),
  Street_Address varchar(100),
  City varchar(100),
  Country varchar(100)
)

This table has a multicolumn index on Last_Name and First_Name columns:

create index Phonebook_Subscriber_Index on Phonebook(Last_Name, First_Name)

Now, let’s look at three queries that use columns on which the index is based:

-- Query # 1
select Phone_Number
from Sale
where Last_Name = 'Smith'
  and First_Name = 'John'

-- Query # 2
select count(*)
from Sale
where Last_Name = 'Smith'

-- Query # 3
select count(*)
from Sale
where First_Name = 'John'

The first two queries benefit from the index since both of them use the first column that is included in the index.

The last query is not able to use the index because it filters data by the last column.

Just like using a real phone book that has subscribers listed in the alphabetical order first by last name and then by first name. It is easy to find subscribers by last name. It is also easy to find them by last name and first name together. But when it is necessary to find all subscribers with a specific first name, it will be required to scan through all phone book. It will not make any difference whether the first name that is used for a search is unique or not.

The most selective columns first myth

There is a popular belief that the most selective columns must always be the leading columns in an index. In other words, columns with a higher cardinality must appear before columns with lower cardinality in the definition of an index.

This is not true. As discussed earlier, the most important consideration when choosing the order of columns for an index is how this index will be used by queries.

Performance and storage tradeoffs

Database indexes bring improved speed of querying the data at the cost of additional writes and storage space that is used to maintain the data structure of these indexes. Each new index slows down insert, update and delete operations because the system has to keep indexes consistent with data modifications that are made to tables.

This is why it is recommended to create the minimum possible number of indexes with the smallest number of columns.