Optimizing MySQL performance through indexing
This is the 1st post in a series on MySQL performance.
An index that makes the query fast is the most basic explanation of an index you have heard of.
As an engineer whose expertise is in the application and not MySQL performance, this can be overwhelming. You should start by analyzing query response time because it is meaningful and actionable.
Meaningful — The only metric we see is query response time. We become impatient when a query takes n seconds to execute. In the same query, a million rows might be examined, but we don’t see a million rows examined. Our time is precious.
Actionable — It is in your hands to change the query response time since you own the code. You can still indirectly optimize query response time even if you don’t have access to it.
The most effective way to improve performance in your database is to make proper use of indexes. In a database, an index is a data structure that correlates values to rows where these values occur in a given column. A database can find values more quickly with an index than by searching the whole table from top to bottom using brute force.
Most software developers do not know how or when to create an index. When to use an index is rarely or never explained in books and documentation about databases. Developers can only guess how to use them effectively.
How SQL query works ?
SELECT queries are used to retrieve data, and usually, a WHERE clause is used to filter the data. In most cases, an index is created on the columns specified in the WHERE clause of the query. In the absence of an index, the database scans all the rows, filters out the matching rows, and returns the result. If there are millions of records, this scan operation may take a long time, causing API and application response times to be slow.
Let’s create a database to better understand it. InnoDB is MySQL’s default database engine.
CREATE TABLE sample_table (
first_name VARCHAR(32) NOT NULL,
last_name VARCHAR(32) NOT NULL,
Insert 1000 random records into this table. Using a procedure, I will be able to generate these records.
DROP PROCEDURE generate_data;
CREATE PROCEDURE generate_data()
DECLARE i INT DEFAULT 0;
WHILE i < 1000 DO
INSERT INTO `sample_table` (`first_name`,`last_name`,`age`,`mobile_no`) VALUES (
CONV(FLOOR(RAND() * 99999999999999), 10, 36),
CONV(FLOOR(RAND() * 99999999999999), 10, 36),
LPAD(FLOOR(RAND() * 10000000000), 10, '0')
SET i = i + 1;
Run the following procedure
The command SHOW INDEX can be used to verify that no index has been created on this table.
show index from sample_table;
As of right now, if we run a simple SELECT query, the query will scan the entire table since there is no index defined on mobile_no.
EXPLAIN SELECT * from sample_table where mobile_no='5554688121';
EXPLAIN indicates how the query engine intends to execute the query. As you can see in the above snapshot, the rows column returns 1000, and possible_keys returns null. The possible_keys column represents all the indices that can be used in this query. Out of all possible indices, the key column indicates which index will be used.
What is Index?
The CREATE INDEX statement creates an index, which is a unique structure in the database. It requires its own disk space and stores a copy of the indexed table data. Therefore, an index is pure redundancy. A table index does not change the data in the table; it just creates a new data structure that refers to the table. Database indexes are after all very similar to book indices; they occupy their own space, are highly redundant, and refer to information stored in another location. In order to create a primary index, ALTER TABLE must be used, not CREATE INDEX.
It’s not necessary to create your own primary key. Since InnoDB by design must have a primary key in every table, it automatically creates one for you if you don’t define one. When you create a primary key later on for that table, InnoDB deletes the previously auto-generated primary key.
Now that we don’t have a primary key defined, let’s see what InnoDB creates for us by default:
SHOW EXTENDED INDEX FROM sample_table;
EXTENDED shows all the indices that are not accessible by the user, but are managed entirely by MySQL. MySQL has defined a composite index on DB_ROW_ID, DB_TRX_ID, DB_ROLL_PTR, as well as all columns in the table. In the absence of a primary key defined by the user, this index is used to find records uniquely.
ALTER TABLE sample_table ADD PRIMARY KEY (mobile_no);
Since I generated the table rows randomly, one of the criteria for the primary key is that the column value should be unique within the table. In my case, I have duplicate generated values, so I can’t use them as primary keys.
However, I can create an index on the mobile_no column, which I would show you here.
CREATE INDEX mobile_no_idx ON sample_table(mobile_no);
Let’s verify the index with the SHOW INDEX command.
SHOW INDEX FROM sample_table;
Now let’s see if this index reduces the number of rows that need to be searched for a given mobile_no in the WHERE clause of a query.
EXPLAIN SELECT * from sample_table where mobile_no='5554688121';
The rows only return 1, the possible_keys and key both return mobile_no_idx. With an index, the query optimizer goes straight to the record and retrieves it. This is very efficient. The purpose of an index is to reduce the search scope at the expense of additional space.
Indexes in MySQL can be categorized into a following types:
- B-Tree (Balanced Tree) indexes — The most commonly used and default type of index. This index can be used together with search queries that use the =, >, >=, <, <= and BETWEEN & LIKE keywords.
- R-Tree (Spatial Tree) indexes — With MySQL’s geometric data types, it can be used to index geographical objects.
- Hash indexes — Usually used in conjunction with = or <=> search operators. Very fast but can be used only when the memory storage engine is in use.
- Covering indexes — An index that covers all the columns required for a query to be completed.
- Clustered indexes — Row data is stored in such indexes. A PRIMARY KEY or, if none exist, a UNIQUE index is usually used
- Composite indexes —Multiple columns are used to create an index.
- Prefix indexes — An index of this type allows you to index a prefix of a column. These indexes do not index the full value of a column, so they are frequently used to conserve space.
The B-tree index is the most commonly used index in MySQL. As long as such an index is in place, MySQL can take advantage of search queries that use the keywords =, >, >=, <, <=, and BETWEEN & LIKE. It is one of the primary reasons such an index may be added to a column to speed up search queries. When a B-tree index is in use, the storage engine does not have to scan the entire table to find the relevant rows.
To add such an index to a table.
CREATE INDEX index_name ON table_name(column_name)
CREATE INDEX index_name ON table_name(column_name) USING BTREE;
Examples of queries that might make use of such an index
SELECT * from table_name where column_name <= 'value';
SELECT * from table_name where column_name BETWEEN 1 AND 100;
SELECT * from table_name where column_name LIKE '%value%';
In order to make use of R-Tree or Spatial indexes, you must use the Geographic Information System (GIS) functions provided by MySQL. MBRContains, MBRCovers, and MBREquals are some of these functions. They can indicate whether the minimum bounding rectangles of one parameter contain, cover, or equal the minimum bounding rectangles of another parameter.
To add such an index to a table, use the SPATIAL keyword.
CREATE SPATIAL INDEX index_name ON table_name(column_name);
The main advantage of hash indexes in MySQL is that they are very fast, but their main disadvantage is that they can only be used for equality comparisons and they only work on the MEMORY storage engine. Range searches are also not supported.
To create such an index on a table, make use of the USING HASH option at the end.
CREATE INDEX index_name ON table_name(column_name) USING HASH;
Hash indexes should only be used with = or <=> operators.
It’s possible to run into some corner cases that may require an index to be used in a different way when working with MySQL. Covering indexes are those that cover all fields necessary for a query to succeed. When a covering index is present, a query can retrieve results directly from the index rather than having to access the disk. If you use an index that already contains the values of the columns you want to search, MySQL will not access the disk, rather, it will retrieve the results directly from the index.
SELECT col1, col2 FROM table_name WHERE col3 = 'value';
Clustered indexes aren’t a separate MySQL index type, they are a different approach to storing data. If your table has a PRIMARY KEY, the clustered index is the PRIMARY KEY. It is the first UNIQUE INDEX with all of its key columns defined as NOT NULL if your table does not have a PRIMARY KEY.
Using such an index, all rows in the table are stored and sorted based on its key values. You cannot have more than one clustered index in your tables because each row can only have one sort order.
Composite indexes span multiple columns. If we have an index on multiple columns, we have a composite index.
CREATE INDEX index_name ON demo_table(col1,col2,col3);
In queries that use indexed fields for selection, joining, or filtering, composite indexes are beneficial.
SELECT * FROM table_name WHERE col1 = 'value1' AND col2 = 'val2';
In MySQL, composite indexes can consist of up to 16 columns, though bear in mind that MySQL uses indexes from left to right and not vice versa.
In some situations, you might need to index lengthy columns, making your indexes very large. As an alternative to indexing the entire value, prefix indexes can also index the first few characters of a column.
To create a prefix index on a table in MySQL
CREATE INDEX index_name ON table_name (column_name(length));
Using prefix indexes, it is important to index columns in a way that provides decent selectivity while saving space at the same time.
Antipattern: Using Indexes Without a Plan
Choosing our indexes by guessing inevitably leads to some mistakes.
- Defining no indexes or not enough indexes
- Defining too many indexes or indexes that don’t help
- Running queries that no index can help
It is common to read that a database incurs overhead when it maintains an index. Whenever we INSERT, UPDATE, or DELETE a table, the database must update its index data structures to make sure subsequent searches can reliably find the right set of rows. Overhead is often associated with waste. Therefore, when we read that maintaining an index incurs overhead, we want to eliminate it. In some cases, developers conclude that removing the indexes is the best solution. It is common advice, but it ignores the fact that indexes have benefits that justify their cost.
Too Many Indexes
An index is only useful if you run queries that use it. Creating indexes you don’t use is pointless. Because they don’t know which indexes will benefit their queries, some people create indexes on every column and every combination of columns. A database table covered with indexes incurs a lot of overhead with no guarantee of return.
When No Index Can Help
The third type of mistake is to run a query that can’t use an index. In order to make queries run faster, developers build more and more indexes, trying to find some magical combination of columns or index options.
CREATE INDEX telephone_idx on sample_table(last_name, first_name);
Here are some examples of queries that won’t benefit from this index:
SELECT * from sample_table ORDER BY first_name, last_name;
SELECT * FROM sample_table WHERE last_name = 'Charles' OR first_name = 'Charles' ;
Let’s see how the database engine plans to execute these queries.
EXPLAIN SELECT * from sample_table ORDER BY first_name, last_name;
Both possible_keys and key are null here, and all rows were scanned.
EXPLAIN SELECT * FROM sample_table WHERE last_name = 'Charles' OR first_name = 'Charles' ;
Here, possible_key is telephone_idx, but key is still null since this index was not used.
One of the most important rules of indexing is that you shouldn’t guess blindly.
Know your data, know your queries and mentor your indexes.