MySQL query optimization is the art and science of improving the execution efficiency of your SQL statements. In today’s data-driven world, where datasets can contain millions or billions of rows, an unoptimized query can turn a milliseconds-fast operation into a minutes-long bottleneck, severely degrading application performance and user experience.
This article provides an extremely detailed breakdown of common MySQL query pitfalls and offers the correct, optimized solutions with paired code examples.
The Foundation: Understanding EXPLAIN
Before diving into specific optimization techniques, you must first understand how MySQL executes your query. The EXPLAIN statement is your primary tool for this, providing the execution plan.
| Field | Interpretation | Good Value | Bad Value |
| type | The join type—how MySQL looks for rows. | const, eq_ref, ref, range | ALL (Full Table Scan) |
| key | The index MySQL decided to use. | An actual index name. | NULL |
| rows | The number of rows MySQL estimates it must examine. | Low (e.g., 1 or 10) | High (e.g., millions) |
| Extra | Additional information about the execution plan. | Using index (Covering Index) | Using filesort, Using temporary |
Example: Using EXPLAIN
EXPLAIN SELECT * FROM orders WHERE customer_id = 100;
Analyze the output. If the type is ALL and key is NULL, you have a performance problem, likely needing an index.
Optimization Technique 1: Indexing Strategy
Indexing is the single most important factor in query performance. Indexes allow MySQL to quickly locate data without scanning every row in the table (a “Full Table Scan”).
❌ Wrong (Missing Index)
Attempting to filter a large table by a non-indexed column:
SELECT * FROM products
WHERE category_id = 5
AND status = 'in_stock'; -- SLOW on large tables without an index
Reasoning for the error: Without an index on category_id or status, MySQL must scan the entire products table row-by-row to find matches, which is extremely slow on large datasets. EXPLAIN would show type: ALL.
✅ Correct (Composite Index)
Creating and using a composite index that covers both columns used in the WHERE clause.
— 1. Create the composite index
CREATE INDEX idx_category_status ON products (category_id, status);
-- 1. Create the composite index
CREATE INDEX idx_category_status ON products (category_id, status);
-- 2. The query is now optimized
SELECT * FROM products
WHERE category_id = 5
AND status = 'in_stock'; -- FAST, index can be used.
Key Indexing Best Practice:
Composite Indexes: When filtering on multiple columns, use a composite index (col1, col2, ...) and put the most selective column (the one that filters out the most rows) first.
Covering Indexes: If an index includes all the columns needed by the query (in the SELECT, WHERE, JOIN, ORDER BY), MySQL can fetch the data directly from the index without accessing the table row, which is indicated by Extra: Using index in EXPLAIN.
Optimization Technique 2: Selecting Data
Avoid fetching columns you don’t need, especially in high-volume queries.
❌ Wrong (SELECT *)
Using the wildcard to select all columns:
SELECT * FROM user_profiles
WHERE last_login < DATE_SUB(NOW(), INTERVAL 1 YEAR);
Reasoning for the error: This forces the server to retrieve all column data (including potentially large text/BLOB columns) for matching rows, wasting network bandwidth, memory, and I/O resources.
✅ Correct (Specify Columns)
Explicitly listing only the necessary columns:
SELECT user_id, username, email FROM user_profiles
WHERE last_login < DATE_SUB(NOW(), INTERVAL 1 YEAR);
Key Benefit: If user_id, username, email, and last_login are all part of a composite index, this query may become a covering index query, indicated by Extra: Using index, giving a massive speed boost.
Optimization Technique 3: Avoiding Functions on Indexed Columns (Non-Sargable Predicates)
Applying a function to an indexed column in the WHERE clause prevents the index from being used, forcing a full scan. This is called creating a non-sargable predicate.
❌ Wrong (Function on Column)
Applying the YEAR() function to an indexed order_date column:
SELECT order_id, order_date FROM orders
WHERE YEAR(order_date) = 2024; -- SLOW, even if order_date is indexed
Reasoning for the error: MySQL must calculate the YEAR() function for every single row in the table before it can compare the result to 2024, effectively bypassing the index on order_date. EXPLAIN would likely show type: ALL or type: range but with poor key usage.
✅ Correct (Search Argument on Value)
Applying the function to the search value or using an index-friendly range search:
SELECT order_id, order_date FROM orders
WHERE order_date >= '2024-01-01'
AND order_date < '2025-01-01'; -- FAST, allows the index on order_date to be used.
Key Insight: The expression you’re comparing against must be a search argument (sargable) that MySQL’s optimizer can use directly with the index structure.
Optimization Technique 4: Subqueries vs. JOINs
While subqueries are readable, they can often be less efficient than a well-optimized JOIN, especially in older MySQL versions (pre-8.0)
❌ Wrong (Inefficient Subquery)
Using IN with a subquery that returns a large result set:
SELECT name FROM customers
WHERE customer_id IN (
SELECT customer_id FROM orders
WHERE order_total > 1000
); -- SLOW: may execute subquery first and then search the outer query.
Reasoning for the error: This can force MySQL to build a large temporary table from the subquery result or perform a slow, row-by-row nested loop, especially if the inner query isn’t indexed well.
✅ Correct (INNER JOIN)
Rewriting the logic using an INNER JOIN:
SELECT c.name FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_total > 1000
GROUP BY c.name; -- FAST: uses indexed join columns (customer_id)
Key Benefit: JOIN operations are generally better optimized by the MySQL engine than IN (subquery). Ensure the join columns (customer_id in this case) are indexed in both tables for maximum performance.
Optimization Technique 5: Pagination and Limiting Results
For large result sets, always limit the rows returned, especially for user-facing lists or reports.
❌ Wrong (Inefficient Large Offset)
Paginating deeply into a large table:
SELECT * FROM log_entries
ORDER BY log_date DESC
LIMIT 100000, 100; -- SLOW: MySQL must scan and sort 100,100 rows just to discard 100,000.
Reasoning for the error: MySQL still has to find, sort, and process the first 100,000 rows before it can discard them and return the next 100. This is resource-intensive and gets worse as the offset grows.
✅ Correct (Index-Based Pagination)
Using a “keyset” or “seek” pagination based on the primary key or a unique indexed column from the previous page:
-- Assume the last log_id from the previous page was 123456
SELECT * FROM log_entries
WHERE log_id < 123456 -- or > for ascending order
ORDER BY log_id DESC
LIMIT 100; -- FAST: uses the index on log_id directly
Key Benefit: This method jumps directly to the starting point in the index, avoiding the scan and sort of the large offset.
Optimization Technique 6: Using LIKE Efficiently
LIKE with a leading wildcard (%) cannot use a standard B-tree index.
❌ Wrong (Leading Wildcard)
Searching for a string where the pattern starts with a wildcard:
SELECT username FROM users
WHERE username LIKE '%smith'; -- SLOW: cannot use an index on username
Reasoning for the error: B-tree indexes are sorted by the column’s starting values. When the search string starts with a wildcard (%), the optimizer cannot use the index’s ordered nature and resorts to a full index scan or a full table scan.
✅ Correct (Trailing Wildcard or Full-Text Search)
Searching with only a trailing wildcard (which can use the index), or using a specialized index for complex searches:
-- 1. Optimized for Trailing Wildcard
SELECT username FROM users
WHERE username LIKE 'smi%'; -- FAST: can use index on username
-- 2. For the leading wildcard case, use FULL-TEXT Search
-- Requires a FULLTEXT index: CREATE FULLTEXT INDEX idx_ft_username ON users(username);
SELECT username FROM users
WHERE MATCH(username) AGAINST('smith'); -- Use for keyword searching
Key Takeaway: Reserve standard indexes for operations where the search pattern is fixed at the start. Use FULLTEXT indexes for complex keyword searches where performance is critical.
Optimization Technique 7: Handling UNION vs. UNION ALL
The difference between UNION and UNION ALL can dramatically affect performance.
❌ Wrong (Unnecessary UNION)
Using UNION when you know your data sets have no duplicates:
SELECT first_name, last_name FROM employees WHERE department = 'Sales'
UNION
SELECT first_name, last_name FROM employees WHERE department = 'Marketing'; -- SLOW: requires sorting and duplicate elimination
Reasoning for the error: UNION implicitly performs a DISTINCT operation, requiring MySQL to execute a sort and comparison of the combined result sets to remove duplicate rows. This often results in a temporary table and Extra: Using temporary; Using filesort in EXPLAIN.
✅ Correct (UNION ALL)
Using UNION ALL when duplicate elimination is not necessary:
SELECT first_name, last_name FROM employees WHERE department = 'Sales'
UNION ALL
SELECT first_name, last_name FROM employees WHERE department = 'Marketing'; -- FAST: simply combines the result sets
Key Principle: Use UNION ALL unless you are absolutely sure you need to eliminate duplicate rows.
Optimization Technique 8: Appropriate Data Types
Choosing the correct, smallest possible data type for your columns saves space and speeds up I/O and memory usage.
❌ Wrong (Overly Large Data Type)
Using a large data type when a smaller one would suffice:
CREATE TABLE settings (
setting_key VARCHAR(255) PRIMARY KEY,
is_enabled INT -- 4 bytes for a boolean (0 or 1)
);
Reasoning for the error: INT uses 4 bytes of storage. If the column only ever stores 0 or 1, 3 bytes per row are wasted, which adds up quickly on large tables, making the table and its indexes unnecessarily large and slow to read.
✅ Correct (Minimal Data Type)
Using the most appropriate minimal data type:
CREATE TABLE settings (
setting_key VARCHAR(255) PRIMARY KEY,
is_enabled TINYINT(1) -- 1 byte for a boolean (0 or 1)
);
Key Insight: Smaller data types mean smaller tables, smaller indexes, and faster I/O because MySQL can fit more rows into its memory buffers. Use TINYINT for small integers/booleans, SMALLINT for medium-sized ones, and VARCHAR with a reasonable length instead of a large blanket value like VARCHAR(255).
Continuous Monitoring with the Slow Query Log
Optimization is an ongoing process. You must identify which queries are actually causing problems in a production environment.
Enable the Slow Query Log in your MySQL configuration. It records all queries that take longer than the specified long_query_time (e.g., 1 or 2 seconds) to execute. Regularly review and optimize the top offenders logged here.