Skip to main content

Command Palette

Search for a command to run...

How to Optimize Slow Queries the Index-First Way (Without Guessing)

Most slow queries aren’t slow because SQL is “bad.” They’re slow because the database is forced to do expensive work repeatedly: scanning too many rows, sorting too much data, or bouncing between table pages like a delivery guy who forgot the add...

Published
12 min read
How to Optimize Slow Queries the Index-First Way (Without Guessing)
T

I am Tuanh.net. As of 2024, I have accumulated 8 years of experience in backend programming. I am delighted to connect and share my knowledge with everyone.

1. Why “index-first” is a mindset, not a magic button

1.1 What “slow” usually means in real systems

In production, a query is rarely slow in isolation. It becomes slow when data grows, when concurrency rises, when caches become less effective, or when one endpoint gets popular and suddenly your database is doing the same expensive plan thousands of times per minute. Index-first shines here because it scales with growth: you’re changing the access path, not just shaving milliseconds off string concatenation or ORM mapping. If your query is currently doing a full scan or scanning a huge range then sorting, you’re paying a tax proportional to table size, and that tax increases forever.

1.2 The two promises an index can make

An index helps in two core ways: it can reduce how many rows must be considered (selectivity), and it can reduce how much extra work must happen after filtering (ordering and lookups). When your WHERE clause filters strongly, a good index makes the database jump directly to relevant keys rather than read unrelated rows. When your ORDER BY matches the index order, the database can stream results in order without sorting a large intermediate set. When your SELECT list is covered by the index, the engine may avoid visiting the table heap/clustered data for each row, which is often where latency explodes under load.

1.3 The index-first workflow in one sentence

You find the query’s dominant cost in the execution plan, then design the smallest index that turns that dominant cost into a cheap index range scan (or seek), and you verify the plan change with real measurements rather than vibes.

2. Start with evidence: capture the query shape and the “real” parameters

Before creating any index, you want the exact query text and the parameter patterns that happen in production. A query that is fast for “status = ACTIVE” might be slow for “status = ARCHIVED,” and a query that is fast for a narrow date range might collapse when the range spans months. Index-first is not “index everything,” it’s “index the most common and most expensive access pattern.” If you build indexes for imaginary workloads, you’ll get imaginary performance and very real write overhead.

2.1 Identify the “access pattern,” not the endpoint name

The same API endpoint may trigger multiple query shapes, especially with optional filters. Indexing optional filters blindly often creates bloated composite indexes that the optimizer can’t reliably use. What you want is to categorize patterns like “tenant + status + created_at range ordered by created_at desc” or “tenant + user_id, latest N.” When you write down that pattern, you can design an index that matches it precisely, and the database can stop improvising with scans and sorts.

2.2 Read the plan like a detective, not a tourist

Execution plans differ by database, but the meaning is consistent: the plan tells you how rows are found, how they’re joined, how they’re filtered, and where time goes. In an index-first workflow, you’re hunting for one of these classic red flags: a full table scan on a big table, a massive sort for ORDER BY, a join that explodes rows early, or a filter applied late after many rows were already read. A plan that says “scan 10 million rows, then sort, then return 20” is basically the database screaming, “Please give me an index that matches your filter and your order.”

2.3 Confirm selectivity and cardinality with reality

Indexes win when they allow the engine to eliminate most rows early. If your filter is low selectivity, like “is_deleted = false” on 99% of rows, indexing that column alone is usually useless because it doesn’t narrow anything. Instead, you’d combine it with a more selective prefix, like “tenant_id” or “customer_id,” so the engine can narrow within a tenant first. Index-first means you pick leading columns that truly reduce the search space, not just columns that “look important.”

3. Design the index from the query outward (the practical rules that actually work)

Index design becomes straightforward when you stop thinking in “columns” and start thinking in “operations.” A WHERE clause is a narrowing operation. An ORDER BY is an ordering operation. A JOIN is a matching operation. A good index supports the most expensive operation first, while also fitting the query’s shape so the optimizer can use it without gymnastics.

3.1 Rule of thumb: equality first, then range, then order

For many OLTP queries, the most effective composite indexes follow this pattern: put equality predicates first (like tenant_id = ? and status = ?), then range predicates (like created_at >= ? and created_at < ?), and if you frequently ORDER BY created_at, align the index ordering to reduce sorting. This matters because once the engine hits a range condition, it often can’t use later columns to narrow further in the same way; so you want the strongest equalities up front.

3.2 Covering indexes: when “extra columns” are cheaper than extra I/O

A covering index includes all columns needed to satisfy the query, so the database can answer from the index alone. That sounds like “add everything,” but index-first is about being surgical: you include only the handful of columns you actually SELECT for that hot path. If your endpoint lists “id, title, created_at,” then adding “title” to the index can remove table lookups for every row returned. That can be a huge win when you return hundreds of rows per request under high concurrency.

E-E-A-T and real engineering both love this: build an index that supports a family of queries that share the same access path. For example, if several screens filter by tenant_id and status and sort by created_at, one well-designed composite index can speed up all of them. This reduces index sprawl, keeps writes faster, and makes maintenance easier. Index-first is “one index, many wins,” not “many indexes, one win.”

4. A concrete Java example: turning a scan + sort into an index range scan

Let’s use a realistic scenario: you have an “orders” table and a common API endpoint that fetches the latest orders for a tenant, filtered by status, in a time window, sorted by newest first. Without the right index, the database often scans a large portion of a tenant’s rows, filters, sorts, then returns a small page.

4.1 The slow query shape (common in real APIs)

Here’s a simplified query (the exact syntax varies slightly by database, but the idea is the same):

SELECT id, customer_id, total_amount, created_at
FROM orders
WHERE tenant_id = ?
AND status = ?
AND created_at >= ?
AND created_at < ?
ORDER BY created_at DESC
LIMIT ? OFFSET ?;

This query is deceptively expensive when you have millions of rows per tenant, because if the engine can’t use an index that matches the filter and the order, it may read a huge set and sort it. OFFSET makes it even worse because the database may have to walk past many rows before returning your page.

4.2 Java code that triggers it (Spring JDBC, easy to reason about)

Below is a Java method you might actually ship. It uses JdbcTemplate to keep the example focused on the query and the database behavior.

import org.springframework.jdbc.core.JdbcTemplate;
import java.sql.Timestamp;
import java.time.Instant;
import java.util.List;

public class OrderRepository {
private final JdbcTemplate jdbc;

public OrderRepository(JdbcTemplate jdbc) {
this.jdbc = jdbc;
}

public List<orderrow> findLatestOrders(
long tenantId,
String status,
Instant fromInclusive,
Instant toExclusive,
int limit,
int offset
) {
String sql = """
SELECT id, customer_id, total_amount, created_at
FROM orders
WHERE tenant_id = ?
AND status = ?
AND created_at >= ?
AND created_at < ?
ORDER BY created_at DESC
LIMIT ? OFFSET ?
""";

return jdbc.query(
sql,
(rs, rowNum) -> new OrderRow(
rs.getLong("id"),
rs.getLong("customer_id"),
rs.getBigDecimal("total_amount"),
rs.getTimestamp("created_at").toInstant()
),
tenantId,
status,
Timestamp.from(fromInclusive),
Timestamp.from(toExclusive),
limit,
offset
);
}

public record OrderRow(long id, long customerId, java.math.BigDecimal totalAmount, Instant createdAt) {}
}

What makes this example useful is that it mirrors typical production usage: stable equality predicates (tenant_id, status), a range (created_at window), and a sort (created_at desc). That combination is exactly where index-first delivers big wins.

4.3 The index-first fix: build the access path the query is begging for

A strong starting point is a composite index that matches the query’s filter prefix and its ordering:

-- Works conceptually across engines; exact syntax varies.
CREATE INDEX idx_orders_tenant_status_createdat_desc
ON orders (tenant_id, status, created_at DESC);

Why this specific order matters, in plain English: the database can first jump directly to the slice of the index for one tenant, then narrow to one status within that tenant, then traverse the created_at range in descending order so the ORDER BY becomes “free,” and finally stop early because LIMIT allows it to stop as soon as it has enough rows. That “stop early” part is the secret sauce: you’re not just speeding up scanning; you’re avoiding scanning entirely.

4.4 Make it even faster: cover the query to avoid table lookups

If your engine supports included columns (or if you can safely add extra columns to the index), you can often eliminate table reads for list endpoints:

-- Example idea: include columns so the query can be served from the index.
-- Syntax differs: Postgres uses INCLUDE; others may require adding columns to the index.
CREATE INDEX idx_orders_cover_latest
ON orders (tenant_id, status, created_at DESC)
INCLUDE (id, customer_id, total_amount);

Now the database can satisfy SELECT id, customer_id, total_amount, created_at without fetching rows from the base table for each result, which dramatically reduces random I/O. Under concurrency, random I/O is where latency goes to get married and have children.

4.5 Interpreting the improvement: what you should see in the plan

Before the index, you’ll often see something like “Seq Scan/Full Scan” and then a “Sort,” with huge row counts. After the index, you want to see an “Index Range Scan/Index Seek” using your composite index, and the sort step should disappear or become negligible. The best outcome is when the plan also indicates it can stop early due to LIMIT. Even if you don’t memorize plan operators, you’ll recognize the difference: the estimated rows examined collapses, and the expensive sort vanishes.

5. The common traps that make index-first fail (and how to avoid them)

Indexes are powerful, but they’re not a free lunch. If you add indexes like stickers on a laptop, you’ll eventually slow down writes, bloat storage, and confuse the optimizer with too many “almost right” options.

5.1 Indexing low-selectivity columns alone

Columns like is_active, is_deleted, or boolean flags rarely work well as standalone indexes because they don’t narrow enough. The fix is to pair them with a selective prefix like tenant_id or account_id, or to use partial indexes when your database supports them, so the index only covers the subset that matters, like “WHERE is_deleted = false.”

5.2 Building huge composite indexes “just in case”

A composite index with six columns might look impressive, but if the query patterns vary, the optimizer may not use it consistently, and updates become more expensive. Index-first prefers minimal indexes that match a stable access pattern. If you can’t describe the pattern in one short sentence, the index is probably trying to do too much.

5.3 OFFSET pagination: the silent performance killer

Even with a great index, large OFFSET values force the engine to walk past many rows. Index-first often pairs naturally with keyset pagination: instead of OFFSET, you pass the last seen (created_at, id) and ask for the next page “where created_at < ? or (created_at = ? and id < ?).” This transforms deep pagination from “walk a mile to get a coffee” into “take the elevator.”

6. Verification and maintenance: keeping the win after shipping

An index-first optimization is only real if it survives production data growth and workload changes.

6.1 Measure with representative data and concurrency

A query that improves 10× in a dev database might improve 1.2× in production if your dev data is tiny or differently distributed. Validate using production-like cardinalities, and measure not only average latency but also p95/p99 under load, because indexes often shine most when concurrency is high and random I/O becomes the bottleneck.

6.2 Keep indexes healthy and intentional

Over time, old endpoints die, new filters appear, and “temporary” workarounds become permanent architecture. Periodically review which indexes are actually used, drop the dead ones, and keep the useful ones aligned with current query clusters. Index-first is a strategy, not a one-time event.

6.3 Useful visuals to embed in your post (copy-friendly links)

A B-Tree index structure diagram helps readers “see” why range scans and ordering work so well: https://commons.wikimedia.org/wiki/File:B-tree.svgA high-level PostgreSQL EXPLAIN flow visual can help explain plan-reading concepts: https://www.postgresql.org/docs/current/using-explain.htmlA MySQL InnoDB index/page concept overview is great for understanding table lookups vs covering indexes: https://dev.mysql.com/doc/refman/8.0/en/innodb-index-types.html

7. Wrap-up: the index-first checklist you’ll actually reuse

Index-first optimization works because it forces you to be honest: you don’t “optimize SQL,” you optimize how the database finds data. When you start from the plan, choose an index that matches the query’s equality predicates, ranges, and ordering, and you verify with real measurements, you’re no longer guessing—you’re engineering. The best part is that the same thinking scales across databases and across years of data growth, which is exactly what you want in systems that refuse to stay small.

If you want, comment below with your slow query (anonymized) and its access pattern, and tell me your database engine (Postgres/MySQL/Oracle/etc.). What are you stuck on: the execution plan, the right composite index order, or pagination strategy?

Read more at : How to Optimize Slow Queries the Index-First Way (Without Guessing)

More from this blog

T

tuanh.net

540 posts

Are you ready to elevate your Java, OOP, Spring, and DevOps skills? Look no further!