Skip to main content

Command Palette

Search for a command to run...

How to Build a 3-Tier Caching Stack with CDN + Redis + Local Cache (Without Lying to Users)

This article dives deep into how a real-world multi-tier caching strategy actually works when you combine CDN, Redis, and local in-process cache instead of treating them as buzzwords. It explains why each caching layer exists, what it should cach...

Published
11 min read
How to Build a 3-Tier Caching Stack with CDN + Redis + Local Cache (Without Lying to Users)
T

I am Tuanh.net. As of 2024, I have accumulated 8 years of experience in backend programming. I am delighted to connect and share my knowledge with everyone.

1. The indirect problem you’re actually solving

Most teams don’t wake up and say, “Let’s add Redis and a CDN today.” What usually happens is subtler: the product grows, traffic spikes at the worst possible time, and suddenly your database becomes the main character in a horror movie. The tricky part is that performance problems rarely come from “slow code” alone. They come from repeated work: the same payload recomputed, the same SQL executed, the same JSON serialized, the same bytes shipped over and over. A caching tier is just a disciplined way to admit you’re doing déjà vu at scale—then get paid back in latency.

1.1 Why a single cache layer isn’t enough anymore

One layer can help, but it’s rarely the best ROI once you have real traffic patterns. CDNs are unbeatable for serving identical bytes globally, but they don’t know your business semantics; Redis is great for shared, fast data access, but it still costs a network hop; local in-process cache is absurdly fast, but it can lie (stale data) and it can explode (memory). The winning architecture is not “pick one,” it’s “make each layer do what it’s uniquely good at,” then define rules so the layers don’t sabotage each other.

1.2 The mental model: “bytes, objects, and truth”

Think of the tiers as serving different “forms” of value. The CDN caches bytes (HTTP responses) close to users using standardized caching directives and validators defined by HTTP caching rules. (rfc-editor.org) Redis caches shared objects (domain data or precomputed fragments) so all app instances benefit. (Redis) Local cache caches hot execution results inside a single JVM so you skip both network hops and serialization overhead. The hard part isn’t adding caches—it’s making sure you always know which tier is allowed to be “wrong” and for how long.

1.3 Where things go wrong in real systems

Multi-tier caching fails in predictable ways. You get stampedes when a popular key expires and 1,000 requests stampede your origin at once. You get ghost bugs when local cache keeps serving stale objects after a write because you didn’t invalidate correctly. You get “CDN won’t cache” mysteries because your headers accidentally say “don’t cache this anywhere.” These aren’t edge cases; they’re the default outcome if you don’t design explicit policies for TTLs, invalidation, and revalidation.

2. What each tier should cache in a practical product

A useful way to decide is to classify endpoints and data shapes, especially for your kind of system (a content-heavy site like a movie platform).

2.1 CDN: cache public, versionable HTTP responses

The CDN should cache responses that are the same for many users and can tolerate being slightly stale, such as home page sections, top trending lists, poster metadata, static JSON “landing” payloads, and images. You control CDN behavior with HTTP caching directives like Cache-Control, plus revalidation mechanisms like ETag and conditional requests described by the HTTP caching standard. (rfc-editor.org) If you want to be fancy without being reckless, patterns like stale-while-revalidate let caches serve stale content briefly while refreshing in the background, reducing user-visible latency spikes. (developer.mozilla.org)

2.2 Redis: cache shared computed objects and “expensive joins”

Redis is your “shared brain.” It’s perfect for expensive results you don’t want to recompute for every request across multiple app instances: aggregated lists, denormalized view models, rate-limit counters, session-ish data, and “this payload took 80ms of DB time but is read 10,000 times.” Common strategies include cache-aside (lazy loading) and write-through depending on how much consistency you need. (docs.aws.amazon.com)

2.3 Local cache: cache ultra-hot, ultra-cheap lookups

Local cache is your “muscle memory.” It shines when a small set of keys are hammered repeatedly and you want sub-millisecond responses inside one JVM. Libraries like Caffeine are popular because they handle eviction and time-based policies efficiently. (Baeldung on Kotlin) But local cache must be treated like a convenience layer, not a source of truth; it needs short TTLs, careful sizing, and safe behavior under load.

3. The core design: cache hierarchy + revalidation + stampede protection

This is where the architecture becomes “real,” because you’re designing behavior, not just wiring.

3.1 Choose a default read path and stick to it

A practical default is: Local → Redis → DB, and only if the request is eligible for HTTP caching do you also let the CDN sit in front of the origin. In other words, CDN is the outer shield; local+Redis are your internal shields. This layering matters because it determines failure modes: when Redis is slow, local can mask it; when local is cold (new deployment), Redis can mask it; when both are cold, your DB needs to survive the spike.

3.2 Use revalidation at the edge instead of “hard TTL” everywhere

Hard TTL-only caching causes synchronized expirations, which is how you summon a stampede. Edge revalidation (ETag/If-None-Match) lets CDNs refresh safely while preserving correctness guarantees defined by HTTP caching behavior. (rfc-editor.org) Many CDNs also document how they handle revalidation and collapsing revalidation requests so the origin doesn’t get spammed during refresh. (Cloudflare Docs)

3.3 Stampede protection: don’t let 1 key take down your origin

You want a “single flight” mechanism so only one request performs the expensive load when a key is missing, while others wait briefly for the result. You also want TTL jitter so hot keys don’t all expire at the same second. Even in the Java caching ecosystem, the “refresh storm” problem is well-known and discussed in the context of local caches. (GitHub) In Redis, you can implement locks or request coalescing to prevent dogpiles; the exact method depends on your latency SLOs and how okay you are with serving slightly stale data.

4. A concrete Java example: CDN headers + Redis + Local Caffeine, with safe fallback

Below is a Spring Boot-style example for an endpoint like /api/homepage, returning a JSON payload used by your Next.js frontend. The point is not “copy-paste this and ship,” but to show a complete, coherent policy: local cache is fastest, Redis is shared, DB is last, CDN caches the bytes, and clients get validators so CDNs can revalidate cheaply.

4.1 Java code: a 3-tier cache-aside read with ETag + Cache-Control

import com.fasterxml.jackson.databind.ObjectMapper;
import com.github.benmanes.caffeine.cache.Cache;
import com.github.benmanes.caffeine.cache.Caffeine;
import org.springframework.data.redis.core.StringRedisTemplate;
import org.springframework.http.CacheControl;
import org.springframework.http.ResponseEntity;
import org.springframework.util.DigestUtils;
import org.springframework.web.bind.annotation.;

import java.nio.charset.StandardCharsets;
import java.time.Duration;
import java.util.concurrent.TimeUnit;

@RestController
@RequestMapping("/api")
public class HomepageController {

private final StringRedisTemplate redis;
private final ObjectMapper mapper;

// Local cache: ultra-fast, small TTL, size-bounded
private final Cache<string, string=""> localJsonCache = Caffeine.newBuilder()
.maximumSize(2_000)
.expireAfterWrite(3, TimeUnit.SECONDS) // intentionally short: convenience layer
.build();

public HomepageController(StringRedisTemplate redis, ObjectMapper mapper) {
this.redis = redis;
this.mapper = mapper;
}

@GetMapping("/homepage")
public ResponseEntity<string> homepage(
@RequestHeader(value = "If-None-Match", required = false) String ifNoneMatch
) throws Exception {

final String cacheKey = "hp:v1:guest"; // version the key so you can rotate safely
String json = localJsonCache.getIfPresent(cacheKey);

// Tier 2: Redis (shared)
if (json == null) {
json = redis.opsForValue().get(cacheKey);
if (json != null) {
localJsonCache.put(cacheKey, json);
}
}

// Tier 3: DB / compute (origin truth)
if (json == null) {
HomepageDto dto = loadHomepageFromDatabaseAndCompute();
json = mapper.writeValueAsString(dto);

// Write to Redis with a TTL (add jitter to avoid synchronized expiry)
long baseSeconds = 30;
long jitter = (long) (Math.random()
10); // 0..9s
redis.opsForValue().set(cacheKey, json, Duration.ofSeconds(baseSeconds + jitter));

localJsonCache.put(cacheKey, json);
}

// Strong validator for CDN/browser revalidation
String etag = """ + DigestUtils.md5DigestAsHex(json.getBytes(StandardCharsets.UTF_8)) + """;

// If client/CDN already has the same version, return 304
if (etag.equals(ifNoneMatch)) {
return ResponseEntity.status(304)
.eTag(etag)
.cacheControl(CacheControl.maxAge(Duration.ofSeconds(0)).mustRevalidate())
.build();
}

// CDN policy:
// - public: cacheable by shared caches
// - s-maxage: shared cache TTL (CDN)
// - max-age: browser TTL
// - stale-while-revalidate: allow serving slightly stale while refreshing
return ResponseEntity.ok()
.eTag(etag)
.cacheControl(CacheControl.empty()) // set manually for precision
.header("Cache-Control",
"public, max-age=10, s-maxage=60, stale-while-revalidate=30")
.body(json);
}

private HomepageDto loadHomepageFromDatabaseAndCompute() {
// Example: query trending + new releases + banners, then assemble a DTO
// Keep this function pure-ish so caching remains predictable.
return HomepageDto.mock();
}

// Minimal DTO for demonstration
public record HomepageDto(String title, String[] trending, String[] newReleases) {
static HomepageDto mock() {
return new HomepageDto(
"Tonight's Picks",
new String[]{"Nano Machine (yes, still trending)", "Interstellar", "Oldboy"},
new String[]{"Dune", "The Batman", "Spirited Away"}
);
}
}
}

4.2 What the code is really doing (and why it’s designed this way)

The local cache is deliberately short-lived and size-bounded because it’s not meant to guarantee freshness; it’s meant to eliminate redundant work inside one JVM when traffic bursts hit the same key repeatedly. This is the layer that makes p95 latency look good when your app is warm, but it should never be the layer that decides business correctness. That’s why the TTL is only a few seconds and the key is versioned (hp:v1:guest): versioning gives you a clean “kill switch” for cache invalidation when you deploy a new DTO shape, because nothing is worse than caching the wrong JSON schema at scale.

Redis is the shared layer that smooths over multi-instance deployments. When a new pod comes up, local cache is empty; Redis prevents a “cold start storm” from hammering your DB. The TTL uses jitter so keys don’t expire simultaneously, which reduces synchronized stampedes—one of the most common self-inflicted outages in caching systems. The final tier, loadHomepageFromDatabaseAndCompute(), is the only place allowed to be “truth,” so it stays simple and deterministic; this makes cache behavior predictable and debuggable, and it prevents subtle bugs where “compute logic” accidentally depends on time, randomness, or request metadata.

The ETag is a validator so the edge can revalidate cheaply. HTTP caching defines how caches reuse stored responses and how validators work with conditional requests; with ETag, a CDN can ask, “Has this changed?” and your origin can answer with a 304 without shipping the full payload. (rfc-editor.org) The Cache-Control header expresses a layered policy: the browser can keep it briefly (max-age=10), the CDN can keep it longer (s-maxage=60), and stale-while-revalidate=30 gives you a pragmatic “serve fast while refreshing” behavior that reduces user-facing latency spikes during refresh windows. (developer.mozilla.org)

4.3 How this maps to CDN behavior in production

In production, the CDN caches the HTTP response bytes according to your caching directives, and many CDNs document how they revalidate stale content and how they reduce duplicated revalidation requests to the origin. (Cloudflare Docs) Your origin then becomes a “revalidation oracle” for hot endpoints instead of a full payload factory every time. That’s a huge shift in load profile: fewer expensive DB reads, fewer JSON serializations, fewer large responses, and more tiny 304 answers—great for both cost and latency.

5. The “grown-up” parts: invalidation, writes, and consistency

Caching is easy for read-only data. Real systems write.

5.1 Cache-aside vs write-through: pick per data type, not per team preference

Cache-aside (lazy loading) is often the default because it keeps the write path simple: you write to the DB, then either invalidate or let TTL expire. Write-through can be attractive when you want the cache to always reflect DB writes immediately, but it changes the operational shape of your system and can trade latency for consistency. (docs.aws.amazon.com) The trick is to use different strategies for different entities: a “homepage guest payload” can tolerate staleness, but a “user subscription status” probably can’t.

5.2 Invalidation that doesn’t ruin your weekend

For Redis, a practical approach is versioned keys for schema changes and targeted deletes for updates. For local cache, keep TTLs small so you don’t need complex cross-node invalidation unless you truly require it. For CDN, either rely on TTL + revalidation or purge by key/tag depending on your CDN feature set; don’t build a bespoke “purge everything” button unless you enjoy chaos.

5.3 Observability: if you can’t measure hit ratios, you’re just guessing

You want to track local hit rate, Redis hit rate, origin compute time, and 304 ratio at the edge. If your CDN hit rate is low, it’s usually headers or personalization. If Redis hit rate is low, it’s key design, TTLs, or invalidation. If local hit rate is high but memory is spiking, you’re caching too broadly. This is where caching stops being “performance” and becomes “operational engineering.”

https://docs.aws.amazon.com/whitepapers/latest/database-caching-strategies-using-redis/images/image1.png
https://developers.cloudflare.com/cache/concepts/revalidation/
https://www.rfc-editor.org/rfc/rfc9111.html

7. A quick “rules of thumb” conclusion without turning into a textbook

A 3-tier caching stack works when each tier has a clear job: CDN caches public bytes with revalidation, Redis caches shared objects with sane TTLs and jitter, local cache shaves off micro-latency for hot keys without pretending to be truth. The moment you let any cache become “mysterious,” your system becomes a magic show—users see something, you don’t know why, and the rabbit occasionally catches fire.

If you want, comment below with your endpoint types (public pages, logged-in APIs, video metadata, search results) and your current traffic shape (requests/sec + DB type), and I’ll suggest a concrete caching policy matrix (TTL, keys, invalidation) tailored to your use case.

Read more at : How to Build a 3-Tier Caching Stack with CDN + Redis + Local Cache (Without Lying to Users)

More from this blog

T

tuanh.net

540 posts

Are you ready to elevate your Java, OOP, Spring, and DevOps skills? Look no further!