Methods to Prevent Duplicate Cron Execution Across Multiple Spring Boot Instances

Source: Methods to Prevent Duplicate Cron Execution Across Multiple Spring Boot Instances

Modern cloud deployments scale application pods horizontally. A cron defined with Spring's @Scheduled suddenly runs N times — once per pod — unless you design for single execution. This article walks through battle-tested approaches that prevent duplicate cron execution across multiple Spring Boot instances, and explains performance, failure modes, and operational trade-offs you must know before choosing a solution.

1. Why duplicates happen and the properties you must choose

When you annotate a method with @Scheduled in Spring Boot, the scheduling occurs within the local JVM. In a single instance that’s fine; in a multi-instance (pod cluster, multiple VMs) environment, every instance maintains its own scheduler and will invoke the task independently. Solving this requires coordination across instances — a single source of truth for "which instance runs the job now."

1.1 Key properties of coordination solutions

When evaluating a strategy, ask:

Strong vs eventual guarantees: Do you need strict single-execution (exactly-once), or is at-most-once acceptable?
Failover behavior: If the leader dies mid-job, does another instance start immediately, wait, or skip?
Lock lifetime and renewal: Will long-running jobs need lock renewal/heartbeats?
Performance & latency: Does lock acquisition add latency? How many jobs and frequency?
Operational complexity: Is an external system (Redis, DB, Quartz server) acceptable to manage?
Idempotency: Can your task be safely retried or executed more than once?

1.2 Two major patterns

Broadly you can choose:

Leader election / singleton execution — elect one instance to run scheduled jobs (e.g., Kubernetes leader election, ZooKeeper, database lock).
Distributed locking per job — every scheduled trigger tries to acquire a short-lived lock before running (Redis, DB advisory locks, etc.).

Each pattern has trade-offs discussed below.

2. Using Redis distributed locks (fast, common)

Redis is a common choice when low latency and simple operations matter. Two common ways: using Redisson or implementing a SET NX PX pattern (single Redis master assumed). Prefer libraries (Redisson, lettuce-based recipes) that handle lock renewal and edge cases.

2.1 Example — Redisson lock around @Scheduled

package com.example.scheduled;import org.redisson.api.RLock;import org.redisson.api.RedissonClient;import org.springframework.scheduling.annotation.Scheduled;import org.springframework.stereotype.Component;import java.util.concurrent.TimeUnit;@Componentpublic class RedisScheduledJob {    private final RedissonClient redissonClient;    public RedisScheduledJob(RedissonClient redissonClient) {        this.redissonClient = redissonClient;    }    @Scheduled(cron = "0 /5    *") // every 5 minutes    public void run() {        RLock lock = redissonClient.getLock("my:job:redislock");        boolean acquired = false;        try {            // Try to acquire for up to 1 second, set lease time 10 minutes            acquired = lock.tryLock(1, 10, TimeUnit.MINUTES);            if (!acquired) {                // another instance is running the job                return;            }            // do the work            performJob();        } catch (InterruptedException e) {            Thread.currentThread().interrupt();        } finally {            if (acquired && lock.isHeldByCurrentThread()) {                lock.unlock();            }        }    }    private void performJob() {        // business logic    }}

Explanation:

tryLock(timeout, leaseTime): The first argument bounds how long we attempt to acquire the lock (non-blocking behaviour across threads), the second sets a TTL (lease time) so stale locks auto-expire if the holder dies.
Lease time selection: Choose > expected job max duration. If job exceeds lease, lock might auto-expire and another instance could acquire it concurrently. Implement lock renewal or set a safe margin.
Renewal: Redisson offers automatic lock renewal (watchdog) when configured. If you disable it, you must ensure lease >= worst-case runtime.
Network partitions: If Redis becomes partitioned, clients might observe stale state; using Redis Sentinel or a cluster helps. Beware split-brain — always rely on a robust Redis topology.
Performance: Lock acquisition is a single Redis roundtrip usually; scales well for many jobs and instances.

2.2 Edge cases and trade-offs

Clock skew is less relevant for Redis locks because TTL is server-side, but client-side timeouts still matter.
Non-reentrancy: Choose reentrant locks only if same thread might reacquire; otherwise accidental reacquire on same job could mask design bugs.
Lock expiry during long operations: To avoid mid-run double-execution, either auto-renew or avoid TTL expiry by computing safe margin; autop-renewals add background heartbeats and complexity.
Redis durability: If Redis goes down and restarts without persistence, locks vanish and duplicates may occur until clients reconcile. Prefer persistent Redis or high-availability setup.

3. Database advisory locks (Postgres example)

Using the application database avoids introducing a new dependency; PostgreSQL supports advisory locks, which are lightweight and transactional in behavior. This method is attractive when you already rely on the DB and want strong single-source-of-truth semantics.

3.1 Example — pg_try_advisory_lock via JdbcTemplate

package com.example.scheduled;import org.springframework.jdbc.core.JdbcTemplate;import org.springframework.scheduling.annotation.Scheduled;import org.springframework.stereotype.Component;@Componentpublic class DbLockScheduledJob {    private final JdbcTemplate jdbc;    public DbLockScheduledJob(JdbcTemplate jdbc) {        this.jdbc = jdbc;    }    @Scheduled(cron = "0 0/10    ") // every 10 minutes    public void run() {        Boolean acquired = jdbc.queryForObject(            "SELECT pg_try_advisory_lock(?::bigint)", Boolean.class, 42L);        if (Boolean.FALSE.equals(acquired)) {            return; // someone else holds lock        }        try {            performJob();        } finally {            jdbc.update("SELECT pg_advisory_unlock(?::bigint)", 42L);        }    }    private void performJob() {        // business logic    }}

Explanation:

pg_try_advisory_lock is a session-level lock: if the session (DB connection) closes, PostgreSQL releases the lock automatically.
Lock identifier (42L here) should be unique per job; you can hash job names to numeric values.
Transactional nuances: Advisory locks are independent of transactions — you control lifecycle by releasing or closing the connection. Using a connection pool (HikariCP) means ensure the connection isn't returned while still holding a lock.
Connection management: Prefer acquiring a dedicated connection or ensure locking and unlocking occur on the same connection. JdbcTemplate normally handles connections per operation; for session-level locks you may need manual Connection handling via DataSource#getConnection.

3.2 Trade-offs

Pros: No extra infra; very robust; lock persisted as long as DB is up; safe against JVM crashes when connection is closed.
Cons: DB becomes a bottleneck if many jobs and frequent locking; advisory locks are limited in number only by available integer identifiers; using connection-per-lock can tie up DB connections for long-running jobs.
Scaling: For many scheduled jobs, prefer a pooling or using a lock table (a dedicated table with transactional row-based locking) to avoid connection exhaustion.

4. Leveraging ShedLock (library tailored for scheduled jobs)

ShedLock is a popular library that coordinates schedule execution via a shared store (DB, Mongo, Redis, etc.) and integrates with Spring @Scheduled with minimal code. It focuses on simplicity and supports lock expiration.

4.1 Example — ShedLock with JDBC

@Componentpublic class ShedLockJob {    @Scheduled(cron = "0 0/15    ")    @SchedulerLock(name = "ShedLockJob_run", lockAtMostFor = "10m", lockAtLeastFor = "1m")    public void run() {        // business logic    }}

Explanation:

@SchedulerLock instructs ShedLock to store a lock record. lockAtMostFor is TTL to avoid deadlocks; lockAtLeastFor ensures minimal run time.
ShedLock supports different stores. The JDBC store uses a table that holds lock owner and timestamps; acquisition is done via a transactional upsert/compare-and-set.
ShedLock does not attempt to create a leader for all jobs; each scheduled job gets its own lock key.

4.2 Operational considerations

Visibility: The lock table also acts as an audit trail for who ran what and when.
Failover: If a node dies, lockAtMostFor ensures another node can take over after TTL.
Long-running tasks: For jobs longer than TTL, prefer auto-renewal or larger lockAtMostFor. ShedLock can be extended with heartbeats, but then you’re implementing a renewal mechanism yourself.

5. Quartz clustering (heavyweight scheduler)

If your application needs advanced scheduling features (persistence, clustering, job stores, failure recovery), Quartz can run in clustered mode with a shared JDBC job store. In this model, Quartz coordinates triggers so a job is executed by only one scheduler node in the cluster.

5.1 Pros and cons

Pros: Rich scheduling, misfire handling, durable jobs, and clustering out-of-the-box.
Cons: Heavy to configure and maintain; introduces a separate API and lifecycle compared to Spring's @Scheduled; DB contention for job store under heavy load.

6. Kubernetes leader election (platform-native)

In Kubernetes, you can use leader election libraries (client-go leader election or Spring Cloud Kubernetes leader election) to elect one pod as the leader that runs all scheduled tasks. This integrates well with pod lifecycle and avoids adding an external lock store.

6.1 Behavior and trade-offs

Good for cloud-native setups where you want an instance-level singleton.
Not ideal for multi-cluster or non-Kubernetes environments.
Failover: leader lease durations and renew deadlines must be tuned to avoid frequent flips during transient node pressure.
Operational: Requires RBAC permissions to update configmaps or endpoints (depending on the lock mechanism used).

7. Idempotency and defense in depth

Even with locks, network partitions, bugs, or mis-configuration can cause duplicate executions. Building idempotency into your tasks is essential. Idempotency techniques:

Use unique job-run IDs and store last processed sequence IDs in a durable store.
Make operations idempotent at DB level (INSERT ... ON CONFLICT DO NOTHING) or via deduplication keys in downstream systems.
Implement semantically idempotent steps: check-before-write, compare-and-swap.

7.1 Example — idempotent write

// Pseudo-JDBC idempotent writeint rows = jdbc.update(  "INSERT INTO job_results(job_run_id, processed_at) VALUES (?, now()) ON CONFLICT (job_run_id) DO NOTHING",  jobRunId);if (rows == 0) {  // a duplicate run was attempted; safely skip side-effectful operations  return;}// safe to perform the rest

Explanation: By persisting a job_run_id unique key, side-effects are skipped if the same run repeats. This is weaker than preventing duplicates entirely but robust against rare cases.

8. Performance considerations and scaling

Performance impact depends on approach:

Redis locks: Minimal latency per lock (single roundtrip). Good for many short schedules. Watch out for lock metadata growth and eviction.
DB advisory locks: Lightweight but can consume DB connections if locks are session-bound for long-running jobs. For many concurrent different jobs, scale DB or use a lock table with short transactions instead.
ShedLock: One DB write per scheduler invocation; acceptable for low-frequency jobs but consider batching/optimizations at high frequency.
Quartz: Designed for many complex jobs; DB contention can increase with triggers frequency.

8.1 Throughput examples

If you have thousands of jobs scheduled every minute:

Prefer Redis or dedicated distributed lock systems with high throughput.
Avoid per-job heavyweight DB transactions holding connections during job runtime.
Consider externalizing scheduling to a dedicated scheduler service that distributes jobs onto workers (pub/sub pattern).

9. Failure modes, monitoring, and testing

Plan for:

Stuck locks: Have metrics for lock acquisition failures and age of locks. Implement alerting for lock TTL expiration anomalies.
Job duration anomalies: Track actual run times and compare to lock TTLs to detect misconfiguration.
Chaos testing: Simulate instance crashes and network partitions to validate that duplicate executions do not cause unacceptable side effects.
Observability: Log lock acquisition attempts, successes, failures, and lock owner identity. Correlate with job execution logs.

9.1 Testing approaches

Unit-test locking logic by mocking lock clients. For integration tests:

Bring up two instances in a test harness (Docker Compose) using the shared lock store and assert only one performs work.
Inject latency and simulate store failures to validate fallback behavior.

10. Practical decision guide

Small app, few jobs, existing DB: Use DB advisory locks or ShedLock with JDBC. Keep designs simple and monitor DB connections.
Many jobs, high frequency: Use Redis-based locks (Redisson) or an external scheduler system that hands out work to workers via queues.
Complex scheduling requirements: Use Quartz clustering.
Kubernetes-native: Consider leader election if you want platform-managed singleton behavior.
Always: Build idempotency and observability as second-line defenses.

10.1 Minimal checklist before deploying

Before rolling a strategy to production, verify:

Lock TTLs are >= expected max runtime (or renewal exists).
Lock store HA and persistence are configured (Redis cluster, DB replicas).
Connection/handle management avoids leakage (especially with DB session locks).
Monitoring and alerts for lock acquisition failures and job duration anomalies are in place.

11. Final recommendations

Most teams benefit from starting with ShedLock or a Redis lock because they integrate quickly with Spring and scale well. If you already rely on PostgreSQL and job volume is modest, advisory locks are simple and robust. For enterprise-grade scheduling with many jobs and complex triggers, Quartz is appropriate despite its complexity. Regardless of choice, assume occasional duplicate runs will happen and harden your job logic with idempotency and observability.

If you have specific constraints (cloud provider, job frequency, expected runtime, and failure SLAs), comment with those details and I can recommend a tailored pattern and configuration.

Methods to Prevent Duplicate Cron Execution Across Multiple Spring Boot Instances

1. Why duplicates happen and the properties you must choose

1.1 Key properties of coordination solutions

1.2 Two major patterns

2. Using Redis distributed locks (fast, common)

2.1 Example — Redisson lock around @Scheduled

2.2 Edge cases and trade-offs

3. Database advisory locks (Postgres example)

3.1 Example — pg_try_advisory_lock via JdbcTemplate

3.2 Trade-offs

4. Leveraging ShedLock (library tailored for scheduled jobs)

4.1 Example — ShedLock with JDBC

4.2 Operational considerations

5. Quartz clustering (heavyweight scheduler)

5.1 Pros and cons

6. Kubernetes leader election (platform-native)

6.1 Behavior and trade-offs

7. Idempotency and defense in depth

7.1 Example — idempotent write

8. Performance considerations and scaling

8.1 Throughput examples

9. Failure modes, monitoring, and testing

9.1 Testing approaches

10. Practical decision guide

10.1 Minimal checklist before deploying

11. Final recommendations

Comments

More from this blog

Reasons TTL Alone Is a Weak Cache Strategy for Frequently Updated Business Data

Techniques: How to design versioned commands so retries stay safe under concurrent modification?

Techniques to Partition Data for Growth Without Breaking Query Simplicity

Methods to Move Cross-Cutting Logic Out of Controllers Without Building a Mystery Box

Reasons Java services get slower after a few hours: How to find thread pool saturation?

Command Palette

1. Why duplicates happen and the properties you must choose

1.1 Key properties of coordination solutions

1.2 Two major patterns

2. Using Redis distributed locks (fast, common)

2.1 Example — Redisson lock around @Scheduled

2.2 Edge cases and trade-offs

3. Database advisory locks (Postgres example)

3.1 Example — pg_try_advisory_lock via JdbcTemplate

3.2 Trade-offs

4. Leveraging ShedLock (library tailored for scheduled jobs)

4.1 Example — ShedLock with JDBC

4.2 Operational considerations

5. Quartz clustering (heavyweight scheduler)

5.1 Pros and cons

6. Kubernetes leader election (platform-native)

6.1 Behavior and trade-offs

7. Idempotency and defense in depth

7.1 Example — idempotent write

8. Performance considerations and scaling

8.1 Throughput examples

9. Failure modes, monitoring, and testing

9.1 Testing approaches

10. Practical decision guide

10.1 Minimal checklist before deploying

11. Final recommendations

Comments

More from this blog