Methods to Prevent Duplicate Cron Execution Across Multiple Spring Boot Instances
Practical strategies, code patterns, and operational trade-offs to ensure scheduled jobs run exactly once in clustered Spring Boot deployments — with Redis, database locks, ShedLock, Quartz clustering, and Kubernetes leader patterns explained in ...

I am Tuanh.net. As of 2024, I have accumulated 8 years of experience in backend programming. I am delighted to connect and share my knowledge with everyone.
1. Why duplicates happen and the properties you must choose
1.1 Key properties of coordination solutions
- Strong vs eventual guarantees: Do you need strict single-execution (exactly-once), or is at-most-once acceptable?
- Failover behavior: If the leader dies mid-job, does another instance start immediately, wait, or skip?
- Lock lifetime and renewal: Will long-running jobs need lock renewal/heartbeats?
- Performance & latency: Does lock acquisition add latency? How many jobs and frequency?
- Operational complexity: Is an external system (Redis, DB, Quartz server) acceptable to manage?
- Idempotency: Can your task be safely retried or executed more than once?
1.2 Two major patterns
- Leader election / singleton execution — elect one instance to run scheduled jobs (e.g., Kubernetes leader election, ZooKeeper, database lock).
- Distributed locking per job — every scheduled trigger tries to acquire a short-lived lock before running (Redis, DB advisory locks, etc.).
2. Using Redis distributed locks (fast, common)
2.1 Example — Redisson lock around @Scheduled
package com.example.scheduled;import org.redisson.api.RLock;import org.redisson.api.RedissonClient;import org.springframework.scheduling.annotation.Scheduled;import org.springframework.stereotype.Component;import java.util.concurrent.TimeUnit;@Componentpublic class RedisScheduledJob { private final RedissonClient redissonClient; public RedisScheduledJob(RedissonClient redissonClient) { this.redissonClient = redissonClient; } @Scheduled(cron = "0 /5 *") // every 5 minutes public void run() { RLock lock = redissonClient.getLock("my:job:redislock"); boolean acquired = false; try { // Try to acquire for up to 1 second, set lease time 10 minutes acquired = lock.tryLock(1, 10, TimeUnit.MINUTES); if (!acquired) { // another instance is running the job return; } // do the work performJob(); } catch (InterruptedException e) { Thread.currentThread().interrupt(); } finally { if (acquired && lock.isHeldByCurrentThread()) { lock.unlock(); } } } private void performJob() { // business logic }}
- tryLock(timeout, leaseTime): The first argument bounds how long we attempt to acquire the lock (non-blocking behaviour across threads), the second sets a TTL (lease time) so stale locks auto-expire if the holder dies.
- Lease time selection: Choose > expected job max duration. If job exceeds lease, lock might auto-expire and another instance could acquire it concurrently. Implement lock renewal or set a safe margin.
- Renewal: Redisson offers automatic lock renewal (watchdog) when configured. If you disable it, you must ensure lease >= worst-case runtime.
- Network partitions: If Redis becomes partitioned, clients might observe stale state; using Redis Sentinel or a cluster helps. Beware split-brain — always rely on a robust Redis topology.
- Performance: Lock acquisition is a single Redis roundtrip usually; scales well for many jobs and instances.
2.2 Edge cases and trade-offs
- Clock skew is less relevant for Redis locks because TTL is server-side, but client-side timeouts still matter.
- Non-reentrancy: Choose reentrant locks only if same thread might reacquire; otherwise accidental reacquire on same job could mask design bugs.
- Lock expiry during long operations: To avoid mid-run double-execution, either auto-renew or avoid TTL expiry by computing safe margin; autop-renewals add background heartbeats and complexity.
- Redis durability: If Redis goes down and restarts without persistence, locks vanish and duplicates may occur until clients reconcile. Prefer persistent Redis or high-availability setup.
3. Database advisory locks (Postgres example)
3.1 Example — pg_try_advisory_lock via JdbcTemplate
package com.example.scheduled;import org.springframework.jdbc.core.JdbcTemplate;import org.springframework.scheduling.annotation.Scheduled;import org.springframework.stereotype.Component;@Componentpublic class DbLockScheduledJob { private final JdbcTemplate jdbc; public DbLockScheduledJob(JdbcTemplate jdbc) { this.jdbc = jdbc; } @Scheduled(cron = "0 0/10 ") // every 10 minutes public void run() { Boolean acquired = jdbc.queryForObject( "SELECT pg_try_advisory_lock(?::bigint)", Boolean.class, 42L); if (Boolean.FALSE.equals(acquired)) { return; // someone else holds lock } try { performJob(); } finally { jdbc.update("SELECT pg_advisory_unlock(?::bigint)", 42L); } } private void performJob() { // business logic }}
- pg_try_advisory_lock is a session-level lock: if the session (DB connection) closes, PostgreSQL releases the lock automatically.
- Lock identifier (42L here) should be unique per job; you can hash job names to numeric values.
- Transactional nuances: Advisory locks are independent of transactions — you control lifecycle by releasing or closing the connection. Using a connection pool (HikariCP) means ensure the connection isn't returned while still holding a lock.
- Connection management: Prefer acquiring a dedicated connection or ensure locking and unlocking occur on the same connection. JdbcTemplate normally handles connections per operation; for session-level locks you may need manual Connection handling via DataSource#getConnection.
3.2 Trade-offs
- Pros: No extra infra; very robust; lock persisted as long as DB is up; safe against JVM crashes when connection is closed.
- Cons: DB becomes a bottleneck if many jobs and frequent locking; advisory locks are limited in number only by available integer identifiers; using connection-per-lock can tie up DB connections for long-running jobs.
- Scaling: For many scheduled jobs, prefer a pooling or using a lock table (a dedicated table with transactional row-based locking) to avoid connection exhaustion.
4. Leveraging ShedLock (library tailored for scheduled jobs)
4.1 Example — ShedLock with JDBC
@Componentpublic class ShedLockJob { @Scheduled(cron = "0 0/15 ") @SchedulerLock(name = "ShedLockJob_run", lockAtMostFor = "10m", lockAtLeastFor = "1m") public void run() { // business logic }}
- @SchedulerLock instructs ShedLock to store a lock record. lockAtMostFor is TTL to avoid deadlocks; lockAtLeastFor ensures minimal run time.
- ShedLock supports different stores. The JDBC store uses a table that holds lock owner and timestamps; acquisition is done via a transactional upsert/compare-and-set.
- ShedLock does not attempt to create a leader for all jobs; each scheduled job gets its own lock key.
4.2 Operational considerations
- Visibility: The lock table also acts as an audit trail for who ran what and when.
- Failover: If a node dies, lockAtMostFor ensures another node can take over after TTL.
- Long-running tasks: For jobs longer than TTL, prefer auto-renewal or larger lockAtMostFor. ShedLock can be extended with heartbeats, but then you’re implementing a renewal mechanism yourself.
5. Quartz clustering (heavyweight scheduler)
5.1 Pros and cons
- Pros: Rich scheduling, misfire handling, durable jobs, and clustering out-of-the-box.
- Cons: Heavy to configure and maintain; introduces a separate API and lifecycle compared to Spring's @Scheduled; DB contention for job store under heavy load.
6. Kubernetes leader election (platform-native)
6.1 Behavior and trade-offs
- Good for cloud-native setups where you want an instance-level singleton.
- Not ideal for multi-cluster or non-Kubernetes environments.
- Failover: leader lease durations and renew deadlines must be tuned to avoid frequent flips during transient node pressure.
- Operational: Requires RBAC permissions to update configmaps or endpoints (depending on the lock mechanism used).
7. Idempotency and defense in depth
- Use unique job-run IDs and store last processed sequence IDs in a durable store.
- Make operations idempotent at DB level (INSERT ... ON CONFLICT DO NOTHING) or via deduplication keys in downstream systems.
- Implement semantically idempotent steps: check-before-write, compare-and-swap.
7.1 Example — idempotent write
// Pseudo-JDBC idempotent writeint rows = jdbc.update( "INSERT INTO job_results(job_run_id, processed_at) VALUES (?, now()) ON CONFLICT (job_run_id) DO NOTHING", jobRunId);if (rows == 0) { // a duplicate run was attempted; safely skip side-effectful operations return;}// safe to perform the rest
8. Performance considerations and scaling
- Redis locks: Minimal latency per lock (single roundtrip). Good for many short schedules. Watch out for lock metadata growth and eviction.
- DB advisory locks: Lightweight but can consume DB connections if locks are session-bound for long-running jobs. For many concurrent different jobs, scale DB or use a lock table with short transactions instead.
- ShedLock: One DB write per scheduler invocation; acceptable for low-frequency jobs but consider batching/optimizations at high frequency.
- Quartz: Designed for many complex jobs; DB contention can increase with triggers frequency.
8.1 Throughput examples
- Prefer Redis or dedicated distributed lock systems with high throughput.
- Avoid per-job heavyweight DB transactions holding connections during job runtime.
- Consider externalizing scheduling to a dedicated scheduler service that distributes jobs onto workers (pub/sub pattern).
9. Failure modes, monitoring, and testing
- Stuck locks: Have metrics for lock acquisition failures and age of locks. Implement alerting for lock TTL expiration anomalies.
- Job duration anomalies: Track actual run times and compare to lock TTLs to detect misconfiguration.
- Chaos testing: Simulate instance crashes and network partitions to validate that duplicate executions do not cause unacceptable side effects.
- Observability: Log lock acquisition attempts, successes, failures, and lock owner identity. Correlate with job execution logs.
9.1 Testing approaches
- Bring up two instances in a test harness (Docker Compose) using the shared lock store and assert only one performs work.
- Inject latency and simulate store failures to validate fallback behavior.
10. Practical decision guide
- Small app, few jobs, existing DB: Use DB advisory locks or ShedLock with JDBC. Keep designs simple and monitor DB connections.
- Many jobs, high frequency: Use Redis-based locks (Redisson) or an external scheduler system that hands out work to workers via queues.
- Complex scheduling requirements: Use Quartz clustering.
- Kubernetes-native: Consider leader election if you want platform-managed singleton behavior.
- Always: Build idempotency and observability as second-line defenses.
10.1 Minimal checklist before deploying
- Lock TTLs are >= expected max runtime (or renewal exists).
- Lock store HA and persistence are configured (Redis cluster, DB replicas).
- Connection/handle management avoids leakage (especially with DB session locks).
- Monitoring and alerts for lock acquisition failures and job duration anomalies are in place.
11. Final recommendations
Read more at : Methods to Prevent Duplicate Cron Execution Across Multiple Spring Boot Instances





