Part 2: Cloud-Native Circuit Breakers: How It Works Under the Hood

4 minute read

In Part 1, I shared the motivation behind creating a distributed circuit breaker for cloud-native systems — something memoryless, stateless, and resilient enough to survive autoscaling chaos.

Now let’s look under the hood.

🧠 Design Goals Recap

I needed something that works seamlessly in distributed environments like AWS Lambda, ECS, or Kubernetes — where in-memory state is basically meaningless.

So the solution had to be:

🔁 Shared — one circuit state for all instances
☁️ Stateless-compatible — doesn’t care where it runs
⚙️ Configurable — env vars or YAML, your call
🧩 Pluggable — default is DynamoDB, others possible later

💾 Why DynamoDB?

DynamoDB is serverless, fast, and battle-tested at scale. But most importantly:

It supports TTL (Time To Live) out-of-the-box
You can do atomic updates and conditionals per item
It’s a good fit for key-value state tracking, with minimal latency

We use one record per circuit breaker function, keyed by name (e.g., getUserData), which holds:

Key (PK)	Field	Description
`getUserData`	`failureCount`	Number of failures observed
	`lastFailureTime`	Timestamp of last failure
	`state`	`CLOSED`, `OPEN`, or `HALF_OPEN`
	`ttl`	Auto-expiry to reset breaker state

DynamoDB’s TTL feature automatically cleans up old breaker states if left idle — meaning you never have to worry about stale entries.

🔄 Failure Tracking Logic

Here’s what happens step-by-step when a method fails:

The method is annotated with @CloudCircuitBreaker.
On exception, the handler increments the failureCount in DynamoDB atomically.
If the count exceeds threshold (say, 5), the breaker state flips to OPEN.
While OPEN, any call to that method will short-circuit — it skips execution and calls the fallback.
After a timeout (e.g., 60 seconds), it transitions to HALF_OPEN — allows one test request.
If the test passes → reset breaker (CLOSED).
If it fails → back to OPEN.

🧬 Configuration

You can configure each circuit via annotations (per method).

@CloudCircuitBreaker(
    function = "getUserData",
    fallback = "fallbackFunction"
)
public Response getUserData(String userId) {
    return userService.fetchFromUpstream(userId);
}

public Response fallbackFunction(String userId) {
    return userService.fetchFromCache(userId);
}

Configure via YAML:

cloudcb:
    failureThreshold: 5
    timeoutSeconds: 60

This lets you adapt quickly across environments — say you want a longer timeout in staging but more aggressive breakage in prod.

🗺️ Architecture Diagram

Here’s how it all fits together at runtime:

                ┌────────────────────────┐
                │  Java Method Call      │
                └─────────┬──────────────┘
                          │
                  Checks Annotation
                          │
                          ▼
            ┌────────────────────────────┐
            │ Circuit State Lookup (DDB)│
            └─────────┬──────────────────┘
                      │
        ┌─────────────▼──────────────┐
        │  Is State OPEN or TTL valid? │─────▶ Yes → Skip execution → Fallback
        └─────────────┬──────────────┘
                      │ No
                      ▼
        ┌────────────────────────────┐
        │   Execute Original Method  │
        └─────────┬──────────────────┘
                  │
         ┌────────▼────────┐
         │ Success?        │
         └──────┬──────────┘
                │
      ┌─────────▼─────────────┐
      │  Reset failure count  │
      └───────────────────────┘

This happens transparently via a runtime proxy that wraps all annotated methods — no boilerplate for developers.

🧪 Safe for Stateless Systems

What makes this durable:

TTL-managed keys mean no cleanup jobs
Atomic counters prevent race conditions
Centralized breaker state shared across all containers/functions

It doesn’t matter if one instance dies, restarts, or scales out — the breaker state lives in DynamoDB and survives independently of your infrastructure.

🧰 Pluggable Design (Coming Soon)

While DynamoDB is default, the design allows swapping in Redis, S3, or even custom backends.

The circuit state store is defined behind an interface like:

/**
 * Interface for persisting and retrieving the state of a circuit breaker.
 * <p>
 * Implementations of this interface are responsible for providing a durable store
 * (e.g., DynamoDB, Redis, in-memory) for circuit breaker states identified by a unique key.
 * </p>
 * <p>
 * This allows for distributed or clustered systems to share circuit breaker status across instances.
 * </p>
 *
 * @author Clinton Fernandes
 */
public interface CircuitBreakerStore {

    /**
     * Retrieves the current state of the circuit breaker for the given key.
     *
     * @param key A unique identifier representing a specific circuit breaker (e.g., service.method).
     * @return The current {@link CircuitBreakerState}, or {@code null} if no state exists.
     */
    CircuitBreakerState getState(String key);

    /**
     * Persists the circuit breaker state for the given key.
     *
     * @param key   A unique identifier representing a specific circuit breaker.
     * @param state The {@link CircuitBreakerState} to persist.
     */
    void saveState(String key, CircuitBreakerState state);

    /**
     * Resets (removes or reinitializes) the circuit breaker state for the given key.
     * This typically moves the circuit breaker back to the initial (closed) state.
     *
     * @param key A unique identifier representing a specific circuit breaker.
     */
    void reset(String key);
}

Swapping implementations is just a Spring bean or Lambda module away.

🔗 Try It Out

Ready to use it? Clone or check it out here:
👉 clinton1719/cloud-circuitbreaker

👋 Up Next: Spring Boot Autoconfig & Lambda Support

In Part 3, I’ll walk through:

Autowiring the annotation into Spring Boot
Supporting AWS Lambda and native Java functions
Packaging best practices for Maven-distributed libraries

Follow along or subscribe to the blog feed. Feedback, ideas, PRs — always welcome.

Stay resilient 💪

Share on

X Facebook LinkedIn Bluesky

Clinton Fernandes

Part 2: Cloud-Native Circuit Breakers: How It Works Under the Hood

🧠 Design Goals Recap

💾 Why DynamoDB?

🔄 Failure Tracking Logic

🧬 Configuration

🗺️ Architecture Diagram

🧪 Safe for Stateless Systems

🧰 Pluggable Design (Coming Soon)

🔗 Try It Out

👋 Up Next: Spring Boot Autoconfig & Lambda Support

Share on

You May Also Enjoy

How to resolve the dreadful ‘Need to perform AWS calls for account xxx, but no credentials have been configured’ error

Part 4: Building an engine and wrapper to support non Spring Boot apps

Part 3: Spring Boot & Lambda Integration with Cloud Circuit Breaker

Why Java’s In-Memory Circuit Breakers Fail in Distributed Cloud Systems