Protect services with a circuit breaker at your API gateway

You can now better protect your upstream services from attacks, abuse attempts, and cascading error conditions with our new circuit-breaker Traffic Policy action. A circuit breaker helps improve the reliability of your systems, beyond the DDoS protection included with every ngrok account, by rejecting requests when your services respond with 500-level error codes, then re-evaluating the health of your upstream service before resuming normal traffic flows.

You can now add and customize a circuit breaker in front of your APIs and apps with a single Traffic Policy rule:

---
on_http_request:
 - actions:
     - type: circuit-breaker
       config:
         error_threshold: 0.25


If your upstream service starts responding with 25% error codes, ngrok steps in to pause traffic and potentially prevent much bigger problems.

What is circuit breaking?

Unsurprising to anyone who’s linked too many strings of Christmas lights on the same outlet, the theory behind circuit breaking in software comes directly from the switches behind your breaker box.

When either systems detect an unhealthy or sustainable state—due to overcurrent or error response codes—they pause flow to prevent even worse conditions and allow components to return to their normal state. From a software engineering and resilience perspective, that’s incredibly useful in a few ways:

  • ngrok’s network blocks malicious attacks before they have enough of an impact on your upstream services to degrade the experience for others.
  • Your systems save on CPU cycles trying to complete tasks that will inevitably fail.
  • You prevent cascading failures across services, particularly in a microservices environment, by stopping requests before they even ingress into your system.
  • Users receive informative error messages instead of blank screens or hanging curl requests, which is a better user experience.

Where does circuit breaking happen—the service or the API gateway?

Does every device you plug into your outlets have a separate circuit breaker? Or do you have one place—a breaker box—to funnel current to many circuits from one convenient place?

Software circuit breakers work best the same way, too, but unlike your laptops and refrigerators, you do get some choice in the matter.

With a service-level circuit breaker, you get fine-grained control and all the custom failure logic you can imagine. No matter the state of the rest of your infrastructure, you know this in-service circuit-breaking function will respond as expected. That control comes with some downsides—notably, it’s a lot of work to build this custom code, maintain state against the rest of your system, and observe what’s going on inside the system to make sure it's running the way you expect.

When you implement circuit breaking at your API gateway, you get consistent failure handling across your API services without having to build the same function over and over again. With a configuration language like Traffic Policy, you can pair circuit breaking with rate limiting, global load balancing, and DDoS protection to quickly implement multiple layers of protection for your services and systems.

Either way, you must first properly instrument your API service with proper error handling to send relevant success and error messages.

Does every API need a circuit breaker?

You get the most value from a circuit breaker when you have:

  • API services with critical external dependencies.
  • High-throughput systems where failure in one place can cascade into others.
  • Systems with limited resources, like IoT devices or network appliances, which need extra protection from entering out-of-memory states.
  • Microservice architectures with many possible failure points.

With proper observability data, you can also figure out how badly you need a circuit breaker by looking for increased latency during peak loads or whether your systems are exhausted of resources during partial outages. Obviously, cascading failures across services are a surefire sign that stopping the flow of traffic, and allowing your services time to recover, will help you from an operational perspective.

That said, you benefit every time you add circuit breaking to your API services.

What holds most folks back from implementing circuit breakers everywhere isn’t misplaced confidence that their services are “too small to fail” or have some built-in resilience on an architectural level. The big problem is that they’re difficult to implement in many situations.

That’s often because you get stuck trying to implement circuit breaking at the service level and get stuck trying to custom-wire these complex patterns. In other cases, you realize that you need to upgrade to a more expensive edition or install third-party plugins to enable the feature with your current API gateway or reverse proxy. It's unfortunate, because the use case for and benefit of circuit breakers is pretty definitive.

But, instead of pondering over observability data and trying to decide when it’s the “right” time to finally implement circuit breaking for your API services, ngrok makes it so easy you have no excuse not to.

Get started with ngrok’s circuit breaker

Let’s walk through a quick example, using a combination of cloud and internal endpoints, to help you test circuit breaking at the API gateway level.

We’ll assume you have an internal endpoint running at https://api.internal that points to an upstream API service.

In the ngrok dashboard, reserve a domain and head over to the Endpoints section of the ngrok dashboard and to create a new cloud endpoint. Leave the binding as Public and enter your reserved domain as the URL.

You’ll see an editable IDE with some example YAML, which you should replace with the following:

on_http_request:
 - actions:
     - type: circuit-breaker
       config:
         error_threshold: 0.5
     - type: forward-internal
       config:
         url: https://api.internal


Click Save to apply this policy, which first checks whether to trip the circuit breaker on every HTTP request. If the circuit breaker remains closed, in that traffic is not paused, then ngrok forwards requests to your https://api.internal endpoint. Your endpoint should look like:

With that configuration, an error rate of 50% within 10 seconds will trip the circuit breaker for an additional 10 seconds before resuming traffic flow to normal.

What can you do with ngrok’s circuit breaker?

As with all of ngrok’s Traffic Policy actions, you have plenty of opportunities to flexibly control exactly how circuit breaking works and at which points in your routing topology.

Migrate circuit breaker modules to Traffic Policy actions

We strongly recommend migrating any existing uses of the circuit breaker module to our new Traffic Policy action. Let’s say you have the following module configured on an edge.

You can migrate to the newer, more flexible, and far more powerful Traffic Policy action by disabling the module on your edge and applying the following rule via the agent CLI, agent config, SDKs, or Kubernetes Operator. The number of buckets is fixed at 10, so you don’t need to migrate that specific rule.

---
on_http_request:
 - actions:
     - type: circuit-breaker
       config:
         error_threshold: 0.25
         tripped_duration: 60
         window_duration: 30
         volume_threshold: 200

Tweak thresholds based on observability data

When you use ngrok as your API gateway, you can also use Traffic Inspector to view every request and response sent to your upstream services. That includes seeing the entire lifecycle of your circuit breaker action, including which requests triggered error codes from your upstream service, which requests were blocked by the tripped breaker (with the associated ngrok error 3202), and when they started to flow normally again.

This built-in observability helps you quickly configure the exact conditions to open your circuit breaker and debug common errors by replaying specific requests against a development environment.

Chain rate limiting and circuit breaking

Aside from the built-in DDoS protection ngrok offers for every endpoint, you can also protect your services with rate limiting as the first line of defense, then apply a circuit breaker “in case of emergency.”

For example, the following policy will first rate-limit specific clients by their IP address to 50 requests/minute. If ngrok receives more than 1000 requests, which haven’t been denied by rate limiting, in a 10-second window, and your API service responds with error codes 15% of the time, then it trips the circuit breaker.

---
on_http_request:
 - actions:
     - type: rate-limit
       config:
         name: Only allow 50 requests per minute
         algorithm: sliding_window
         capacity: 50
         rate: 60s
         bucket_key:
           - req.headers['host']
     - type: circuit-breaker
       config:
         error_threshold: 0.15
         volume_threshold: 1000


You aren’t limited to a single bucket of rate limiting either—check out our templates gallery for more examples.

Compose global and service-specific circuit breaker configurations

One of the most powerful functions of Traffic Policy is that it’s inherently composable. That means you can have one circuit breaker policy attached to a cloud endpoint for all your services, then also configure different thresholds for specific upstream APIs that are more “fragile.”

For example, on your cloud endpoint, you might implement a broad generic policy that prevents any upstream service from failing to respond to more than 20% of requests, then forward to internal endpoints based on the path of the request (e.g. https://api.example.com/foo -> https://foo.internal and https://api.example.com/bar -> https://bar.internal).

---
on_http_request:
 - actions:
     - type: circuit-breaker
       config:
         error_threshold: 0.20
 - actions:
     - type: forward-internal
       config:
         url: https://${req.url.path}.internal

You can then attach a separate policy, with fine-tuned thresholds, to foo.internal:

---
on_http_request:
 - actions:
     - type: circuit-breaker
       config:
         error_threshold: 0.05
         tripped_duration: 2m
         window_duration: 1m
         volume_threshold: 25000

The global circuit breaker then triggers on more widespread error states, while the specific breaker for https://foo.internal steps in only during specific situations you want to prepare for in advance.

A tripped circuit breaker is better than a broken system

The circuit breaker action is available with every ngrok account—just sign up for free to get started. From there, we highly recommend our developer documentation on traffic management, from circuit breakers to mTLS and beyond:

Have questions or want to request a new feature in Traffic Policy? We’d love to hear from you. Hit us up at support@ngrok.com or create an issue or discussion on the ngrok community repo. if you prefer a more in-person experience, be sure to check out Office Hours, our monthly livestream for answering your questions with demos straight from our DevRel and Product folks.

Share this post
Joel Hans
Joel Hans is a Senior Developer Educator. Away from blog posts and demo apps, you might find him mountain biking, writing fiction, or digging holes in his yard.
API gateway
Traffic Inspector
Traffic Policy
Features
Gateways
Production