Skip to main content

Fast and slow burn

Reading time: 0 minute(s) (0 words)

The burn rate is a measure that calculates how fast you use up your error budget. When your burn rate goes above 1, your service won't meet its SLO in the future if the current error rate continues.

However, the value of the burn rate can drastically change multiple times over a specified period, and it can become tricky to establish if you're going to burn your budget. That's why we often think of the overall exhaustion characteristics and try to distinguish between fast and slow burn. The following guide provides an overview of those use cases.

Nobl9 allows you to configure alert policies that are based on the burn rate characteristics for the Entire / Remaining Error budget would be exhausted and The average error budget burn rate conditions.

The following guide provides an overview of slow and fast burn rate conditions.

Fast burn​

Fast burn alerting conditions can detect short but significant spikes in burn rate over a brief timeframe (usually 30m or less). Use this condition to react quickly to momentary outages or issues with your services that require immediate attention.

Here's an example of a fast burn configuration for the Average Burn Rate is condition:

apiVersion: n9/v1alpha
kind: AlertPolicy
metadata:
name: fast-burn
project: default
spec:
alertMethods: []
conditions:
- lastsFor: 5m
measurement: averageBurnRate
value: 5
op: gte
coolDown: 5m
description: "Fast Burn Policy that triggers when the average burn rate is greater than or equal to 5x for at least 5 minutes"
severity: Medium

Fast Burn chart​

The first line in the chart below represents the budget that has several burn rate spikes (one medium, one small, one big). A fast burn policy can be used here to alert on some of those spikes. In the example below, the alert threshold is set to ~5, enough for the big spike to trigger an alert, but not on the medium and small spike.

replay source config
Image 1: Fast burn chart

Slow burn​

Slow burn conditions can be used to detect a gradual budget burn that occurs over a prolonged timeframe, usually exceeding 30 minutes.

This approach is handy in detecting problems that do not require immediate attention but must be addressed in due time.

tip

As a rule, the threshold for slow burn should be smaller than that for fast burn conditions.

Here's an example of a slow burn configuration for the Average Burn Rate is condition:

apiVersion: n9/v1alpha
kind: AlertPolicy
metadata:
name: slow-burn
project: default
spec:
alertMethods: []
conditions:
- lastsFor: 30m
measurement: averageBurnRate
value: 2
op: gte
coolDown: 5m
description: "Slow Burn Policy that triggers when the average burn rate over last 30m is greater than or equal to 2x"
severity: Medium

Slow Burn chart​

The first line in the chart below represents the budget that has several burn rate spikes (one medium, one small, one big). A slow burn policy can be used here to alert when the burn over a more extended period (including all those spikes) is significant enough for Nobl9 to trigger an alert.

replay source config
Image 2: Slow burn chart
tip

The main difference between slow and fast burn is that a single event can trigger a fast burn, whereas a slow burn typically requires a higher threshold to be reached before it kicks in. We recommend aiming for a higher lastsFor threshold with a slow burn condition to prevent it from being triggered too easily.