Fast and slow burn

Reading time: 0 minute(s) (0 words)

The burn rate is a measure that calculates how fast you use up your error budget. When your burn rate goes above 1, your service won't meet its SLO in the future if the current error rate continues.

However, the value of the burn rate can drastically change multiple times over a specified period, and it can become tricky to establish if you're going to burn your budget. That's why we often think of the overall exhaustion characteristics and try to distinguish between fast and slow burn. The following guide provides an overview of those use cases.

Nobl9 allows you to configure alert policies that are based on the burn rate characteristics for the Entire / Remaining Error budget would be exhausted and The average error budget burn rate conditions.

The following guide provides an overview of slow and fast burn rate conditions.

Fast burn

Fast burn alert conditions can detect short but significant spikes in burn rate over a brief timeframe (usually 30m or less). Use this condition to react quickly to momentary outages or issues with your services that require immediate attention.

Here's an example of a fast burn configuration for the Average Burn Rate is condition:

apiVersion: n9/v1alpha
kind: AlertPolicy
metadata:
  name: fast-burn
  project: default
spec:
  alertMethods: []
  conditions:
    - alertingWindow: 5m
    measurement: averageBurnRate
    value: 5
    op: gte
  coolDown: 5m
  description: "Fast Burn Policy that triggers when the average burn rate is greater than or equal to 5x for at least 5 minutes"
  severity: Medium

Fast burn chart

The first line in the chart below represents the budget that has several burn rate spikes (one medium, one small, one big). A fast burn policy can be used here to alert on some of those spikes. In the example below, the alert threshold is set to ~5, enough for the big spike to trigger an alert, but not on the medium and small spike.

replay source config — Image 1: Fast burn chart

Slow burn

Slow burn conditions can be used to detect a gradual budget burn that occurs over a prolonged timeframe, usually exceeding 30 minutes.

This approach is handy in detecting problems that do not require immediate attention but must be addressed in due time.

tip

As a rule, the threshold for slow burn should be smaller than that for fast burn conditions.

Here's an example of a slow burn configuration for the Average Burn Rate is condition:

apiVersion: n9/v1alpha
kind: AlertPolicy
metadata:
  name: slow-burn
  project: default
spec:
  alertMethods: []
  conditions:
  - alertingWindow: 30m
    measurement: averageBurnRate
    value: 2
    op: gte
  coolDown: 5m
  description: "Slow Burn Policy that triggers when the average burn rate over last 30m is greater than or equal to 2x"
  severity: Medium

Slow burn chart

The first line in the chart below represents the budget that has several burn rate spikes (one medium, one small, one big). A slow burn policy can be used here to alert when the burn over a more extended period (including all those spikes) is significant enough for Nobl9 to trigger an alert.

tip

The main difference between slow and fast burn is that a single event can trigger a fast burn, whereas a slow burn typically requires a higher threshold to be reached before it kicks in. We recommend aiming for a higher lastsFor threshold with a slow burn condition to prevent it from being triggered too easily.

Multi-window multi-burn

The combination of the two conditions above gives us multi-window, multi-burn.

note

Keep in mind that you can only define such a condition using the alerting window parameter.

You can use such a configuration when you want to be alerted when the steady burn over a long period is significant enough in your SLO, and it is currently burning the budget (so, your SLO has a momentary spike detected by the fast burn part of this preset).

An example use case for multi-multi is when you currently have an outage that requires attention, and you’ve already been burning some budget for a longer period.

Multi-window multi-burn conditions prevent alerting you when the slow burn over a long period is significant, but your SLO is recovering the budget.

- apiVersion: n9/v1alpha
  kind: AlertPolicy
  metadata:
    name: fast-burn
    project: default
  spec:
    alertMethods: []
    conditions:
    - alertingWindow: 15m
      measurement: averageBurnRate
      op: gte
      value: 5
    - alertingWindow: 6h
      measurement: averageBurnRate
      op: gte
      value: 2
    coolDown: 5m
    description: "Multiwindow, multi-burn policy that triggers when your service requires attention and prevents from alerting when you're currently recovering budget"
    severity: Medium

Fast burn​

Fast burn chart​

Slow burn​

Slow burn chart​

Multi-window multi-burn​

Fast burn

Fast burn chart

Slow burn

Slow burn chart

Multi-window multi-burn