Skip to main content

Remaining budget

Reading time: 0 minute(s) (0 words)

Each SLO has an error budget, which defines the amount of acceptable errors that can occur in a given time window:

  • In the occurrences method, those errors correlate to actual events that make the SLO burn through the error budget.
  • In the time slices method, those errors correlate to the number of bad minutes that make the SLO burn through the error budget.

Although the methods use different units (i.e., number of errors for occurrences, minutes for time slices) to describe the error budget allocation, the remaining error budget can be represented as a percentage of the overall allocation.

Time slices vs occurrences​

If you have an SLO with a 99.9% availability target and use the occurrences method, the total number of requests in a month will be 10000. You can have up to 10 bad requests in such a month and still meet the target.

It means that the error budget is 10 bad requests. If, at any point in time, you've already experienced 5 bad requests within a specific time window, then the remaining error budget is 50% (5/10).

Similarly, if the SLO calculation method is time slices for the same target and a month with 43200 minutes, the error budget is 43 minutes.

It means that 43 bad requests in such a month would still meet the target, indicating that the error budget is 43 minutes. If, at any point in time, you've already had 21 bad minutes in a time window, then the remaining error budget will be 50% (21/43).

tip

Using the remaining budget conditions in your alert policies is a simple yet effective way to monitor the health of your SLOs.

Consider this type of alerting more reactive than proactive, as it triggers when the error budget has already been consumed, no matter how slow or fast it has happened.

Basic YAML configuration​

The following YAML defines an alert policy with the Remaining budget condition:

apiVersion: n9/v1alpha
kind: AlertPolicy
metadata:
name: budget-below-20
project: default
spec:
alertMethods: []
conditions:
- measurement: burnedBudget
value: 0.8
op: gte
coolDown: 5m
description: "Error Budget is nearly exhausted (20%)"
severity: Medium
note

The remaining budget calculation doesn't rely on the evaluation window, which can be configured using the alertingWindow parameter. This means you can't use alertingWindow in the burnedBudget measurement.

The value for this condition's lastsFor parameter defaults to 0. This configuration will alert you when you reach a specific budget level.

We don't recommend changing this value to <0, as such configuration might unnecessarily delay alerts when your SLO has already reached a specific budget level.

Custom mathematical operators​

You can use all available mathematical operators to define the remaining budget condition:

  • lte - less than or equal to (≀)
  • gte - greater than or equal to (β‰₯)
  • lt - less than (<)
  • gt - greater than (>)

You can find itΒ usefulΒ to combine remaining budget conditions with a custom operator with other measurements such as timeToBurnBudget or averageBurnRate. This way, Nobl9 will only alert you when your SLO has used up its entire error budget, or there's no error budget. Check YAML samples below for more details.

The following alert policy will be trigerred when there is no budget left and entire budget would be exhausted in 8h and this condition lasts for 15m.

apiVersion: n9/v1alpha
kind: AlertPolicy
metadata:
name: entire-exhaustion-prediction-8h
displayName: Entire budget exhaustion in 8h
labels:
type:
- time-exhaustion
spec:
description: Entire error budget allocation prediction for 99%, 30 Day Rolling.
severity: Low
coolDown: "15m"
conditions:
- measurement: timeToBurnEntireBudget
value: "8h"
lastsFor: "15m"
- measurement: burnedBudget
value: 1
op: gte
tip

Check YAML guide for default operators used with all alerting conditions.