Remaining budget
Each SLO has an error budget, which defines the number of acceptable errors that can occur in a given time window:
- In the occurrences method, those errors correlate to actual events that make the SLO burn through the error budget.
- In the time slices method, those errors correlate to the number of bad minutes that make the SLO burn through the error budget.
Although the methods use different units (i.e., number of errors for occurrences, minutes for time slices) to describe the error budget allocation, the remaining error budget can be represented as a percentage of the overall allocation.
Time slices vs occurrencesβ
If you have an SLO with a 99.9%
availability target and use the occurrences method, the total number of requests in a month will be 10000
. You can have up to 10
bad requests in such a month and still meet the target.
It means that the error budget is
10
bad requests. If, at any point in time, you've already experienced5
bad requests within a specific time window, then the remaining error budget is50%
(5/10).
Similarly, if the SLO calculation method is time slices for the same target and a month with 43200
minutes, the error budget is 43 minutes.
It means that
43
bad requests in such a month would still meet the target, indicating that the error budget is43
minutes. If, at any point in time, you've already had21
bad minutes in a time window, then the remaining error budget will be50%
(21/43).
Using the remaining budget conditions in your alert policies is a simple yet effective way to monitor the health of your SLOs.
Consider this type of alerting more reactive than proactive, as it triggers when the error budget has already been consumed, no matter how slow or fast it has happened.
Basic YAML configurationβ
The following YAML defines an alert policy with the Remaining budget condition:
apiVersion: n9/v1alpha
kind: AlertPolicy
metadata:
name: budget-below-20
project: default
spec:
alertMethods: []
conditions:
- measurement: burnedBudget
value: 0.8
op: gte
coolDown: 5m
description: "Error Budget is nearly exhausted (20%)"
severity: Medium
The remaining budget calculation doesn't rely on the evaluation window, which can be configured using the alertingWindow
parameter. This means you can't use alertingWindow
in the burnedBudget
measurement.
The value for this condition's lastsFor
parameter defaults to 0
. This configuration will alert you when you reach a specific budget level.
We don't recommend changing this value to <0
, as such configuration might unnecessarily delay alerts when your SLO has already reached a specific budget level.
Custom mathematical operatorsβ
You can use all available mathematical operators to define the remaining budget condition:
lte
- less than or equal to (β€
)gte
- greater than or equal to (β₯
)lt
- less than (<
)gt
- greater than (>
)
You can find itΒ usefulΒ to combine remaining budget conditions with a custom operator with other measurements such as timeToBurnBudget
or averageBurnRate
. This way, Nobl9 will only alert you when your SLO has used up its entire error budget, or there's no error budget. Check YAML samples below for more details.
- Sample configuration
- Sample configuration - 2
The following alert policy will be trigerred when there is no budget left and entire budget would be exhausted in 8h and this condition lasts for 15m.
apiVersion: n9/v1alpha
kind: AlertPolicy
metadata:
name: entire-exhaustion-prediction-8h
displayName: Entire budget exhaustion in 8h
labels:
type:
- time-exhaustion
spec:
description: Entire error budget allocation prediction for 99%, 30 Day Rolling.
severity: Low
coolDown: "15m"
conditions:
- measurement: timeToBurnEntireBudget
value: "8h"
lastsFor: "15m"
- measurement: burnedBudget
value: 1
op: gte
The following alert policy will be trigerred when an SLO still has some budget left to burn (lines 18β20), remaining budget would be exhausted in 3d & this condition lasts for 15m (lines 14-19).
apiVersion: n9/v1alpha
kind: AlertPolicy
metadata:
name: remaining-exhaustion-prediction-3d
displayName: Remaining budget exhaustion in 3d
labels:
type:
- time-exhaustion
spec:
description: Remaining error budget Allocation prediction
severity: Medium
coolDown: "15m"
conditions:
- measurement: timeToBurnBudget
value: "72h"
lastsFor: "15m"
- measurement: burnedBudget
value: 1
op: lt
Check YAML guide for default operators used with all alerting conditions.