Nobl9 alerting conditions
Nobl9 offers multiple ways to set up alerts. These methods are all centered around the error budget, which determines the maximum duration a system can malfunction without consequences.
This approach helps you protect your error budget and ensures that it doesn't get burned or used up entirely. Nobl9 alerting logic is flexible, allowing you to create a model that suits your needs.
Nobl9 alerting logic rests on two key features:
- Feature 1:
Nobl9 fires alerts immediately only if an SLO is in a specific state, so you can act on it here and now.
- Feature 2:
Nobl9 fires alerts if an SLO could enter a particular state at a given point in time, so you can prevent it.
In Nobl9, both of these assumptions are served by alerting conditions, which are crucial building blocks for alert policies. So, each alerting condition uses some measurement to perform each calculation. Each alerting condition must be defined to detect (or prevent) a specific error budget burn state.
This way, Nobl9 allows you to configure your alerts according to the following:
Measurement that’s used for alerting logic. Each measurement is defined as the function of time (exhaustion conditions), burn rate (error budget burn rate conditions), or error budget (remaining error budget conditions). Also, each measurement corresponds to a specific condition type.
The error budget’s burnout characteristic. The most popular types of alerting conditions are:
Overview of available alerting conditions
Depending on the alert policy configuration, Nobl9 can notify you when:
Remaining error budget would be exhausted in the near or distant future. In this condition, exhaustion time prediction becomes more sensitive as your remaining budget decreases. Once your SLO has no error budget left, even the slightest amount of burn will trigger an alert.
Entire error budget would be exhausted in the near or distant future. This prediction is based on the allocation of your entire error budget and depends only on the current burn rate. Use it to define alerts based on time rather than the burn rate function and avoid the remaining budget value impacting the prediction.note
The Entire/Remaining error budget would be exhausted conditions are triggered when based on the current burn rate, Nobl9 predicts the burn of an entire/remaining budget allocation in the configured period.
Remaining error budget would be exhausted uses the
timeToBurnBudgetmeasurement when verifying alerting conditions, while Entire error budget would be exhausted uses the
The average error budget burn rate is greater or equal to the threshold and lasts for some period. This alerting condition helps catch burn rate spikes independently of the burned budget.
The remaining error budget is below the threshold. It allows for the most straightforward configurations that will alert you when you reach a specific level of error budget, regardless of how quickly, or slowly you reach it.
Slow burn and fast burn conditions
To measure slow burn and fast burn scenarios, you can use the Entire / Remaining Error budget would be exhausted or The average error budget burn rate conditions. For example:
Error budget would be exhausted in 3 days and lasts for 10m - fast burn
To detect short but significant spikes in burn rate over a brief timeframe
Error budget would be exhausted in 3 days and lasts for 1h - slow burn
To detect a gradual budget burn over a prolonged timeframe
Check the Fast and slow burn guide to learn more.
This section of the Nobl9 documentation offers a deeper dive into the ins and outs of Nobl9 error budget calculations and how they are tied to alert policies and alert methods. Check the specific guides to dive deeper into the inner workings of Nobl9 alerting: