Skip to main content

Alerting—general use case

Reading time: 0 minute(s) (0 words)

This guide explores the practical implementation of Nobl9 alerting mechanisms based on service level objectives. We'll review a specific configuration of an Alert Policy, see its lifecycle, and focus on how firing alerts are tied to the cooldown period.


Let's assume you've attached an alert policy to an SLO with the following condition:

The average error budget burn rate is greater or equal to 3 and this condition lasts 10 minutes with a cooldown period set to 15 minutes.

Here's a YAML and UI configuration of this use case:

apiVersion: n9/v1alpha
kind: AlertPolicy
name: trigger-alert-immediately
displayName: # string, optional
project: # string, optional - if not defined, Nobl9 returns a default value for this field
description: # string, optional
severity: Medium
cooldown: 15m
- measurement: averageBurnRate
value: 3
op: gte
lastsFor: 10m
- name: discord-notification
project: # string, optional - if not defined, Nobl9 returns a default value for this field

Overview of the alert policy's lifecycle

A single HTTP request with higher latency causes SLO to burn the budget. A spike on the burn rate graph is observed with a burn rate 5x. Fortunately, it is almost immediately resolved. At 10:23, the burn rate is again 0x. Alert was not triggered because the alert policy expects from burn rate to last at least 10 minutes.

Our SLO's error budget starts to burn again. Because of higher traffic, more HTTP requests should impact the error budget. The budget begins to burn with a burn rate 3.7x. At 11:25, an alert is triggered because the alert condition is satisfied.

Our SLO's error budget stops burning. The burn rate is 0x. When the alert condition stops being satisfied, the cooldown period starts to be measured. During the cooldown period, no new alerts are triggered. If the burn rate is 0x for another 15 minutes, the alert will be resolved, and a new alert can be triggered.

The burn rate has peaked again (5x). It lasts until 11:35, and then the burn rate is 0x again. Our alert policy condition is satisfied, and the cooldown counter is stopped. A new alert was not triggered because the previous alert still needed to be resolved (as the cooldown period was reset). At 11:35, the cooldown period starts to be measured again.

The burn rate is still 0x, the cooldown period is satisfied. The alert is resolved.

The burn rate is 5x, lasting for the next 5 minutes. A new alert is triggered.

The diagram below illustrates the lifecycle of the alert policy described above:

Key takeaways

When cooldown conditions are satisfied (i.e., no alert events are triggered during its defined duration), an alert event is resolved.

New alerts are not triggered during the cooldown period. The cooldown period is reset when an alert condition is satisfied, even for a while.

The cooldown period may not be satisfied at a given time and won’t trigger any alerts.

However, if, over time, all alert conditions have been satisfied again, the cooldown period is then reset. It will be calculated when any of the conditions stopped to be satisfied.

All conditions must be met for an alert policy with multiple conditions to trigger alerts. However, if you want to begin measuring the cooldown period, you must ensure that at lease one condition is no longer fulfilled.

New alerts won't be triggered unless the previously triggered alert is resolved.

Avoid adding alert policies that are always satisfied, for example, Average burn rate >= 0. This condition is satisfied when the budget is burning and when the budget is not burning, too.

For a more in-depth look, consult additional resources: