Alerting—general use case

Reading time: 0 minute(s) (0 words)

This guide explores the practical implementation of Nobl9 alerting mechanisms based on service level objectives. We'll review a specific configuration of an Alert Policy, see its lifecycle, and focus on how firing alerts are tied to the cooldown period.

Assumptions

Let's assume you've attached an alert policy to an SLO with the following condition:

The average error budget burn rate is greater or equal to 3 and this condition lasts 10 minutes with a cooldown period set to 15 minutes.

Here's a YAML and UI configuration of this use case:

YAML configuration
UI configuration

apiVersion: n9/v1alpha
kind: AlertPolicy
metadata:
  name: trigger-alert-immediately
  displayName: # string, optional
  project: # string, optional - if not defined, Nobl9 returns a default value for this field
spec:
  description: # string, optional
  severity: Medium
  cooldown: 15m
  conditions:
    - measurement: averageBurnRate
      value: 3
      op: gte
      lastsFor: 10m
  alertMethod:
    - name: discord-notification
      project: # string, optional - if not defined, Nobl9 returns a default value for this field

Overview of the alert policy's lifecycle

10:21AM
A single HTTP request with higher latency causes SLO to burn the budget. A spike on the burn rate graph is observed with a burn rate 5x. Fortunately, it is almost immediately resolved. At 10:23, the burn rate is again 0x. Alert was not triggered because the alert policy expects from burn rate to last at least 10 minutes.

11:15AM
Our SLO's error budget starts to burn again. Because of higher traffic, more HTTP requests should impact the error budget. The budget begins to burn with a burn rate 3.7x. At 11:25, an alert is triggered because the alerting condition is satisfied.

11:29AM
Our SLO's error budget stops burning. The burn rate is 0x. When the alerting condition stops being satisfied, the cooldown period starts to be measured. During the cooldown period, no new alerts are triggered. If the burn rate is 0x for another 15 minutes, the alert will be resolved, and a new alert can be triggered.

11:33AM
The burn rate has peaked again (5x). It lasts until 11:35, and then the burn rate is 0x again. Our alert policy condition is satisfied, and the cooldown counter is stopped. A new alert was not triggered because the previous alert still needed to be resolved (as the cooldown period was reset). At 11:35, the cooldown period starts to be measured again.

11:50AM
The burn rate is still 0x, the cooldown period is satisfied. The alert is resolved.

11:58AM
The burn rate is 5x, lasting for the next 5 minutes. A new alert is triggered.

The diagram below illustrates the lifecycle of the alert policy described above:

Key takeaways

When cooldown conditions are satisfied (i.e., no alert events are triggered during its defined duration), an alert event is resolved.

New alerts are not triggered during the cooldown period. The cooldown period is reset when an alert condition is satisfied, even for a while.

The cooldown period may not be satisfied at a given time and won’t trigger any alerts.

However, if, over time, all alert conditions have been satisfied again, the cooldown period is then reset. It will be calculated when any of the conditions stopped to be satisfied.

All conditions must be met for an alert policy with multiple conditions to trigger alerts. However, if you want to begin measuring the cooldown period, you must ensure that at lease one condition is no longer fulfilled.

New alerts won't be triggered unless the previously triggered alert is resolved.

tip

Avoid adding alert policies that are always satisfied, for example, Average burn rate >= 0. This condition is satisfied when the budget is burning and when the budget is not burning, too.

Alerting—general use case

Assumptions​

Overview of the alert policy's lifecycle​

Key takeaways​

Useful links​

Assumptions

Overview of the alert policy's lifecycle

Key takeaways

Useful links