Skip to main content

Alerting - Use Case

See how to configure an Alert based on a real example.

Assumptions

You've attached an Alert Policy to an SLO with the following condition:

Average error budget burn rate is greater or equal to 3 AND this condition lasts for 10 minutes with cooldown period set to 15 minutes.

The following presents a configuration of this use case:

YAML configuration

apiVersion: n9/v1alpha
kind: AlertPolicy
metadata:
name: trigger-alert-immediately
displayName: # string, optional
project: # string, optional - if not defined, Nobl9 returns a default value for this field
spec:
description: # string, optional
severity: Medium
cooldown: 15m
conditions:
- measurement: averageBurnRate
value: 3
op: gte
lastsFor: 10m
alertMethod:
- name: discord-notification
project: # string, optional - if not defined, Nobl9 returns a default value for this field

Configuration in the Nobl9 UI

Overview

  • At 10:21, single HTTP request with higher latency causes SLO to burn budget. A spike on burn rate graph is observed with burn rate 5x. Fortunately, it is almost immediately resolved. At 10:23, burn rate is again 0x. Alert was not triggered because alert policy expects from burn rate to last at least 10 minutes.

  • At 11:15, budget starts to burn again. Because of higher traffic, there are more HTTP requests that should impact the error budget. The budget starts to burn with burn rate 3.7x. At 11:25, an alert is triggered because alert condition is satisfied.

  • At 11:29, error budget stops burning. Burn rate is 0x. From the moment when alert condition stopped to be satisfied, cooldown starts to be measured. During cooldown period, no new alert is triggered. If burn rate is 0x for another 15 minutes, alert will be resolved and new alert could be triggered.

  • At 11:33, burn rate is again higher (5x). It lasts until 11:35 and then, burn rate is 0x again. Alert policy condition is satisfied and cooldown counter is stopped. New alert was not triggered because previous alert was not resolved yet (as cooldown period was reset). At 11:35, cooldown period starts to be measured again.

  • At 11:50, burn rate is still 0x, cooldown period is satisfied. Alert is resolved.

  • At 11:58, burn rate is 5x and this lasts for the next 5 minutes. New Alert is triggered.

The diagram below shows a lifecycle of the above-mentioned Alert Policy:

Alerting use case

Conclusions

  • When cooldown conditions are satisfied (i.e., no Alert events are triggered during its defined duration), an Alert event is resolved.

    • New alerts are not triggered during cooldown period. Cooldown period is reset when alert condition is satisfied even for a while.
  • The cooldown period may not be satisfied at a given time and won’t trigger any alerts. However, if, over time, all the alert conditions are satisfied again, the cooldown period is then reset and will be calculated from the time when any of the conditions stopped to be satisfied.

    • For alert policy with multiple conditions, all conditions must be satisfied to trigger alerts. To start measuring cool down period just one condition needs to be stop satisfied.
  • New alert will not be triggered unless previously triggered alert is resolved.

tip

Avoid adding alert policies that are always satisfied, for example, Average burn rate >= 0. This condition is satisfied when budget is burning and when the budget is not burning too.


Good job! You now can leverage Nobl9 Alerts like a pro!