Skip to main content

Alerting

Reading time: 0 minute(s) (0 words)

Alerting is one of the fundamental building blocks of reliability. However, it can get complex due to many factors, such as having many SLOs with different threshold and targets, determining the proper severity and notification channels, and managing alert fatigue.

Nobl9 aims to manage these complexities by providing building blocks for configuring simple and effective alerting logic.

tip

With Nobl9 system annotations feature, you can see an annotation added to the SLO objective charts by default each time an alert is triggered or resolved (they are displayed regardless of whether an alert policy is silenced).

See SLO annotations for more details.

system-annotation-example
Image 1: System annotation example

Alert policies​

To configure the alerting logic for Nobl9, you can use alert policies that contain the necessary parameters. These parameters include:

  • Alert conditions: rules that define when an alert will be triggered. If an Alert Policy has multiple conditions, all of them must be satisfied for Nobl9 alert to trigger an alert. You can think of it as a logical operator AND between the specific conditions.

  • Cooldown period: is a defined amount of time that must pass after the first condition in a policy is no longer met. Cooldown helps to prevent alert fatigue when rules are set to evaluate over smaller time windows and the SLI is particularly volatile. Once the cooldown period has successfully passed, the current alert is considered resolved and a new one can be triggered.

    tip

    You can use the AlertSilence feature to mitigate ongoing alert fatigue. Read more in Silencing alerts.

  • Alert methods: notification channels that will be used to send the alert. Nobl9 will send such notification to all alert methods configured in the Alert Policy.

    tip

    You can create alert methods independently of the alert policy (in the Alerts tab in Nobl9 UI, via sloctl or Nobl9 Terraform Provider), and you can reuse them across multiple alert policies.

    You can link alert methods of your choice to alert policy at the moment of its creation or link them later. Read more in Alert methods.

  • Additional settings, such as the severity of the alert, labels, and metadata (name, description, and project)

You can create alert policies on the Alerts view or using sloctl. Follow YAML guide to see how to set it up through YAML.

Lifecycle of an alert​

Each alert has its lifecycle associated with the configured alert policy and the objective against which it is evaluated. Such lifecycle corresponds to specific statuses.

Alert statuses​

There are three possible statuses for an alert policy:

  • Triggered
  • Resolved
  • Canceled

An alert policy triggers alerts independently of other policies for each objective of any SLO it is linked to. A single alert policy can generate multiple alerts for an SLO, but only one can be in a Triggered state for each objective in this SLO. This means no new alerts from a specific alert policy can be triggered while there is a Triggered alert for this objective.

If an SLO has multiple alert policies, each has its alert policy lifecycles and can trigger alerts independently.

After an alert policy generates an alert, it's in a Triggered state. It will remain in this state until the alert conditions are no longer satisfied and the cooldown period has passed. When the alert conditions are no longer satisfied, this alert moves to a Resolved state.

If you change the configuration of an alert policy or a new calendar window has started for the calendar-aligned time window SLOs, the alert will move to a Canceled state.

note

When an alert event assumes the resolved status within the duration of AlertSilence, Nobl9 will send you an all-clear notification. Read more in Silencing alerts.

Cooldown period​

Cooldown period was designed to prevent sending too many alerts. It is an interval measured from the last time stamp when all alert policy conditions were satisfied.

Each SLO objective keeps its cooldown period separately from other objectives. If multiple alert policies are configured, each cooldown is evaluated independently and does not impact the lifecycle of other alert policies.

Assumptions for cooldown​
An alert is resolved when cooldown conditions are satisfied (i.e., the alert policy evaluation states that conditions are not met for at least the cooldown duration).
No new alerts from an alert policy can be triggered for an objective unless the cooldown period is satisfied.
Cooldown is reset each time the conditions are satisfied. If you set a long cooldown (e.g., 1 hour), then even a single spike during the cooldown period will restart the timer to 1 hour, effectively extending the timespan of this alert.
Cooldown can be re-evaluated only with the incoming point. If you set a 5-minute cooldown, but your points come in every 10 minutes, then the cooldown will be re-evaluated every 10 minutes.
In that case, the alert will be resolved 10 minutes after the incident is resolved, even if the cooldown period is set to 5 minutes.

The diagram below shows a simplified lifecycle of an alert policy with a defined cooldown period and lastsFor parameter.

note

This diagram represents a scenario with lastsFor parameter set.

As an alternative to lastsFor, you can use alerting_window to alert immediately to an incident based on the specified evaluation window.

Learn more about those parameters in Nobl9 observation model for alerting

System annotation example
Image 2: Alerting lifecycle
tip

Check out the Alerting - use case in SLOcademy for the complex examples.

Displaying triggered alerts in the UI​

If any of your SLOs triggers alerts, Nobl9 will display this information on the main pane of the Service Health dashboard, next to the SLO name in the SLO Grid view and the SLO Grid view tree:

alert status shd
Image 2: Triggered alerts notification on the Service Health dashboard
alert status grid view
Image 3: Triggered alerts notification on the SLO Grid view
alert status shd
Image 4: Triggered alerts notification on the SlO Grid view (tree) dashboard

Retrieving triggered alerts in sloctl​

Using sloctl, you can retrieve information when an alert stopped to be valid. To do so, run the following command in sloctl:

sloctl get alerts
info

For more details on the sloctl get alerts command, see sloctl User guide.

Here's an example of a triggered alert that hasn't been resolved yet:

apiVersion: n9/v1alpha
kind: Alert
metadata:
name: 6fbc76bc-ff8a-40a2-8ac6-65d7d7a2686e
project: alerting-test
spec:
alertPolicy:
name: burn-rate-is-4x-immediately
project: alerting-test
service:
name: triggering-alerts-service
project: alerting-test
severity: Medium
slo:
name: prometheus-rolling-timeslices-threshold
project: alerting-test
status: Triggered
objective:
displayName: Acceptable
name: objective-1
value: 950
triggeredClockTime: "2022-01-16T00:28:05Z"

If you describe infrastructure as code, you might also consider defining the alert methods with the same convention. You can find more details in our Terraform documentation.

Adding labels to alert methods​

You can add one or more labels to an alert policy. They'll be sent along with the alert notification when the policy’s conditions are met.

Alerting resources​

Alerting quickstart
Get started with alerting
Observation model
Learn about Nobl9 observation model
Alert methods
Learn about Nobl9 alert methods
Alert conditions
Learn about Nobl9 alert conditions
Alert presets
Check how to use alert presets
Alert details
Learn how Nobl9 visualizes fired alerts
Alert use case
Check a use case of Nobl9 alerting
Alert silence
See how to silence noisy alerts
Metric health notifier
See how to notified when your SLO receives no data
Error budget calculations
Learn more about error budget calculations