Skip to main content

Alerting on SLOsโ€”overview

Reading time: 0 minute(s) (0 words)

When an incident is triggered, Nobl9 can send an alert to a notification engine or tool (for example, PagerDuty). Nobl9 also supports integration with a web endpoint by using webhooks where you define the endpoint and parameters to pass.

Alerting on SLOs allows you to react immediately to incidents that matter from the perspective of the user experience of your service (e.g., in terms of latency, errors, correctness, and other SLO-related concepts). Alerts improve the control of whatโ€™s going on in your system and allow you to do better-contributing factor analysis when something goes wrong.

Here are important things to keep in mind while setting up your alerts:

  • Both our attention and energy are limited resources. SLO alerts must correspond to real and urgent issues of your system.

  • Keep in mind that to improve your monitoring, these alerts have to be intentional (i.e., well-defined) and need to evolve together with your system.

tip

With Nobl9 system annotations feature, you can see an annotation added to the SLO objective charts by default each time an alert is triggered or resolved (they are displayed regardless of whether an alert policy is silenced). For more detailed information, refer to the SLO annotations documentation.

system-annotation-example
Image 1: System annotation example

Alert policy & alert method lifecycleโ€‹

Cooldown periodโ€‹

You can configure Cooldown period for your alert policies. Follow YAML guide to see how to set up the cooldown period through YAML.

What is a cooldown period?โ€‹

Cooldown period was designed to prevent sending too many alerts. It is an interval measured from the last time stamp when all alert policy conditions were satisfied. Each SLO objective triggers alerts and keeps its cooldown period.

Assumptions:

  • When cooldown conditions are satisfied (i.e., no alert events are triggered during its defined duration), an alert event is resolved.

  • New alert is triggered unless the cooldown period is satisfied.

  • The cooldown period may not be satisfied at a given time and wonโ€™t trigger any alerts. However, if, over time, all the alerting conditions are satisfied again, the cooldown period is then reset. It will be calculated from the time when any of the conditions stopped to be satisfied.

The diagram below shows a simplified lifecycle of an alert policy with a defined cooldown period:

System annotation example
Image 2: Alerting lifecycle
tip

Check out the Alerting - use case in SLOcademy for the complex examples.

Configuring cooldown period in the UIโ€‹

Refer to Getting started for details.

Alert policy statusesโ€‹

When an alert policy is in Triggered state, no other new alert can be triggered unless the alert is resolved or canceled.

Alert policy statuses adhere to the following criteria:

  • An alert is resolved when any of the conditions stopped to be true AND the cooldown period expires from that time.

  • An alert is canceled when alert policy configuration has changed OR a new calendar window has started for the calendar aligned time window SLOs.

note

When an alert event assumes the resolved status within the duration of AlertSilence, Nobl9 will send you an all-clear notification. Read mode in Silencing alerts.

Displaying triggered alerts in the UIโ€‹

If any of your SLOs fires alerts, Nobl9 will display this information on the main pane of the Service Health Dashboard, next to the SLO name in the SLO Grid view and the SLO Grid view tree:

alert status shd
Image 2: Triggered alerts notification on the Service Health Dashboard
alert status grid view
Image 3: Triggered alerts notification on the SLO Grid view
alert status shd
Image 4: Triggered alerts notification on the SlO Grid view (tree) dashboard

Retrieving triggered alerts in sloctlโ€‹

Using sloctl, you can retrieve information when an alert stopped to be valid. To do so, run the following command in sloctl:

sloctl get alerts
info

For more details on the sloctl get alerts command, check sloctl User guide.

Here's an example of a triggered alert that hasn't been resolved yet:

apiVersion: n9/v1alpha
kind: Alert
metadata:
name: 6fbc76bc-ff8a-40a2-8ac6-65d7d7a2686e
project: alerting-test
spec:
alertPolicy:
name: burn-rate-is-4x-immediately
project: alerting-test
service:
name: triggering-alerts-service
project: alerting-test
severity: Medium
slo:
name: prometheus-rolling-timeslices-threshold
project: alerting-test
status: Triggered
objective:
displayName: Acceptable
name: objective-1
value: 950
triggeredClockTime: "2022-01-16T00:28:05Z"

Alert listโ€‹

Alert list on the SLO grid view allows you to view alert events related to your SLO. You can access the alerts by accessing the SLO Details view and clicking the Alerts tab at the top:

alert list
Image 5: Accessing alert list

You can click on each tile to check all details of every alert (see the video below):

Video 1: Browsing alert details

Nobl9 limits displaying alerts to 1000 most recent alert events. You can use filters to narrow down the results. You can filter by:

  • Alert statuses: triggered, resolved
  • SLO objective names
  • Alert policy name
  • Date (for example, last hour/week/month, current time window, or by a custom date range)
tip

Keep in mind that the filters are linked by the AND logical operator.

alert list filters
Image 6: Alert list filters
caution

In rare situations, Nobl9 won't return some alerts. This can happen because:

  • Nobl9 returns alerts for existing objects only (alert policies, SLOs, services, and objectives).
  • If you delete any objects, Nobl9 won't return alerts for them in the alert list.
  • If you delete an SLO, alert policy, or service and recreate it with the same name, Nobl9 won't return results for it.
  • If you unlink an alert policy from an SLO, Nobl9 won't return alerts for it.

Alert list and RBACโ€‹

Nobl9 returns alerts triggered in a given project only (the alert project is the same as the SLO project). If you don't have permission to view SLO in a given project, you won't see these alerts.

tip

You can also use the sloctl get alerts command to get up to 1000 most recent alert events and filter the results using flags.

Labels and alert methodsโ€‹

Adding labels to alert methodsโ€‹

Users can add one or more labels to an alert policy, which will be sent along with the alert notification when the policyโ€™s conditions are met.

Other relevant resourcesโ€‹

For useful tips on how to get started with your first alert, check Your first alert policy!. Also see our Tips and tricks.

If you describe infrastructure as code, you might also consider defining the alert methods with the same convention. You can find more details in our Terraform documentation.