Skip to main content

Alerting center
Beta

Reading time: 0 minute(s) (0 words)

The Alerting Center is a beta feature designed to manage site reliability and respond to production issues efficiently. It provides a centralized dashboard with a heat-mapped timeline of all past and active alerts related to SLOs.

With the Alerting Center, you get:

Centralized alert management
Access all alerts in a single view without navigating through individual SLOs to identify alerts.
Real-time active alert updates
The Live updates section displays the number of unresolved alerts triggered within your selected time window. Live updates are suspended when the time window is paused.
Heatmap visualization
The heatmap visualizes alert intensity, aggregating alerts into 60 equally sized time buckets. Each bucket's duration is determined by dividing the selected time window into 60 equal parts. The darker the bucket's color, the more alerts are in it.
From overview to detail
Clicking a heatmap bucket opens the list of affected SLOs. Access individual SLO details (Alerts tab) to identify critical areas and prioritize responses.
Flexible filtering and grouping
Filter alerts by status, severity and group them by resource to focus on specific subsets of alerts based on their priority and relevance.

Overviewโ€‹

To access the Alerting Center, select Alerts in the main navigation panel.

alerting center example
Image 1: Main view of alerting center

Alerting Center and RBACโ€‹

The Alerting Center displays all alerts for SLOs in projects where you have at least view permission. See RBAC for more details.

Heatmapโ€‹

Heatmap displays alert intensity by resourceโ€”the darker the bucket color, the more alerts in this bucket.

The heatmap defaults to displaying:

  • Alerts fired during the last 24 hours
  • Triggered and resolved alerts
  • Alerts grouped by project
  • Resources sorted alphabetically
  • Alerts of all severities

The heatmap aggregates alerts into 60 buckets, each representing a specific time interval. The duration of each bucket interval depends on the selected time windowโ€”it's calculated by dividing the time window into 60 equal parts.

For reference, the following table shows how the bucket interval changes based on the selected time window:

Time window durationBucket interval
1 hour60 seconds
24 hours24 minutes
1 week168 minutes

Alerts can be filtered by status: triggered, resolved and severity: low, medium, high and grouped by resource: project, service, alert policy, and SLO.

Clicking a colored bucket on the heatmap displays the list of affected SLOs. Click the required SLO link to open the Alerts tab of this SLO details.

Limitationsโ€‹

During its beta phase, the Alerting Center has the following limitations:

  • The heatmap displays the most recent 20000 alerts within your organization.
  • If you apply severity filters on the heatmap and then, from the Alerting SLOs list, open any SLO details > Alerts tab, the alerts won't be filtered by severity under the Alerts tab.

Example use casesโ€‹

During a flash sale on Flower Market's eCommerce site, an unexpected surge in traffic triggered numerous alerts. The SRE team used the Alerting Center to quickly identify the root cause of the performance issues impacting the customer experience and sales.

They selected a 4-hour time window to focus on alerts directly related to the recent traffic spike.

A quick check of the Live updates section showed multiple active alerts across different site areas. To isolate the problem, the team grouped the alerts by service, suspecting backend scaling issues.

To prioritize their investigation, they applied the following filters:

  • Severity: High
  • Status: Triggered

This immediately highlighted which services were struggling under the increased load and allowed the team to rapidly pinpoint the affected services and begin troubleshooting the most critical problems. Further investigation would then be needed to get to the root cause of these high severity alerts.