Alert details view
The alert details view allows for an in-depth analysis of a particular triggered alert. It offers a comprehensive overview of the time frame in which the alert was triggered, displaying the alert's duration on the SLI chart and charts for every condition defined in the alert policy that was met. On the condition-specific charts, you can also see the exact moment when the conditions were met, and the reason behind it.
Overviewβ
You can access the alert details from the Alerts tab on the SLO details view. Choose a tile with the specific alert that you want to investigate and click it.
The yellow Alert condition onset interval shows period from the moment of receiving the first data point significant for the alert whose value exceeds the threshold in the alert evaluation period. You can see this for both the alerting window and lasts for types.
The alert onset period can only begin once the evaluation of the previous alert within the same alert policy for a specific objective has resolved.
This might lead to the following situation: condition A has been true for a long period of time (in the extreme case, it is always TRUE
), while the onset will only be triggered when the previous alert has at least one condition that is FALSE
for at least the duration of the cooldown period.
The green Cooldown interval indicates the alert's cooldown timer. This timer starts after the conditions triggering the alert are no longer met. The alert resolves if none of those conditions are met again during the cooldown period.
The table below shows an overview of charts displayed in the alert details view per condition and parameter type:
Alerting Condition Type | Alerting window | Lasts for |
---|---|---|
Average burn rate is | ||
Entire budget would be exhausted | ||
Remaining budget would be exhausted | ||
Remaining budget is |
Metadataβ
In the metadata section on the top of the alert details view, you can quickly review the essential information related to an alert. This includes the alert's unique ID, SLO, and any other Nobl9 resources associated with the time the alert was triggered, its duration (if it has been resolved), and all data relevant to the alert policy.
Charts - limitations and assumptionsβ
-
Nobl9 displays only 1 alert per the alert details card.
Nobl9 follows this convention to facilitate the investigation process into the cause of the triggered alert. -
The dotted line that defines the alert is an approximation for UX purposes. Users should be aware that it may not be 100% accurate.
-
When someone in your organization edits the threshold value, the alert details will display the "historical" value, i.e., the value before the change.
-
Currently, there's no auto-refresh feature on the alert details view.
This means that if you keep the tab open for an alert while it's being resolved, you won't see the most up-to-date status of the alert.
Breaks in the continuity of chartsβ
You might sometimes notice gaps in alerting charts. This occurs because Nobl9 requires processing time for evaluations equal to or greater than the duration of the alerting window.
For example, if you set the alerting window to 1 hour, Nobl9 needs at least 1 hour of data points to process the evaluation.
It is crucial to remember this whenever a new SLO calendar-aligned time window starts for an alert policy with a long alerting window. Nobl9 uses this convention to avoid artificially extending data points from the "edges" of the chart that demarcate the end of calendar-aligned time windows. Such an extension would distort data visualization on the charts (as there are no data points between the edge of the past time window and the beginning of a new one).
SLOs with rolling time windows won't be affected because they never reset once the time window has been filled for the first time.
In the first scenario, you'll see missing data on the sides of the fired alert in the Error budget burn rate chart:
In the second case, you might see some gaps in the middle of the chart, which shows the break between two time windows:
Error budget exhaustion chartsβ
The "Never" value displayed on Error budget exhaustion's chart y-axis means that, based on the lasts for or alerting window parameter set in your alert policy, the SLO won't exhaust its budget within its time window.
If your SLO has exhausted its entire budget, the chart will display 0
values (they won't drop below 0).
Impact of Replayβ
When the Replay process is started, Nobl9 pauses calculations related to alerts. So, for example, if someone starts the Replay process at 12:00
, the Replay lasts 20 minutes, and you enter alert details at 12:15
, you'll see the chart ending at 12:00
. In such a case, check the Job status widget in the UI and wait until the replay process is finished to see updated data on charts.
Whenever a Replay process is started in any SLO in your Organization, you'll be able to see the following message on the alert details view:
When the Replay process has completed, Nobl9 will display the following message on the alert details view: