Nobl9 application (1.78.0)
- Additional alert details via
sloctl
- Custom mathematical operators for the remaining budget condition
- Replay message and time range in the alert URL
Release detailsβ
new Error budget adjustmentsβ
Error budget adjustments improve the accuracy of your SLO calculations by allowing you to exclude planned events, such as periods of low traffic, scheduled downtime, or routine maintenance, from affecting the error budget.
With this feature, you can designate specific times during which incidents or errors won't count against your SLOs. Once applied, the selected period is ignored in the error budget calculations, treating it as if it had never occurred, while SLI data remains intact to maintain a complete record of events impacting your SLOs. System annotations make it easy to identify excluded periods, ensuring transparency and clarity in the correlation between SLI data and Error Budget charts.
This method not only accurately reflects system performance during normal operating conditions but also prevents teams from being penalized for non-impactful events, thereby optimizing the on-call process and reducing alert fatigue.
improved Additional alert details via sloctlβ
Starting from the 1.74 release on Apr 4, 2023, sloctl get alerts
command will provide more detailed information about alerts triggered or resolved on the spec
level. When using the sloctl get alerts command or the SLO status API, you'll be able to see data for each condition, including when it was first met, how long it lasted, and when it was last met. Additionally, if the status is 'Resolved,' you'll be able to see when the cooldown period started and the reason for the resolution. If an alert was canceled, the reason for cancellation will also be included.
The following YAML shows a sample alert policy that includes additional details (lines 14-17, 29β31):
apiVersion: n9/v1alpha
kind: Alert
metadata:
...
spec:
alertPolicy:
displayName: Remaining budget is <2%
name: remaining-error-budget-is-2
project: default
conditions:
- lastsFor: 1m0s
measurement: burnedBudget
op: gte
status:
firstMetMetricTime: "2024-04-26T12:10:15Z"
lastMetMetricTime: "2024-04-26T12:11:15Z"
lastsForMetMetricTime: "2024-04-26T12:11:15Z"
value: 0.98
coolDown: 5m0s
objective:
displayName: so-so
name: objective-2
value: 600
service:
...
severity: High
slo:
...
status: Triggered
triggeredClockTime: "2024-04-26T12:13:05Z"
triggeredMetricTime: "2024-04-26T12:11:15Z"
improved Custom mathematical operators for the remaining budget conditionβ
You can now use all available mathematical operators to define the remaining budget condition:
lte
- less than or equal to (β€
)gte
- greater than or equal to (β₯
)lt
- less than (<
)gt
- greater than (>
)
You can find itΒ usefulΒ to combine remaining budget conditions with a custom operator with other measurements such as timeToBurnBudget
or averageBurnRate
. This way, Nobl9 will only alert you when your SLO has used up its entire error budget, or there's no error budget. Check YAML samples below for more details.
- No budget left
- Some budget left
The following alert policy will be trigerred when there is no budget left and entire budget would be exhausted in 8h and this condition lasts for 15m.
apiVersion: n9/v1alpha
kind: AlertPolicy
metadata:
name: entire-exhaustion-prediction-8h
displayName: Entire budget exhaustion in 8h
labels:
type:
- time-exhaustion
spec:
description: Entire error budget allocation prediction for 99%, 30 Day Rolling.
severity: Low
coolDown: "15m"
conditions:
- measurement: timeToBurnEntireBudget
value: "8h"
lastsFor: "15m"
- measurement: burnedBudget
value: 1
op: gte
The following alert policy will be trigerred when an SLO still has some budget left to burn (lines 18β20), remaining budget would be exhausted in 3d & this condition lasts for 15m (lines 14-19).
apiVersion: n9/v1alpha
kind: AlertPolicy
metadata:
name: remaining-exhaustion-prediction-3d
displayName: Remaining budget exhaustion in 3d
labels:
type:
- time-exhaustion
spec:
description: Remaining error budget Allocation prediction
severity: Medium
coolDown: "15m"
conditions:
- measurement: timeToBurnBudget
value: "72h"
lastsFor: "15m"
- measurement: burnedBudget
value: 1
op: lt
Check YAML guide for default operators used with all alerting conditions.
improved Replay message and time range in the alert URLβ
Weβve added a new message box for Replay status in the alert details view. This box provides a hint that the SLI data triggering the alert has been reimported and you may not be viewing the most recent data. Additionally, the alert details URL now includes a time range, making it easier to share your manually adjusted alert view with others.