Skip to main content

Nobl9 application (1.78.0)

Β· 4 min read
We've just released Nobl9 1.78.0! Release highlights:
new
improved

Release details​

new
Error budget adjustments​

Error budget adjustments improve the accuracy of your SLO calculations by allowing you to exclude planned events, such as periods of low traffic, scheduled downtime, or routine maintenance, from affecting the error budget.

With this feature, you can designate specific times during which incidents or errors won't count against your SLOs. Once applied, the selected period is ignored in the error budget calculations, treating it as if it had never occurred, while SLI data remains intact to maintain a complete record of events impacting your SLOs. System annotations make it easy to identify excluded periods, ensuring transparency and clarity in the correlation between SLI data and Error Budget charts.

This method not only accurately reflects system performance during normal operating conditions but also prevents teams from being penalized for non-impactful events, thereby optimizing the on-call process and reducing alert fatigue.

adjustment on the rbd chart
Image 1: An SLO with past (a, b), ongoing (c), and future (d) adjustment events

improved
Additional alert details via sloctl​

Starting from the 1.74 release on Apr 4, 2023, sloctl get alerts command will provide more detailed information about alerts triggered or resolved on the spec level. When using the sloctl get alerts command or the SLO status API, you'll be able to see data for each condition, including when it was first met, how long it lasted, and when it was last met. Additionally, if the status is 'Resolved,' you'll be able to see when the cooldown period started and the reason for the resolution. If an alert was canceled, the reason for cancellation will also be included.

The following YAML shows a sample alert policy that includes additional details (lines 14-17, 29–31):

apiVersion: n9/v1alpha
kind: Alert
metadata:
...
spec:
alertPolicy:
displayName: Remaining budget is <2%
name: remaining-error-budget-is-2
project: default
conditions:
- lastsFor: 1m0s
measurement: burnedBudget
op: gte
status:
firstMetMetricTime: "2024-04-26T12:10:15Z"
lastMetMetricTime: "2024-04-26T12:11:15Z"
lastsForMetMetricTime: "2024-04-26T12:11:15Z"
value: 0.98
coolDown: 5m0s
objective:
displayName: so-so
name: objective-2
value: 600
service:
...
severity: High
slo:
...
status: Triggered
triggeredClockTime: "2024-04-26T12:13:05Z"
triggeredMetricTime: "2024-04-26T12:11:15Z"

improved
Custom mathematical operators for the remaining budget condition​

You can now use all available mathematical operators to define the remaining budget condition:

  • lte - less than or equal to (≀)
  • gte - greater than or equal to (β‰₯)
  • lt - less than (<)
  • gt - greater than (>)

You can find itΒ usefulΒ to combine remaining budget conditions with a custom operator with other measurements such as timeToBurnBudget or averageBurnRate. This way, Nobl9 will only alert you when your SLO has used up its entire error budget, or there's no error budget. Check YAML samples below for more details.

The following alert policy will be trigerred when there is no budget left and entire budget would be exhausted in 8h and this condition lasts for 15m.

apiVersion: n9/v1alpha
kind: AlertPolicy
metadata:
name: entire-exhaustion-prediction-8h
displayName: Entire budget exhaustion in 8h
labels:
type:
- time-exhaustion
spec:
description: Entire error budget allocation prediction for 99%, 30 Day Rolling.
severity: Low
coolDown: "15m"
conditions:
- measurement: timeToBurnEntireBudget
value: "8h"
lastsFor: "15m"
- measurement: burnedBudget
value: 1
op: gte
tip

Check YAML guide for default operators used with all alerting conditions.

improved
Replay message and time range in the alert URL​

We’ve added a new message box for Replay status in the alert details view. This box provides a hint that the SLI data triggering the alert has been reimported and you may not be viewing the most recent data. Additionally, the alert details URL now includes a time range, making it easier to share your manually adjusted alert view with others.

alert details replay in progress
Image 2: Active Replay process on the alert details view

Documentation updates​