Budget adjustments Beta
Error budgets are invaluable for visualizing trends and understanding your services' performance. However, real-world data isn't always clean or fully representative. Outliers and specific eventsβsuch as holidays, planned maintenance, or one-time occurrencesβcan distort error budget calculations and obscure meaningful insights.
This is where budget adjustments come in. With this feature, you can define exclusions to shield your error budget from events like scheduled downtime or deployments. By applying ad-hoc adjustments or scheduling cyclical events with flexible recurrence rules, you can ensure that these periods do not throw your SLOs off track.
With budget adjustments, you can:
Budget adjustments refine your error budget calculations to account for practical realities, helping you maintain accurate insights, focus on long-term trends, and make informed decisions without being misled by temporary fluctuations.
Budget adjustment is a beta functionality.
Currently, you can only apply it using sloctl
, adjustments API or the Nobl9 Terraform provider.
Overviewβ
Budget adjustments and RBACβ
Only Organization Admins can apply, update and delete budget adjustments.
Limits for budget adjustmentsβ
The following limits apply to budget adjustments:
- You can add up to 30 SLOs to a single budget adjustment definition.
- You can modify or update up to 30 unique events per SLO in one action.
How budget adjustments workβ
Budget adjustments allow you to take control of your error budget by excluding specific events and ensuring more reliable metrics about the performance of your services:
Terms of the gameβ
There are two basic building blocks for handling budget adjustments in Nobl9:
Check the section below to understand the difference between them and how both terms are related.
Budget adjustment definitionβ
A budget adjustment definition refers to kind: BudgetAdjustment
, specified in a .yaml
format and managed through sloctl
or the Nobl9 Terraform provider. This definition establishes the parameters for how budget adjustments are applied to SLO(s). Specifically, it outlines:
- Time period: Specifies when the adjustment is active, including start and end times.
- Recurrence: Determines whether the adjustment is a one-time event or repeats over time.
- Target SLOs: Lists the specific SLOs to which the adjustment applies (the adjustment applies to all objectives of the specified SLO).
A definition of a budget adjustment serves as the source for generating adjustment events. In other words, the budget adjustment definition passed in YAML has a one-to-many relationship with its associated events, meaning a single definition can result in multiple adjustment events.
Budget adjustment eventβ
An adjustment event is a singular occurrence of an adjustment triggered for a particular SLO based on the parameters defined in a budget adjustment definition.
Adjustment events are not self-standing and can't exist independently of the definition. They are always derived from a budget adjustment definition, and the defined adjustment events are applied to all objectives within one SLO during the specified periods.
Adding budget adjustmentsβ
You can add budget adjustment events using YAML definition in sloctl
:
- General YAML
- Working YAML
apiVersion: n9/v1alpha
kind: BudgetAdjustment
metadata:
name: string # Mandatory
displayName: string # Optional
spec:
description: string # Optional
firstEventStart: YYYY-MM-DDThh:mm:ssZ # Mandatory, defined start date-time point
duration: 1h
rrule: FREQ=DAILY;INTERVAL=1 # Optional
filters:
slos:
- name: string # Mandatory
project: string # Mandatory
apiVersion: n9/v1alpha
kind: BudgetAdjustment
metadata:
name: working-budget-adjustment
displayName: Working budget adjustment
spec:
description: Budget adjustment event happening monthly on the first Tuesday of each month for 1 hour.
firstEventStart: 2024-01-01T12:00:00Z
duration: 1h
rrule: FREQ=MONTHLY;INTERVAL=1;BYDAY=1TU
filters:
slos:
- name: latency-slo
project: project-alpha
- name: uptime-slo
project: project-beta
- name: throughput-slo
project: project-gamma
Field | Type | Description |
---|---|---|
spec.firstEventStart mandatory | string | Scheduled start time for the first adjustment event. The expected value must be a string representing the date and time in the RFC3339 format. firstEventStart is equivalent to to the DSTART property in iCal.Constraints: β’ firstEventStart can be at most 30 days in the past. |
spec.duration mandatory | string | The duration of the budget adjustment event. Constraints: β’ The expected value for this field is a string formatted as a time duration β’ The duration must be defined with a precision of 1 minute β’ Example: 1h10m |
spec.rrule optional | string | The iCalendar recurrence rule for the budget adjustment event. Constraints: β’ The expected value is a string in the iCal RRULE format β’ Example: FREQ=MONTHLY;BYMONTHDAY=1 β’ Use RRULE calculator to create the desired recurrence rule β’ rrule can't be applied to past events. |
spec.filters.slos[] mandatory | string | A list of SLOs that will be attached to the budget adjustment event. spec.filters.slos[n].name and spec.filters.slos[n].project are mandatory for each list item. |
If you apply BudgetAdjustment
with firstEventStart
in the past and defined rrule
, sloctl will return an error:
Error: Validation failed: Cannot apply BudgetAdjustment. firstEventStart is in the past, and RRULE cannot be applied to past events.
Types of budget adjustment eventsβ
There can be three types of budget adjustment based on the last timestamp received by an SLO.
Let's assume that the last data point was received around Apr 24, 09:25
:
Apr 24, 09:25
(a and b in the image below). Any changes to these events must be handled individually through the adjustments API.Apr 24, 09:25
, but they haven't finished yet (c in the image below).Any changes made to their definition may affect their duration, either by shortening or extending it. However, the minimum time to which they'll be shortened will be the time already processed for the budget adjustment event. The start time of these events will remain unaffected by any changes made in their definition.
Apr 24, 09:25
(d in the image below). All changes made in the budget adjustment definition will affect them.Actions applicable to budget adjustmentsβ
When working with non-recurring past adjustment definitions, keep in mind the following:
Non-recurring adjustments trigger historical events with defined start and end dates. If an adjustment eventβs start and end dates are fully in the past, editing the associated adjustment definition is restricted to avoid unintentional modifications to historical data.
To modify or delete such events, use these sloctl
commands:
- Update the event:
sloctl budgetadjustment event update
- Delete the event:
sloctl budgetadjustment event delete
Recurrence rule formatβ
Using spec.rrule
you can create one-time adjustments for ad hoc needs or define a rule for predictable events that happen regularly. The spec.rrule
field follows iCalendar specification.
The format of the rrule
field consists of key-value pairs separated by semicolons (;
). Each key-value pair specifies a parameter of the recurrence rule. Nobl9 supports all iCalendar rules outlied in the iCalendar documentation.
Example:
apiVersion: n9/v1alpha
kind: BudgetAdjustment
...
spec:
rrule: FREQ=DAILY;INTERVAL=3;COUNT=10
Use the rrule
generator to create a recurrence rule suited to your needs.
The FREQ
value in the rrule definition
specifies the frequency of the adjustment event. The value can be one of the following:
HOURLY
DAILY
WEEKLY
MONTHLY
YEARLY
The INTERVAL
value specifies the interval between each recurrence. The value is an integer representing the number of units of the frequency type. For example, if FREQ=DAILY
and INTERVAL=2
, the event occurs every two days.
You can also include additional parameters such as BYDAY
, BYMONTH
, BYSETPS
, for more complex recurrence patterns.
User experienceβ
Impact on SLI dataβ
Budget adjustment events don't affect SLI data. When the budget adjustment is active, Nobl9 collects data points and displays them on the SLI chart.
You can see the budget adjustment event on the SLI chart, marked by an annotation with the the icon. When you hover over the budget adjustment area, you can see the collected data points:
When you hover over the Reliability burn down and the Error budget burn rate charts, you can see data gaps in the budget adjustment event's area:
Adjustments and replay playlistsβ
When working with adjustments and replay processes, note that only one calculation can be performed per SLO at a time. Any new requests involving the same SLO will be queued and executed sequentially:
-
Replay and adjustment conflict:
- If a replay is running for SLO X and the user creates an adjustment for the same SLO X, the adjustment will be queued and will only begin once the replay is complete.
-
Adjustment and Replay conflict:
- If an adjustment is running for SLO X and the user initiates a replay for the same SLO X, the replay will be queued and will only start after the adjustment is complete.
-
Multiple adjustments conflict:
- If an adjustment is already running for SLO X and another adjustment is created for the same SLO X, the second adjustment will be queued and will only begin after the first adjustment is finished.
This ensures calculations are processed in the correct order without conflicts or data inconsistencies.
Managing adjustments for SLOsβ
To maintain accurate SLO tracking, you may need to exclude certain events or recurring time windows from error budget calculations. These short use cases show how to set up recurring adjustment definitions, manage past adjustments events, and create adjustments for historical events.
Setting up recurring adjustments for known downtime periodsβ
In cases where downtime is predictable, such as routine maintenance or regular inactive hours, you can define a recurring adjustment to automatically exclude these periods from error budget calculations. This feature allows users to set up a schedule that repeats weekly, monthly, or at custom intervals, preventing the need to create new adjustments manually each time.
Letβs say that the service undergoes routine maintenance every Saturday from 2 a.m. to 4 a.m., during which it is temporarily taken offline. The adjustment definition for SLOs monitoring this service could look like this:
apiVersion: n9/v1alpha
kind: BudgetAdjustment
metadata:
name: maintenance-budget-adjustment
displayName: Maintenance budget adjustment
spec:
description: Budget adjustment event happening weekly on the Saturday for 2 hours.
firstEventStart: 2024-01-01T00:00:00Z
duration: 2h
rrule: FREQ=WEEKLY;INTERVAL=1;BYDAY=SA
filters:
slos:
- name: latency-slo
project: project-alpha
- name: uptime-slo
project: project-alpha
- name: throughput-slo
project: project-alpha
Reviewing and modifying past adjustments for accuracyβ
Sometimes, historical adjustment events may need to be modified because of an error in the original exclusion setup or a change in the actual downtime that should have been recorded. This scenario includes two common actions:
- Updating a specific past adjustment event
- Deleting an incorrect adjustment event
Review and update past adjustments process using adjustments APIβ
During a routine maintenance window on Saturday, a regional outage extended the downtime beyond the scheduled period. Although the initial adjustment covered the planned maintenance time, the unexpected outage led to additional unplanned downtime. After reviewing the historical adjustment event for that date, the team realized they needed to adjust the exclusion to capture the entire downtime period.
Access adjustment events history
The team sends the following GET
request to the adjustments API
curl -XGET
-H 'Organization: <organization_name>'
-H 'Authorization: Bearer <token>'
'http://app.nobl9.com/api/budgetadjustments/v1/maintenance-budget-adjustment/events?from=2024-01-01T00:00:00Z&to=2024-01-31T23:59:59Z'
The API returns the following response:
[
{
"eventStart": "2024-01-06T00:00:00Z",
"eventEnd": "2024-01-06T02:00:00Z",
"slos": [
{
"project": "project-alpha",
"name": "latency-slo"
},
{
"project": "project-alpha",
"name": "uptime-slo"
}
]
},
{
"eventStart": "2024-01-13T00:00:00Z",
"eventEnd": "2024-01-13T02:00:00Z",
"slos": [
{
"project": "project-alpha",
"name": "latency-slo"
},
{
"project": "project-alpha",
"name": "uptime-slo"
}
]
},
{
"eventStart": "2024-01-20T00:00:00Z",
"eventEnd": "2024-01-20T02:00:00Z",
"slos": [
{
"project": "project-alpha",
"name": "latency-slo"
},
{
"project": "project-alpha",
"name": "uptime-slo"
}
]
},
{
"eventStart": "2024-01-27T00:00:00Z",
"eventEnd": "2024-01-27T02:00:00Z",
"slos": [
{
"project": "project-alpha",
"name": "latency-slo"
},
{
"project": "project-alpha",
"name": "uptime-slo"
}
]
}
]
Identify the adjustment and update it
Having identified the event that needs to be updated, the team sends the following PUT
request to the adjustment API:
curl -XPUT -H 'Organization: <organization>'
-H 'Authorization: Bearer <token>'
-H "Content-type: application/json" -d '[
{
"eventStart": "2024-01-20T00:00:00Z",
"eventEnd": "2024-01-20T02:00:00Z",
"slos": [
{
"project": "project-alpha",
"name": "latency-slo"
},
{
"project": "project-alpha",
"name": "uptime-slo"
}
],
"update": {
"eventStart": "2024-01-20T00:00:00Z",
"eventEnd": "2024-01-20T03:00:00Z"
}
}
]' 'https://app.nobl9.com/api/budgetadjustments/v1/maintenance-budget-adjustment/events/update'
You can also update past adjustment events using sloctl budgetadjustments events command. To do so:
- Run
budgetadjustments events get --adjustment-name=maintenance-budget-adjustment
command to retrieve a list of events for the specified adjustment and related SLO. - Identify the event that needs to be updated.
- Run
budgetadjustments events update
, providing the updated values in a YAML file
See the Adjustments use case for real-life examples managed through sloctl
.
Delete an incorrect adjustment event using sloctl
β
On another Saturday, the maintenance was canceled because the service needed to remain fully operational due to high demand. Despite the cancellation, the adjustment was still applied as usual, which prevented the service degradation that occurred during this period from impacting the error budget. Reviewing the historical adjustment event for that date, the team realized they needed to remove the adjustment event.
Access adjustment events history
The team accesses adjustment events history by running the following command in sloctl
sloctl budgetadjustments events get --adjustment-name=maintenance-budget-adjustment --from=2024-01-01T00:00:00Z --to=2024-01-31T23:59:59Z
Having run the command above, the team received the following response in sloctl
:
- eventStart: 2024-01-06T00:00:00Z
eventEnd: 2024-01-06T02:00:00Z
slos:
- project: project-alpha
name: latency-slo
- project: project-alpha
name: uptime-slo
- eventStart: 2024-01-13T00:00:00Z
eventEnd: 2024-01-13T02:00:00Z
slos:
- project: project-alpha
name: latency-slo
- project: project-alpha
name: uptime-slo
- eventStart: 2024-01-20T00:00:00Z
eventEnd: 2024-01-20T02:00:00Z
slos:
- project: project-alpha
name: latency-slo
- project: project-alpha
name: uptime-slo
- eventStart: 2024-01-27T00:00:00Z
eventEnd: 2024-01-27T02:00:00Z
slos:
- project: project-alpha
name: latency-slo
- project: project-alpha
name: uptime-slo
Identify the event and delete it
The team identifies that the following adjustment event must be deleted:
- eventStart: 2024-01-27T00:00:00Z
eventEnd: 2024-01-27T02:00:00Z
slos:
- project: project-alpha
name: latency-slo
- project: project-alpha
name: uptime-slo
And runs the following command in sloctl
to delete it:
sloctl budgetadjustments events delete --adjustment-name=maintenance-budget-adjustment -f ./maintenance-event-to-delete.yaml
You can also delete a past budget adjustment event using adjustments API.
Retrospectively excluding a historical eventβ
There may be cases where an event in the past should be excluded, but no adjustment definition was initially created. For example, an unanticipated maintenance period or a non-service-related incident (e.g., a regional outage) impacted the SLO. This feature allows users to create an adjustment definition after the event has occurred.
The team discovered that a misconfigured monitoring metric caused the SLO to appear broken, even though the service was functioning correctly. Recognizing that the issue was due to inaccurate data beyond their control, they decided to retroactively exclude this event from the error budget
apiVersion: n9/v1alpha
kind: BudgetAdjustment
metadata:
name: outage-budget-adjustment
displayName: Outage budget adjustment
spec:
description: Budget exclusion due to external outage incident.
firstEventStart: 2024-01-04T10:00:00Z
duration: 4h
filters:
slos:
- name: latency-slo
project: project-alpha