Skip to main content

Use cases for past adjustments

Reading time: 0 minute(s) (0 words)

External factors like third-party outages, maintenance windows, or one-off incidents can distort your error budgets. Left unaddressed, these anomalies may inflate failure rates, creating a misleading picture of service reliability.

Scenario: A one-hour incident affecting daily budgets

Imagine managing a daily error budget. A single one-hour incident causes an abnormal failure spike. Without adjustments, this short-lived event impacts the entire day's budget, making the SLO appear overly pessimistic. To avoid this and ensure the remaining data accurately reflects the day's performance, you can exclude that hour.

This is how applying past budget adjustments works: it refines historical data to better align with the realities of your system, providing more accurate insights, fairer evaluations, and improved decision-making.

SLO incident
SLO incident
SLO incident
Error budget, SLI and burn rate after a one-hour incident

Using budget adjustments, we can exclude the anomalous data from that one-hour period. This ensures the chart reflects a more accurate trend for the remaining time, providing a clearer view of system performance.

Applying a single past adjustment event

apiVersion: n9/v1alpha
kind: BudgetAdjustment
metadata:
displayName: Example Display Name
name: my-adjustment-1-a9c4c0db-2c81-4379-8bcf-40098fdf5b3d
spec:
description: Example description
filters:
slos:
- name: sample-slo-1-8f385fc7-2b0d-4290-8870-758edbe00295
project: test-project
firstEventStart: 2024-12-01T16:15:00Z
duration: 77m

This configuration creates one budget adjustment event for the specified time period, ensuring the error budget excludes the anomalous data for more accurate reporting.

You can tell if a budget adjustment has been applied: the event is marked by in the SLO charts on the Nobl9 web:

SLO incident
SLO incident

Updating budget adjustments

Updating budget adjustments is modifying the adjusted period, i.e., changing its start and end dates. For this, retrieve the required adjustment events for review and then modify them as needed. Event retrieval is available with sloctl budgetadjustments events get or the adjustments API (GET method).

Once the adjustment events are modified as required, reapply changes using sloctl budgetadjustments events update or the PUT method in the adjustments API.

After applying the changes, Nobl9 automatically recalculates the error budget and updates the reliability burn-down charts to reflect the new adjustment period, ensuring the accuracy of your SLO metrics.

workflow for updating adjustments
  1. Step 1: Locate the adjustment Identify the adjustment to be updated.
  2. Step 2: Retrieve adjustment events Export the adjustment events into a file.
  3. Step 3: Edit details Update the YAML file to reflect the new requirements, such as revised dates.
  4. Step 4: Apply changes Use sloctl or adjustments API to reapply the modified events.

Example 1: Errors in initial adjustment event

An SLO experienced a 20-minute error period, but the adjustment was incorrectly created for only 10 minutes. This discrepancy would result in an inaccurate representation of the error budget.

SLO incident
Incorrectly applied adjustment

To modify the adjustment event, we first need to retrieve it using sloctl. The command below fetches the event associated with a specific adjustment within the specified time range:

Fetch adjustment events
sloctl budgetadjustments events get --adjustment-name my-adjustment-1-578b974d-8e27-43cf-85a3-7751a774f13d --from=2024-12-04T00:00:00Z --to=$(date -u +%Y-%m-%dT%H:%M:%SZ)

The result is a single event in the YAML format:

Fetched events
- eventStart: 2024-12-04T06:37:00Z
eventEnd: 2024-12-04T06:47:00Z
slos:
- project: my-project
name: my-slo-1-578b974d-8e27-43cf-85a3-7751a774f13d

Next, save the event to a ΥΑΜL file and correct the event duration. For example, to modify event duration, we can extend the eventEnd timestamp. In this case, we extend the event by 12 minutes:

file_with_modified_events.yaml
- eventStart: 2024-12-04T06:37:00Z
eventEnd: 2024-12-04T06:47:00Z
slos:
- project: test-project
name: sample-slo-1-578b974d-8e27-43cf-85a3-7751a774f13d
update:
eventStart: 2024-12-04T06:37:00Z
eventEnd: 2024-12-04T06:59:00Z

To apply the changes, use the following sloctl command to update the adjustment events:

Update events
sloctl budgetadjustments events update --adjustment-name my-adjustment-1-578b974d-8e27-43cf-85a3-7751a774f13d -f file_with_modified_events.yaml

After applying the update, Nobl9 will automatically recalculate the budget, and the adjustment range will be updated on the chart to reflect the extended duration.

SLO incident
Updated adjustment event

Deleting adjustment events

Deleting budget adjustment events is removing adjustments that are no longer valid or necessary. Delete the required adjustment events using sloctl budgetadjustments events delete or the DELETE method in the adjustments API.

After the events are deleted, Nobl9 automatically recalculates the error budget and updates the reliability burn-down charts, ensuring that your SLO metrics no longer reflect the removed adjustments.

workflow for deleting adjustments
  1. Step 1: Identify the adjustment Locate the adjustment to be removed.
  2. Step 2: Delete adjustment events Use sloctl or the adjustments API to delete the events.
  3. Step 3: Recalculate metrics The system will automatically update the error budget and reliability burn-down charts to reflect the removal.

Example 1: An adjustment event is no longer valid

An adjustment was scheduled for planned maintenance, but the maintenance did not occur.

As a result, many valid events were excluded, causing the budget to be burned below the allowed threshold, potentially triggering alerts.

SLO incident
Incorrect adjustment event

To modify an adjustment event, we first need to retrieve the required event using sloctl. The command below fetches the event associated with a specific adjustment within the specified time range:

Fetch adjustment events
sloctl budgetadjustments events get --adjustment-name my-adjustment-1-654d2659-71d5-42a2-a2ed-b703d817de3f --from=2024-12-04T00:00:00Z --to=$(date -u +%Y-%m-%dT%H:%M:%SZ)

The result is a single event in YAML format:

Fetched adjustment event
- eventStart: 2024-12-04T07:55:00Z
eventEnd: 2024-12-04T08:55:00Z
slos:
- project: test-project
name: sample-slo-1-654d2659-71d5-42a2-a2ed-b703d817de3f

Next, save the event to a YAML file and correct the event duration. For example, to modify event duration, we can extend the eventEnd timestamp. In this case, we'll extend the event by 12 minutes to ensure accuracy:

Updated adjustment events
- eventStart: 2024-12-04T07:55:00Z
eventEnd: 2024-12-04T08:55:00Z
slos:
- project: test-project
name: sample-slo-1-654d2659-71d5-42a2-a2ed-b703d817de3f
update:
eventStart: 2024-12-04T07:55:00Z
eventEnd: 2024-12-04T09:07:00Z

To apply the changes, use the following sloctl command to delete the adjustment events:

Delete defined adjustments events
budgetadjustments events delete --adjustment-name my-adjustment-1-578b974d-8e27-43cf-85a3-7751a774f13d -f file_with_events.yaml

Once the events are deleted, Nobl9 automatically recalculates the budget, and the adjustment with the updated duration will be reflected on the SLO chart. The recalculated error budget should be above the target value.

SLO incident
Incorrect adjustment event deleted

Example 2: Data correction (through Replay) eliminates the need for adjustment

An error in the query or metric source data caused a 1-hour gap in the good count metric. This resulted in significant budget depletion:

SLO incident
Depleted budget in result of an incident

An adjustment was applied to exclude this period from the calculations:

SLO incident
Applied budget adjustment event

Later, the metric source data was corrected, and we ran Replay for this SLO. Nobl9 filled in the gap with meaningful data:

slo after replay
SLO after running Replay

Now, we can remove the adjustment to incorporate the new data into our budget calculations, ensuring the picture is complete with all relevant data included:

slo after adjustment deletion
SLO after removing the redundant adjustment event