Replay restrictions
Limitations and RBAC
- Two slots are allocated for every organization. This means you can run two Replays at a time.
Track job progress in the Job Status widget. - To replay SLOs, queue Replays, and remove them from the queue, your role must be Organization admin, Project owner, or Project editor.
Learn more about RBAC.
Metric gathering systems usually downsample older data using different aggregate functions like mean
or sum
or simply by dropping data points. This is aimed at saving space and can affect the result of a query made against a time range in the past. Refer to the documentation of your required data source for more details.
Replaying a single SLO may take up to an hour depending on:
- The length of the replayed period
- The number of objectives in your SLO
- The number of unique queries used in your SLO
Replay impact on connected resources
Running Replay for existing SLO has important consequences on SLI data, alerts, and composite SLOs.
Impact on SLI data
-
Live data is gathered while Replay is in progress but isn't considered in calculating SLO's error budget until the process is complete.
-
Replay queries your data source once again for the entire selected historical period.
It can completely replace SLI data already gathered for the same period. -
Data resolution can be lowered due to the downsampling of historical data. It depends on the data source.
As a result, the SLI chart can look different upon replaying with the same query. -
Replay won't always fill in missing data points. If there are gaps in data, Replay instead marks these gaps as shown in the examples below.
This happens when the data source doesn't keep data for as long as you're trying to retrieve, for example, according to the data retention policy.
To avoid this, always set the Maximum Period for Historical Data Retrieval less than or equal to data source's retention period.
The example of missing SLI data before and after replaying:
2023-01-01 01:20:00 = 100
2023-01-01 01:21:00 = 230
2023-01-01 01:22:00 = 270
2023-01-01 01:24:00 = 220
2023-01-01 01:25:00 = 130
2023-01-01 01:26:00 = 280
2023-01-01 01:27:00 = 200
2023-01-01 01:20:00 = 100
2023-01-01 01:21:00 = 230
[...] # Gap in the data stream
2023-01-01 01:28:00 = 90
2023-01-01 01:29:00 = 220
2023-01-01 01:30:00 = 270
2023-01-01 01:31:00 = 190
2023-01-01 01:20:00 = 100
2023-01-01 01:21:00 = 230
[...] # Gap in the data stream
2023-01-01 01:28:00 = 90
2023-01-01 01:29:00 = 220
2023-01-01 01:30:00 = 270
2023-01-01 01:31:00 = 190
Impact on alerts
- Alerting is suspended for the entire Replay duration and resumed once Replay is complete.
- Once Replay is complete:
- You won't receive already received alerts for the recalculated historical period again.
- You receive missed alerts: the alerts triggered when Replay was running.
These alerts are triggered based on the recalculated data.
Impact on composite SLOs
Currently, you can't replay a composite SLO, but only its components.
Replaying components of a composite causes no retroactive changes to the composite data.
The replayed component stops reporting data until the process is complete.
It is treated according to your maxDelay
and, if longer, whenDelayed
settings.
The overall composite error budget calculations depend on the duration of the Replay process,
the component's maxDelay
settings, and the existence of components without a delay in the composite.
Non-delayed components? | Replay vs. maxDelay | Result |
---|---|---|
Yes | Replay<maxDelay | The composite pauses for the duration of Replay. Component's data collected after replaying is considered in calculations as usual. |
Yes | Replay>maxDelay | Component's data is considered in calculations according to whenDelayed . Data delayed for the time surplus (once maxDelay ends) is calculated as usual. |
No | Any ratio | The composite pauses for the duration of Replay. Upon replaying, the component's data fills the no-data gap. |
Turning an existing SLO into a composite 1.0, while this SLO is being replayed, results in the following:
- Replay continues for the original objectives of this SLO
- Historical data won't be considered in calculating the composite SLO 1.0 error budget. It's calculated without Replay, from the moment of creating the composite 1.0 objective.