Replay restrictions
Limitations and RBAC
- Two slots are allocated for every organization. This means you can run two Replays at a time.
Track job progress in the Job Status widget. - To replay SLOs, queue Replays, remove them from the queue, and cancel data import, your role must be Organization admin, Project owner, or Project editor.
Learn more about RBAC.
Metric gathering systems usually downsample older data using different aggregate functions like mean
or sum
or simply by dropping data points. This is aimed at saving space and can affect the result of a query made against a time range in the past. Refer to the documentation of your data source for more details.
Replaying a single SLO may take up to an hour depending on:
- The length of the replayed period
- The number of objectives in your SLO
- The number of unique queries used in your SLO
Replay impact on connected resources
Running Replay for existing SLO has important consequences on SLI data and reports, alerts, composite SLOs.
Impact on SLI data and reports
- Live data is gathered while Replay is in progress but isn't considered in calculating SLO's error budget until the process is complete.
- Replay queries your data source once again for the entire selected historical period, even if data for part of this period is already collected by your SLO.
It can completely replace existing SLI data. - Data resolution can be lowered due to the downsampling of historical data. It depends on the data source.
As a result, the SLI chart can look different upon replaying, even when the query remains the same. - Replay won't always fill in missing data points. If there are gaps in data, Replay instead marks these gaps as shown in the examples below.
This happens when the data source doesn't keep data for as long as you're trying to retrieve, for example, according to the data retention policy.
Maximum Period for Historical Data Retrieval prevents exceeding the data source's retention period.
The example of missing SLI data before and after replaying:
2023-01-01 01:20:00 = 100
2023-01-01 01:21:00 = 230
2023-01-01 01:22:00 = 270
2023-01-01 01:24:00 = 220
2023-01-01 01:25:00 = 130
2023-01-01 01:26:00 = 280
2023-01-01 01:27:00 = 200
2023-01-01 01:20:00 = 100
2023-01-01 01:21:00 = 230
[...] # Gap in the data stream
2023-01-01 01:28:00 = 90
2023-01-01 01:29:00 = 220
2023-01-01 01:30:00 = 270
2023-01-01 01:31:00 = 190
2023-01-01 01:20:00 = 100
2023-01-01 01:21:00 = 230
[...] # Gap in the data stream
2023-01-01 01:28:00 = 90
2023-01-01 01:29:00 = 220
2023-01-01 01:30:00 = 270
2023-01-01 01:31:00 = 190
- Reports depend on accurate SLI data to provide insights on service performance or assess overall system health. Since Replay overrides previously collected data, and this can alter data resolution or introduce gaps, the calculations and insights provided by reports might differ from previous iterations, impacting trend observation.
Impact on alerts
- Alerting is suspended for the entire Replay duration and resumed once Replay is complete.
- Once Replay is complete:
- You won't receive already received alerts for the recalculated historical period again.
- You receive missed alerts: the alerts triggered when Replay was running.
These alerts are triggered based on the recalculated data.
Impact on composite SLOs
Currently, you can't replay a composite SLO, but only its components.
Replaying components of a composite causes no retroactive changes to the composite data.
The replayed component stops reporting data until the process is complete.
It is treated according to your maxDelay
and, if longer, whenDelayed
settings.
The overall composite error budget calculations depend on the duration of the Replay process,
the component's maxDelay
settings, and the existence of components without a delay in the composite.
Non-delayed components? | Replay vs. maxDelay | Result |
---|---|---|
Yes | Replay<maxDelay | The composite pauses for the duration of Replay. Component's data collected after replaying is considered in calculations as usual. |
Yes | Replay>maxDelay | Component's data is considered in calculations according to whenDelayed . Data delayed for the time surplus (once maxDelay ends) is calculated as usual. |
No | Any ratio | The composite pauses for the duration of Replay. Upon replaying, the component's data fills the no-data gap. |
Turning an existing SLO into a composite 1.0, while this SLO is being replayed, results in the following:
- Replay continues for the original objectives of this SLO
- Historical data won't be considered in calculating the composite SLO 1.0 error budget. It's calculated without Replay, from the moment of creating the composite 1.0 objective.