Replay
Replay allows you to retrieve historical SLI data from your monitoring source to backfill or correct an SLO's error budget calculations. Use it to ensure data accuracy and speed up the utility of new SLOs.
Common use cases for Replay include:
- Backfilling new SLOs: Immediately calculate an error budget for a newly created SLO using existing historical data, without waiting for it to accumulate naturally over time.
- Recovering data: Correct an SLO's data after recovering from a monitoring source outage or fixing a corrupted SLI.
- Aligning after configuration changes: Re-fetch SLI data to align an SLO with a new metric or query definition.
Replay in a nutshellβ
Here's a quick overview of Replay's core functions, behaviors, and how you can manage them:
Core functionality
- Access historical data: Replay fetches historical data for both new and existing SLOs.
- SLO-specific behavior: Retrieves data from a data source.
- Continuous operation: SLOs continue collecting data in real-time during a Replay. The historical and current data are then merged to produce a complete error budget calculation.
Key behavior
- Up-to-the-moment data: A Replay always retrieves or calculates data up to the moment it is triggered.
- Data retrieval limits: The retrieval period is limited by your data source, though you can set a more restrictive period in Nobl9.
- Chart display: During an ongoing Replay, SLO charts for the replayed period appear empty until the process is complete. Queued Replays do not affect the chart display.
Managing Replays
- Run up to two Replays simultaneously; additional Replays are queued.
- Track the progress of all jobs in the Job Status widget.
- Cancel an active Replay or remove a queued one using the Job Status widget or
sloctl.
How it worksβ
Replay performs two primary tasks: retrieving historical data and processing the retrieved data. The maximum period for historical data retrieval is enforced. This period is initially constrained by your data source's retention policyβthe limits are detailed in the Supported data sources section below.
You can further restrict the retrieval period in your data source settings.
Setting the value to 0 blocks Replay for SLOs based on that data source.
When adding a data source, you can also define a default period for historical data retrieval. This value is applied to all SLOs using that data source for automatic Replay on creation. You can override this default for individual SLOs as needed.
Replaying SLOs may take a considerable time, depending on the selected retrieval period and other environmental and your data source factors.
Additionally, historical data retrieved during Replay may already be downsampled by your data source. Read more about Replay restrictions and impact.
Scope of supportβ
Supported data sources for Replay, their minimum required agent versions, and the maximum period for historical data retrieval:
| Data source name | Nobl9 agent minimum version | Maximum period for historical data retrieval |
|---|---|---|
| Amazon CloudWatch | 0.65.0 | 15 days |
| Amazon Prometheus | 0.65.0 | 30 days |
| AppDynamics | 0.68.0 | 30 days |
| Azure Monitor | 0.69.0-beta01 | 30 days |
| Azure Monitor managed service for Prometheus | 0.78.0-beta | 30 days |
| Coralogix | 0.65.0 | 30 days |
| Datadog | 0.65.0 | 30 days |
| Dynatrace | 0.66.0 | 28 days |
| Elasticsearch | 0.88.0 | 30 days |
| Google Cloud Monitoring | 0.80.0 | 30 days |
| Graphite | 0.65.0 | 30 days |
| LogicMonitor | 0.81.0-beta | 30 days |
| New Relic | 0.65.0 | 30 days |
| Prometheus | 0.65.0 | 30 days |
| ServiceNow Cloud Observability | 0.65.0 | 30 days |
| Splunk | 0.82.2 | 30 days |
| Sumo Logic | 0.102.0-beta | 30 days |
| ThousandEyes | 0.97.0, 0.97.0-beta | 30 days |
Replay configurationβ
Before replaying SLOs, configure Replay at the data source level.
- Nobl9 Web
- sloctl
In the Data source wizard, define the Advanced settings > Historical data retrieval:
- The Maximum period for historical data retrieval value sets the per-integration limits for how far back to the past Nobl9 can query data from your data source.
- The Default period for historical data retrieval defines the period applied to replay SLOs based on your data source. This value is suggested by default as Period for historical data retrieval.
Requirements:
- Each value must be the whole positive number or zero.
- Default period for historical data retrieval must be up to the maximum period set for this data source. You can override this value when creating SLOs.
- Maximum period for historical data retrieval must be up to the maximum period allowed initially by your data source.
- To enable Replay for a data source, set its Maximum period for historical data retrieval to a non-zero value.
In your data source's YAML definition for agent or direct, define the values for historicalDataRetrieval:
maxDurationsets the limits for how far back to the past Nobl9 can query data from your data source.
The maximum period must be less than or equal to the period initially allowed by your data source.defaultDurationdefines the period applied to replay SLOs based on your data source.
apiVersion: n9/v1alpha
# Oneof: Agent, Direct
kind: Direct
metadata:
displayName: My data source with replay
name: my-data-source
project: my-project
spec:
# Your data source-specific fields
# ...
historicalDataRetrieval:
defaultDuration:
unit: Day
value: 0
maxDuration:
unit: Day
value: 30
# Query parameters configuration: query interval, query delay, jitter, and timeont
interval:
unit: Minute
value: 1
queryDelay:
unit: Minute
value: 5
jitter:
unit: Second
value: 45
timeout:
unit: Second
value: 50
# Event logs available for direct connections only
logCollectionEnabled: true
# Oneof: beta, stable. Use beta for Replay
releaseChannel: beta
Run sloctl apply to proceed.
Learn more about YAML configuration for data sources connected using the agent or direct methods.
Requirements:
- Each value must be the whole positive number or zero.
- The value for
defaultPeriodmust be up to the maximum period set for this data source. - The
maxDurationvalue must be up to the maximum period allowed originally by your data source. - To enable Replay for a data source, set its
maxDurationto a non-zero value.
You can also set up replaying SLOs immediately on creation with the Nobl9 Terraform provider.
For this, specify the value for retrieve_historical_data_from in your SLO definition.