Data anomaly troubleshooting

Reading time: 0 minute(s) (0 words)

Any of the data anomalies detected by Nobl9 can be caused by reasons outside of Nobl9. This includes issues with a data source or with a data stream itself (like slight changes in a metric behavior).

However, to ensure that SLO data anomalies actually point to an external issue, we recommend checking the correctness of your SLO and SLO objective settings.

This troubleshooting guide will guide you through the issues that you can fix in your SLOs and data source configurations.

No data

The reasons for your SLO not receiving any data, receiving partial data, or not passing the query check verification can be grouped into two broader categories:

Issues with the data source your SLO is connected to and network connectivity
Issues with the query configured in your SLO

Reason	How to address
Connection issues for data sources that use the agent method	• Confirm the connection status of your data source on your data source's details page. • Review your agent metrics.
Connection issues for data sources that use the direct method	Examine event logs.
Inappropriate data source's query parameters	The query interval and timeout must fit the data density: • If the timeout is small, and data is sparse, requests can fail before the data is emitted. • Check the query interval—requests must be sent with enough frequency to capture new data points.
Incorrect source-specific settings	Look into source-specific fields: • Ensure the authentication credentials, URL, and other provided values are correct. • Verify the validity of any tokens or API keys.
Rate limits hit	Nobl9 stops collecting data when the data source's API rate limit is reached. Collection resumes once the rate limit is reset.
Incorrect query	Ensure the syntax is correct according to your data source.
Network issues	Check for any network errors between Nobl9 and your data source.

Timestamp persistence feature

Nobl9 integrations with data sources (regardless of the connection method used) are resistant to temporary network failures while trying to receive data from them. When your data source becomes available again, Nobl9 catches up on the data lost during the brief outage.

If the data source remains unavailable for an extended period and doesn't recover, Nobl9 cannot collect data from it to resume calculations. In such cases, we recommend checking your data source's status page (see below).

Expand to see status pages

Service name	Link
Amazon Cloudwatch	Click to go to the status page
Amazon Prometheus	Click to go to the status page
Amazon Redshift	Click to go to the status page
AppDynamics	Click to go to the status page
Azure Monitor Azure Monitor managed service for Prometheus	Click to go to the status page
Coralogix	Click to go to the status page
Datadog	Click to go to the status page
Dynatrace	Click to go to the status page
Elasticsearch	Click to go to the status page
Google BigQuery	Click to go to the status page
Google Cloud Monitoring	Click to go to the status page
Grafana Loki	Click to go to the status page
Graphite	Click to go to the status page
InfluxDB	Click to go to the status page
Instana	Click to go to the status page
LogicMonitor	Click to go to the status page
New Relic	Click to go to the status page
OpenTSDB	Click to go to the status page
Pingdom	Click to go to the status page
ServiceNow Cloud Observability	Click to go to the status page
Splunk	Click to go to the status page
Splunk Observability	Click to go to the status page
Sumo Logic	Click to go to the status page
ThousandEyes	Click to go to the status page

Tools you can use

Checking data source connection
- For data sources connected using the agent method:
  - Agent connection status
  - Agent metrics
- For data sources connected using the direct method:
  - Event logs
Checking a query and targets
- Query checker for the Datadog, Dynatrace, and New Relic SLOs
- SLI Analyzer

Resources you can refer to for troubleshooting:

Specific Prometheus queries can impact SLO calculations

Nobl9 queries to Prometheus can not contain the following functions:

The reason is these three functions extrapolate missing data. Missing timestamps can lead to inconsistent data received by Nobl9.

Any function using a range vector (like rate, increase, irate) can introduce another issue because Nobl9 requests data at a specific granularity (e.g., 15 seconds for Prometheus). Range vector queries operate over a different interval (e.g., [5m]). Attempts to align these intervals potentially lead to unpredictable data.

Range vector queries introduce an interval (in PromQL, it’s represented by [x] where x is the duration, such as 5m), and it's hard to match the data intake interval with the aggregation function interval—those two might not overlap, and so the data will be unpredictable.

Constant and no burn

These data anomaly types are caused by either too strict or too lenient SLO objective settings. The following settings have an impact:

Target
Numerator (good or bad) query
Denominator (total) query

Burn type	Threshold SLOs	Ratio SLOs
Constant burn	The threshold target is too high	The ratio target is too high The numerator is too restrictive The denominator is too broad, or queries irrelevant data
No burn	The threshold target is too low	The ratio target is too low The numerator and denominator queries are nearly identical

In either case, the SLO fails to reflect the actual state of your system.

Tools you can use:

SLI Analyzer to experiment with different settings and determine a more optimal target
Query checker for the Datadog, Dynatrace, and New Relic SLOs

Reimport your historical SLI data

Once the issue is resolved, we recommend replaying your SLO to refill it with historical SLI data for the period when the data anomaly was detected.

Useful links

Check out these related guides and references:

Query delayNobl9 features

Query parametersNobl9 features

SLO troubleshootingFAQ & troubleshooting

Data source troubleshootingFAQ & troubleshooting

Network connectivity requirementsRequirements

Permissions and accessData sources: requirements

Historical data retrievalData sources

Data source connection methodsSLOcademy

No data​

Specific Prometheus queries can impact SLO calculations​

Constant and no burn​

Reimport your historical SLI data​

Useful links​

No data

Specific Prometheus queries can impact SLO calculations

Constant and no burn

Reimport your historical SLI data

Useful links