No data troubleshooting
The reasons for your SLO not receiving any data, receiving partial data, or not passing the query check verification can be grouped into two broader categories:
- Issues with the data source your SLO is connected to
- Issues with the query configured in your SLO
Issues with data sourcesโ
The table provides the most common reasons for data source issues and ways to address them:
Reason | How to address |
---|---|
Agent is not connected | Check the connection status of your data source in the Integrations section |
Incorrect source-specific settings | Check your data source configuration: ensure the authentication credentials, URL, and other source-specific values are correct. Check the validity of any tokens or API keys |
Other reasons | Check whether your data source returns any errors. |
If none of the above works, refer to the below sections.
Network issuesโ
Nobl9 SLO calculations are prone to errors in cases where the Nobl9 agent canโt gather all the necessary data from data sources. This issue might occur when, for example, there are network issues between Nobl9 and the respective data source. Refer to Agent troubleshooting for more details.
Nobl9 integrations with data sources (regardless of the connection method used) are resistant to temporary network failures while trying to receive data from them. When the data source becomes available again, Nobl9 catches up on the data lost during the brief outage.
If the data source stays unavailable for an extended period and doesn't recover, Nobl9 cannot collect data from it to resume calculations. In such cases, we recommend checking your data source's status page (see below).
Expand to see status pages
Service Name | Link |
---|---|
Amazon Cloudwatch | Click to go to the status page |
Amazon Prometheus | Click to go to the status page |
Amazon Redshift | Click to go to the status page |
AppDynamics | Click to go to the status page |
BigQuery | Click to go to the status page |
Datadog | Click to go to the status page |
Dynatrace | Click to go to the status page |
Elasticsearch | Click to go to the status page |
Google Cloud Monitoring | Click to go to the status page |
Grafana Loki | Click to go to the status page |
Graphite | Click to go to the status page |
InfluxDB | Click to go to the status page |
Instana | Click to go to the status page |
ServiceNow Cloud Observability | Click to go to the status page |
New Relic | Click to go to the status page |
OpenTSDB | Click to go to the status page |
Pingdom | Click to go to the status page |
Splunk | Click to go to the status page |
Splunk Observability | Click to go to the status page |
Sumo Logic | Click to go to the status page |
ThousandEyes | Click to go to the status page |
Rate limitingโ
When integrating with data sources, Nobl9 agent must comply with the rate limits set by their APIs. Strict rate limits can be the reason for Nobl9 agent to stop collecting data.
For more details on the API rate limits for your data source, refer to the API rate limits section of your required data source article.
Issues with queriesโ
Incorrect queries likely cause errors in your SLO's burn rate calculations. In Nobl9, you can validate your metric's input to ensure you provide all the required values necessary to process your SLI data. However, Nobl9's query checker doesnโt handle more complex queries.
In general, burn rate calculations can be incorrect if:
- Queries return unexplainable, unpredictable data or no data at all.
- In ratio SLOs,
good
orbad,
andtotal
queries can be misplacedโgood
is greater than thetotal
in this case.
Incorrect SLO configurationโ
One of the scenarios
when queries return incorrect data is a mismatch between the data collection method of a ratio SLO and SLI nature.
For example, this can happen when SLO data collection method is set to incremental
while its SLI data is non-incremental
. Check the SLO calculations guide for more details about incremental metrics.
If your SLI data is non-incremental,
remember to set the Data count method to non-incremental in the Nobl9 Web application
or set the value to incremental: false
in your YAML definition in sloctl
.
Here's a YAML definition for a non-incremental metric:
spec:
description: Example Generic SLO
indicator:
metricSource:
name: generic
project: default
kind: Agent
budgetingMethod: Occurrences
objectives:
- displayName: Good response (200)
value: 1
name: ok
target: 0.95
countMetrics:
incremental: false
good:
generic:
query: >-
SINCE N9FROM UNTIL N9TO FROM a1: entities(aws:postgresql:123)
FETCH a1.metrics("infra:database.requests.good",
"aws-cloudwatch"){timestamp, value} LIMITS
metrics.granularityDuration(PT1M)
total:
generic:
query: >-
SINCE N9FROM UNTIL N9TO FROM a1: entities(aws:postgresql:123)
FETCH a1.metrics("infra:database.requests.total",
"aws-cloudwatch"){timestamp, value} LIMITS
metrics.granularityDuration(PT1M)
primary: true
service: api-server
The incremental
parameter impacts how SLO calculations are processed.
Set it to true
for SLOs, whose queries provide Nobl9 with incremental data.
By incremental data, we mean: value v
that for each point in time, t
,
is always greater than or equal to previous value (which is a linearly increasing function):
v(t) โค v(t+1)
Specific Prometheus queries can impact SLO calculationsโ
Nobl9 queries to Prometheus canโt contain the following functions:
These three functions extrapolate missing data. Missing timestamps can lead to inconsistent data received by Nobl9.
Any function using a range vector (like rate
, increase
, irate
)
can introduce another issue because Nobl9 requests data at a specific granularity
(e.g., 15 seconds for Prometheus).
Range vector queries operate over a different interval (e.g., [5m]
).
Attempts to align these intervals potentially lead to unpredictable data.
Range vector queries introduce an interval (in PromQL, itโs represented by [x]
where x
is the duration, such as 5m
), and it's hard to match the data intake interval with the aggregation function intervalโthose two might not overlap, and so the data will be unpredictable.
Learn more about other data sources.
Reimport your historical SLI dataโ
Once the issue is resolved, we recommend replaying your SLO to refill it with historical SLI data for the period when your SLO wasnโt collecting any data or was collecting data only partially.