Skip to main content

No data troubleshooting

Reading time: 0 minute(s) (0 words)

The reasons for your SLO not receiving any data, receiving partial data, or not passing the query check verification can be grouped into two broader categories:

  • Issues with the data source your SLO is connected to
  • Issues with the query configured in your SLO

Issues with data sourcesโ€‹

The table provides the most common reasons for data source issues and ways to address them:

ReasonHow to address
Agent is not connectedCheck the connection status of your data source in the Integrations section
Incorrect source-specific settingsCheck your data source configuration: ensure the authentication credentials, URL, and other source-specific values are correct. Check the validity of any tokens or API keys
Other reasonsCheck whether your data source returns any errors.
  • For data sources connected with the direct method, verify event logs (if activated)
  • For data sources connected with the agent method, check your agent's health
  • If none of the above works, refer to the below sections.

    Network issuesโ€‹

    Nobl9 SLO calculations are prone to errors in cases where the Nobl9 agent canโ€™t gather all the necessary data from data sources. This issue might occur when, for example, there are network issues between Nobl9 and the respective data source. Refer to Agent troubleshooting for more details.

    Nobl9 integrations with data sources (regardless of the connection method used) are resistant to temporary network failures while trying to receive data from them. When the data source becomes available again, Nobl9 catches up on the data lost during the brief outage.

    If the data source stays unavailable for an extended period and doesn't recover, Nobl9 cannot collect data from it to resume calculations. In such cases, we recommend checking your data source's status page (see below).

    Expand to see status pages

    Rate limitingโ€‹

    When integrating with data sources, Nobl9 agent must comply with the rate limits set by their APIs. Strict rate limits can be the reason for Nobl9 agent to stop collecting data.

    API rate limits details

    For more details on the API rate limits for your data source, refer to the API rate limits section of your required data source article.

    Issues with queriesโ€‹

    Incorrect queries likely cause errors in your SLO's burn rate calculations. In Nobl9, you can validate your metric's input to ensure you provide all the required values necessary to process your SLI data. However, Nobl9's query checker doesnโ€™t handle more complex queries.

    In general, burn rate calculations can be incorrect if:

    • Queries return unexplainable, unpredictable data or no data at all.
    • In ratio SLOs, good or bad, and total queries can be misplacedโ€”good is greater than the total in this case.

    Incorrect SLO configurationโ€‹

    One of the scenarios when queries return incorrect data is a mismatch between the data collection method of a ratio SLO and SLI nature. For example, this can happen when SLO data collection method is set to incremental while its SLI data is non-incremental. Check the SLO calculations guide for more details about incremental metrics.

    If your SLI data is non-incremental, remember to set the Data count method to non-incremental in the Nobl9 Web application or set the value to incremental: false in your YAML definition in sloctl. Here's a YAML definition for a non-incremental metric:

    Example SLO fragment with non-incremental ratio metrics
    spec:
    description: Example Generic SLO
    indicator:
    metricSource:
    name: generic
    project: default
    kind: Agent
    budgetingMethod: Occurrences
    objectives:
    - displayName: Good response (200)
    value: 1
    name: ok
    target: 0.95
    countMetrics:
    incremental: false
    good:
    generic:
    query: >-
    SINCE N9FROM UNTIL N9TO FROM a1: entities(aws:postgresql:123)
    FETCH a1.metrics("infra:database.requests.good",
    "aws-cloudwatch"){timestamp, value} LIMITS
    metrics.granularityDuration(PT1M)
    total:
    generic:
    query: >-
    SINCE N9FROM UNTIL N9TO FROM a1: entities(aws:postgresql:123)
    FETCH a1.metrics("infra:database.requests.total",
    "aws-cloudwatch"){timestamp, value} LIMITS
    metrics.granularityDuration(PT1M)
    primary: true
    service: api-server

    The incremental parameter impacts how SLO calculations are processed. Set it to true for SLOs, whose queries provide Nobl9 with incremental data. By incremental data, we mean: value v that for each point in time, t, is always greater than or equal to previous value (which is a linearly increasing function):

    v(t) โ‰ค v(t+1)

    Specific Prometheus queries can impact SLO calculationsโ€‹

    Nobl9 queries to Prometheus canโ€™t contain the following functions:

    These three functions extrapolate missing data. Missing timestamps can lead to inconsistent data received by Nobl9.

    Any function using a range vector (like rate, increase, irate) can introduce another issue because Nobl9 requests data at a specific granularity (e.g., 15 seconds for Prometheus). Range vector queries operate over a different interval (e.g., [5m]). Attempts to align these intervals potentially lead to unpredictable data.

    Range vector queries introduce an interval (in PromQL, itโ€™s represented by [x] where x is the duration, such as 5m), and it's hard to match the data intake interval with the aggregation function intervalโ€”those two might not overlap, and so the data will be unpredictable.

    Learn more about other data sources.

    Reimport your historical SLI dataโ€‹

    Once the issue is resolved, we recommend replaying your SLO to refill it with historical SLI data for the period when your SLO wasnโ€™t collecting any data or was collecting data only partially.

    For a more in-depth look, consult additional resources: