Skip to main content

β›­ Data sources

In the case of data retrieval failures for any data source, except for Splunk Observability, you can check this data source's event logs:

β›­ Permissions for adding the Dynatrace data source​

To connect the Nobl9 agent to Dynatrace, you need an access token with the metrics.read scope activated.

Read more about the Dynatrace integration with Nobl9.

β›­ Errors upon setting up an integration with Splunk Observability​

Clear the Nobl9 cache in your browser and set up the integration again.

β›­ Nobl9 integration with new data sources​

If your required data source is missing in the list of available integrations, let us know contact form, so we can estimate the possibility.

β›­ No data from a data source​

Depending on the budgeting method, "no data" is treated as follows:

  • Occurrences: no good and no total events, so there is no effect on the ratio of good to total events.
  • Time slices: minutes with no data considered good.

You can troubleshoot this situation in the following ways:

  • Verify your authentication credentials.
    To verify credentials in Nobl9 UI, go to Integrations and click Edit on your required data source.
  • Verify the query.
    It’s a good practice to first apply the query you wrote in a test environment to make sure it returns valid metrics.
  • Make sure your query returns only one metric and one time series.
  • If you apply any filters in the query, it must point to the exact entity you want to observe.
    Nobl9 processes only one datasetβ€”there is no aggregation on the Nobl9 side.

When all the above steps are verified but the SLO still doesn't show any data, review the data source integration log for any errors:

β›­ Changing streams in ServiceNow Cloud Observability​

You can update the stream ID for SLOs based on ServiceNow Cloud Observability without losing the history.

β›­ Discrepancy between Nobl9 SLI chart and Grafana Cortex values​

A discrepancy in Nobl9 and Grafana may occur when creating ratio SLOs with Prometheus as the data source. In the Nobl9 UI, a ratio SLO has a very specific 1-minute resolution presented on the graphs. The SLI value at these rounded minutes differs from the cortex values presented in Grafana.

Why is there a discrepancy?​

Nobl9 interprets every data point of a non-incremental ratio SLI as an increase from the previous point. Such a time series is then downsampled by summing its individual points and is displayed on the chart. That’s why in Nobl9, there might be higher values on the chart than in Prometheus: the more the chart is zoomed out, the higher the values will beβ€”this is an intended behavior. However, it is important to understand the exact semantics of a user query and how it can affect calculations. When querying time series in Prometheus, there are three parameters that affect the result:

  • Query
  • Time range
  • Step
How can time range change the result?​

In general, the time range only cuts out a section of the result time series; however, in certain functions, in promql, the increase function extrapolate values to the end of the requested range. That means that the value for a given timestamp can be different depending on whether it is in the beginning, middle, or end of the requested time range. Nobl9 adjusts for that by appending the ends of the queried time range to full minutes.

How can step change the result?​

The increase is a function that accepts a range vector as an argument and returns an instant vector. To produce a range vector, the customer can use the [] operator in promql with a fixed time duration, for example, [2m].
When looking at a value at a given point in time, a certain counter increases in the last two minutes until that point.
If the step used to query the time series is smaller than the range vector duration, then the result is based on a series of overlapping windows.
Since Nobl9 interprets every data point of a non-incremental ratio SLI as an increase from the previous point and uses the query_range API with the 15s step, then with the 2m range vector, each HTTP request is counted eight times.
The above-mentioned logic has an impact on error budget calculations as it affects error distribution. What used to be a single erroneous event will be counted proportionally as 1/8 every 15 seconds over two minutes.
As a result, reliability burn down will be smoothed out in comparison to SLO, which would use a raw counter value. Error budgets are calculated based on a proportion of errors, so higher absolute values don’t affect it.

What is the possible workaround?​

Following the Prometheus documentation on the increase function:

Breaks in monotonicity (such as counter resets due to target restarts) are automatically adjusted for.

If this is not required for your counter, that is, your counter doesn’t get restarted, you can use the raw counter value in your query along with the incremental ratio SLI instead of the most accurate results.

Other notes regarding promql​

Using offset in the query is a useful way to account for delays in data availability or eventual consistency in monitoring systems.
However, it also shifts the alert time, so consider this for alerts with time-sensitive policies.
Moreover, sum is an aggregation operator. In the context of time-series data, sum aggregates values across dimensions, not across time.

Check Nobl9 features that will help you troubleshoot:

Event logs
Metrics health notifier
Query checker
Nobl9 agent logging