Skip to main content

Data sources FAQ

Reading time: 0 minute(s) (0 words)
How does Nobl9 query a data source?

By default, Nobl9 queries a data source every minute for the last minute of data. However, the time depends on the data source and query configuration.

Read more about your required data source and query parameters.

How does data collection resume after outages?

In the case of any outages, the after-recovery data collection resumes starting from the last successfully fetched data point. This is ensured by the timestamp cache persistence feature and applies to data sources regardless of the connection method used.

By default, it's inactive for data sources connected using the agent method, so you may want to activate persistence for your data source.

Which permissions are required to add Dynatrace as a data source?

To connect the Dynatrace data source, you need an access token with the metrics.read scope activated.

Read more about the Dynatrace integration with Nobl9.

Can I change a stream for my existing ServiceNow Cloud Observability integration in Nobl9?

You can update the stream ID for SLOs based on ServiceNow Cloud Observability without losing the history.

Why do I see discrepancies between Nobl9 SLI chart and cortex values from Grafana?

A discrepancy in Nobl9 and Grafana may occur when creating ratio SLOs with Prometheus as the data source. In the Nobl9 web application, a ratio SLO has a very specific 1-minute resolution presented on the graphs. The SLI value at these rounded minutes differs from the cortex values presented in Grafana.

Why is there a discrepancy?

Nobl9 interprets every data point of a non-incremental ratio SLI as an increase from the previous point. Such a time series is then downsampled by summing its individual points and is displayed on the chart. That’s why in Nobl9, there might be higher values on the chart than in Prometheus: the more the chart is zoomed out, the higher the values will beβ€”this is an intended behavior. However, it is important to understand the exact semantics of a user query and how it can affect calculations. When querying time series in Prometheus, there are three parameters that affect the result:

  • Query
  • Time range
  • Step

How can time range change the result?

In general, the time range only cuts out a section of the result time series; however, in certain functions, in promql, the increase function extrapolate values to the end of the requested range. That means that the value for a given timestamp can be different depending on whether it is in the beginning, middle, or end of the requested time range. Nobl9 adjusts for that by appending the ends of the queried time range to full minutes.

How can step change the result?

The increase is a function that accepts a range vector as an argument and returns an instant vector. To produce a range vector, the customer can use the [] operator in promql with a fixed time duration, for example, [2m].
When looking at a value at a given point in time, a certain counter increases in the last two minutes until that point.
If the step used to query the time series is smaller than the range vector duration, then the result is based on a series of overlapping windows.
Since Nobl9 interprets every data point of a non-incremental ratio SLI as an increase from the previous point and uses the query_range API with the 15s step, then with the 2m range vector, each HTTP request is counted eight times.
The above-mentioned logic has an impact on error budget calculations as it affects error distribution. What used to be a single erroneous event will be counted proportionally as 1/8 every 15 seconds over two minutes.
As a result, reliability burn down will be smoothed out in comparison to SLO, which would use a raw counter value. Error budgets are calculated based on a proportion of errors, so higher absolute values don’t affect it.

What is the possible workaround?

Following the Prometheus documentation on the increase function:

Breaks in monotonicity (such as counter resets due to target restarts) are automatically adjusted for.

If this is not required for your counter, that is, your counter doesn’t get restarted, you can use the raw counter value in your query along with the incremental ratio SLI instead of the most accurate results.

Other notes regarding promql

Using offset in the query is a useful way to account for delays in data availability or eventual consistency in monitoring systems.

However, it also shifts the alert time, so consider this for alerts with time-sensitive policies.
Moreover, sum is an aggregation operator. In the context of time-series data, sum aggregates values across dimensions, not across time.

How can multiple-value responses from Elasticsearch be handled?

Multiple values can be returned when using percentile aggregation. An example response containing multiple values:

"n9-val": {
"values": {
"95.0": 25.5,
"99.0": 30.0
}
}

It may be caused by query configuration. For example:

"percentiles": {
"field": "transaction.duration.us",
"percents": [95, 99]
}

However, it's crucial for the Nobl9 agent to send only one value to Nobl9.

For this, it sends to Nobl9 the first non-null value from the aggregation list and logs the error indicating that multiple values were returned but only one was retrieved.

To avoid requesting redundant data and error logging, ensure only one value is returned from the search query.

To address this, specify a single percentile value in your query. For example, the above query can be modified as follows:

"percentiles": {
"field": "transaction.duration.us",
"percents": [95]
}
My data source is missing in the list of Nobl9 integrations. What do I do?

If your required data source is missing in the list of available integrations, let us know contact form, so we can estimate the possibility of adding it.