Skip to main content

Data sources

In the case of data retrieval failures for any data source, except for Splunk Observability, you can check this data source's event logs:

  • The data source must be added with the direct connection method
  • Event logs must be activated in advance for your data source

β›­ Data collection after outages​

In the case of any outages, the after-recovery data collection resumes starting from the last successfully fetched data point. This is ensured by the timestamp cache persistence feature and applies to data sources regardless of the connection method used.

By default, it's inactive for data sources connected using the agent method, so you may want to activate persistence for your data source.

β›­ Permissions for adding the Dynatrace data source​

To connect the Nobl9 agent to Dynatrace, you need an access token with the metrics.read scope activated.

Read more about the Dynatrace integration with Nobl9.

β›­ Errors upon setting up an integration with Splunk Observability​

Clear the Nobl9 cache in your browser and set up the integration again.

β›­ Nobl9 integration with new data sources​

If your required data source is missing in the list of available integrations, let us know contact form, so we can estimate the possibility.

β›­ No data from a data source​

Depending on the budgeting method, "no data" is treated as follows:

  • Occurrences: no good and no total events, so there is no effect on the ratio of good to total events.
  • Time slices: minutes with no data considered good.

You can troubleshoot this situation in the following ways:

  • Verify your authentication credentials.
    To verify credentials on the Nobl9 Web, go to Integrations and click Edit on your required data source.
  • Verify the query.
    It’s a good practice to first apply the query you wrote in a test environment to make sure it returns valid metrics.
  • Make sure your query returns only one metric and one time series.
  • If you apply any filters in the query, it must point to the exact entity you want to observe.
    Nobl9 processes only one datasetβ€”there is no aggregation on the Nobl9 side.

When all the above steps are verified but the SLO still doesn't show any data, review the data source integration log for any errors:

β›­ Changing streams in ServiceNow Cloud Observability​

You can update the stream ID for SLOs based on ServiceNow Cloud Observability without losing the history.

β›­ Discrepancy between Nobl9 SLI chart and values from my data source​

A discrepancy in Nobl9 and Grafana may occur when creating ratio SLOs with Prometheus as the data source. In the Nobl9 Web, a ratio SLO has a very specific 1-minute resolution presented on the graphs. The SLI value at these rounded minutes differs from the cortex values presented in Grafana.

Why is there a discrepancy?​

Nobl9 interprets every data point of a non-incremental ratio SLI as an increase from the previous point. Such a time series is then downsampled by summing its individual points and is displayed on the chart. That’s why in Nobl9, there might be higher values on the chart than in Prometheus: the more the chart is zoomed out, the higher the values will beβ€”this is an intended behavior. However, it is important to understand the exact semantics of a user query and how it can affect calculations. When querying time series in Prometheus, there are three parameters that affect the result:

  • Query
  • Time range
  • Step
How can time range change the result?​

In general, the time range only cuts out a section of the result time series; however, in certain functions, in promql, the increase function extrapolate values to the end of the requested range. That means that the value for a given timestamp can be different depending on whether it is in the beginning, middle, or end of the requested time range. Nobl9 adjusts for that by appending the ends of the queried time range to full minutes.

How can step change the result?​

The increase is a function that accepts a range vector as an argument and returns an instant vector. To produce a range vector, the customer can use the [] operator in promql with a fixed time duration, for example, [2m].
When looking at a value at a given point in time, a certain counter increases in the last two minutes until that point.
If the step used to query the time series is smaller than the range vector duration, then the result is based on a series of overlapping windows.
Since Nobl9 interprets every data point of a non-incremental ratio SLI as an increase from the previous point and uses the query_range API with the 15s step, then with the 2m range vector, each HTTP request is counted eight times.
The above-mentioned logic has an impact on error budget calculations as it affects error distribution. What used to be a single erroneous event will be counted proportionally as 1/8 every 15 seconds over two minutes.
As a result, reliability burn down will be smoothed out in comparison to SLO, which would use a raw counter value. Error budgets are calculated based on a proportion of errors, so higher absolute values don’t affect it.

What is the possible workaround?​

Following the Prometheus documentation on the increase function:

Breaks in monotonicity (such as counter resets due to target restarts) are automatically adjusted for.

If this is not required for your counter, that is, your counter doesn’t get restarted, you can use the raw counter value in your query along with the incremental ratio SLI instead of the most accurate results.

Other notes regarding promql​

Using offset in the query is a useful way to account for delays in data availability or eventual consistency in monitoring systems.
However, it also shifts the alert time, so consider this for alerts with time-sensitive policies.
Moreover, sum is an aggregation operator. In the context of time-series data, sum aggregates values across dimensions, not across time.

β›­ Elasticsearch: handling multiple value responses​

Multiple values can be returned when using percentile aggregation. An example response containing multiple values:

"n9-val": {
"values": {
"95.0": 25.5,
"99.0": 30.0
}
}

It may be caused by query configuration. For example:

"percentiles": {
"field": "transaction.duration.us",
"percents": [95, 99]
}

However, it's crucial for the Nobl9 agent to send only one value to Nobl9.

For this, it sends to Nobl9 the first non-null value from the aggregation list and logs the error indicating that multiple values were returned but only one was retrieved.

To avoid requesting redundant data and error logging, ensure only one value is returned from the search query.

To address this, specify a single percentile value in your query. For example, the above query can be modified as follows:

"percentiles": {
"field": "transaction.duration.us",
"percents": [95]
}

Check Nobl9 features that will help you troubleshoot:

Event logs
Metrics health notifier
Query checker
Nobl9 agent logging