Skip to main content

Splunk Observability
On demand

Reading time: 0 minute(s) (0 words)

Splunk Observability allows users to search, monitor, and analyze machine-generated big data. Splunk Observability facilitates collecting and monitoring metrics, logs, and traces from common data sources. Data collection and monitoring in one place ensure full-stack, end-to-end observability of the entire infrastructure.

Splunk Observability is different from the Splunk Core that powers Splunk Cloud / Enterprise and is the traditional log management solution from Splunk. Nobl9 also integrates to that through a different set of APIs.

Splunk Observability parameters and supported features in Nobl9
General support:
Release channel: Alpha
Connection method: Agent, Direct
Replay and SLI Analyzer: Not supported
Event logs: Not supported
Query checker: Not supported
Query parameters retrieval: Supported
Timestamp cache persistence: Supported

Query parameters:
Query interval: 1 min
Query delay: 5 min
Jitter: 15 sec
Timeout: 30 sec

Agent details and minimum required versions for supported features:
Plugin name: n9splunk_observability
Query delay environment variable: SPLUNK_QUERY_DELAY
Timestamp cache persistence: 0.65.0

Additional notes:
Available on demand
Maximum query delay for Splunk Observability is 15 minutes

On-demand feature

The Splunk Observability integration with Nobl9 is available on demand. Fill in the request form to access it.

Creating SLOs with Splunk Observability​

Nobl9 Web​

Follow the instructions below to create your SLOs with Splunk Observability in the UI:

  1. Navigate to Service Level Objectives.

  2. Click .
  3. In step 2, select Splunk Observability as the Data Source for your SLO, then specify the Metric. You can choose either a Threshold Metric, where a single time series is evaluated against a threshold, or a Ratio Metric, which allows you to enter two time series to compare (for example, a count of good requests and total requests).

    1. Choose the Data Count Method for your ratio metric:
    • Non-incremental: counts incoming metric values one-by-one. So the resulting SLO graph is pike-shaped.
    • Incremental: counts the incoming metric values incrementally, adding every next value to previous values. It results in a constantly increasing SLO graph.
  4. Enter a Program (for the Threshold metric), or Program for good counter, and Program for total counter (for the count metric). The following are program examples:

    1. Threshold metric for Splunk Observability:

      A = data('demo.trans.count', filter=filter('demo_datacenter', 'Tokyo'), rollup='rate').mean().publish(label='A', enable=False);
      B = data('demo.trans.count', filter=filter('demo_datacenter', 'Tokyo'), rollup='rate').stddev().publish(label='B', enable=False);
      C = (B/A).publish(label='C');
    2. Ratio metric for Splunk Observability:

      Program for good counter: data('demo.trans.count', filter=filter('demo_datacenter', 'Tokyo'),rollup='rate').stddev().publish()

      Program for total counter: data('demo.trans.count', filter=filter('demo_datacenter', 'Tokyo'), rollup='rate').mean().publish()

      SLI values for good and total
      When choosing the query for the ratio SLI (countMetrics), keep in mind that the values ​​resulting from that query for both good and total:
      • Must be positive.
      • While we recommend using integers, fractions are also acceptable.
        • If using fractions, we recommend them to be larger than 1e-4 = 0.0001.
      • Shouldn't be larger than 1e+20.
  5. In step 3, define a Time Window for the SLO.

    • Rolling time windows are better for tracking the recent user experience of a service.

    • Calendar-aligned windows are best suited for SLOs that are intended to map to business metrics measured on a calendar-aligned basis, such as every calendar month or every quarter.

  6. In step 4, specify the Error Budget Calculation Method and your Objective(s).

    • Occurrences method counts good attempts against the count of total attempts.
    • Time Slicesmethod measures how many good minutes were achieved (when a system operates within defined boundaries) during a time window.
    • You can define up to 12 objectives for an SLO.

    See the use case example and the SLO calculations guide for more information on the error budget calculation methods.

  7. In step 5, add the Display name, Name, and other settings for your SLO:

    • Create a composite SLO
    • Set notification on data, if this option is available for your data source.
      When activated, Nobl9 notifies you if your SLO hasn't received data or received incomplete data for more than 15 minutes.
    • Add alert policies, labels, and links, if required.
      You can add up to 20 links per SLO.
  8. Click Create SLO.

sloctl​

Sample Splunk Observability threshold SLO
apiVersion: n9/v1alpha
kind: SLO
metadata:
name: api-server-slo
displayName: API Server SLO
project: default
labels:
area:
- latency
- slow-check
env:
- prod
- dev
region:
- us
- eu
team:
- green
- sales
annotations:
area: latency
env: prod
region: us
team: sales
spec:
description: Example Splunk Observability SLO
indicator:
metricSource:
name: splunk-observability
project: default
kind: Agent
budgetingMethod: Occurrences
objectives:
- displayName: Good response (200)
value: 200
name: ok
target: 0.95
rawMetric:
query:
splunkObservability:
program: >-
data('demo.trans.count', filter=filter('api_server'),
rollup='rate').mean().publish()
op: lte
primary: true
service: api-server
timeWindows:
- unit: Month
count: 1
isRolling: false
calendar:
startTime: 2022-12-01T00:00:00.000Z
timeZone: UTC
alertPolicies:
- fast-burn-5x-for-last-10m
attachments:
- url: https://docs.nobl9.com
displayName: Nobl9 Documentation
anomalyConfig:
noData:
alertMethods:
- name: slack-notification
project: default

Important notes:

Metric specification from Splunk Observability has one field:

  • program refers to a SignalFlow analytics program and is mandatory (string). Search criteria that return exactly one time series. program must return only one key in the data map (one time series).
    For query examples, check Signalflow: sample queries under the Useful links section.

Querying the Splunk Observability server​

Nobl9 queries Splunk observability 4 data points every minute, resulting in a 15-second resolution.

Splunk Observability API rate limits​

You can control your resource usage using org token (Access Tokens) limits. For more information, refer to the Org token limits | Splunk Observability documentation and the System limits for Splunk Infrastructure Monitoring | Splunk Observability documentation.

For a more in-depth look, consult additional resources: