Skip to main content

Agent metrics

Reading time: 0 minute(s) (0 words)

While the Nobl9 agent is a stable and lightweight application, Nobl9 users would like to have data-based insights into its health and understand if it is operational, how it is performing, and if it could use resource utilization updates.

With the agent metrics feature, you can get various agent health and resource utilization numbers available to scrape at /health and /metrics endpoints.

Requirements​

You can activate metrics configuration by exposing your agent data via environmental variables in your Docker container or Kubernetes cluster.

To get the Nobl9 agent metrics, you need to have a monitoring solution that can scrape a Prometheus-compliant metrics endpoint.

Metric endpoints​

/health endpoint​

This endpoint returns an HTTP/1.1 200 OK response to a standard GET request if the agent is "healthy." The OK response means that the agent code has completed initialization and is running.

/metrics endpoint​

This endpoint returns Prometheus-scrapable metrics data in response to a standard GET request. It is a text-based endpoint handled by the Golang Prometheus libraries.

Agent's default port​

To scrape agent's metrics, you need to define the N9_METRICS_PORT as an environmental variable while deploying your agent through Kubernetes YAML or a Docker invocation generated in the UI:

Here's a shortened example of Kubernetes deployment YAML with the defined N9_METRICS_PORT variable:

apiVersion: v1
kind: Secret
<...>
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: nobl9-agent-example
namespace: default
spec:
<...>
spec:
containers:
- name: agent-container
image: nobl9/agent:0.73.2
resources:
requests:
memory: "350Mi"
cpu: "0.1"
env:
<...>
- name: N9_METRICS_PORT
value: "9090"
NOTES
  • The N9_METRICS_PORT is a variable specifying the TCP port to which the /metrics and /health endpoints are exposed.

  • Port 9090 is the default value, and you can change it to adjust to the port that you’re using on your infrastructure.

  • If you don't want the agent metrics to be exposed, comment out or delete the N9_METRICS_PORT variable.

List of available agent metrics​

The following is the list of all available agent metrics at /health and /metrics endpoints applicable to stable versions:

Metric nameTypeLabelsDescriptionSplunk Obs?
n9_accumulated_points
countervec
  • organization
  • plugin_name
  • metric_source
Points added to accumulator since agent startY
n9_all_buffer_metrics_dropped
countTotal number of dropped metrics due to the buffer overflow, unlabeledY
n9_buffer_dropped_metrics
countervec
  • organization
  • plugin_name
  • metric_source
Number of dropped metrics due to the buffer overflow, labeledY
n9_buffer_capacitygaugeTotal capacity of the metrics buffer in the Nobl9 agentY
n9_buffer_loadgaugeTotal count of metrics in the buffer that has not yet been successfully uploaded to the Nobl9 platformY
n9_bytes_received_totalcountTotal count of bytes received by the agent since the last startY
n9_bytes_sent_totalcountTotal count of bytes sent since the last agent startY
n9_emitted_points
countervec
  • organization
  • plugin_name
Points successfully emitted to the N9 platform since the agent startY
n9_input_received_bytescountervec
  • organization
  • plugin_name
  • metric_source
Bytes received in input responses’ bodies since the last agent startY
n9_input_sent_bytescountervec
  • organization
  • plugin_name
  • metric_source
Bytes sent in input requests’ bodies since the last agent startY
n9_last_config_update_success_time
countervec
  • organization
  • metric_source
Seconds since the last successful config readY
n9_last_input_successful_response_timegaugevec
  • organization
  • plugin_name
  • metric_source
Seconds since the last successful input responseY
n9_last_query_success_time
countSeconds since the last successful query was sent.

Replaced with last_input_successful_response_time
Y
n9_last_upload_success_timecountSeconds since the last successful data uploadY
n9_output_received_bytescounterBytes received in output responses’ bodies since the last agent startY
n9_output_sent_bytescounterBytes sent in output requests’ bodies since the last agent startY
n9_payload_received_total
countTotal count of bytes requested in all responses’ content since the last agent start.

Replaced with input_received_bytes and output_received_bytes
Y
n9_payload_sent_total
countTotal count of bytes sent in all requests’ content since the last agent start.

Replaced with input_sent_bytes and output_sent_bytes
Y
n9_query_delay
gaugeQuery delay configured in seconds.

Due to deprecation, look for runtime log with the appropriate info
N/A
n9_query_errors
countervec
  • status_code
Total count of errors encountered while executing SLI queries since the last agent start.

Replaced with n9_query_total
N
n9_query_interval
gaugevec
  • indicator_id
Query interval configured in seconds.

Due to deprecation, look for runtime log with the appropriate info
N/A
n9_query_latencyhistogramvec
  • organization
  • plugin_name
  • metric_source
Latency histogram of all SLI query requestsN
n9_query_totalcountervec
  • organization
  • plugin_name
  • status_code
  • metric_source
Total count of all queries ran since the last agent startN
n9_sli_totalgaugevec
  • organization
  • plugin_name
  • metric_source
Total count of the SLIs the agent collects data forY
n9_upload_errors
countervec
  • status_code
Total count of errors encountered while uploading data to the Nobl9 platform since the last agent start.

Replaced with n9_upload_total
Y
n9_upload_latency
histogramvec
  • organization
  • metric_source
Latency histogram of all data uploads by SLI to the Nobl9 platformN
n9_upload_totalcountervec
  • status_code
Total count of all data uploaded to the Nobl9 platform since the last agent startY
n9_uptime
countervec
  • agent_version
  • organization
  • metric_source
Seconds since the agent startY

The following is the list of all available agent metrics at /health and /metrics endpoints applicable to beta versions:

Metric nameTypeLabelsDescriptionSplunk Obs?
n9_accumulated_points
countervec
  • organization
  • plugin_name
  • metric_source
Points added to accumulator since agent startY
n9_all_buffer_metrics_dropped
countTotal number of dropped metrics due to the buffer overflow, unlabeledY
n9_buffer_dropped_metrics
countervec
  • organization
  • plugin_name
  • metric_source
Number of dropped metrics due to the buffer overflow, labeledY
n9_buffer_capacitygaugeTotal capacity of the metrics buffer in the Nobl9 agentY
n9_buffer_loadgaugeTotal count of metrics in the buffer that has not yet been successfully uploaded to the Nobl9 platformY
n9_bytes_received_totalcountTotal count of bytes received by the agent since the last startY
n9_bytes_sent_totalcountTotal count of bytes sent since the last agent startY
n9_emitted_points
countervec
  • organization
  • metric_source
  • plugin_name
Points successfully emitted to the N9 platform since the agent startY
n9_input_received_bytescountervec
  • organization
  • plugin_name
  • metric_source
Bytes received in input responses’ bodies since the last agent startY
n9_input_sent_bytescountervec
  • organization
  • plugin_name
  • metric_source
Bytes sent in input requests’ bodies since the last agent startY
n9_last_config_update_success_time
countervec
  • organization
  • metric_source
Seconds since the last successful config readY
n9_last_input_successful_response_timegaugevec
  • organization
  • plugin_name
  • metric_source
Seconds since the last successful input responseY
n9_last_query_success_time
countSeconds since the last successful query was sent.

Replaced with last_input_successful_response_time
Y
n9_last_upload_success_timecountSeconds since the last successful data uploadY
n9_output_received_bytescounterBytes received in output responses’ bodies since the last agent startY
n9_output_sent_bytescounterBytes sent in output requests’ bodies since the last agent startY
n9_payload_received_total
countTotal count of bytes requested in all responses’ content since the last agent start.

Replaced with input_received_bytes and output_received_bytes
Y
n9_payload_sent_total
countTotal count of bytes sent in all requests’ content since the last agent start.

Replaced with input_sent_bytes and output_sent_bytes
Y
n9_query_delay
gaugeQuery delay configured in seconds.

Due to deprecation, look for runtime log with the appropriate info
N/A
n9_query_errors
countervec
  • status_code
Total count of errors encountered while executing SLI queries since the last agent start.

Replaced with n9_query_total
N
n9_query_interval
gaugevec
  • indicator_id
Query interval configured in seconds.

Due to deprecation, look for runtime log with the appropriate info
N/A
n9_query_latencyhistogramvec
  • organization
  • plugin_name
  • metric_source
Latency histogram of all SLI query requestsN
n9_query_totalcountervec
  • organization
  • plugin_name
  • status_code
  • metric_source
Total count of all queries ran since the last agent startN
n9_sli_totalgaugevec
  • organization
  • plugin_name
  • metric_source
Total count of the SLIs the agent collects data forY
n9_upload_errors
countervec
  • status_code
Total count of errors encountered while uploading data to the Nobl9 platform since the last agent start.

Replaced with n9_upload_total
Y
n9_upload_latency
histogramvec
  • organization
  • metric_source
Latency histogram of all data uploads by SLI to the Nobl9 platformN
n9_upload_totalcountervec
  • status_code
Total count of all data uploaded to the Nobl9 platform since the last agent startY
n9_uptime
countervec
  • agent_version
  • organization
  • metric_source
Seconds since the agent startY

Metrics use cases​

Metrics can help you better understand what the Nobl9 agent is doing in your Kubernetes environment.

Peak in agent's memory utilization​

For instance, if you detect a peak in memory usage of an active Nobl9 agent, metrics allow you to scrape agent’s utilization data with Prometheus and display a graph in Grafana to see the agent memory usage over time.

Breaks in agent's operation​

You might also encounter a situation where Kubernetes reports the pod is in a Running state, but the agent appears to stop operating at times. By scraping metrics data with Prometheus and displaying graphs with Grafana, you can get details about agent’s operation over time.

For instance, you might see spikes in the graph displaying n9_upload_errors (deprecated, use n9_upload_total{status_code!~"200"}) while the graph displaying n9_bytes_sent_total has stopped growing. At the same time, the panel displaying n9_last_upload_success_time indicates uploads have stopped, and the graph displaying n9_buffer_load has also started growing. You can leverage this data with Nobl9 support and your local networking team to help troubleshoot the issue further.

Examples of agent metric visualizations​

Using Prometheus, you can collect metrics and analyze them using a tool like Grafana.

  • Here's an example of Grafana visualization of the n9_uptime metric:
n9-uptime
Image 1: Visualization of the n9_uptime metric
  • Here's an example of Grafana visualization of the n9_query_total metric for several Nobl9 agents:
n9-query-total
Image 2: Visualization of the n9_query_total metric
  • Here's an example of Grafana visualization of the n9_upload_total metric:
n9-upload-total
Image 3: Visualization of the n9_upload_total metric