DS 7.1.7

Metric Types Reference

The following monitoring metrics are available in each interface:

Type Description

Counter

Cumulative metric for a numerical value that only increases while the server is running.

Counts that reflect volatile data, such as the number of requests, are reset to 0 when the server starts up.

Gauge

Metric for a numerical value that can increase or decrease.

Summary

Metric that samples observations, providing a count of observations, sum total of observed amounts, average rate of events, and moving average rates across sliding time windows.

Common REST and LDAP views show summaries as JSON objects. JSON summaries have the following fields:(1)

{
  "count": number,      // Number of events since the server started
  "total": number,      // Sum of quantities measured for each event
                        // since the server started
  // The following are related to the "count":
  "mean_rate": number,  // Average event rate per second
                        // since the server started
  "m1_rate": number,    // One-minute average event rate per second
                        // (exponentially decaying)
  "m5_rate": number,    // Five-minute average event rate per second
                        // (exponentially decaying)
  "m15_rate": number,   // Fifteen-minute average event rate per second
                        // (exponentially decaying)
}

The "total" depends on the type of events measured. For example, if the "count" is the number of requests, then the "total" is the total etime in milliseconds to process all the requests. If the "count" is the number of times the server read bytes of data, then the "total" is the total number of bytes read.

The Prometheus view does not provide time-based statistics, as rates can be calculated from the time-series data. Instead, the Prometheus view includes summary metrics whose names have the following suffixes or labels:

  • _count: number of events since the server started

  • _total: sum of quantities measured for each event since the server started

  • {quantile="0.5"}: 50% at or below this value since the server started

  • {quantile="0.75"}: 75% at or below this value since the server started

  • {quantile="0.95"}: 95% at or below this value since the server started

  • {quantile="0.98"}: 98% at or below this value since the server started

  • {quantile="0.99"}: 99% at or below this value since the server started

  • {quantile="0.999"}: 99.9% at or below this value since the server started

Timer

Metric combining a summary with other statistics.

Common REST and LDAP views show summaries as JSON objects. JSON summaries have the following fields(1)

{
  "count": number,     // Number of events since the server started
  "total": number,     // Total duration for all events
                       // since the server started, in ms
                       // (for requests, sum of the etimes
                       // since the server started, in ms)
  // The following are related to the "count":
  "mean_rate": number, // Average event rate per second
                       // since the server started
  "m1_rate": number,   // One-minute average event rate per second
                       // (exponentially decaying)
  "m5_rate": number,   // Five-minute average event rate per second
                       // (exponentially decaying)
  "m15_rate": number,  // Fifteen-minute average event rate per second
                       // (exponentially decaying)
  // The following are related to the "total":
  "mean": number,      // Average duration over all events
                       // since the server started, in ms
  "min": number,       // Minimum duration recorded
                       // since the server started, in ms
  "max": number,       // Maximum duration recorded
                       // since the server started, in ms
  "stddev": number,    // Standard deviation of durations
                       // since the server started, in ms
  "p50": number,       // 50% durations at or below this value
                       // (median) since the server started, in ms
  "p75": number,       // 75% durations at or below this value
                       // since the server started, in ms
  "p95": number,       // 95% durations at or below this value
                       // since the server started, in ms
  "p98": number,       // 98% durations at or below this value
                       // since the server started, in ms
  "p99": number,       // 99% durations at or below this value
                       // since the server started, in ms
  "p999": number,      // 99.9% durations at or below this value
                       // since the server started, in ms
  "p9999": number,     // 99.99% durations at or below this value
                       // since the server started, in ms
  "p99999": number     // 99.999% durations at or below this value
                       // since the server started, in ms
}

The Prometheus view does not provide time-based statistics. Rates can be calculated from the time-series data.

(1) Monitoring metrics reflect sample observations made while the server is running. The values are not saved when the server shuts down. As a result, metrics of this type reflect data recorded since the server started.

Metrics that show etime measurements in milliseconds (ms) continue to show values in ms even if the server is configured to log etimes in nanoseconds.

The calculation of moving averages is intended to be the same as that of the uptime and top commands, where the moving average plotted over time is smoothed by weighting that decreases exponentially. For an explanation of the mechanism, see the Wikipedia section, Exponential moving average.

Copyright © 2010-2023 ForgeRock, all rights reserved.