DS 7.2.5

Metric types reference

The following monitoring metrics are available in each interface:

Type Description

Counter

Cumulative metric for a numerical value that only increases while the server is running.

Counts that reflect volatile data, such as the number of requests, are reset to 0 when the server starts up.

Gauge

Metric for a numerical value that can increase or decrease.

Histogram

Metric that samples observations, and counts them in buckets, as well as providing a sum of all observed values.

Common REST and LDAP views show histograms as JSON objects. JSON histograms for entry sizes (in bytes) have the following fields:(1)

{
  "count": number,      // Number of events since the server started
  "sum": number,        // Sum of quantities measured for each event
                        // since the server started
  // The buckets in a histogram depend on what the server observes.
  // Each bucket for an entry size measurement has a ceiling size.
  // The first field shows the number of 500-byte or smaller entries,
  // and the second shows the number of 1000-byte or smaller entries.
  // The final field shows the number of entries larger than
  // 1,000,000 bytes:
  "less-than-or-equal-to-500": number,
  "less-than-or-equal-to-1000": number,
  "less-than-or-equal-to-5000": number,
  "less-than-or-equal-to-10000": number,
  "less-than-or-equal-to-50000": number,
  "less-than-or-equal-to-100000": number,
  "less-than-or-equal-to-500000": number,
  "less-than-or-equal-to-1000000": number,
  "less-than-or-equal-to-inf": number
}

Summary

Metric that samples observations, providing a count of observations, sum total of observed amounts, average rate of events, and moving average rates across sliding time windows.

Common REST and LDAP views show summaries as JSON objects. JSON summaries have the following fields:(1)

{
  "count": number,      // Number of events since the server started
  "total": number,      // Sum of quantities measured for each event
                        // since the server started
  // The following are related to the "count":
  "mean_rate": number,  // Average event rate per second
                        // since the server started
  "m1_rate": number,    // One-minute average event rate per second
                        // (exponentially decaying)
  "m5_rate": number,    // Five-minute average event rate per second
                        // (exponentially decaying)
  "m15_rate": number,   // Fifteen-minute average event rate per second
                        // (exponentially decaying)
}

The "total" depends on the type of events measured. For example, if the "count" is the number of requests, then the "total" is the total etime in milliseconds to process all the requests. If the "count" is the number of times the server read bytes of data, then the "total" is the total number of bytes read.

The Prometheus view does not provide time-based statistics, as rates can be calculated from the time-series data. Instead, the Prometheus view includes summary metrics whose names have the following suffixes or labels:

  • _count: number of events since the server started

  • _total: sum of quantities measured for each event since the server started

  • {quantile="0.5"}: 50% at or below this value since the server started

  • {quantile="0.75"}: 75% at or below this value since the server started

  • {quantile="0.95"}: 95% at or below this value since the server started

  • {quantile="0.98"}: 98% at or below this value since the server started

  • {quantile="0.99"}: 99% at or below this value since the server started

  • {quantile="0.999"}: 99.9% at or below this value since the server started

Timer

Metric combining a summary with other statistics.

Common REST and LDAP views show summaries as JSON objects. JSON summaries have the following fields(1)

{
  "count": number,     // Number of events since the server started
  "total": number,     // Total duration for all events
                       // since the server started, in ms
                       // (for requests, sum of the etimes
                       // since the server started, in ms)
  // The following are related to the "count":
  "mean_rate": number, // Average event rate per second
                       // since the server started
  "m1_rate": number,   // One-minute average event rate per second
                       // (exponentially decaying)
  "m5_rate": number,   // Five-minute average event rate per second
                       // (exponentially decaying)
  "m15_rate": number,  // Fifteen-minute average event rate per second
                       // (exponentially decaying)
  // The following are related to the "total":
  "mean": number,      // Average duration over all events
                       // since the server started, in ms
  "min": number,       // Minimum duration recorded
                       // since the server started, in ms
  "max": number,       // Maximum duration recorded
                       // since the server started, in ms
  "stddev": number,    // Standard deviation of durations
                       // since the server started, in ms
  "p50": number,       // 50% durations at or below this value
                       // (median) since the server started, in ms
  "p75": number,       // 75% durations at or below this value
                       // since the server started, in ms
  "p95": number,       // 95% durations at or below this value
                       // since the server started, in ms
  "p98": number,       // 98% durations at or below this value
                       // since the server started, in ms
  "p99": number,       // 99% durations at or below this value
                       // since the server started, in ms
  "p999": number,      // 99.9% durations at or below this value
                       // since the server started, in ms
  "p9999": number,     // 99.99% durations at or below this value
                       // since the server started, in ms
  "p99999": number     // 99.999% durations at or below this value
                       // since the server started, in ms
}

The Prometheus view does not provide time-based statistics. Rates can be calculated from the time-series data.

(1) Monitoring metrics reflect sample observations made while the server is running. The values are not saved when the server shuts down. As a result, metrics of this type reflect data recorded since the server started.

Metrics that show etime measurements in milliseconds (ms) continue to show values in ms even if the server is configured to log etimes in nanoseconds.

The calculation of moving averages is intended to be the same as that of the uptime and top commands, where the moving average plotted over time is smoothed by weighting that decreases exponentially. For an explanation of the mechanism, see the Wikipedia section, Exponential moving average.

Copyright © 2010-2024 ForgeRock, all rights reserved.