Monitor Performance
Use the following interfaces to monitor Java Agent performance:
- Prometheus
-
A third-party software used for gathering and processing monitoring data.
For information about installing and running Prometheus, see the Prometheus documentation.
You can configure Java Agent to expose an endpoint which Prometheus scrapes to obtain performance metrics from your protected web applications.
Configure Prometheus to monitor the metrics endpoint exposed by the agent by using the
prometheus.yml
configuration file. For more, see the Prometheus configuration documentation.Prometheus provides monitoring and processing for the information provided by Java Agent. For further analysis and visualization use tools such as Grafana to create customized charts and graphs based on the information collected by Prometheus. Download example Grafana dashboards from the ForgeRock BackStage website. For more information, see the Grafana website.
For more information, see Expose an Endpoint for Common REST and Prometheus Metrics.
- ForgeRock® Common REST
-
You can configure Java Agent to expose an endpoint that allows REST clients to gather metrics about your protected web applications, in JSON format.
For more information, see Expose an Endpoint for Common REST and Prometheus Metrics.
- CSV File-based
-
Write metrics to comma-separated value (CSV) files, without exposing an endpoint.
When enabled, the monitoring
.csv
files are written to the same directory as the agent instance debug files, for example in/path/to/java_agents/tomcat_agent/Agent_001/logs/debug/
.For more information, see Save Metrics to CSV Files.
Expose an Endpoint for Common REST and Prometheus Metrics
Common REST and Prometheus performance metrics are provided by an endpoint
configured in the protected web application’s web.xml
file. The endpoint
must be accessible to the REST client or Prometheus server that will be making
use of the performance data.
-
For each protected web application that will expose metrics, edit the web application’s
web.xml
file.The following Tomcat example exposes a base endpoint named
/metrics
:<servlet> <servlet-name>AgentMonitoring</servlet-name> <servlet-class>org.forgerock.http.servlet.HttpFrameworkServlet</servlet-class> <init-param> <param-name>application-loader</param-name> <param-value>guice</param-value> </init-param> </servlet> <servlet-mapping> <servlet-name>AgentMonitoring</servlet-name> <url-pattern>/metrics/*</url-pattern> </servlet-mapping>
Choose any name for the exposed base endpoint, but make sure it does not conflict with any of the builtin agent endpoints, for example
/sunwCDSSORedirectURI
. -
Allow access to the base endpoint used for monitoring web applications protected by the agent by using one of the following methods:
-
Create a Not Enforced URI rule for the base endpoint.
The following example rule allows access to the metrics base endpoint:
*/metrics/*
-
Create a Compound Not-Enforced URI and IP rule for the base endpoint.
A Compound Not-Enforced URI and IP rule can allow access from only the IP addresses of the REST clients or Prometheus server.
The following example rule allows access to the
/metrics
endpoint for HTTP requests that come from the IP address range from 192.168.1.1 to 192.168.1.3:192.168.1.1-192.168.1.3 | */metrics/*
HTTP requests from other IP addresses are not able to access the metrics base endpoint.
-
Create an authorization policy in AM to restrict access to the metrics base endpoint.
Note that the metric base endpoint does not require login credentials. You can use a policy to ensure that requests to the endpoints are authenticated against the AM instance.
For more information, see Configuring Policies in AM’s Authorization Guide.
-
-
The Common REST performance monitoring endpoint will now be available in the path used by the protected web application, for example
https://mydomain.example.com/myapp/metrics/crest
.Configure your REST clients to access the endpoint to gather performance metric data. If you are protecting the endpoint by using policies in AM, include the relevant credentials.
-
The Prometheus performance monitoring endpoint is available in the path used by the protected web application, for example
https://mydomain.example.com/myapp/metrics/prometheus
.Configure your Prometheus server to access the endpoint to gather performance metric data. If you are protecting the endpoint by using policies in AM, include the relevant credentials.
Save Metrics to CSV Files
-
Set Export Monitoring Metrics to CSV, as follows:
-
true
to configure the agent to write metric information to CSV files. -
false
to prevent the agent from writing metric information to CSV files.
-
Metric Types
Timer Fields
Common REST Fields
Field | Description |
---|---|
|
Metric ID. |
|
Metric type. |
|
Number of events recorded for this metric. |
|
Sum of the durations recorded for this metric. |
|
Minimum duration recorded for this metric. |
|
Maximum duration recorded for this metric. |
|
Average duration recorded for this metric. |
|
Standard deviation of durations recorded for this metric. |
|
Units used for measuring the durations in the metric. |
|
50% of the durations recorded are at or below this value. |
|
75% of the durations recorded are at or below this value. |
|
95% of the durations recorded are at or below this value. |
|
98% of the durations recorded are at or below this value. |
|
99% of the durations recorded are at or below this value. |
|
99.9% of the durations recorded are at or below this value. |
|
One-minute average rate. |
|
Five-minute average rate. |
|
Fifteen-minute average rate. |
|
Average rate. |
|
Units used for measuring the rate of the metric. |
Duration-based values, such as min , max , and p50 , are weighted
towards newer data. By representing approximately the last five minutes of data,
the timers make it easier to see recent changes in behavior, rather than a
uniform average of recordings since the server was started.
|
The following is an example of the requests.granted.not-enforced
metric from
the Common REST endpoint:
{
"_id" : "requests.granted.not-enforced",
"_type" : "timer",
"count" : 486,
"total" : 80.0,
"min" : 0.0,
"max" : 1.0,
"mean" : 0.1905615495053855,
"stddev" : 0.39274399467782056,
"duration_units" : "milliseconds",
"p50" : 0.0,
"p75" : 0.0,
"p95" : 1.0,
"p98" : 1.0,
"p99" : 1.0,
"p999" : 1.0,
"m1_rate" : 0.1819109974890356,
"m5_rate" : 0.05433445522996721,
"m15_rate" : 0.03155662103953588,
"mean_rate" : 0.020858521722211427,
"rate_units" : "calls/second"
}
Prometheus Fields
The Prometheus endpoint does not provide rate-based statistics, as rates can be calculated from the time-series data.
Field | Description |
---|---|
|
Metric ID, and type. Note that the |
|
Number of events recorded. |
|
Sum of the durations recorded. |
|
50% of the durations are at or below this value. |
|
75% of the durations are at or below this value. |
|
95% of the durations are at or below this value. |
|
98% of the durations are at or below this value. |
|
99% of the durations are at or below this value. |
|
99.9% of the durations are at or below this value. |
Duration-based quantile values are weighted towards newer data. By representing approximately the last five minutes of data, the timers make it easier to see recent changes in behavior, rather than a uniform average of recordings since the server was started. |
The following is an example of the
ja_requests{access=granted,decision=allowed-by-policy}
metric from the
Prometheus endpoint:
ja_requests_seconds{access="granted",decision="allowed-by-policy",quantile="0.5",} 0.013000000000000001
ja_requests_seconds{access="granted",decision="allowed-by-policy",quantile="0.75",} 0.022000000000000002
ja_requests_seconds{access="granted",decision="allowed-by-policy",quantile="0.95",} 0.022000000000000002
ja_requests_seconds{access="granted",decision="allowed-by-policy",quantile="0.98",} 0.022000000000000002
ja_requests_seconds{access="granted",decision="allowed-by-policy",quantile="0.99",} 0.022000000000000002
ja_requests_seconds{access="granted",decision="allowed-by-policy",quantile="0.999",} 1.1380000000000001
ja_requests_count{access="granted",decision="allowed-by-policy",} 7.0
ja_requests_seconds_total{access="granted",decision="allowed-by-policy",} 1.21
Gauge Fields
Common REST Fields
Metric for a numerical value that can increase or decrease. The value for a gauge is calculated when requested, and represents the state of Metric at that specific time.
Field | Description |
---|---|
|
Metric ID. |
|
Metric type. |
|
Current value of the metric. |
The following is an example of the jvm.used-memory
metric from the Common
REST endpoint:
{
"_id" : "jvm.used-memory",
"_type" : "gauge",
"value" : 2.13385216E9
}
Prometheus Fields
Field | Description |
---|---|
|
Metric ID, and type. Formatted as a comment. |
|
Current value. Large values may be represented in scientific E-notation. |
The following is an example of the ja_jvm_used_memory_bytes
metric from the
Prometheus endpoint:
# TYPE ja_jvm_used_memory_bytes gauge
ja_jvm_used_memory_bytes 1.418723328E9
DistinctCounter
Metric providing an estimate of the number of unique values recorded.
For example, this could be used to estimate the number of unique users who have authenticated, or unique client IP addresses.
The DistinctCounter metric is calculated per instance of AM, and
cannot be aggregated across multiple instances to get a site-wide view.
|
Common REST Fields
Field | Description |
---|---|
|
Metric ID. |
|
Metric type. Note that the |
|
Calculated estimate of the number of unique values recorded in the metric. |
The following is an example of the authentication.unique-uuid.success
metric
from the Common REST endpoint:
{
"_id" : "authentication.unique-uuid.success",
"_type" : "gauge",
"value" : 3.0
}
Prometheus Fields
Field | Description |
---|---|
|
Metric ID, and type. Note that the |
|
Calculated estimate of the number of unique values recorded in the metric. |
The following is an example of the ja_notenforced_ip_unmatched_cache_size
metric from the Prometheus endpoint:
# TYPE ja_notenforced_ip_unmatched_cache_size gauge
ja_notenforced_ip_unmatched_cache_size 3.0
Exposed Metrics
Java Agent exposes the monitoring metrics described in this section.
Audit Handler Metrics
Metric | Prometheus name | Description |
---|---|---|
|
|
Time taken to generate an audit object. (Timer) |
|
|
Time taken to audit outcomes, both locally to the agent and remotely in AM. (Timer) |
Labels:
<handler-type>
-
am-delegate
. Remote auditing performed by AM. (Prometheus:am_delegate
)json
. Local audit logging using JSON. <outcome>
-
success
failure
Endpoint and REST SDK Metrics
Metric | Prometheus name | Description |
---|---|---|
|
|
Time taken to retrieve user session information from AM. (Timer) |
|
|
Time taken to retrieve the user profile information from AM. (Timer) |
|
|
Time taken to retrieve policy decisions from AM. (Timer) |
JSON Web Token (JWT) Metrics
Metric | Prometheus name | Description |
---|---|---|
|
|
|
Size of the JWT cache. (Gauge) |
|
|
The eviction count for the JWT cache. (Gauge) |
|
|
The load count for the JWT cache. (Gauge) |
|
|
The load time for the JWT cache, in milliseconds. (Gauge) |
|
|
The hit count for the JWT cache. (Gauge) |
|
|
JVM Metrics
To get Metric name used by Prometheus, prepend |
Name | Description |
---|---|
|
Number of processors available to the Java virtual machine. (Gauge) |
|
Number of classes loaded since the Java virtual machine started. (Gauge) |
|
Number of classes unloaded since the Java virtual machine started. (Gauge) |
|
Number of collections performed by the "parallel scavenge mark sweep" garbage collection algorithm. (Gauge) |
|
Approximate accumulated time taken by the "parallel scavenge mark sweep" garbage collection algorithm. (Gauge) |
|
Number of collections performed by the "parallel scavenge" garbage collection algorithm. (Gauge) |
|
Approximate accumulated time taken by the "parallel scavenge" garbage collection algorithm. (Gauge) |
|
Amount of heap memory that the Java virtual machine initially requested from the operating system. (Gauge) |
|
Maximum amount of heap memory that the Java virtual machine will attempt to use. (Gauge) |
|
Amount of heap memory that is committed for the Java virtual machine to use. (Gauge) |
|
Amount of heap memory used by the Java virtual machine. (Gauge) |
|
Amount of memory that the Java virtual machine initially requested from the operating system. (Gauge) |
|
Maximum amount of memory that the Java virtual machine will attempt to use. (Gauge) |
|
Amount of non-heap memory that the Java virtual machine initially requested from the operating system. (Gauge) |
|
Maximum amount of non-heap memory that the Java virtual machine will attempt to use. (Gauge) |
|
Amount of non-heap memory that is committed for the Java virtual machine to use. (Gauge) |
|
Amount of non-heap memory used by the Java virtual machine. (Gauge) |
|
Amount of "code cache" memory that the Java virtual machine initially requested from the operating system. (Gauge) |
|
Maximum amount of "code cache" memory that the Java virtual machine will attempt to use. (Gauge) |
|
Amount of "code cache" memory that is committed for the Java virtual machine to use. (Gauge) |
|
Amount of "code cache" memory used by the Java virtual machine. (Gauge) |
|
Amount of "compressed class space" memory that the Java virtual machine initially requested from the operating system. (Gauge) |
|
Maximum amount of "compressed class space" memory that the Java virtual machine will attempt to use. (Gauge) |
|
Amount of "compressed class space" memory that is committed for the Java virtual machine to use. (Gauge) |
|
Amount of "compressed class space" memory used by the Java virtual machine. (Gauge) |
|
Amount of "metaspace" memory that the Java virtual machine initially requested from the operating system. (Gauge) |
|
Maximum amount of "metaspace" memory that the Java virtual machine will attempt to use. (Gauge) |
|
Amount of "metaspace" memory that is committed for the Java virtual machine to use. (Gauge) |
|
Amount of "metaspace" memory used by the Java virtual machine. (Gauge) |
|
Amount of "parallel scavenge eden space" memory that the Java virtual machine initially requested from the operating system. (Gauge) |
|
Maximum amount of "parallel scavenge eden space" memory that the Java virtual machine will attempt to use. (Gauge) |
|
Amount of "parallel scavenge eden space" memory that is committed for the Java virtual machine to use. (Gauge) |
|
Amount of "parallel scavenge eden space" memory after the last time garbage collection recycled unused objects in this memory pool. (Gauge) |
|
Amount of "parallel scavenge eden space" memory used by the Java virtual machine. (Gauge) |
|
Amount of "parallel scavenge old generation" memory that the Java virtual machine initially requested from the operating system. (Gauge) |
|
Maximum amount of "parallel scavenge old generation" memory that the Java virtual machine will attempt to use. (Gauge) |
|
Amount of "parallel scavenge old generation" memory that is committed for the Java virtual machine to use. (Gauge) |
|
Amount of "parallel scavenge old generation" memory after the last time garbage collection recycled unused objects in this memory pool. (Gauge) |
|
Amount of "parallel scavenge old generation" memory used by the Java virtual machine. (Gauge) |
|
Amount of "parallel scavenge survivor space" memory that the Java virtual machine initially requested from the operating system. (Gauge) |
|
Maximum amount of "parallel scavenge survivor space" memory that the Java virtual machine will attempt to use. (Gauge) |
|
Amount of "parallel scavenge survivor space" memory that is committed for the Java virtual machine to use. (Gauge) |
|
Amount of "parallel scavenge survivor space" memory after the last time garbage collection recycled unused objects in this memory pool. (Gauge) |
|
Amount of "parallel scavenge survivor space" memory used by the Java virtual machine. (Gauge) |
|
Amount of memory that is committed for the Java virtual machine to use. (Gauge) |
|
Amount of memory used by the Java virtual machine. (Gauge) |
|
Number of threads in the BLOCKED state. (Gauge) |
|
Number of live threads including both daemon and non-daemon threads. (Gauge) |
|
Number of live daemon threads. (Gauge) |
|
Number of threads in the NEW state. (Gauge) |
|
Number of threads in the RUNNABLE state. (Gauge) |
|
Number of threads in the TERMINATED state. (Gauge) |
|
Number of threads in the TIMED_WAITING state. (Gauge) |
|
Number of threads in the WAITING state. (Gauge) |
Not Enforced Rule Metrics
Metric | Prometheus name | Description |
---|---|---|
|
|
Size of the not-enforced URI matched cache. (Gauge) |
|
|
Eviction count for the not-enforced URI matched cache. (Gauge) |
|
|
Load count for the not-enforced URI matched cache. (Gauge) |
|
|
Load time for the not-enforced URI matched cache, in milliseconds. (Gauge) |
|
|
Hit count for the not-enforced URI matched cache. (Gauge) |
|
|
Miss count for the not-enforced URI matched cache. (Gauge) |
|
|
Size of the not-enforced URI unmatched cache. (Gauge) |
|
|
Eviction count for the not-enforced URI unmatched cache. (Gauge) |
|
|
Load count for the not-enforced URI unmatched cache. (Gauge) |
|
|
Load time for the not-enforced URI unmatched cache, in milliseconds. (Gauge) |
|
|
Hit count for the not-enforced URI unmatched cache. (Gauge) |
|
|
Miss count for the not-enforced URI unmatched cache. (Gauge) |
|
|
Size of the not-enforced IP matched cache. (Gauge) |
|
|
Eviction count for the not-enforced IP matched cache. (Gauge) |
|
|
Load count for the not-enforced IP matched cache. (Gauge) |
|
|
Load time for the not-enforced IP matched cache, in milliseconds. (Gauge) |
|
|
Hit count for the not-enforced IP matched cache. (Gauge) |
|
|
Miss count for the not-enforced IP matched cache. (Gauge) |
|
|
Size of the not-enforced IP unmatched cache. (Gauge) |
|
|
Eviction count for the not-enforced IP unmatched cache. (Gauge) |
|
|
Load count for the not-enforced IP unmatched cache. (Gauge) |
|
|
Load time for the not-enforced IP unmatched cache, in milliseconds. (Gauge) |
|
|
Hit count for the not-enforced IP unmatched cache. (Gauge) |
|
|
Miss count for the not-enforced IP unmatched cache. (Gauge) |
|
|
Size of the not-enforced compound matched cache. (Gauge) |
|
|
Eviction count for the not-enforced compound matched cache. (Gauge) |
|
|
Load count for the not-enforced compound matched cache. (Gauge) |
|
|
Load time for the not-enforced compound matched cache, in milliseconds. (Gauge) |
|
|
Hit count for the not-enforced compound matched cache. (Gauge) |
|
|
Miss count for the not-enforced compound matched cache. (Gauge) |
|
|
Size of the not-enforced compound unmatched cache. (Gauge) |
|
|
Eviction count for the not-enforced compound unmatched cache. (Gauge) |
|
|
Load count for the not-enforced compound unmatched cache. (Gauge) |
|
|
Load time for the not-enforced compound unmatched cache, in milliseconds. (Gauge) |
|
|
Hit count for the not-enforced compound unmatched cache. (Gauge) |
|
|
Miss count for the not-enforced compound unmatched cache. (Gauge) |
Policy Decision Metrics
Metric | Prometheus name | Description |
---|---|---|
|
|
Size of the policy decision cache. (Gauge) |
|
|
Eviction count for the policy decision cache. (Gauge) |
|
|
Load count for the policy decision cache. (Gauge) |
|
|
Load time for the policy decision cache, in milliseconds. (Gauge) |
|
|
Hit count for the policy decision cache. (Gauge) |
|
|
Miss count for the policy decision cache. (Gauge) |
POST Data Preservation Metrics
Metric | Prometheus name | Description |
---|---|---|
|
|
Size of the POST data preservation cache. (Gauge) |
|
|
Eviction count for the POST data preservation cache. (Gauge) |
|
|
Load count for the POST data preservation cache. (Gauge) |
|
|
Load time for the POST data preservation cache, in milliseconds. (Gauge) |
|
|
Hit count for the POST data preservation cache. (Gauge) |
|
|
Miss count for the POST data preservation cache. (Gauge) |
Request Metrics
Metric | Prometheus name | Description |
---|---|---|
|
|
Rate of granted/denied requests and their decision. (Timer) |
Labels:
<access>
-
granted
denied
<decision>
-
not-enforced
: Request matched a not enforced rule.no-valid-token
: Request did not have a valid SSO token or an OpenID Connect JWT.allowed-by-policy
: Request matched a policy, which allowed access.denied-by-policy
: Request matched a policy, which denied access.am-unavailable
: The AM instance was not reachable.agent-exception
: An internal error (exception) occurred within the agent.
Session Information Metrics
Metric | Prometheus name | Description |
---|---|---|
|
|
Size of the session information cache. (Gauge) |
|
|
Eviction count for the session information cache. (Gauge) |
|
|
Load count for the session information cache. (Gauge) |
|
|
Load time for the session information cache, in milliseconds. (Gauge) |
|
|
Hit count for the session information cache. (Gauge) |
|
|
Miss count for the session information cache. (Gauge) |
SSO Token to JWT Exchange Metrics
Metric | Prometheus name | Description |
---|---|---|
|
|
Size of the SSO token exchange cache. (Gauge) |
|
|
Eviction count for the SSO token exchange cache. (Gauge) |
|
|
Load count for the SSO token exchange cache. (Gauge) |
|
|
Load time for the SSO token exchange, in milliseconds. (Gauge) |
|
|
Hit count for the SSO token exchange cache. (Gauge) |
|
|
Miss count for the SSO token exchange cache. (Gauge) |
Websocket Metrics
Metric | Prometheus name | Description |
---|---|---|
|
|
Number of milliseconds since anything was received over the websocket, for example a ping or a notification. (Gauge) |
|
|
Number of milliseconds since anything was sent over the websocket. (Gauge) |
|
|
Number of configuration change notifications received. Note that some may be ignored if the realm or agent name are not applicable. (DistinctCounter) |
|
|
Number of configuration change notifications processed, that were not ignored. (DistinctCounter) |
|
|
Number of policy change notifications received. Note that some may be ignored if the realm or agent name are not applicable. (DistinctCounter) |
|
|
Number of policy change notifications processed, that were not ignored. (DistinctCounter) |
|
|
Number of session logout notifications received. Note that some may be ignored if the realm or agent name are not applicable. (DistinctCounter) |
|
|
Number of session logout notifications processed, that were not ignored. (DistinctCounter) |
|
|
Ping/pong round trip time. (Timer) |