Tune performance

Tune deployments in the following steps:

Consider the issues that impact the performance of a deployment. See Define requirements and constraints.
Tune and test the downstream servers and applications:
1. Tune the downstream web container and JVM to achieve performance targets.
2. Test downstream servers and applications in a pre-production environment, under the expected load, and with common use cases.
Increase hardware resources as required, and then re-tune the deployment.

Define requirements and constraints

When you consider performance requirements, bear in mind the following points:

The capabilities and limitations of downstream services or applications on your performance goals.
The increase in response time due to the extra network hop and processing, when PingGateway is inserted as a proxy in front of a service or application.
The constraint that downstream limitations and response times place on PingGateway.

Service level objectives

A service level objective (SLO) is a target that you can measure quantitatively. Where possible, define SLOs to set out what performance your users expect. Even if your first version of an SLO consists of guesses, it is a first step towards creating a clear set of measurable goals for your performance tuning.

When you define SLOs, bear in mind that PingGateway can depend on external resources that can impact performance, such as AM’s response time for token validation, policy evaluation, and so on. Consider measuring remote interactions to take dependencies into account.

Consider defining SLOs for the following metrics of a route:

Average response time for a route.

The response time is the time to process and forward a request, and then receive, process, and forward the response from the protected application.

The average response time can range from less than a millisecond, for a low latency connection on the same network, to however long it takes your network to deliver the response.
Distribution of response times for a route.

Because applications set timeouts based on worst case scenarios, the distribution of response times can be more important than the average response time.
Peak throughput.

The maximum rate at which requests can be processed at peak times. Because applications are limited by their peak throughput, this SLO is arguably more important than an SLO for average throughput.
Average throughput.

The average rate at which requests are processed.

Metrics are returned at the monitoring endpoints. For information about monitoring endpoints, refer to Monitoring. For examples of how to set up monitoring in PingGateway, refer to Monitor services.

Available resources

With your defined SLOs, inventory the server, networks, storage, people, and other resources. Estimate whether it is possible to meet the requirements, with the resources at hand.

Benchmarks

Before you can improve the performance of your deployment, establish an accurate benchmark of its current performance. Consider creating a deployment scenario that you can control, measure, and reproduce.

For information about running Ping Identity Platform benchmark tests, refer to the ForgeOps documentation on benchmarks. Adapt the scenarios as necessary for your PingGateway deployment.

Tune PingGateway

Consider the following recommendations for improving performance, throughput, and response times. Adjust the tuning to your system workload and available resources, and then test suggestions before rolling them out into production.

Logs

Log messages in PingGateway and third-party dependencies are recorded using the Logback implementation of the Simple Logging Facade for Java (SLF4J) API. By default, logging level is INFO.

To reduce the number of log messages, consider setting the logging level to error. For information, refer to Manage logs.

Buffering message content

PingGateway creates a TemporaryStorage object to buffer content during processing. For information about this object and its default values, refer to TemporaryStorage.

Messages bigger than the buffer size are written to disk, consuming I/O resources and reducing throughput.

The default size of the buffer is 64 KB. If the number of concurrent messages in your application is generally bigger than the default, consider allocating more heap memory or changing the initial or maximum size of the buffer.

To change the values, add a TemporaryStorage object named TemporaryStorage, and use non-default values.

Caches

When caches are enabled, PingGateway can reuse cached information without making additional or repeated queries for the information. This gives the advantage of higher system performance, but the disadvantage of lower trust in results.

When caches are disabled, PingGateway must query a data store each time it needs data. This gives the disadvantage of lower system performance, and the advantage of higher trust in results.

All caches provide similar configuration properties for timeout, defining the duration to cache entries. When the timeout is lower, the cache is evicted more frequently, and consequently, the performance is lower but the trust in results is higher.

When you configure caches in PingGateway, make choices to balance your required performance with your security needs.

Learn more about PingGateway caches in Caches.

WebSocket notifications

By default, PingGateway receives WebSocket notifications from AM for the following events:

When a user logs out of AM, or when the AM session is modified, closed, or times out. PingGateway can use WebSocket notifications to evict entries from the session cache. For an example of setting up session cache eviction, refer to Session cache eviction.
When AM creates, deletes, or changes a policy decision. PingGateway can use WebSocket notifications to evict entries from the policy cache. For an example of setting up policy cache eviction, refer to Using WebSocket notifications to evict the policy cache.
When PingGateway automatically renews a WebSocket connection to AM. To configure WebSocket renewal, refer to the notifications.renewalDelay property of AmService.

If the WebSocket connection is lost, during that time the WebSocket is not connected, PingGateway behaves as follows:

Responds to session service calls with an empty SessionInfo result.

When the SingleSignOn filter recieves an empty SessionInfo call, it concludes that the user is not logged in, and triggers a login redirect.
Responds to policy evaluations with a deny policy result.

By default, PingGateway waits for five seconds before trying to re-establish the WebSocket connection. If it can’t re-establish the connection, it keeps trying every five seconds.

To disable WebSocket notifications, or change any of the parameters, configure the notifications property in AmService. For information, refer to AmService.

Tune the ClientHandler/ReverseProxyHandler

The ClientHandler/ReverseProxyHandler communicates as a client to a downstream third-party service or protected application. The performance of the communication is determined by the following parameters:

The number of available connections to the downstream service or application.
The connection timeout, which is the maximum time to connect to a server-side socket before timing out and abandoning the connection attempt.
The socket timeout, which is the maximum time a request can take before a response is received after which the request is deemed to have failed.

Configure PingGateway in conjunction with PingGateway’s first-class Vert.x configuration, and the vertx property of admin.json. For more information, refer to AdminHttpApplication (admin.json).

Vert.x options for tuning
Object	Vert.x Option	Description
PingGateway (first-class)	`gatewayUnits`	The number of deployed Vert.x Verticles. This setting is effectively the number of cores that PingGateway operates across, or in other words, the number of available threads. Each instance operates on the same port on its own event-loop thread. Default: Number of available cores. (This is the optimal value.)
root.vertx	`eventLoopPoolSize`	The size of the pool available to service Verticles for event-loop threads. To guarantee that a single thread handles all I/O events for a single request or response, PingGateway deploys a Verticle onto each event loop. Configure `eventLoopPoolSize` to be greater than or equal to `gatewayUnits`. Default: 2 * number of available cores. For more information, refer to Reactor and Multi-Reactor.
root.connectors.<connector>.vertx	`acceptBacklog`	The maximum number of connections to queue before refusing requests. Default: 1024
	`sendBufferSize`	The TCP connection send buffer size. Set this property according to the available RAM and required number of concurrent connections. Default: Use the Java TCP send buffer size default settings that Java inherits from the operating system.
	`receiveBufferSize`	The TCP receive buffer size. Set this property according to the available RAM and required number of concurrent connections. Default: Use the Java TCP receive buffer size default settings that Java inherits from the operating system.
	`maxHeaderSize`	Set this property when HTTP headers manage large values (such as JWT). Default: 8 KB (8,192 bytes)

Vert.x options for troubleshooting performance
Object	Vert.x Option	Description
root.vertx	`blockedThreadCheckInterval` and `blockedThreadCheckIntervalUnit`	The interval at which Vert.x checks for blocked threads and logs a warning. Default: 1 second
	`maxEventLoopExecuteTime` and `maxEventLoopExecuteTimeUnit`	The maximum execution time before Vert.x logs a warning. Default: 2 seconds
	`warningExceptionTime` and `warningExceptionTimeUnit`	The threshold at which warning logs are accompanied by a stack trace to identify cause. Default: 5 seconds
	`logActivity`	Whether to log network activity. Default: `false`

Set the maximum number of file descriptors and processes per user

Each PingGateway instance in your environment should have access to at least 65,536 file descriptors to handle multiple client connections.

Ensure that every PingGateway instance is allocated enough file descriptors. For example, use the ulimit -n command to check the limits for a particular user:

$ su - iguser
$ ulimit -n

It may also be necessary to increase the number of processes available to the user running the PingGateway processes.

For example, use the ulimit -u command to check the process limits for a user:

$ su - iguser
$ ulimit -u

Before increasing the file descriptors for the PingGateway instance, ensure that the total amount of file descriptors configured for the operating system is higher than 65,536.

If the PingGateway instance uses all of the file descriptors, the operating system will run out of file descriptors. This may prevent other services from working, including those required for logging in the system.

Refer to your operating system’s documentation for instructions on how to display and increase the file descriptors or process limits for the operating system and for a given user.

Tune PingGateway’s JVM

Start tuning the JVM with default values, and monitor the execution, paying particular attention to memory consumption, and GC collection time and frequency. Incrementally adjust the configuration, and retest to find the best settings for memory and garbage collection.

Make sure there is enough memory to accommodate the peak number of required connections, and make sure timeouts in PingGateway and its container support latency in downstream servers and applications.

PingGateway makes low memory demands, and consumes mostly YoungGen memory. However, using caches, or proxying large resources, increases the consumption of OldGen memory. For information about how to optimize JVM memory, refer to the Oracle documentation.

Consider these points when choosing a JVM:

Find out which version of the JVM is available. More recent JVMs usually contain performance improvements, especially for garbage collection.
Choose a 64-bit JVM if you need to maximize available memory.

Consider these points when choosing a GC:

Test GCs in realistic scenarios, and load them into a pre-production environment.
Choose a GC that is adapted to your requirements and limitations. Consider comparing the Garbage-First Collector (G1) and Parallel GC in typical business use cases.

The G1 is targeted for multi-processor environments with large memories. It provides good overall performance without the need for additional options. The G1 is designed to reduce garbage collection, through low-GC latency. It is largely self-tuning, with an adaptive optimization algorithm.

The Parallel GC aims to improve garbage collection by following a high-throughput strategy, but it requires more full garbage collections.

Learn more in Best practice for JVM Tuning with G1 GC.

PingGateway 2024.6