Solutions

High CPU and unresponsive servers when DS (All versions) is running behind an F5 Load Balancer

Last updated Aug 10, 2020

The purpose of this article is to provide assistance if you encounter high CPU or notice that DS is consuming increasing CPU and/or memory until the server becomes unresponsive. This issue only occurs when DS is behind a F5® load balancer.


Symptoms

You will notice one or more of the following symptoms:

  • High CPU and/or memory usage.
  • Unresponsive servers.
  • An increasing number of connections.
  • Running out of Java heap space.
  • A restart restores service for a short period only.

DS logs and commands

You might see errors similar to the following when this happens:

[07/Jul/2020:14:14:56 +0200] category=SYNC severity=INFORMATION msgID=105 msg=Replication server accepted a connection from ds1.example.com/203.0.113.0:9989 to local address 203.0.113.10:8989 but the SSL handshake failed. This is probably benign, but may indicate a transient network outage or a misconfigured client application connecting to this replication server. The error was: Remote host closed connection during handshake

[07/Jul/2020:14:15:07 +0200] category=SYNC severity=WARNING msgID=97 msg=Directory server DS(1234) is closing its connection to replication server RS(5678) at ds1.example.com/203.0.113.0:9989 for domain "dc=example,dc=com" because it could not detect a heart beat

...

[07/Jul/2020:14:15:19 +0200] category=SYNC severity=ERROR msgID=178 msg=Directory server 1234 was attempting to connect to replication server 5678 but has disconnected in handshake phase. Error: SocketTimeoutException(Read timed out)

If you try to run a command against the server, for example, the status command, you will see a response similar to the following:

Connect Error: The connection attempt to server
ds1.example.com/203.0.113.0:4444 has failed because the connection
timeout period of 30000 ms was exceeded

Heap dumps

If you capture a heap dump, you will see that a grizzly.Buffer object is consuming nearly all the memory, for example:

One instance of "org.glassfish.grizzly.memory.BuffersBuffer" loaded by "sun.misc.Launcher$AppClassLoader @ 0x70f9bd440" occupies 2,914,703,452 (98.59%) bytes. The memory is accumulated in one instance of "org.glassfish.grizzly.Buffer[]" loaded by "sun.misc.Launcher$AppClassLoader @ 0x70f9bd440".

You can capture a heap dump as described in: How do I collect JVM data for troubleshooting DS/OpenDJ (All versions)? or alternatively, this is captured when you run the Support Extract (How do I use the Support Extract tool in DS/OpenDJ (All versions) to capture troubleshooting data?).

Recent Changes

Enabled the F5 OneConnect feature.

Causes

The F5 OneConnect feature is designed to optimize HTTP/HTTPS traffic. When it is enabled for other protocols with long-lived connections such as LDAP, you will see unexpected behavior and performance issues such as the symptoms listed above.

Solution

This issue can be resolved by switching off the F5 OneConnect feature. See Configuration Guide › On Load Balancers for further recommendations on using a load balancer.

Note

There are other external factors that can result in similar symptoms, such as: 

Additionally, there is a known issue related to the grizzly.Buffer object: OPENDJ-6681 (Build up of Grizzly TCPNIOConnection objects lead to a FilterChain Exception), which is fixed in DS 6.5.3 and later.

See Also

How do I enable Garbage Collector (GC) Logging for DS/OpenDJ (All versions)?

AskF5 - Overview of the OneConnect profile

Related Training

N/A

Related Issue Tracker IDs

N/A



Copyright and TrademarksCopyright © 2020 ForgeRock, all rights reserved.
Loading...