High CPU in DS 5, 5.5, 5.5.1 and OpenDJ 3.5.2, 3.5.3 when RS is writing to another RS
The purpose of this article is to provide assistance if you observe high CPU in DS/OpenDJ when one Replication Server (RS) is writing to another RS.
Archived
This article has been archived and is no longer maintained by ForgeRock.
Symptoms
You observe high CPU utilization on one or more replication servers. Restarting the DS/OpenDJ service and/or server does not resolve it.
An error similar to the following is shown in the stack trace (jstack) when this happens:
CPU = 99.9 "Replication server RS(1000) writing to Replication server RS(2000) for domain "dc=example,dc=com" at ds1.example.com/203.0.113.0:8989" #111 prio=5 os_prio=0 tid=0x0007fbaf01c71800 nid=0x617d runnable [0x00007f6f7fbf9000] java.lang.Thread.State: RUNNABLE at java.io.RandomAccessFile.length(Native Method) at java.io.RandomAccessFile.skipBytes(RandomAccessFile.java:468) at org.opends.server.replication.server.changelog.file.BlockLogReader.readNextRecord(BlockLogReader.java:311) at org.opends.server.replication.server.changelog.file.BlockLogReader.readRecord(BlockLogReader.java:244) at org.opends.server.replication.server.changelog.file.BlockLogReader.searchClosestBlockStartToKey(BlockLogReader.java:399) at org.opends.server.replication.server.changelog.file.BlockLogReader.seekToRecord(BlockLogReader.java:158) at org.opends.server.replication.server.changelog.file.LogFile$LogFileCursor.positionTo(LogFile.java:634) at org.opends.server.replication.server.changelog.file.Log$InternalLogCursor.positionTo(Log.java:1286) at org.opends.server.replication.server.changelog.file.Log$AbortableLogCursor.positionTo(Log.java:1537) at org.opends.server.replication.server.changelog.file.FileReplicaDBCursor.nextWhenCursorIsExhaustedOrNotCorrectlyPositionned(FileReplicaDBCursor.java:117) at org.opends.server.replication.server.changelog.file.FileReplicaDBCursor.next(FileReplicaDBCursor.java:111) at org.opends.server.replication.server.changelog.file.ReplicaCursor.next(ReplicaCursor.java:123) at org.opends.server.replication.server.changelog.file.CompositeDBCursor.addCursor(CompositeDBCursor.java:170) at org.opends.server.replication.server.changelog.file.CompositeDBCursor.recycleExhaustedCursors(CompositeDBCursor.java:126) at org.opends.server.replication.server.changelog.file.CompositeDBCursor.next(CompositeDBCursor.java:107) at org.opends.server.replication.server.changelog.file.DomainDBCursor.next(DomainDBCursor.java:32)You can collect a stack trace as shown in How do I collect JVM data for troubleshooting DS (All versions)?
In the changelogDb, you may notice DSID.server files for obsolete servers. Alternatively, you will have a large number of directory servers in your replication topology.
Recent Changes
Upgraded to, or installed DS 5, 5.5 or 5.5.1.
Upgraded to, or installed OpenDJ 3.5.2 or 3.5.3.
Repeatedly enabled and disabled replication (using dsreplication configure/unconfigure or dsreplication enable/disable depending on your version).
Causes
When you repeatedly enable and disable replication, old replica ID data (DSID.server) remains in the changelogDb. Due to the obsolete DSID.server data and/or the sheer number of directory servers, the changelogDb will contain a lot of data. When one RS is writing to another and iterating through the changelogDb files, this data is constantly being opened and read, which results in high CPU.
Solution
This issue can be resolved by upgrading to DS 5.5.2 or later; you can download this from BackStage.
Workaround
Please raise a ticket for assistance; the procedure to workaround this issue should only be performed under support supervision.
See Also
How do I collect data for troubleshooting high CPU utilization on DS (All versions) servers?
How do I find which thread is consuming CPU in a Java process in DS (All versions)?
How do I migrate an existing DS+RS replication topology to a DS to RS topology in DS 5.x or 6.x?
Related Training
N/A
Related Issue Tracker IDs
OPENDJ-4598 (Replication Server cursoring through obsolete replica ID's causing high CPU spin)