Solutions

High CPU in DS 5, 5.5, 5.5.1 and OpenDJ 3.5.2, 3.5.3 when RS is writing to another RS

Last updated Oct 22, 2018

The purpose of this article is to provide assistance if you observe high CPU in DS/OpenDJ when one Replication Server (RS) is writing to another RS.


Symptoms

You observe high CPU utilization on one or more replication servers. Restarting the DS/OpenDJ service and/or server does not resolve it.

An error similar to the following is shown in the stack trace (jstack) when this happens:

CPU = 99.9

"Replication server RS(1000) writing to Replication server RS(2000) for domain "dc=example,dc=com" at ds1.example.com/203.0.113.0:8989" #111 prio=5 os_prio=0 tid=0x0007fbaf01c71800 nid=0x617d runnable [0x00007f6f7fbf9000]

   java.lang.Thread.State: RUNNABLE
   at java.io.RandomAccessFile.length(Native Method)
   at java.io.RandomAccessFile.skipBytes(RandomAccessFile.java:468)
   at org.opends.server.replication.server.changelog.file.BlockLogReader.readNextRecord(BlockLogReader.java:311)
   at org.opends.server.replication.server.changelog.file.BlockLogReader.readRecord(BlockLogReader.java:244)
   at org.opends.server.replication.server.changelog.file.BlockLogReader.searchClosestBlockStartToKey(BlockLogReader.java:399)
   at org.opends.server.replication.server.changelog.file.BlockLogReader.seekToRecord(BlockLogReader.java:158)
   at org.opends.server.replication.server.changelog.file.LogFile$LogFileCursor.positionTo(LogFile.java:634)
   at org.opends.server.replication.server.changelog.file.Log$InternalLogCursor.positionTo(Log.java:1286)
   at org.opends.server.replication.server.changelog.file.Log$AbortableLogCursor.positionTo(Log.java:1537)
   at org.opends.server.replication.server.changelog.file.FileReplicaDBCursor.nextWhenCursorIsExhaustedOrNotCorrectlyPositionned(FileReplicaDBCursor.java:117)
   at org.opends.server.replication.server.changelog.file.FileReplicaDBCursor.next(FileReplicaDBCursor.java:111)
   at org.opends.server.replication.server.changelog.file.ReplicaCursor.next(ReplicaCursor.java:123)
   at org.opends.server.replication.server.changelog.file.CompositeDBCursor.addCursor(CompositeDBCursor.java:170)
   at org.opends.server.replication.server.changelog.file.CompositeDBCursor.recycleExhaustedCursors(CompositeDBCursor.java:126)
   at org.opends.server.replication.server.changelog.file.CompositeDBCursor.next(CompositeDBCursor.java:107)
   at org.opends.server.replication.server.changelog.file.DomainDBCursor.next(DomainDBCursor.java:32)

You can collect a stack trace as shown in How do I collect JVM data for troubleshooting DS/OpenDJ (All versions)?

In the changelogDb, you may notice DSID.server files for obsolete servers. Alternatively, you will have a large number of directory servers in your replication topology. 

Recent Changes

Upgraded to, or installed DS 5, 5.5 or 5.5.1.

Upgraded to, or installed OpenDJ 3.5.2 or 3.5.3.

Repeatedly enabled and disabled replication (using dsreplication configure/unconfigure or dsreplication enable/disable depending on your version).

Causes

When you repeatedly enable and disable replication, old replica ID data (DSID.server) remains in the changelogDb. Due to the obsolete DSID.server data and/or the sheer number of directory servers, the changelogDb will contain a lot of data. When one RS is writing to another and iterating through the changelogDb files, this data is constantly being opened and read, which results in high CPU.

Solution

This issue can be resolved by upgrading to DS 5.5.2 or later; you can download this from BackStage.

Workaround

Please raise a ticket for assistance; the procedure to workaround this issue should only be performed under support supervision.

See Also

How do I troubleshoot high CPU utilization on DS/OpenDJ (All versions) servers?

How do I find which thread is consuming CPU in a Java process in DS/OpenDJ (All versions)?

How do I migrate an existing DS+RS replication topology to a DS to RS topology in DS/OpenDJ (All versions)?

Troubleshooting DS/OpenDJ

Related Training

N/A

Related Issue Tracker IDs

OPENDJ-4598 (Replication Server cursoring through obsolete replica ID's causing high CPU spin)



Copyright and TrademarksCopyright © 2018 ForgeRock, all rights reserved.
Loading...