Troubleshooting DS/OpenDJ

This book provides information on troubleshooting various issues in DS/OpenDJ, including collecting useful troubleshooting information such as heap dumps and stack traces.


Sending troubleshooting data to ForgeRock Support for analysis

The purpose of this article is to provide information on sending troubleshooting and diagnostic data that ForgeRock Support has requested for analysis using ticket attachments.

Sending data for analysis

You can attach troubleshooting and diagnostic data to tickets in BackStage. When using BackStage to access tickets, there are no limits to the attachment size; this is the recommended method of sending us troubleshooting and diagnostic data.

See Uploading Files for further information on uploading files to BackStage.

See Also

Best practice for raising a ticket with ForgeRock Support

Getting started with ForgeRock Support

AM/OpenAM

How do I collect all the data required for troubleshooting AM/OpenAM and Policy Agents (All versions)?

How do I record troubleshooting information in AM (All versions) and OpenAM 13.x? 

How do I create a HAR file for troubleshooting AM/OpenAM (All versions)?

How do I collect JVM data for troubleshooting AM/OpenAM (All versions)?

Troubleshooting AM/OpenAM and Policy Agents 

DS/OpenDJ

How do I use the Support Extract tool in DS/OpenDJ (All versions) to capture troubleshooting information?

How do I collect JVM data for troubleshooting DS/OpenDJ (All versions)?

Troubleshooting DS/OpenDJ

IDM/OpenIDM

How do I create a HAR file for troubleshooting IDM/OpenIDM (All versions)?

How do I collect JVM data for troubleshooting IDM/OpenIDM (All versions)?

Troubleshooting IDM/OpenIDM 

IG/OpenIG

How do I collect debug logs in IG/OpenIG (All versions) for troubleshooting?

How do I collect JVM data for troubleshooting IG/OpenIG (All versions)?

Troubleshooting IG/OpenIG 


How do I troubleshoot replication issues in DS/OpenDJ (All versions)?

The purpose of this article is to provide assistance for troubleshooting replication issues in DS/OpenDJ. It also provides other useful information about replication, including: background information, regular tasks you should perform to ensure replication is behaving as expected (to avoid future issues) and the recommended ways to recover and stop replication.

Warning

Do not compress, tamper with, or otherwise alter changelog database files directly unless specifically instructed to do so by a qualified ForgeRock technical support engineer. External changes to changelog database files can render them unusable by the server. By default, changelog database files are located under the /path/to/ds/changelogDb directory.

Overview

This article provides background information on replication and how it works, and also on how to monitor and troubleshoot replication issues. The following topics are covered:

Background information

DS/OpenDJ uses a DS and RS model for replication.

  • A DS is a Directory Server. DSs contain the backend databases and answer client requests.
  • A RS is a Replication Server. RSs contain a changelog and handle replication traffic with DSs and with other RSs; receiving, sending and storing only changes to directory data rather than directory data itself. A DS connects to an RS for replication purposes.

When installed without replication enabled, the DS/OpenDJ instance is a DS by default. If you enable replication, the instance spins up RS threads within the process. The instance then becomes a DS+RS.

a DS/OpenDJ replication topology can consist of the following types of instances.

  • DS+RS
  • Standalone DS
  • Standalone RS

With this in mind, each DS and each RS have a unique ID assigned to them. The DS ID is used for keeping track of changes to the system and is included in the CSN. See Using the CSN to troubleshoot data consistency for further information on decoding the CSN and identifying the DS ID.

See Administration Guide › Understanding Directory Services › About Replication and Administration Guide › Managing Data Replication › Replication Per Suffix for further information on replication.

Note

You should be cautious about changing the hostname as this affects replication. If you need to change it, follow the procedure given in How do I change the hostname for DS/OpenDJ (All versions)? to ensure replication is correctly handled. Additionally, you should be consistent with your use of either FQDNs or IP addresses for hostnames as noted in Administration Guide › Managing Data Replication › Configuring Replication.

changelogDb

The changelog stores replication changes in the replication changes database (changelogDb directory). These changes are purged from the changelog according to the replication purge delay setting. You must ensure this is set appropriately to keep data long enough for replication and data recovery purposes; these changes are permanently lost once they are purged from the changelog. See How do I control how long replication changes are retained in DS/OpenDJ (All versions)? and FAQ: Backup and restore in DS/OpenDJ (Q. When does the replication purge take place?) for further information.

Change Sequence Number (CSN)

The CSN is used to track changes via replication and the changelog. The CSN is an encoded value that represents the date and time, the DS's server ID and the change number for that timestamp. See Using the CSN to troubleshoot data consistency for further information on decoding the CSN.

Generation ID

The generation ID is a checksum of attributes from some of the entries and is used during replication to check that the suffix being updated is the same as the one offering the updates.

Checking the status of replication

It is very important to check the status of replication on a regular basis so that you can be confident that all changes are being replayed successfully; in particular it can be helpful to check if you notice that replication changes are slower than expected.

You can use the dsreplication status command to give you an overall view of the replication topology. The output shows you information on a per server/suffix basis, including how many entries each server has as well as any missing changes. For example:

$ ./dsreplication status --adminUID admin --adminPassword password --hostname ds1.example.com --port 4444 --trustAll

Suffix DN         : Server               : Entries : Replication enabled : DS ID : RS ID : RS Port (1) : M.C. (2) : A.O.M.C. (3) : Security (4)
------------------:----------------------:---------:---------------------:-------:-------:-------------:----------:--------------:-------------
dc=example,dc=com : ds1.example.com:4444 : 2002    : true                : 10000 : 11500 : 8989        : 0        :              : true
dc=example,dc=com : ds2.example.com:5444 : 2002    : true                : 14057 : 12210 : 9989        : 0        :              : true

Compare the Entries values across the servers to ensure they match and check there are no missing changes (M.C.). If you see discrepancies, you should run dsreplication status a few times in short succession to see if the server catches up. If Entries values continue to be different and/or you still have missing changes, you know replication is out of sync.

The M.C. metric is deprecated in DS 6; you should monitor replication delay instead. See Administration Guide › Monitoring Replication Delay Over LDAP and Administration Guide › Monitoring Replication Delay Over HTTP for further information.

Note

It is possible that you will sometimes see missing changes but as long as they do go away (that is, the count returns to 0) it is normal. dsreplication status searches each server's backend for the information it uses to calculate entries / missing entries. This is like trying to hit a moving target and can sometimes display missing changes when there aren't any. This is why it is important to monitor replication on a regular basis. There is also a known issue in OpenDJ 3.x, which mistakenly reports missing changes: OPENDJ-3133 (dsreplication status reports M.C. (Missing Changes) when none exist.)

See Reference › Tools Reference › dsreplication for further information.

Monitoring replication

It is very important to monitor replication on a regular basis; you should also ensure your systems are not down for longer than the period set in your purge delay. If a server is down longer than the purge delay and the entry count has not changed, any LDAP MOD operations that took place during this time will not be seen as Missing Changes in the M.C. column of the dsreplication status output. Any ADD or DEL operations during this time will show a difference in the number of entries.  For example, if the purge delay is one day and Master 2 is down for two days, any modifications made to Master 1 will be purged after one day and never seen in the M.C. column in the replication status.

You can do a ldapsearch against baseDN "cn=Replication,cn=monitor" to monitor replication. You can either return all attributes or request specific ones as required. For example, you may use a command such as this, which includes some useful attributes to check for replication purposes:

$ ./ldapsearch --port 1389 --bindDN "cn=Directory Manager" --bindPassword password --baseDN "cn=Replication,cn=monitor" --searchScope sub "(&(objectClass=*)(domain-name=dc=example,dc=com))" \* + last-status-change-date last-change lost-connections received-updates sent-updates replayed-updates pending-updates replayed-updates-ok resolved-modify-conflicts resolved-naming-conflicts unresolved-naming-conflicts missing-changes approximate-delay

Example of a partial response (you can see the full response in this file: ReplicationMonitoringResponse.txt):

dn: cn=Directory server DS(10000) ds1.example.com:54209,cn=dc_example_dc_com,cn=Replication,cn=monitor
objectClass: top
objectClass: ds-monitor-entry
objectClass: extensibleObject
domain-name: dc=example,dc=com
server-id: 10000
connected-to: ds1.example.com/198.51.100.0:8989
lost-connections: 0
received-updates: 0
sent-updates: 95
replayed-updates: 0
max-rcv-window: 100000
current-rcv-window: 100000
max-send-window: 100000
current-send-window: 99905
server-state: 0000015d13a119d23b470000005f Wed Jul 05 09:41:51 PDT 2017 1499272911314
ssl-encryption: false
generation-id: 19363628
pending-updates: 0
replayed-updates-ok: 0
resolved-modify-conflicts: 0
resolved-naming-conflicts: 0
unresolved-naming-conflicts: 0
remote-pending-changes-size: 0
dependent-changes-size: 0
changes-in-progress-size: 0
assured-sr-sent-updates: 0
assured-sr-acknowledged-updates: 0
assured-sr-not-acknowledged-updates: 0
assured-sr-timeout-updates: 0
assured-sr-wrong-status-updates: 0
assured-sr-replay-error-updates: 0
assured-sr-received-updates: 0
assured-sr-received-updates-acked: 0
assured-sr-received-updates-not-acked: 0
assured-sd-sent-updates: 0
assured-sd-acknowledged-updates: 0
assured-sd-timeout-updates: 0
last-status-change-date: 20170705163523.699Z
status: Normal
cn: Directory server DS(10000) ds1.example.com:54209
etag: 00000000345e8a3d
pwdPolicySubentry: cn=Default Password Policy,cn=Password Policies,cn=config
structuralObjectClass: dd-monitor-entry
subschemaSubentry: cn=schema
entryDN: cn=Directory server DS(10000) ds1.example.com:54209,cn=dc_example_dc_com,cn=Replication,cn=monitor
entryUUID: 39ed7ebc-6c79-38a7-ae4d-c84819eab121
hasSubordinates: false
numSubordinates: 0

Meanings of key attributes, such as the ones included in the above example:

Attribute Meaning
last-status-change-date The date of the last status change. 
last-change The CSN of the last change and associated timestamp.
lost-connections The number of times connection has been lost between DSs and RSs. This value should roughly equate to the number of times you have stopped replication. If it is much greater, you should investigate to find out what is causing these connection losses.
received-updates The number of replicated changes received by this server. This value should match replayed-updates.
sent-updates

The number of changes that have been sent by this server. You should see these being received and applied to other servers via the received-updates / replayed-updates attribute values.

This value, in conjunction with the received-updates / replayed-updates values on the other servers, indicates how well replication is working.

replayed-updates The number of replicated changes that have been applied to this server. This value should match received-updates.
pending-updates The number of replicated changes waiting to be applied to this server.
replayed-updates-ok The number of replicated changes that have been successfully applied to this server. This value should match received-updates.
resolved-modify-conflicts The number of modify conflicts that have been resolved since the server was last started. Modify conflicts are always resolved automatically.
resolved-naming-conflicts The number of naming conflicts that have been resolved since the server was last started. This value includes both automatically and manually resolved conflicts. 
unresolved-naming-conflicts

The number of unresolved naming conflicts since the server was last started. This value should equal 0, otherwise it means there are naming conflicts that you need to identify and resolve manually.

Naming conflicts are identified by an additional entryuuid RDN in the DN as demonstrated in the Identifying replication issues from the DS/OpenDJ log files section.

This value does not decrease once a conflict has been manually resolved. An RFE exists for this: OPENDJ-251 (Provide count of unresolved replication naming conflicts as part of the Monitoring information)

missing-changes The number of changes that have been sent by the other server but have not yet been applied to this server. This value should equal 0.
approximate-delay The difference between the RS current time and the timestamp of the oldest update that has not yet been sent to the DS; this indicates replication latency. This value should equal 0.
numSubordinates The number of entries below the baseDN.
Note

Some of these attributes (for example, missing-changes) can indicate issues with replication; however, there can be an innocent reason for these discrepancies since replication is not real time and replicated changes can be delayed because of things such as network speed or latency, CPU usage and overall load on the individual instances. That is why it is important to monitor replication on a regular basis to understand what is normal for your environment.

Using the CSN to troubleshoot data consistency

Some attributes, such as ds-sync-state and ds-sync-hist give the CSN so you can determine the replication state. The CSN consists of the following information:

  • The first 16 digits are a timestamp.
  • The next 4 digits make up the replica-id (DS ID).
  • The last 8 digits are the sequence numbers that identify the change.

For example, the ds-sync-state value of 00000155a8bd5aa3271000000002.

Decoding this number using the decodecsn tool gives the following output:

CSN 00000155a8bd5aa3271000000002
 -> ts=00000155a8bd5aa3 (1467414829731) Fri Jul  1 2016 17:13:49.731
 id=2710 (10000)
 no=00000002 (2)

This gives you the timestamp of the change, the ID of the DS server where the change was made (10000) and the sequence number. Comparing this DS ID to the information output from dsreplication status identifies this DS as Master 1.

Innocent replication example

The following ldapsearch example shows an innocent replication issue where there are multiple ds-sync-state attributes:

$ ./ldapsearch --port 51389 --bindDN "cn=Directory Manager" --bindPassword password --searchScope base --baseDN dc=openam,dc=forgerock,dc=org "(objectClass=*)" \* ds-sync-state +
    
dn: dc=openam,dc=forgerock,dc=org
objectClass: top
objectClass: domain
dc: openam
ds-sync-state: 0000015d190d9ab64d710000003a
ds-sync-state: 0000015d18b72dd8550b0000000a

The presence of multiple ds-sync-state attributes indicates that you have (or have had) many different replicas in the lifetime of your replicated directory service.

ds-sync-hist example

The following ldapsearch example shows that the attributes sn, cn, postalAddress, and givenname were changed to Doe, John Doe, John Doe$01251 Chestnut Street$Panama City, DE  50369 and John respectively:

$ ./ldapsearch --port 1389 --bindDN "cn=Directory Manager" --bindPassword password --searchScope sub --baseDN dc=example,dc=com "(uid=user.0)" \* ds-sync-hist +

dn: uid=user.0,ou=People,dc=example,dc=com
objectClass: top
objectClass: inetOrgPerson
objectClass: organizationalPerson
objectClass: person
mail: user.0@maildomain.net
initials: ASA
homePhone: +1 225 216 5900
pager: +1 779 041 6341
givenName: John
employeeNumber: 0
telephoneNumber: +1 685 622 6202
mobile: +1 010 154 3228
sn: Doe
cn: John Doe
userPassword: {SSHA}f+6nCXygJSBwS9G3VDAOXNDRvI+YXI3CYswvug==
description: This is the description for Aaccf Amar.
street: 01251 Chestnut Street
st: DE
postalAddress: Aaccf Amar$01251 Chestnut Street$Panama City, DE  50369
uid: user.0
l: Panama City
postalCode: 50369
ds-sync-hist: sn:0000015d13a119d23b470000005f:repl:Doe
ds-sync-hist: cn:0000015d13a119d23b470000005f:repl:John Doe
ds-sync-hist: postaladdress:0000015d13a119d23b470000005f:repl:John Doe$01251 Chestnut Street$Panama City, DE  50369
ds-sync-hist: modifiersName:0000015d13a119d23b470000005f:repl:cn=Directory Manager,cn=Root DNs,cn=config
ds-sync-hist: modifyTimestamp:0000015d13a119d23b470000005f:repl:20170705164151Z
ds-sync-hist: givenname:0000015d13a119d23b470000005f:repl:John
modifyTimestamp: 20170705164151Z
modifiersName: cn=Directory Manager,cn=Root DNs,cn=config
entryUUID: 0d3ce3bf-4107-3b34-9e5a-fa71deb8b504
pwdPolicySubentry: cn=Default Password Policy,cn=Password Policies,cn=config
subschemaSubentry: cn=schema
hasSubordinates: false
numSubordinates: 0
etag: 000000001d86d123
structuralObjectClass: inetOrgPerson
entryDN: uid=user.0,ou=People,dc=example,dc=com

Data consistency example

Remembering what a CSN (Change Sequence Number) is, each CSN represents a single change, be it an ADD, DELETE or MODIFY. These CSNs therefore represent a data element's (change) consistency ID. Since dsreplication checks for status based on the deltas of these CSNs, we can extrapolate that if all servers have all changes/CSNs, then the data can be deemed consistent between the instances. If an instance is missing a change, then it can be assumed there is a divergence in the consistency of each entry on the database.

The following worked example demonstrates using the CSN to determine data consistency:

  1. Let's take a two master replication topology with 2000 entries. Since the following dsreplication status is taken just after instance setup, data creation and initialization, the data is known to be consistent; at this point, the backends are the same and there are no changes yet, that is, the changelogDb has 0 changes:
    $ ./dsreplication status --adminUID admin --adminPassword password --hostname ds1.example.com --port 4444 --trustAll
    
    Suffix DN         : Server               : Entries : Replication enabled : DS ID : RS ID : RS Port (1) : M.C. (2) : A.O.M.C. (3) : Security (4)
    ------------------:----------------------:---------:---------------------:-------:-------:-------------:----------:--------------:-------------
    dc=example,dc=com : ds1.example.com:4444 : 2000    : true                : 14409 : 5070  : 8989        : 0        :              : true
    dc=example,dc=com : ds1.example.com:5444 : 2000    : true                : 26696 : 3946  : 9989        : 0        :              : true
    
  2. Make a change in the form of an ADD (adding John Doe). The CSN for this change is 00000155c6188be0384900000001 as seen in the changelog (both masters' changelogs now have the same change - an ADD):
    dn: changeNumber=1,cn=changelog
    objectClass: top
    objectClass: changeLogEntry
    changeNumber: 1
    changeTime: 20160707160225Z
    changeType: add
    targetDN: uid=jdoe,ou=People,dc=example,dc=com
    changes:: b2JqZWN0Q2xhc3M6IG9yZ2FuaXphdGlvbmFsUGVyc29uCm9iamVjdENsYXNzOiB0b3AKb2JqZWN0Q2xhc3M6IHBlcnNvbgpvYmplY3RDbGFzczogaW5ldE9yZ1BlcnNvbgp1aWQ6IGpkb2UKZ2l2ZW5OYW1lOiBKb2huCnNuOiBEb2UKY246IEpvaG4gRG9lCnVzZXJQYXNzd29yZDoge1NTSEF9WmJTcnJDL05BMHEwUFBzQmRPaVdRZTRaV3FQTDQ5Nll2RmR2NVE9PQplbnRyeVVVSUQ6IGY0MTRmZmVkLTVlZDAtNDUzNy1iMDU5LTU5YzUyMTc5MmNkMApjcmVhdGVUaW1lc3RhbXA6IDIwMTYwNzA3MTYwMjI1Wgpwd2RDaGFuZ2VkVGltZTogMjAxNjA3MDcxNjAyMjUuMzc2WgpjcmVhdG9yc05hbWU6IGNuPURpcmVjdG9yeSBNYW5hZ2VyLGNuPVJvb3QgRE5zLGNuPWNvbmZpZw==
    subschemaSubentry: cn=schema
    numSubordinates: 0
    hasSubordinates: false
    entryDN: changeNumber=1,cn=changelog
    replicationCSN: 00000155c6188be0384900000001
    replicaIdentifier: 14409
    changeInitiatorsName: cn=Directory Manager,cn=Root DNs,cn=config
    targetEntryUUID: f414ffed-5ed0-4537-b059-59c521792cd0
    changeLogCookie: dc=example,dc=com:00000155c6188be0384900000001;
    
  3. Check the replication status again:
    $ ./dsreplication status --adminUID admin --adminPassword password --hostname ds1.example.com --port 4444 --trustAll
    
    Suffix DN         : Server               : Entries : Replication enabled : DS ID : RS ID : RS Port (1) : M.C. (2) : A.O.M.C. (3) : Security (4)
    ------------------:----------------------:---------:---------------------:-------:-------:-------------:----------:--------------:-------------
    dc=example,dc=com : ds1.example.com:4444 : 2001    : true                : 14409 : 5070  : 8989        : 0        :              : true
    dc=example,dc=com : ds1.example.com:5444 : 2001    : true                : 26696 : 3946  : 9989        : 0        :              : true
    
    Since the dsreplication command uses the CSN as its basis for replication and data consistency, we now have proof that:
    • The data element representing the ADD is on both servers.
    • Because the data element is on both servers, we know based on the matching "entry count" from the replication status, that the entries in the backend are consistent.
    If the CSN (change) was not played to the other server, the servers' CSNs would not match and therefore the data would not be consistent. This would be seen as a difference in the entry count displayed by dsreplication status.
  4. Make a simple MODIFY to John's entry (add a description). The CSN for this change is 00000155c619858e384900000002.
    dn: changeNumber=2,cn=changelog
    objectClass: top
    objectClass: changeLogEntry
    changeNumber: 2
    changeTime: 20160707160329Z
    changeType: modify
    targetDN: uid=jdoe,ou=People,dc=example,dc=com
    changes:: YWRkOiBkZXNjcmlwdGlvbgpkZXNjcmlwdGlvbjogVGhpcyBpcyBKb2huJ3MgRGVzY3JpcHRpb24KLQpyZXBsYWNlOiBtb2RpZmllcnNOYW1lCm1vZGlmaWVyc05hbWU6IGNuPURpcmVjdG9yeSBNYW5hZ2VyLGNuPVJvb3QgRE5zLGNuPWNvbmZpZwotCnJlcGxhY2U6IG1vZGlmeVRpbWVzdGFtcAptb2RpZnlUaW1lc3RhbXA6IDIwMTYwNzA3MTYwMzI5Wgot
    subschemaSubentry: cn=schema
    numSubordinates: 0
    hasSubordinates: false
    entryDN: changeNumber=2,cn=changelog
    replicationCSN: 00000155c619858e384900000002
    replicaIdentifier: 14409
    changeInitiatorsName: cn=Directory Manager,cn=Root DNs,cn=config
    targetEntryUUID: f414ffed-5ed0-4537-b059-59c521792cd0
    changeLogCookie: dc=example,dc=com:00000155c619858e384900000002;
    
  5. Check the replication status again and observe there are no Missing Changes:
    $ ./dsreplication status --adminUID admin --adminPassword password --hostname ds1.example.com --port 4444 --trustAll
    Thu Jul  7 10:26:12 MDT 2016
    Suffix DN         : Server               : Entries : Replication enabled : DS ID : RS ID : RS Port (1) : M.C. (2) : A.O.M.C. (3) : Security (4)
    ------------------:----------------------:---------:---------------------:-------:-------:-------------:----------:--------------:-------------
    dc=example,dc=com : ds1.example.com:4444 : 2001    : true                : 14409 : 5070  : 8989        : 0        :              : true
    dc=example,dc=com : ds1.example.com:5444 : 2001    : true                : 26696 : 3946  : 9989        : 0        :              : true
    
    Knowing the same criteria from above, all CSNs have been played from Master 1 to Master 2 and therefore again, we have proof that the data is consistent between Master 1 and Master 2.  

Identifying replication issues from the DS/OpenDJ log files

The following table shows error messages you may see in your logs along with what they mean and possible resolutions:

Error Meaning / resolution
dn="entryuuid=bfbbd0fd-53ba-451f-93a1-2f446f4de18+uid=user1,dc=example,dc=com"

This entry indicates a naming conflict (that is, you have an additional entryuuid RDN in the DN), which can happen when changes are applied to two servers at the same time meaning replication then creates a duplicate entry.

It can also occur as a result of the following known issue: OPENDJ-3343 (Invalid Conflict resolution on Add sequence when Parent & Child are added on different replica).

This can be resolved by locating the duplicate entries on both servers and then deleting the entry that was modified first. DS/OpenDJ understands that you are deleting a conflicting entry and cleans up after itself.

category=SYNC severity=MILD_ERROR msgID=14876739 msg=Could not replay operation AddOperation(connID=-1, opID=72, dn=uid=user1,ou=People,dc=example,dc=com) with ChangeNumber 00000148a9d35134620200000002 error Unwilling to Perform There is not enough space on the disk for the database to perform the write operation"

The server has low disk space, which prevents write operations from taking place.

This can be resolved by increasing or freeing up disk space. Once space is available, replication will resume with the next set of changes, although you will be missing the changes indicated in this error as they will have been skipped.

You should avoid running out of disk space by utilizing the disk space monitoring tools detailed in Administration Guide › Managing Directory Data › Setting Disk Space Thresholds For Database Backends.

category=SYNC severity=SEVERE_WARNING msgID=14811232 msg=Directory server DS(42134) has connected to replication server RS(42706) for domain "cn=admin data" at ds1.example.com/198.51.100.0:8080, but the generation IDs do not match, indicating that a full re-initialization is required. The local (DS) generation ID is 36215293 and the remote (RS) generation ID is 172193
The generation ID contained in the restored data is not the same as the one in the current replication topology. See Generation IDs do not match error after restoring a DS/OpenDJ (All versions) replica for further information on resolving this.
category=SYNC severity=INFORMATION msgID=14680180 msg=Late monitor data received for domain "cn=schema" from replication server RS(1797), and will be ignored
category=SYNC severity=SEVERE_WARNING msgID=14811242 msg=Timed out waiting for monitor data for the domain "dc=example,dc=com" from replication server RS(1797)

The other server is not responding quickly enough with monitoring information or there is a heavy load on the server.

This can be caused by an underlying network issue but is typically related to high levels of Garbage Collection (GC) and can be resolved by tuning your JVM. See Best practice for JVM Tuning for further information.

category=SYNC severity=SEVERE_ERROR msgID=14841194 msg=Replication server caught exception while listening for client connections Read timed out
This entry indicates network issues or there is a heavy load on the server.
category=SYNC severity=NOTICE msgID=14837116 msg=New replication connection from 198.51.100.0 started with unexpected message StopMsg and is being closed
The RS exits silently in a misconfigured network. This is a known issue, which is fixed in DS 5: OPENDJ-3309 (Replication server connection listener thread exits silently).
category=SYNC severity=ERROR msgID=org.opends.messages.replication.178 msg=Directory server 8764 was attempting to connect to replication server 1245 but has disconnected in handshake phase
category=SYNC severity=SEVERE_ERROR msgID=14942387 msg=Replication server 1797 was attempting to connect to replication server ds1.example.com/198.51.100.0:8989 but has disconnected in handshake phase

The DS cannot connect to the RS. See Directory server 1 was attempting to connect to replication server 2 but has disconnected in handshake phase error in DS 5 and OpenDJ 2.6.x, 3.0, 3.5, 3.5.1 for further information on resolving this.

The underlying issue is: OPENDJ-1135 (DS sometimes fail to connect to RS after server restart), which is fixed in OpenDJ 3.5.2.

The addition of the IP address in this message can suggest that the hostname cannot be resolved to the IP address and, in many cases, is trying to connect to itself. DS/OpenDJ can be a DS+RS so depending upon your topology, the DS is connecting to the RS if it's on the same instance/system. If you believe the issue is related to the IP address, you can resolve this by ensuring each server can connect to each other's replication port via the hostname used and also that the hostnames resolve to IP addresses physically present on the servers. You should also ensure that you consistently use either FQDNs or IP addresses for hostnames.

category=SYNC severity=ERROR msgID=org.opends.messages.replication.11 msg=The replication server failed to start because the database /path/to/ds/changelogDb could not be read : Could not get or create replica DB for baseDN 'dc=example,dc=com', serverId '10566', generationId '72390'

The server fails to compute the newest record after being restarted. See Replication server fails to start after starting an OpenDJ 3 instance for further information on resolving this.

The underlying issue is:  OPENDJ-2969 (changelogDb could not be read on OpenDJ instance startup), which is fixed in OpenDJ 3.5.

Identifying replication issues in the embedded DS/OpenDJ

The following error in the AM/OpenAM Configuration log indicates replication is failing in the embedded DS/OpenDJ:

ERROR: EmbeddedOpenDS:syncReplication:cmd failed

OpenAM 13 and 12.x

In OpenAM 13 and earlier, you must ensure the hostname and port details used for the embedded DS/OpenDJ during install exactly match the details you use when re-enabling replication (this includes the case used for the hostname) else replication will fail if you make any subsequent changes to the embedded DS/OpenDJ configuration. See OPENAM-8254 (Uppercase characters in server URL hostnames break embedded replication) and OPENAM-11263 (Improve Embedded DJ logging to pick up replication configuration errors.) for further information. A mismatch in these connection details when re-enabling replication will trigger AM/OpenAM to run its clean up process, which removes the replication servers on startup. At this point, replication threads are shutdown and replication fails.

Recovering replication

You can quickly recover replication by restoring a backup or you can initialize from a known good node. However, you should ensure you are not restoring a backup that has a corrupted database as this will restore the corrupted database as well. See the following articles for further information on recovering replication in different situations:

Stopping replication

Warning

Do not allow modifications on the DS server when replication is disabled, as no record of such changes is kept, and the changes cause replication to diverge.

The way in which you stop replication varies depending on which version you are using as follows:

DS 5 and later

The following dsreplication commands are used to stop replication; you should ensure you use the correct command for your use case as follows:

  • dsreplication suspend: this command is used to temporarily stop replication, for example, to do maintenance etc.
  • dsreplication unconfigure: this command is used to permanently stop replication and completely removes the replication configuration information from the server.
  • dsreplication unconfigure -- unconfigureAll: this command is used to fully remove the local server's replication configuration from itself and all other servers in the topology.

Using the wrong command can cause issues with replication.

See Reference › Tools Reference › dsreplication and Administration Guide › Managing Data Replication › Stopping Replication for further information and examples.

OpenDJ 3.5.x and earlier

The dsconfig and dsreplication disable commands are used to stop replication, but you should ensure you use the correct command for your use case as follows:

  • dsconfig set-synchronization-provider-prop: this command is used to temporarily stop replication, for example, to do maintenance etc.
  • dsreplication disable --disableAll: this command is used to fully remove the local server's replication configuration from itself and all other servers in the topology.

Using the wrong command can cause issues with replication.

See OpenDJ Administration Guide > Stopping Replication for further information and examples.

See Also

FAQ: Monitoring DS/OpenDJ

How do I use cn=monitor entry in DS/OpenDJ (All versions) for monitoring?

FAQ: Backup and restore in DS/OpenDJ

Replication in DS/OpenDJ

Administration Guide › Understanding Directory Services › About Replication 

Administration Guide › Managing Data Replication › Configuring Replication

Administration Guide › Managing Data Replication

Reference › Tools Reference › dsreplication 

Related Training

ForgeRock Directory Services Core Concepts 

Related Issue Tracker IDs

OPENAM-11263 (Improve Embedded DJ logging to pick up replication configuration errors.) 

OPENAM-8254 (Uppercase characters in server URL hostnames break embedded replication) 

OPENDJ-3337 (dsreplication status on a DS, shows a DS+RS missing after the DS+RS is disabled/enabled.)

OPENDJ-3343 (Invalid Conflict resolution on Add sequence when Parent & Child are added on different replica)

OPENDJ-3309 (Replication server connection listener thread exits silently)

OPENDJ-3133 (dsreplication status reports M.C. (Missing Changes) when none exist.)

OPENDJ-2969 (changelogDb could not be read on OpenDJ instance startup)

OPENDJ-1577 (Server should stop all DB-related activity when disk full is detected)

OPENDJ-1135 (DS sometimes fail to connect to RS after server restart)

OPENDJ-251 (Provide count of unresolved replication naming conflicts as part of the Monitoring information)

OPENDJ-49 (Replication replay does not take into consideration the server/backend's writability mode.)


How do I find replication conflicts in DS/OpenDJ (All versions)?

The purpose of this article is to provide information on searching for replication conflicts in DS/OpenDJ so that you can fix them. A common error associated with replication conflicts is "Plug-in org.forgerock.openam.idrepo.ldap.DJLDAPv3Repo encountered a ldap exception. ldap errorcode=95" which indicates multiple matching entries.

Background information

Replication conflicts occur when incompatible changes are applied concurrently to multiple read-write replicas. There are two types of conflict possible:

  • Modify conflicts - these involve concurrent modifications to the same entry.
  • Naming conflicts - these involve other operations that affect the DN of the entry.

Replication can resolve modify conflicts and most naming conflicts without intervention. However, the following types of naming conflicts cannot be resolved during replication and must be fixed manually:

  • Different entries that share the same DN are added concurrently on multiple replicas.
  • An entry on one replica is moved (renamed) to use the same DN as a new entry concurrently added on another replica.
  • A parent entry is deleted on one replica while a child entry is added or renamed concurrently on another replica.

This is a brief summary of the information documented in the Administration Guide › Resolving Replication Conflicts.

Multiple matching entries

The following error is commonly seen when there are multiple matching entries:

Plug-in org.forgerock.openam.idrepo.ldap.DJLDAPv3Repo encountered a ldap exception. ldap errorcode=95

This error can occur in a variety of scenarios, including (but are not limited to) the following common examples:

Finding replication conflicts

You can use the logs to identify all types of replication issues; this is covered in detail in How do I troubleshoot replication issues in DS/OpenDJ (All versions)? along with information on monitoring replication to identify issues quickly. However, if you know you have naming conflicts, you can run an ldapsearch command on ds-sync-conflict to identify the specific entries.

Note

You must run the ldapsearch command on each node since replication conflict entries can differ across nodes. Alternatively, you can resolve the replication conflicts on one node and re-initialize the remaining nodes.

Example ldapsearch command:

$ ./ldapsearch --bindDN "cn=Directory Manager" --bindPassword password --hostname ds1.example.com --port 1389 --trustAll --baseDN "dc=example,dc=com" "(ds-sync-conflict=*)" ds-sync-conflict > replication-conflict-entries.out

The naming conflicts written to the file would look similar to the output below (where the addition of entryuuid=[string]+ signifies a naming conflict):

entryuuid=bfbbd0fd-42f4-4d54-b0b2-69b32233cec9+cn=jdoe,ou=group,ou=employees,dc=example,dc=com
entryuuid=c49df422-5f96-ea2b-85a2-d921eb0c3309+uid=user1,dc=example,dc=com
entryuuid=eb8ad149-aa61-ea2b-8b0f-935663bcd8ed+uid=user74,dc=example,dc=com
...

You should resolve the naming conflicts as described in the documentation: Administration Guide › Resolving Replication Conflicts.

See Also

How do I use the Access log to troubleshoot DS/OpenDJ (All versions)?

Troubleshooting DS/OpenDJ

LDAP Schema Reference › ds-sync-conflict

Related Training

N/A

Related Issue Tracker IDs

OPENDJ-3343 (Invalid Conflict resolution on Add sequence when Parent & Child are added on different replica)


How do I perform a heartbeat check against DS/OpenDJ (All versions)?

The purpose of this article is to provide information on performing a heartbeat check against DS/OpenDJ. A heartbeat check allows you to perform a simple health check on the server to ensure it is up and running.

Overview

If your load balancer or application is capable of using a heartbeat connection to check if DS/OpenDJ is online and responding, use of a proper LDAP connection is key.

A proper heartbeat check should progress as follows to fully verify the status of the DS/OpenDJ server:

CONNECT -> BIND -> SEARCH -> UNBIND -> DISCONNECT

or

CONNECT -> BIND -> UNBIND -> DISCONNECT
Caution

It is important to disconnect again, else the heartbeat check can remain open and consume server resources

Performing a heartbeat check

You can use a command such as the following to connect, bind, issue a base (scope) level search on the baseDN of "", unbind and finally disconnect:

$ ./ldapsearch --port 4444 --bindDN "cn=Directory Manager" --useSsl --trustAll --baseDN "" --searchScope base "(objectClass=*)" 1.1
dn:

The results of the search operation show "dn:" (only). This is expected as the LDAP request attribute "1.1." returns the distinguished name only. In this case, you are searching the "rootDSE" and the expected dn: is "null". 

An example entry seen in the access logs for successful heartbeat check is shown below depending on your DS/OpenDJ version:

  • DS 5 and later (JSON log format):
    {"eventName":"DJ-LDAP","client":{"ip":"203.0.113.0","port":52597},"server":{"ip":"203.0.113.0","port":1389},"request":{"protocol":"LDAP","operation":"CONNECT","connId":2},"transactionId":"0","response":{"status":"SUCCESSFUL","statusCode":"0","elapsedTime":0,"elapsedTimeUnits":"MILLISECONDS"},"timestamp":"2018-03-15T16:44:49.400Z","_id":"5009191b-a09a-4c7f-84cd-e240b2810a67-1"}
    {"eventName":"DJ-LDAP","client":{"ip":"203.0.113.0","port":52597},"server":{"ip":"203.0.113.0","port":1389},"request":{"protocol":"LDAP","operation":"BIND","connId":2,"msgId":1,"version":"3","dn":"cn=Directory Manager","authType":"SIMPLE"},"transactionId":"5009191b-a09a-4c7f-84cd-e240b2810a67-2","response":{"status":"SUCCESSFUL","statusCode":"0","elapsedTime":3,"elapsedTimeUnits":"MILLISECONDS"},"userId":"cn=Directory Manager,cn=Root DNs,cn=config","timestamp":"2018-0315T16:44:49.687Z","_id":"5009191b-a09a-4c7f-84cd-e240b2810a67-4"}
    {"eventName":"DJ-LDAP","client":{"ip":"203.0.113.0","port":52597},"server":{"ip":"203.0.113.0","port":1389},"request":{"protocol":"LDAP","operation":"SEARCH","connId":2,"msgId":2,"dn":"","scope":"base","filter":"(objectClass=*)","attrs":["1.1"]},"transactionId":"5009191b-a09a-4c7f-84cd-e240b2810a67-5","response":{"status":"SUCCESSFUL","statusCode":"0","elapsedTime":1,"elapsedTimeUnits":"MILLISECONDS","nentries":1},"timestamp":"2018-03-15T16:44:49.717Z","_id":"5009191b-a09a-4c7f-84cd-e240b2810a67-7"}
    {"eventName":"DJ-LDAP","client":{"ip":"203.0.113.0","port":52597},"server":{"ip":"203.0.113.0","port":1389},"request":{"protocol":"LDAP","operation":"UNBIND","connId":2,"msgId":3},"transactionId":"5009191b-a09a-4c7f-84cd-e240b2810a67-8","timestamp":"2018-03-15T16:44:49.729Z","_id":"5009191b-a09a-4c7f-84cd-e240b2810a67-10"}
    {"eventName":"DJ-LDAP","client":{"ip":"203.0.113.0","port":52597},"server":{"ip":"203.0.113.0","port":1389},"request":{"protocol":"LDAP","operation":"DISCONNECT","connId":2},"transactionId":"0","response":{"status":"SUCCESSFUL","statusCode":"0","elapsedTime":0,"elapsedTimeUnits":"MILLISECONDS","reason":"Client Unbind"},"timestamp":"2018-03-15T16:44:49.734Z","_id":"5009191b-a09a-4c7f-84cd-e240b2810a67-12"}
    
  • Pre-DS 5 (file-based log format):
    [15/Mar/2018:16:44:49 -0600] CONNECT conn=2 from=203.0.113.0:52597 to=203.0.113.0:1389 protocol=LDAP 
    [15/Mar/2018:16:44:49 -0600] BIND REQ conn=2 op=0 msgID=1 version=3 type=SIMPLE dn="cn=Directory Manager" 
    [15/Mar/2018:16:44:49 -0600] BIND RES conn=2 op=0 msgID=1 result=0 authDN="cn=Directory Manager,cn=Root DNs,cn=config" etime=3 
    [15/Mar/2018:16:44:49 -0600] SEARCH REQ conn=2 op=1 msgID=2 base="" scope=baseObject filter="(objectClass=*)" attrs="1.1" 
    [15/Mar/2018:16:44:49 -0600] SEARCH RES conn=2 op=1 msgID=2 result=0 nentries=1 etime=1 
    [15/Mar/2018:16:44:49 -0600] UNBIND REQ conn=2 op=2 msgID=3 
    [15/Mar/2018:16:44:49 -0600] DISCONNECT conn=2 reason="Client Unbind"
    

Alternately, if your application can perform a Connect with a Bind only, this is a viable alternative:

  • DS 5 and later (JSON log format):
    {"eventName":"DJ-LDAP","client":{"ip":"203.0.113.0","port":52597},"server":{"ip":"203.0.113.0","port":1389},"request":{"protocol":"LDAP","operation":"CONNECT","connId":2},"transactionId":"0","response":{"status":"SUCCESSFUL","statusCode":"0","elapsedTime":0,"elapsedTimeUnits":"MILLISECONDS"},"timestamp":"2018-03-15T16:44:49.400Z","_id":"5009191b-a09a-4c7f-84cd-e240b2810a67-1"}
    {"eventName":"DJ-LDAP","client":{"ip":"203.0.113.0","port":52597},"server":{"ip":"203.0.113.0","port":1389},"request":{"protocol":"LDAP","operation":"BIND","connId":2,"msgId":1,"version":"3","dn":"cn=Heartbeat Admins":"SUCCESSFUL","statusCode":"0","elapsedTime":3,"elapsedTimeUnits":"MILLISECONDS"},"userId":"cn=Heartbeat Admin,cn=Root DNs,cn=config","timestamp":"2018-0315T16:44:49.687Z","_id":"5009191b-a09a-4c7f-84cd-e240b2810a67-4"}
    {"eventName":"DJ-LDAP","client":{"ip":"203.0.113.0","port":52597},"server":{"ip":"203.0.113.0","port":1389},"request":{"protocol":"LDAP","operation":"UNBIND","connId":2,"msgId":3},"transactionId":"5009191b-a09a-4c7f-84cd-e240b2810a67-8","timestamp":"2018-03-15T16:44:49.729Z","_id":"5009191b-a09a-4c7f-84cd-e240b2810a67-10"}
    {"eventName":"DJ-LDAP","client":{"ip":"203.0.113.0","port":52597},"server":{"ip":"203.0.113.0","port":1389},"request":{"protocol":"LDAP","operation":"DISCONNECT","connId":2},"transactionId":"0","response":{"status":"SUCCESSFUL","statusCode":"0","elapsedTime":0,"elapsedTimeUnits":"MILLISECONDS","reason":"Client Unbind"},"timestamp":"2018-03-15T16:44:49.734Z","_id":"5009191b-a09a-4c7f-84cd-e240b2810a67-12"}
    
  • Pre-DS 5 (file-based log format):
    [15/Mar/2018:16:44:49 -0600] CONNECT conn=2 from=203.0.113.0:52597 to=203.0.113.0:1389 protocol=LDAP 
    [15/Mar/2018:16:44:49 -0600] BIND REQ conn=2 op=0 msgID=1 version=3 type=SIMPLE dn="cn=Heartbeat Admin" 
    [15/Mar/2018:16:44:49 -0600] BIND RES conn=2 op=0 msgID=1 result=0 authDN="cn=Heartbeat Admin,cn=Root DNs,cn=config" etime=3 
    [15/Mar/2018:16:44:49 -0600] UNBIND REQ conn=2 op=2 msgID=3 
    [15/Mar/2018:16:44:49 -0600] DISCONNECT conn=2 reason="Client Unbind"
Note

You can also perform a ldapsearch against cn=monitor for a complete status of the DS/OpenDJ server as detailed in How do I use cn=monitor entry in DS/OpenDJ (All versions) for monitoring? or you could use the Connections.newHeartBeatConnectionFactory() method as detailed in Developer's Guide › Health Check Connections if your load balancer can use a Java based application.

See Also

How do I use cn=monitor entry in DS/OpenDJ (All versions) for monitoring?

How do I check if a backend is online in DS/OpenDJ (All versions)?

How do I verify that a DS/OpenDJ (All versions) server is responding to LDAP requests without providing a password?

How do I use the Access log to troubleshoot DS/OpenDJ (All versions)?

Performance tuning and monitoring ForgeRock products

Related Training

N/A

Related Issue Tracker IDs

N/A


How do I check if a backend is online in DS/OpenDJ (All versions)?

The purpose of this article is to provide information on checking if a backend is online in DS/OpenDJ. This can be a useful check as part of your DS/OpenDJ monitoring.

Checking if a backend is online

You can perform a ldapsearch against the backend to check it is online. This example performs a check against the userRoot backend:

$ ./ldapsearch --port 389 --bindDN "cn=Directory Manager" --bindPassword password --baseDN "cn=userRoot Backend,cn=monitor" --searchScope sub "(objectClass=*)"

Online backend

If the backend is online, you would get a response such as the following:

dn: cn=userRoot Backend,cn=monitor
objectClass: top
objectClass: ds-monitor-entry
objectClass: ds-backend-monitor-entry
ds-backend-is-private: FALSE
ds-backend-writability-mode: enabled
cn: userRoot Backend
ds-backend-entry-count: 200002
ds-base-dn-entry-count: 200002 dc=forgerock,dc=com
ds-backend-id: userRoot
ds-backend-base-dn: dc=forgerock,dc=com

Offline backend

If the backend is offline, you would get a response such as the following:

SEARCH operation failed
Result Code:  32 (No Such Entry)
Additional Information:  Entry cn=userRoot Backend,cn=monitor does not exist in the memory-based backend
Matched DN:  cn=monitor

You would also see a corresponding error in the DS/OpenDJ log, for example:

[24/Mar/2015:14:41:59 -0600] category=CONFIG severity=SEVERE_ERROR msgID=3407988 msg=An error occurred while trying to initialize a backend loaded from class org.opends.server.backends.jeb.BackendImpl with the information in configuration entry ds-cfg-backend-id=userRoot,cn=Backends,cn=config:  The database environment could not be opened: (JE 5.0.73) Environment must be closed, caused by: com.sleepycat.je.EnvironmentFailureException: Environment invalid because of previous exception: (JE 5.0.73) /Users/jdoe/Software/db/userRoot com.sleepycat.je.log.ChecksumException: Incomplete log entry header, size=0 lsn=0x10/0x67b292 LOG_CHECKSUM: Checksum invalid on read, log is likely invalid. Environment is invalid and must be closed. fetchTarget of 0x10/0x67b292 parent IN=3 IN class=com.sleepycat.je.tree.BIN lastFullVersion=0x1a/0x9136a5 lastLoggedVersion=0x1f/0xc74a1 parent.getDirty()=false state=0 (BackendImpl.java:1754 BackendImpl.java:319 BackendConfigManager.java:1298 BackendConfigManager.java:279 DirectoryServer.java:2210 DirectoryServer.java:1397 DirectoryServer.java:9651).  This backend will be disabled

See Also

How do I perform a heartbeat check against DS/OpenDJ (All versions)?

FAQ: Monitoring DS/OpenDJ

How do I use cn=monitor entry in DS/OpenDJ (All versions) for monitoring?

How do I verify that a DS/OpenDJ (All versions) server is responding to LDAP requests without providing a password?

Administration Guide › Monitoring, Logging, and Alerts

Related Training

ForgeRock Directory Services Core Concepts

Related Issue Tracker IDs

N/A


How do I verify that a DS/OpenDJ (All versions) server is responding to LDAP requests without providing a password?

The purpose of this article is to provide information on verifying that a DS/OpenDJ server is responding to LDAP requests without needing to authenticate or provide a password.

Verifying that a DS/OpenDJ server is responding

You can verify that a DS/OpenDJ server is responding in one of two ways:

  • Perform a base object search of the root entry, requesting the "1.1" pseudo attribute (which means return no attributes) in order to minimize the server's work and network usage. You can choose whether or not you want to authenticate; the default ACI does allow anonymous access to the root entry.
  • Perform an anonymous read of a user attribute (other than a password attribute); the ACIs allow anonymous read access.

Examples

Anonymous search of the root entry:

$ ./ldapsearch --hostname localhost --port 1636 --useSsl --trustAll --baseDN "" --searchScope base "(objectclass=*)" 1.1

Authenticated search of the root entry:

$ ./ldapsearch --hostname localhost --port 1636 --useSsl --trustAll --bindDN "cn=Directory Manager" --bindPassword password --baseDN "" --searchScope base "(objectclass=*)" 1.1

Anonymous read of a user attribute (dc):

$ ./ldapsearch --hostname localhost --port 1636 --useSsl --trustAll --baseDN dc=example,dc=com --searchScope base "(objectclass=*)" dc
Note

In situations where DS/OpenDJ has surpassed low-disk threshold, the server will not be writing to disk, therefore not able to check if user authentications are failing. DS/OpenDJ would continue to work but ordinary restricted user logins would fail.

See Also

How do I perform a heartbeat check against DS/OpenDJ (All versions)?

How do I check if a backend is online in DS/OpenDJ (All versions)?

How do I prevent anonymous access in DS/OpenDJ (All versions)?

Administration Guide › Configuring Privileges and Access Control › Configuring ACIs

Related Training

ForgeRock Directory Services Core Concepts

Related Issue Tracker IDs

N/A


How do I troubleshoot connection via LDAPS issues in DS/OpenDJ (All versions)?

The purpose of this article is to provide information on troubleshooting connection via LDAPS issues in DS/OpenDJ. This article assumes you are connecting to DS/OpenDJ from AM/OpenAM using a secure LDAP connection (over SSL).

Load balancer configuration

If you are using a load balancer in front of DS/OpenDJ, you should ensure you have specified the appropriate details in the SSL certificates depending on your configuration:

  • If the load balancer is doing SSL pass-through, you only need to specify the DS/OpenDJ hostnames in the SSL certificates.
  • If the load balancer is using SSL, you need to specify the load balancer hostname as the Cert Subject and the DS/OpenDJ hostnames as the Subject Alternate Names.

To help pinpoint these issues, you should try the following tests and let us know the results when you submit a ticket with the troubleshooting data:

  1. Bypass the load balancer and point straight to a DS/OpenDJ server; retest to determine if the load balancer is causing the connection issues.
  2. Point the load balancer to an individual DS/OpenDJ server and retest. Repeat with the other DS/OpenDJ servers. If this works, it suggests there is an issue with the DS/OpenDJ Subject Alternative Name or with a known feature in the Oracle® JDK that prevents a triple handshake vulnerability. An issue can manifest as a result of the JDK feature if the load balancer does SSL session resumption with multiple different SSL certificates. This is discussed in the following Stack Overflow article and Puppet Server: Known Issues: SSL Server Certificate Change and Virtual IP Addresses. If it is affecting you, you will also see "server certificate change is restricted" errors in your server log, for example:
    javax.net.ssl.SSLHandshakeException: server certificate change is restrictedduring renegotiation
    

Java upgrades

Some Java® upgrades have caused LDAPS issues in DS/OpenDJ as indicated in the following table:

Troubleshooting connection issues

Caution

If you have made any changes to the keystore or truststore, you should make sure you have restarted DS/OpenDJ to ensure it is using the updated files before proceeding to troubleshoot.

There are a number of different types of data that you should collect to help us troubleshoot connection issues via LDAPS. If you are experiencing connection issues, you should collect the following data and submit it to us when you raise the ticket to enable us to help you more quickly:

  • Server logs
  • SSL debug logs
  • Keystore information
  • Current JVM options
  • Third-party tools output

Server logs

Server logs (Access and Error) record details about all the operations processed by the server, in addition to details of all server events, errors and warnings. If there are CONNECT messages in the Access log, you know that the DS/OpenDJ server is at least receiving TCP connections from AM/OpenAM.

These files are located in the /path/to/ds/logs directory where DS/OpenDJ is installed.

SSL debug logs

The SSL debug logs provide detailed SSL debugging information. When enabled, the SSL debug logs are output to the server.out file, which is located in the /path/to/ds/logs directory where DS/OpenDJ is installed. 

You can enable SSL debug logs by starting the DS/OpenDJ server with additional options:

OPENDJ_JAVA_ARGS='-server -Djavax.net.debug=SSL,handshake,trustmanager' bin/start-ds

Keystore information

Keystore information provides us with details about the contents of each keystore and truststore; keystores are used for private keys whereas truststores are used for public, signed certificates.

By default, truststore and/or keystore files and passwords are stored in different stores depending on their purpose:

  • ads-truststore for replication purposes.
  • admin-truststore and admin-keystore for administration purposes.
  • truststore and keystore for everything else.

These files are located in /path/to/ds/config; as of DS 6 and later, ads-truststore is located in /path/to/ds/db/ads-truststore for new installs.

See  Administration Guide › Changing Server Certificates for further information.

You can run the keytool command for each keystore and truststore to see the contents:

$ keytool -list -v -keystore [keystore] -storetype [storetype] -storepass [password]

replacing [keystore] with the full path and name of the keystore or truststore file, [storetype] with the type of keystore (for example JCEKS or JKS), and [password] with the corresponding keystore or truststore password.

Current JVM options

Current JVM settings can be useful when troubleshooting connection issues. See How do I collect JVM data for troubleshooting DS/OpenDJ (All versions)? for information on finding your current JVM settings.

Third-party tools output

There are two third-party tools, which can help with debugging LDAPS connection issues by providing information about the SSL connection as well as attempting a SSL handshake. These tools are:

  • openssl - you can run a command such as the following to provide this information:
    $ openssl s_client -connect [hostname:port] -showcerts
    Press Ctrl+C to quit openssl.
  • Keystore explorer - this is a GUI application that provides similar information. You need to specify the hostname and port to get the connection information.
Note

These are third-party tools that we suggest can be used for troubleshooting but are not supported by ForgeRock.

See Also

FAQ: SSL certificate management in DS/OpenDJ

AM/OpenAM (All versions) fail to connect to DS/OpenDJ using a secured connection with an ERROR: Connection factory became offline

How do I configure LDAPS clients in DS/OpenDJ (All versions)?

How do I make AM/OpenAM (All versions) communicate with a secured LDAP server?

Administration Guide › LDAP Client Access Over SSL

Administration Guide › Troubleshooting TLS/SSL Connections

Administration Guide › Server Logs

Configuration Reference › LDAP Connection Handler

Related Training

N/A

Related Issue Tracker IDs

OPENDJ-652 (Connections from Solaris 10 ldapclient can cause LDAPS request handler to spin)


How do I verify indexes in DS/OpenDJ (All versions) are correct?

The purpose of this article is to provide some guidance on checking DS/OpenDJ indexes are correct.

Index Verification

Indexes in DS/OpenDJ can be incorrect in two different ways:

  1. An index contains a key that refers to an entry that no longer exists or no longer contains the indexed value.
  2. An index should contain a reference to an entry for a particular key, but does not.

Both cases should not occur in general usage.

To detect the errors, the verify-index tool is provided with DS/OpenDJ. It needs to be run with different options to detect both kinds of index problems. As of OpenDJ 3, you must stop the server before verifying indexes.

Note

For an attribute index, the index name is simply an attribute name. Multiple indexes may be verified for completeness or all indexes if no index name is specified.

Verify Case 1

Run the following verify-index command to step through all the entries in the backend and verify that all indexes are correct:

$ ./verify-index --baseDN [baseDN] --countErrors

replacing [baseDN] with appropriate values.

Verify Case 2

Run the following verify-index command to step through all the keys in the index and verify the values for each key correctly match entries in the backend:

$ ./verify-index --baseDN [baseDN] --index [indexname] --clean

replacing [baseDN] and [indexname] with appropriate values.

Verify Case 3

Run the following verify-index command to step through all the entries in the backend and verify that each entry is referenced in the index if appropriate:

$ ./verify-index --baseDN [baseDN] --index [indexname]

replacing [baseDN] and [indexname] with appropriate values.

Examples

The objectClass index on this DS/OpenDJ server is completely valid:

$ ./verify-index --baseDN "dc=example,dc=com" --countErrors
[20/11/2017:10:35:54 -0700] category=BACKEND seq=1 severity=FINE msg=JE backend 'userRoot' does not specify the number of cleaner threads: defaulting to 8 threads
[20/11/2017:10:35:54 -0700] category=BACKEND seq=2 severity=FINE msg=JE backend 'userRoot' does not specify the number of lock tables: defaulting to 17
[20/11/2017:10:35:57 -0700] category=BACKEND seq=35 severity=INFO msg=Checked 10002 entries and found 0 error(s) in 2 seconds (average rate 3965.9/sec)
[20/11/2017:10:35:57 -0700] category=BACKEND seq=36 severity=FINE msg=Statistics for records that have exceeded the entry limit:
[20/11/2017:10:35:57 -0700] category=BACKEND seq=37 severity=FINE msg=File /dc=com,dc=example/mail.caseIgnoreIA5SubstringsMatch:6 has 15 such record(s) min=10000 max=10000 median=10000
[20/11/2017:10:35:57 -0700] category=BACKEND seq=38 severity=FINE msg=File /dc=com,dc=example/objectClass.objectIdentifierMatch has 4 such record(s) min=10000 max=10002 median=10000
$ ./verify-index --baseDN "dc=example,dc=com" --index objectClass --clean
[20/Oct/2016:11:44:42 +0100] category=BACKEND severity=INFORMATION msgID=9437595 msg=Local DB backend userRoot does not specify the number of lock tables: defaulting to 97
[20/Oct/2016:11:44:42 +0100] category=BACKEND severity=INFORMATION msgID=9437594 msg=Local DB backend userRoot does not specify the number of cleaner threads: defaulting to 24 threads
[20/Oct/2016:11:44:42 +0100] category=JEB severity=NOTICE msgID=8847461 msg=Checked 6 records and found 0 error(s) in 0 seconds (average rate 200.0/sec)
[20/Oct/2016:11:44:42 +0100] category=JEB severity=INFORMATION msgID=8388710 msg=Number of records referencing more than one entry: 4
[20/Oct/2016:11:44:42 +0100] category=JEB severity=INFORMATION msgID=8388711 msg=Number of records that exceed the entry limit: 4
[20/Oct/2016:11:44:42 +0100] category=JEB severity=INFORMATION msgID=8388712 msg=Average number of entries referenced is 0.33/record
[20/Oct/2016:11:44:42 +0100] category=JEB severity=INFORMATION msgID=8388713 msg=Maximum number of entries referenced by any record is 1
$ ./verify-index --baseDN "dc=example,dc=com" --index objectClass
[20/Oct/2016:11:43:07 +0100] category=BACKEND severity=INFORMATION msgID=9437595 msg=Local DB backend userRoot does not specify the number of lock tables: defaulting to 97
[20/Oct/2016:11:43:07 +0100] category=BACKEND severity=INFORMATION msgID=9437594 msg=Local DB backend userRoot does not specify the number of cleaner threads: defaulting to 24 threads
[20/Oct/2016:11:43:09 +0100] category=JEB severity=NOTICE msgID=8847466 msg=Checked 20002 entries and found 0 error(s) in 1 seconds (average rate 11984.4/sec)
[20/Oct/2016:11:43:09 +0100] category=JEB severity=INFORMATION msgID=8388715 msg=Statistics for records that have exceeded the entry limit:
[20/Oct/201614:11:43:09 +0100] category=JEB severity=INFORMATION msgID=8388716 msg=File dc_example_dc_com_objectClass.equality has 4 such record(s) min=20000 max=20002 median=20000

See Also

How do I rebuild indexes in DS/OpenDJ (All versions)?

Unindexed searches causing slow searches and poor performance on DS/OpenDJ (All versions) server

How do I troubleshoot issues with my indexes in DS/OpenDJ (All versions)?

How do I know what index types are needed for search filters in DS/OpenDJ (All versions)?

Administration Guide › Indexing Attribute Values › Verifying Indexes

Reference › Tools Reference › verify-index

Administration Guide › Indexing Attribute Values

Related Training

ForgeRock Directory Services Core Concepts

Related Issue Tracker IDs

OPENDJ-1266 (State index is not updated when an index is deleted)


How do I troubleshoot issues with my indexes in DS/OpenDJ (All versions)?

The purpose of this article is to provide assistance on troubleshooting issues with your indexes in DS/OpenDJ. You may be aware of issues with your indexes due to symptoms such as slow searches, poor performance, high CPU usage when searching and even long login times in AM/OpenAM when your user store is on DS/OpenDJ.

Troubleshooting indexes

Issues with indexes can be caused by one of the following:

General health check

You can use one of the following commands (depending on DS/OpenDJ version) to provide a general health check on all indexes to help you evaluate how well attributes are indexed:

  • DS / OpenDJ 3.x, you can use the following backendstat command; you must stop the server prior to running this command:
    $ ./backendstat show-index-status --backendID userRoot --baseDN dc=example,dc=com
  • OpenDJ 2.6.x, you can use the following dbtest command:
    $ ./dbtest list-index-status --backendID userRoot --baseDN dc=example,dc=com

These commands return a list of all indexes and includes information about the type of index, the associated database name, whether the index is valid and whether the index has exceeded its index entry limit.

Caution

These commands can be expensive and take a long time to complete, as they reads all indexes for all backends.

Degraded indexes

Degraded indexes occur when you have created or made changes to indexes and the index becomes corrupt due to missing data. You can check for degraded indexes using the verify-index command. This is described in How do I verify indexes in DS/OpenDJ (All versions) are correct?

Any degraded indexes must be rebuilt. You can rebuild specific indexes, all indexes or only degraded indexes as described in How do I rebuild indexes in DS/OpenDJ (All versions)?

Missing Indexes 

Missing indexes (unindexed) result in long search times (high etime values in the access log). This can even affect logging into AM/OpenAM if your user store is on DS/OpenDJ and an attribute involved in the login process is not indexed, for example, the LDAP SAML attribute.

Missing indexes need to be created and built.

See Unindexed searches causing slow searches and poor performance on DS/OpenDJ (All versions) server for further information on the symptoms and solution.

Incomplete indexing

Incomplete indexing can also cause problems with slow searches, but sometimes it can be difficult to pinpoint which attribute is causing the issue if there are multiple attributes in the search filter. 

You can use the debugsearchindex attribute with the ldapsearch command to identify issues such as unindexed attributes and exceeded entry limits. The debugsearchindex attribute tells DS/OpenDJ to return an entry containing debug information that describes how the search operation would have been processed (rather than actually performing the search). See the debugsearchindex example below for further information.

Unnecessary indexes

Unused or rarely used indexes can also impact performance as DS/OpenDJ has to maintain entries for all indexes.

You can run a script (topfilters) to find out which indexes are used most often to help you identify underused indexes. You can then remove these indexes using the dsconfig delete-backend-index or delete-local-db-index command depending on which version of DS/OpenDJ you are using.

The source for the topfilters script can be found here: Githutb: OpenDJ Utilities - topfilters.

High index entry limits

Index entry limits are set to 4000 by default, which means that once the number of entries matching a given index exceeds this number, DS/OpenDJ stops maintaining the entries for that index as it is an inefficient use of resources. This default value should be sufficient for most indexes other than objectClass indexes.

Caution

Do not set the index-entry-limit to the number of entries in your backend.

If you have set your index's entry limits much higher than this for any of your indexes, you may find your performance is impacted as DS/OpenDJ has to maintain entries for very large indexes. You should try reducing the index entry limit back to the default value to see if this improves performance. 

Poorly defined search filters

Typically you should aim to have specific search filters in your searches to produce more accurate matches and also reduce the time taken to search.

For example, a search filter such as (objectClass=person) is very generalized and will produce a lot of matches. Instead, you should focus your search by adding other attributes to your search filter that would produce the matches you require. For example, you could use a filter similar to this to reduce the number of matches:

(&(mail=username@maildomain.net)(objectClass=person))

debugsearchindex example

You can use the information contained in the access logs to determine what you should include in your debugsearchindex. For example, the following sample access log snippets show high etimes for two different searches:

[19/Oct/2016:17:09:27 -0400] SEARCH conn=107918 op=1 msgID=2 base="ou=People,dc=example,dc=org" scope=one filter="(&(modifytimestamp>=20161019161001Z)(objectclass=person))" attrs="uid email" result=118 message="Client Disconnect" nentries=5 unindexed etime=16761

[19/Oct/2016:17:16:42 -0400] SEARCH conn=108320 op=1 msgID=2 base="ou=People,dc=example,dc=org" scope=one filter="(&(iscurrent=1)(objectclass=person))" attrs="uid email" result=118 message="Client Disconnect" nentries=9 unindexed etime=46961

You can then construct searches using this information to show which indexes are being used to help you identify incomplete or missing indexes. For example:

$ ./ldapsearch --hostname ds1.example.com --port 1389 --bindDN "cn=Directory Manager" --bindPassword password --baseDN ou=People,dc=example,dc=org --searchScope one "(&(modifytimestamp>=20161019161001Z)(objectclass=person))" debugsearchindex

dn: cn=debugsearch
debugsearchindex: filter=(&(objectClass=person)[INDEX:objectClass.equality][LIMIT-EXCEEDED](modifyTimestamp>=20161019161001Z)[NOT-INDEXED])[NOT-INDEXED] scope=one[LIMIT-EXCEEDED:24482] final=[NOT-INDEXED]
$ ./ldapsearch --hostname ds1.example.com --port 1389 --bindDN "cn=Directory Manager" --bindPassword password --baseDN ou=People,dc=example,dc=org --searchScope one "(&(iscurrent=1)(objectclass=person))" debugsearchindex 

dn: cn=debugsearch
debugsearchindex: filter=(&(objectClass=person)[INDEX:objectClass.equality][LIMIT-EXCEEDED](iscurrent=1)[NOT-INDEXED])[NOT-INDEXED] scope=one[LIMIT-EXCEEDED:26021] final=[NOT-INDEXED]

See Administration Guide › Indexing Attribute Values › About debugsearchindex Values for further information on the results obtained from the debugsearchindex option.

See Also

Unindexed searches causing slow searches and poor performance on DS/OpenDJ (All versions) server

How do I verify indexes in DS/OpenDJ (All versions) are correct?

How do I rebuild indexes in DS/OpenDJ (All versions)?

How do I troubleshoot high CPU utilization on DS/OpenDJ (All versions) servers?

Administration Guide › Rebuilding Indexes

Administration Guide › Indexing Attribute Values

Administration Guide › Indexing Attribute Values › What To Index

Reference › Tools Reference

Configuration Reference

Related Training

ForgeRock Directory Services Core Concepts

Related Issue Tracker IDs

N/A


How do I troubleshoot a DS/OpenDJ (All versions) REST configuration?

The purpose of this article is to provide advice on troubleshooting DS/OpenDJ REST configurations, include enabling logging for the REST interface.

Overview

DS/OpenDJ provides access to REST clients using two different mechanisms:

  1. via the HTTP connection handler, which is included in DS/OpenDJ but not enabled by default.
  2. via an external REST to LDAP (REST2LDAP) gateway servlet that runs in a web container such as Apache Tomcat™.

Troubleshooting what happens inside the REST interface can be difficult. This article provides some advice on the following troubleshooting steps:

  • Enabling logging in REST calls - this can be achieved by disabling the suppression of internal operations. Rest calls via the HTTP connection handler generate internal operations against DS/OpenDJ, which can help you understand exactly what the interface is trying to do as opposed to what you've configured it to do. It is recommended that you do not use your DS/OpenDJ server for anything else while you are logging internal operations to avoid noise in the logs. As of OpenDJ 3.5, you can also enable logging in the REST to LDAP (REST2LDAP) gateway servlet if it runs in the Apache Tomcat™ web container. Logging will also help identify any access control issues (see next step).
  • Checking access controls (ACI) - this can be achieved by performing the same operation with the same credentials using an LDAP client. If you cannot perform the operation, then you will need to adjust the ACI in DS/OpenDJ to allow these operations. The DS/OpenDJ logs will also identify any access control issues.
  • Checking the format of your REST calls - you should ensure you have formed your REST call correctly according to which mechanism you are using for REST as the expected format differs.

Enabling logging in REST calls

The REST interface included in DS/OpenDJ's HTTP connection handler uses an internal connection inside DS/OpenDJ to translate the REST query into a series of LDAP operations. These are not logged by default. Logging these operations can be extremely beneficial when you are developing your REST configuration. However, this will result in a lot of extra access logging, so it is only recommended in your development environment.

Note

You can only log internal operations for the HTTP connection handler; although the external REST2LDAP gateway servlet uses normal LDAP connections to communicate with DS/OpenDJ, these operations are not included when logging is enabled since the servlet is an external application running within the web container. DS/OpenDJ may use internal operations during normal activities so changing this setting may result in a lot of additional logging.

To log these internal operations when using the HTTP connection handler, use the following dsconfig command:

$ ./dsconfig set-log-publisher-prop --publisher-name "File-Based Access Logger" --set suppress-internal-operations:false --hostname ds1.example.com --port 4444 --bindDN "cn=Directory Manager" --bindPassword password

Here is an example of the logging produced by a REST call to create a new user.

[18/Feb/2016:14:56:10 +0000] CONNECT conn=29 from=0:0:0:0:0:0:0:1:62536 to=0:0:0:0:0:0:0:1:8080 protocol=HTTP/1.1
[18/Feb/2016:14:56:10 +0000] SEARCH REQ conn=-1 op=75 msgID=76 base="ou=people,dc=example,dc=com" scope=wholeSubtree filter="(&(objectClass=inetOrgPerson)(uid=user.0))" attrs="1.1"
[18/Feb/2016:14:56:10 +0000] SEARCH RES conn=-1 op=75 msgID=76 result=0 nentries=1 etime=0
[18/Feb/2016:14:56:10 +0000] BIND REQ conn=29 op=0 msgID=0 version=3 type=SIMPLE dn="uid=user.0,ou=People,dc=example,dc=com"
[18/Feb/2016:14:56:10 +0000] BIND RES conn=29 op=0 msgID=0 result=0 authDN="uid=user.0,ou=People,dc=example,dc=com" etime=1
[18/Feb/2016:14:56:10 +0000] ADD REQ conn=29 op=1 msgID=1 base="uid=user.new,ou=People,dc=example,dc=com"
[18/Feb/2016:14:56:10 +0000] ADD RES conn=29 op=1 msgID=1 result=0 etime=6
[18/Feb/2016:14:56:10 +0000] UNBIND REQ conn=29 op=2 msgID=2
[18/Feb/2016:14:56:10 +0000] DISCONNECT conn=29 reason="Client Unbind"

As of OpenDJ 3.5, you can enable logging in the REST to LDAP (REST2LDAP) gateway servlet if it runs in the Apache Tomcat™ web container. This is done by specifying logging properties in the logging.properties file (located in the WEB-INF/classes directory where the gateway servlet is deployed).

Checking access controls (ACI)

All REST requests result in a number of LDAP operations. As a consequence, any REST operation performed is subject to exactly the same access controls (ACI) configured in DS/OpenDJ as normal LDAP operations.

If you are having trouble reading or writing resources, it is recommended that you try to repeat the same operation (using the same credentials) using DS/OpenDJ's ldapmodify or ldapsearch tools. When they work as expected, you should get similar results from the REST interface.

See Administration Guide › Troubleshooting Server Problems › Troubleshooting Client Operations for further information on using the logs to identify and resolve access control issues.

Note

It is good practice to put the ACI in the actual backend, as this means it will be replicated and also included when you export the data. Global ACIs are not replicated and are easy to forget when you're creating new instances.

Checking the format of your REST calls

When making a REST call, you must authenticate as a user (in this example, username = jdoe and password = changeit). You can authenticate in two different ways, regardless of the REST mechanism you are using:

  • By using curl's --user flag, for example using the HTTP connection handler:
    $ curl --user jdoe:changeit http://ds1.example.com:8080/users/jdoe
  • By embedding the credentials in the URL, for example using the REST2LDAP gateway servlet:
    $ curl http://jdoe:changeit@ds1.example.com:8080/rest2ldap/users/jdoe

See Also

How do I know what the default Global ACIs are used for in DS/OpenDJ (All versions)?

Troubleshooting DS/OpenDJ

Administration Guide › Configuring Connection Handlers › RESTful Client Access Over HTTP

Installation Guide › Installing the REST to LDAP Gateway

Reference › REST to LDAP Configuration

Administration Guide › Configuring Privileges and Access Control

Reference › Tools Reference › ldapmodify

Reference › Tools Reference › ldapsearch

Related Training

N/A

Related Issue Tracker IDs

OPENDJ-1183 (Cannot reset userPassword through REST interface due to lack of privileges)

OPENDJ-2871 (Add support for sub-resources)

OPENDJ-2955 (Add logging capabilities to REST2LDAP Gateway)

OPENDJ-2403 (DSML servlet does not log errors for LDAP operations)


How do I troubleshoot high CPU utilization on DS/OpenDJ (All versions) servers?

The purpose of this article is to provide troubleshooting guidance if you are experiencing consistently high CPU utilization on the DS/OpenDJ server. If CPU utilization is in the range 70-80% under load, you should definitely investigate what is causing it as performance is likely to be impacted.

Troubleshooting high CPU usage

High CPU utilization is usually experienced in conjunction with other symptoms such as servers becoming unresponsive and poor performance.

Note

On Linux®, Unix® and Solaris® systems, there is a script you can run to capture the stack traces and process information as detailed in How do I use the msnapshots script to capture information for troubleshooting DS/OpenDJ (All versions)?

There are a number of different types of data that you should collect to help us troubleshoot high CPU utilization. If you are seeing high CPU usage, you should collect the following data and submit it to us when you raise the ticket to enable us to help you more quickly:

  • Access log file.
  • Configuration file.
  • JVM stack traces and current options.
  • Process / thread information.
  • Garbage Collector (GC) logs.
Note

It is important that you collect all the data at the point you are experiencing high CPU usage so that the timestamps of the data collected and the high CPU usage correspond. Please notify us of the exact timestamp when high CPU was noticed so we can focus our attention on the relevant information. 

Access log file

The Access log file records details about all the operations processed by the server. This file is located in the /path/to/ds/logs directory where DS/OpenDJ is installed.

Configuration file

The configuration file (config.ldif, which is located in the /path/to/ds/config directory where DS/OpenDJ is installed) contains all the configuration settings for a DS/OpenDJ server instance and is useful for checking your configuration, including index settings.

JVM stack traces and current options

JVM stack traces must be collected before killing the affected process or restarting the server; otherwise the information relating to the affected process is lost forever, which may make it impossible to identify the root cause. In addition to collecting stack traces, you should also provide details of the current JVM settings as they can provide an opportunity for tuning, which can reduce CPU usage.

Collecting stack traces and finding current JVM settings is described in How do I collect JVM data for troubleshooting DS/OpenDJ (All versions)?

Process / thread information 

Detailed process / thread information is useful for identifying the exact thread within a process that is experiencing the high CPU. You should collect this data at the same time as the stack trace so we can look for the identified thread in the stack trace for more information.

  • On Unix® and Linux® systems, you can use the top command to output process information for all threads to a file:
    $ top -b -n 1 -H -p [pid] > top-output.txt
    replacing [pid] with the process ID of the affected process.
  • On Solaris®, you can use the prstat command to capture this information:
    $ prstat -L -p [pid]
    replacing [pid] with the process ID of the affected process.
  • On Microsoft® Windows®, you will need to use the Process Monitor.

See How do I find which thread is consuming CPU in a Java process in DS/OpenDJ (All versions)? for further information on identifying the thread that is causing high CPU usage.

GC logs

GC logs also provide useful information about tuning issues that may be contributing to the high CPU usage. Enabling GC logging is described in How do I enable Garbage Collector (GC) Logging for DS/OpenDJ (All versions)?

In addition, please include any information about other symptoms you are seeing at the same time as high CPU usage and any changes you have made to your system that may have contributed to the problem.

See Also

How do I troubleshoot issues with my indexes in DS/OpenDJ (All versions)?

FAQ: DS/OpenDJ performance and tuning

Troubleshooting DS/OpenDJ

Administration Guide › Monitoring, Logging, and Alerts › Server Logs

Related Training

ForgeRock Directory Services Core Concepts

Related Issue Tracker IDs

N/A


How do I find which thread is consuming CPU in a Java process in DS/OpenDJ (All versions)?

The purpose of this article is to provide assistance on identifying which thread is consuming CPU in a Java® process in DS/OpenDJ. This is a useful troubleshooting technique when you are experiencing high CPU under load.

Identifying the thread

You can identify which thread is consuming CPU in a Java process as follows:

  1. Determine the process ID (PID) of the affected server process using the following command:
    $ jps -v
  2. Determine which thread in the PID identified in step 1 is consuming the CPU:
    • On Unix® and Linux® systems, you can use the top command:
      $ top -n 1 -H -p [pid]
      replacing [pid] with the process ID of the affected process.
    • On Solaris®, you can use the prstat command:
      $ prstat -L -p [pid]
      replacing [pid] with the process ID of the affected process.
    • On Microsoft® Windows®, you can use the Process Monitor.
    These commands all give an output listing all the threads in the selected process along with various details, including the CPU usage percentage. For example, the top command gives an output similar to this:
    Tasks: 152 total,   0 running, 152 sleeping,   0 stopped,   0 zombie 
    Cpu(s):  2.0%us,  1.4%sy,  0.0%ni, 94.7%id,  0.8%wa,  0.0%hi,  1.1%si,  0.0%st 
    Mem:   5008484k total,  4854428k used,   154056k free,    85624k buffers 
    Swap:  5013496k total,     6516k used,  5006980k free,  3393948k cached
    PID  USER     PR  NI  VIRT  RES  SHR S  %CPU  %MEM   TIME+  COMMAND              
    8706 user.1   20   0 2433m 561m  10m R  32.0  11.5  0:35.45 java
    8624 user.1   20   0 2433m 561m  10m S   0.0  11.5  0:00.01 java                 
    8625 user.1   20   0 2433m 561m  10m S   0.0  11.5  0:10.12 java                 
    8626 user.1   20   0 2433m 561m  10m S   0.0  11.5  0:35.60 java                 
    8627 user.1   20   0 2433m 561m  10m S   0.0  11.5  0:00.01 java                 
    8628 user.1   20   0 2433m 561m  10m S   0.0  11.5  0:00.02 java              
    In this example, we can see that thread ID 8706 is consuming 32% of the CPU.
  3. Produce a stack trace for the PID identified in step 1 using the following command:
    $ jstack [PID] > jstack-output.txt
  4. Convert the thread ID identified as problematic in step 2 to a hex value. In our example output, the thread ID decimal value 8706 converts to hex value 2202.
  5. Search the stack trace output for this hex value using grep. You are looking for a thread nid that matches this hex value:
    $ cat jstack-output.txt | grep -i 2202
    This command gives an output similar to this:
    "Concurrent Mark-Sweep GC Thread" prio=10 tid=0x00007f1448029800 nid=0x2202 runnable
    In this example, we can see that the problem is a Garbage Collector (GC) thread, which suggests that GC is not properly tuned.

See Also

How do I troubleshoot high CPU utilization on DS/OpenDJ (All versions) servers?

How do I use the msnapshots script to capture information for troubleshooting DS/OpenDJ (All versions)?

How do I enable Garbage Collector (GC) Logging for DS/OpenDJ (All versions)?

FAQ: DS/OpenDJ performance and tuning

Troubleshooting DS/OpenDJ

Java Thread Dump Analyzer

Related Training

N/A

Related Issue Tracker IDs

N/A


Monitoring


How do I create a dedicated user for monitoring in DS 6.x?

The purpose of this article is to provide assistance if you need to create a user for DS monitoring.

Overview

When installing DS 6 and later, you are given the option to create a default user for querying monitoring information (see Installation Guide › To Set Up a Directory Server for further information). If you did not create a user at this stage, you can use one of the following options to create one post install:

Once you have created your user/account, you can then enable the appropriate connection handler to expose the required monitoring endpoints:

After completing these steps, you will be ready to start monitoring using your user/account per Administration Guide › Monitoring, Logging, and Alerts

Granting monitoring rights to an existing account

To grant access to an existing user:

  1. Create an LDIF file to apply the ds-privilege-name: monitor-read to the user's entry, for example:
    $ cat monitor-read.ldif
    dn: uid=jdoe,ou=People,dc=example,dc=com
    changetype: modify
    add: ds-privilege-name
    ds-privilege-name: monitor-read
  2. Apply the changes using the following ldapmodify command:
    $ ./ldapmodify --port 1389 --bindDN "cn=Directory Manager" --bindPassword password monitor-read.ldif
  3. Verify the changes are applied, for example:
    $ ./ldapsearch --port 1389 --bindDN "uid=jdoe,ou=People,dc=example,dc=com" --bindPassword password --baseDN cn=monitor "(ds-cfg-backend-id=userRoot)"
    
    objectClass: top
    objectClass: ds-monitor
    objectClass: ds-monitor-backend
    objectClass: ds-monitor-backend-pluggable
    objectClass: ds-monitor-backend-db
    ds-mon-backend-is-private: false
    ds-mon-backend-entry-count: 2002
    ds-mon-backend-writability-mode: enabled
    ds-mon-backend-degraded-index-count: 0
    ds-mon-backend-ttl-is-running: false
    ds-mon-backend-ttl-last-run-time: 20180809151924.210Z
    ds-mon-backend-ttl-thread-count: 0
    ds-mon-backend-ttl-queue-size: 0
    ds-mon-backend-ttl-entries-deleted: {"count":0,"total":0.000,"mean_rate":0.000,"m1_rate":0.000,"m5_rate":0.000,"m15_rate":0.000}
    ds-mon-backend-filter-use-start-time: 19700101000000Z
    ds-mon-backend-filter-use-indexed: 0
    ds-mon-backend-filter-use-unindexed: 0
    ds-mon-db-version: 7.5.11
    ds-mon-db-cache-evict-internal-nodes-count: 0
    ds-mon-db-cache-evict-leaf-nodes-count: 0
    ds-mon-db-cache-total-tries-internal-nodes: 1045
    ds-mon-db-cache-total-tries-leaf-nodes: 882
    ds-mon-db-cache-misses-internal-nodes: 12
    ds-mon-db-cache-misses-leaf-nodes: 70
    ds-mon-db-cache-size-active: 3230029
    ds-mon-db-log-size-active: 4604407
    ds-mon-db-log-cleaner-file-deletion-count: 0
    ds-mon-db-log-utilization-min: 56
    ds-mon-db-log-utilization-max: 56
    ds-mon-db-log-size-total: 4604407
    ds-mon-db-log-files-open: 1
    ds-mon-db-log-files-opened: 5
    ds-mon-db-checkpoint-count: 0
    ds-cfg-backend-id: userRoot
    

Creating a dedicated monitoring account

To create a dedicated user, you can set up an LDIF backend and apply the privileges (which is what DS does during install if this option is selected): 

  1. Create a directory for the user in the /path/to/ds/db directory (this is where the special users are created during install), for example:
    $ mkdir /path/to/ds/db/monitorUser
    
  2. Create an encoded password for the monitoring user, for example:
    $ ./encode-password -c password -s SSHA512
    Encoded Password:  "{SSHA512}NUhN2CulFulVLDDJo+6uZ2NhjSpkl2iFn2sRNgFvjnZM/0LddL3hXPAecLTALCYnKfE+64lXiWwBfPvgYJR+0fL1ojGvsruE"
  3. Create an LDIF file that contains the monitoring user entry including the encoded password from step 2, for example:
    $ cat /path/to/ds/db/monitorUser/monitorUser.ldif
    dn: uid=Monitor
    objectClass: top
    objectClass: person
    objectClass: organizationalPerson
    objectClass: inetOrgPerson
    sn: User
    cn: Monitor
    ds-privilege-name: monitor-read
    userPassword: {SSHA512}NUhN2CulFulVLDDJo+6uZ2NhjSpkl2iFn2sRNgFvjnZM/0LddL3hXPAecLTALCYnKfE+64lXiWwBfPvgYJR+0fL1ojGvsruE
    
  4. Create an LDIF backend:
    $ ./dsconfig create-backend --set base-dn:uid=Monitor --set enabled:true --set ldif-file:/path/to/ds/db/monitorUser/monitorUser.ldif --type ldif --backend-name monitorUser --port 4444 --bindDn "cn=Directory Manager" --bindPassword password --trustAll --no-prompt
    
  5. Verify the monitoring user has read access to cn=monitor, for example:
    $ ./ldapsearch --port 1389 --bindDN "uid=Monitor" --bindPassword password --baseDN cn=monitor "(ds-cfg-backend-id=userRoot)"
    
    objectClass: top
    objectClass: ds-monitor
    objectClass: ds-monitor-backend
    objectClass: ds-monitor-backend-pluggable
    objectClass: ds-monitor-backend-db
    ds-mon-backend-is-private: false
    ds-mon-backend-entry-count: 2002
    ds-mon-backend-writability-mode: enabled
    ds-mon-backend-degraded-index-count: 0
    ds-mon-backend-ttl-is-running: false
    ds-mon-backend-ttl-last-run-time: 20180809151924.210Z
    ds-mon-backend-ttl-thread-count: 0
    ds-mon-backend-ttl-queue-size: 0
    ds-mon-backend-ttl-entries-deleted: {"count":0,"total":0.000,"mean_rate":0.000,"m1_rate":0.000,"m5_rate":0.000,"m15_rate":0.000}
    ds-mon-backend-filter-use-start-time: 19700101000000Z
    ds-mon-backend-filter-use-indexed: 0
    ds-mon-backend-filter-use-unindexed: 0
    ds-mon-db-version: 7.5.11
    ds-mon-db-cache-evict-internal-nodes-count: 0
    ds-mon-db-cache-evict-leaf-nodes-count: 0
    ds-mon-db-cache-total-tries-internal-nodes: 1045
    ds-mon-db-cache-total-tries-leaf-nodes: 882
    ds-mon-db-cache-misses-internal-nodes: 12
    ds-mon-db-cache-misses-leaf-nodes: 70
    ds-mon-db-cache-size-active: 3230029
    ds-mon-db-log-size-active: 4604407
    ds-mon-db-log-cleaner-file-deletion-count: 0
    ds-mon-db-log-utilization-min: 56
    ds-mon-db-log-utilization-max: 56
    ds-mon-db-log-size-total: 4604407
    ds-mon-db-log-files-open: 1
    ds-mon-db-log-files-opened: 5
    ds-mon-db-checkpoint-count: 0
    ds-cfg-backend-id: userRoot

See Also

How do I use cn=monitor entry in DS/OpenDJ (All versions) for monitoring?

FAQ: Monitoring DS/OpenDJ

Performance tuning and monitoring ForgeRock products

Release Notes › New Features

Related Training

N/A

Related Issue Tracker IDs

N/A


FAQ: Monitoring DS/OpenDJ

The purpose of this FAQ is to provide answers to commonly asked questions regarding monitoring in DS/OpenDJ.

Frequently asked questions

Q. How can I check that DS/OpenDJ is up and running?

A. You can perform a heartbeat check against DS/OpenDJ to verify that the server is up and running.

See How do I perform a heartbeat check against DS/OpenDJ (All versions)?for further information.

Q. How can I check if a backend is online?

A. You can check if a backend is online by performing a ldapsearch against the specific backend.

See How do I check if a backend is online in DS/OpenDJ (All versions)? for further information.

Q. How can I monitor replication?

A. You can use the dsreplication status command to give you an overall view of the replication topology, including whether the servers are synchronized. For example:

$ ./dsreplication status --adminUID admin --adminPassword password --hostname ds1.example.com --port 4444 --trustAll

See Reference › dsreplication for further information.

Note

The M.C.and A.O.M.C. metrics returned from the dsreplication status command are deprecated in DS 6; you should monitor replication delay instead. See Administration Guide › Monitoring Replication Delay Over LDAP and Administration Guide › Monitoring Replication Delay Over HTTP for further information.

Alternatively, you can do a ldapsearch against baseDN "cn=Replication,cn=monitor" as detailed in How do I use cn=monitor entry in DS/OpenDJ (All versions) for monitoring?

Q. How can I monitor performance?

A. You can monitor performance by performing a ldapsearch against cn=monitor to return operation statistics.

See How do I use cn=monitor entry in DS/OpenDJ (All versions) for monitoring? for further information on monitoring operation statistics.

Q. What can I monitor with the cn=monitor entry?

A. DS/OpenDJ exposes a lot of different monitoring information over LDAP under this entry.

See How do I use cn=monitor entry in DS/OpenDJ (All versions) for monitoring? for further information, including specific examples you may want to monitor.

Q. Do the statistics under the cn=monitor entry persist when the server is restarted?

A. No, the statistics under cn=monitor are reset each time the server is restarted.

Q. Do I have to use the Directory Manager user for JMX monitoring?

A. No, you can use any user for JMX monitoring; you just need to add the JMX privileges (jmx-notify, jmx-read, and jmx-write) to the user you want to access JMX monitoring. No users have access to JMX monitoring by default.

See Administration Guide › JMX-Based Monitoring for an example of how to add these privileges to a user (Directory Manager, but you can substitute any user you want).

Q. How can I connect to JMX to ensure mbeans are returned?

A. You must be authenticated to the DS/OpenDJ server via a JMX client to see the mbeans with associated attributes and operations. Authenticating to the server is the only way to expose the sensitive elements within the mbeans; connecting to the process directly will not show them. Additionally, you must ensure the user who authenticates has JMX privileges.

See JMX Client Access and JMX-Based Monitoring for more information.

Q. Can I change the default listen-address for the JMX Connection Handler?

A. Yes, as of OpenDJ 2.6.2 you can change the default listen-address. The default listen-address for the JMX Connection Handler is 0.0.0.0.

Q. What URL should I use in JConsole to log in?

A. You must always use a remote URL, even if you are using a local connection to the JVM. For example:

service:jmx:rmi:///jndi/rmi://localhost:1689/org.opends.server.protocols.jmx.client-unknown

You can substitute a hostname alias (such as ds1.example.com) or an IP address for the localhost part of this URL. 

Note

If SSL is enabled, you must set up the truststore on the system where JConsole is running and configure SSL properly to monitor a remote application. See Using JConsole for further information.

See Also

FAQ: DS/OpenDJ performance and tuning

How do I generate sample user data for performance testing in DS/OpenDJ (All versions)?

Unindexed searches causing slow searches and poor performance on DS/OpenDJ (All versions) server

Administration Guide › Monitoring, Logging, and Alerts

Related Training

ForgeRock Directory Services Core Concepts


How do I use cn=monitor entry in DS/OpenDJ (All versions) for monitoring?

The purpose of this article is to provide information on using the cn=monitor entry in DS/OpenDJ for monitoring purposes. DS/OpenDJ exposes monitoring information over LDAP under this entry.

Using the cn=monitor entry

You can perform a ldapsearch against the cn=monitor entry and sub-entries to provide a variety of statistics that are useful for monitoring DS/OpenDJ. 

Some key baseDNs to search are:

baseDN Details
cn=monitor Provides general server information (an example of the type of information returned is shown below).
cn=System Information,cn=monitor Provides information about the system and the JVM.
cn=Disk Space Monitor,cn=monitor Provides information about disks, including disk location and space available.
cn=Work Queue,cn=monitor Provides information about the work queue, including its backlog, average and max backlog. 
cn=Client Connections,cn=monitor Provides information about all open client connections.
cn=JVM Memory Usage,cn=monitor Provides information about memory usage in the JVM.
cn=userRoot Database Environment,cn=monitor Provides monitoring information about the Berkeley DB Java Edition backend.

You can return all monitoring information using a command such as:

$ ./ldapsearch --port 389 --bindDN "cn=Directory Manager" --bindPassword password --baseDN "cn=monitor" --searchScope sub "(objectClass=*)" \*

This would return something similar to:

dn: cn=monitor
objectClass: top
objectClass: ds-monitor-entry
objectClass: extensibleObject
cn: monitor
productName: OpenDJ
vendorName: ForgeRock AS.
vendorVersion: OpenDJ 2.6.3
startTime: 20150615163856Z
currentTime: 20150615164754Z
upTime: 0 days 0 hours 8 minutes 57 seconds
currentConnections: 1
maxConnections: 1
totalConnections: 7

...

dn: cn=Client Connections,cn=monitor
objectClass: top
objectClass: ds-monitor-entry
objectClass: extensibleObject
cn: Client Connections
connection: connID="6" connectTime="20150615164754Z" source="127.0.0.1:53197" destination="127.0.0.1:1389" ldapVersion="3" authDN="cn=Directory Manager,cn=Root DNs,cn=config" security="none" opsInProgress="1"

...

and so on.

Alternatively, you can perform more specific searches against individual baseDNs and include particular objectClass parameters and/or attributes to filter the information returned. See below for some examples of useful statistics that you may want to monitor.

Note

Returning all monitoring information using the above command is a great way to initially find the information you are looking for; you can then construct specific searches once you know the baseDN and attributes you're interested in.

Monitoring replication

You can monitor replication for each Directory server and Replication server that the server searched knows about using a command similar to the following:

$ ./ldapsearch --port 389 --bindDN "cn=Directory Manager" --bindPassword password --baseDN "cn=Replication,cn=monitor" --searchScope sub "(objectClass=*)" \*

Specifically, you would want to check the following attributes, which can signify issues with replication if they are not equal to 0:

  • missing-changes
  • approximate-delay

Monitoring operation statistics

You can monitor operation statistics using a command similar to the following:

$ ./ldapsearch --port 389 --bindDN "cn=Directory Manager" --bindPassword password --baseDN "cn=monitor" --searchScope sub "(objectClass=ds-connectionhandler-statistics-monitor-entry)" \*

This command returns three sets of statistics:

  • Overall read/write statistics (count of messages, bytes etc).
  • Number of requests and responses per operation.
  • Performance metrics - count of each operation finished (xxx-total-count and xxx-total-time), and the total execution time of these operations.

This final set of statistics can be really useful for monitoring performance. You can also determine average operation times by querying this on a regular basis and calculating the differences from the previous query. Both the average operations per second and etime per operation can be derived.

Monitoring the work queue

You can monitor the work queue using a command similar to the following:

$ ./ldapsearch --port 389 --bindDN "cn=Directory Manager" --bindPassword password --baseDN "cn=Work Queue,cn=monitor" --searchScope sub "(objectClass=*)" \*

This command will give you an understanding of how busy your worker threads are as the queue will typically stay empty if there are free worker threads. This can be seen by checking the requestsRejectedDueToQueueFull attribute, which will only increment when the queue is full and the backlog attributes (maxRequestBacklog, currentRequestBacklog and averageRequestBacklog), which will indicate queue size. 

Monitoring database size

You can monitor the database size using a command similar to the following:

$ ./ldapsearch --port 389 --bindDN "cn=Directory Manager" --bindPassword password --baseDN "cn=monitor" --searchScope sub "(objectClass=ds-backend-monitor-entry)"\*

This command will return database size details for each backend, where:

  • ds-backend-entry-count shows the total number of entries in the the backends.
  • ds-base-dn-entry-count shows the total number of entries per base DN.

Monitoring active users

You can monitor users who are currently connected to the DS/OpenDJ server using a command similar to the following:

$ ./ldapsearch --port 389 --bindDN "cn=Directory Manager" --bindPassword password --baseDN "cn=monitor" --searchScope sub "(&(objectClass=ds-monitor-entry)(connection=*))" \*

This command searches the connection handlers and returns the connection attribute. An example output looks like this:

dn: cn=Client Connections,cn=Administration Connector 0.0.0.0 port 4444,cn=monitor
objectClass: top
objectClass: ds-monitor-entry
objectClass: extensibleObject
cn: Client Connections
connection: connID="1" connectTime="20140918144336Z" source="10.0.2.15:37097" destination="10.0.2.15:4444" ldapVersion="3" authDN="cn=Directory Manager,cn=Root DNs,cn=config" security="TLS" opsInProgress="0"

dn: cn=Client Connections,cn=monitor
objectClass: top
objectClass: ds-monitor-entry
objectClass: extensibleObject
cn: Client Connections
connection: connID="1" connectTime="20140918144336Z" source="10.0.2.15:37097" destination="10.0.2.15:4444" ldapVersion="3" authDN="cn=Directory Manager,cn=Root DNs,cn=config" security="TLS" opsInProgress="0"
connection: connID="77" connectTime="20140918181007Z" source="127.0.0.1:40796" destination="127.0.0.1:389" ldapVersion="3" authDN="cn=Directory Manager,cn=Root DNs,cn=config" security="none" opsInProgress="1"

dn: cn=Client Connections,cn=LDAP Connection Handler 0.0.0.0 port 389,cn=monitor
objectClass: top
objectClass: ds-monitor-entry
objectClass: extensibleObject
cn: Client Connections
connection: connID="77" connectTime="20140918181007Z" source="127.0.0.1:40796" destination="127.0.0.1:389" ldapVersion="3" authDN="cn=Directory Manager,cn=Root DNs,cn=config" security="none" opsInProgress="1"

Where:

  • The first client connections handler is a global view of all connections accessing all connection handlers. In this case, both the Admin handler and the LDAP handler have active connections and each connection is shown in the global "cn=Client Connections,cn=monitor" entry.
  • The second client connections handler shows only those connections that have active LDAP connections. As shown, the connection id 77 (connID="77") is seen in the LDAP Connection Handler as well as the global Client Connections entry. This is also the same connection ID that is shown in the access logs, for example:
    connection: connID="77"...
    
    [15/Oct/2016:14:43:19 -0600] CONNECT conn=77 from=127.0.0.1:54259 to=127.0.0.1:1389 protocol=LDAP
    [15/Oct/2016:14:43:19 -0600] BIND REQ conn=77 op=0 msgID=1 version=3 type=SIMPLE dn="cn=Directory Manager"
    ...

See Also

FAQ: Monitoring DS/OpenDJ

How do I perform a heartbeat check against DS/OpenDJ (All versions)?

How do I check if a backend is online in DS/OpenDJ (All versions)?

FAQ: DS/OpenDJ performance and tuning

Administration Guide › Monitoring, Logging, and Alerts

Related Training

ForgeRock Directory Services Core Concepts

Related Issue Tracker IDs

OPENDJ-1479 (monitoring data / attributes shown for 'cn=monitor' are not documented)

OPENDJ-1882 (currentConnections from cn=monitor is not decremented when JMX connections close)


Logging


Where can I find useful logs for troubleshooting ForgeRock products?

The purpose of this article is to provide information on finding logs across ForgeRock products (AM/OpenAM, DS/OpenDJ, IDM/OpenIDM and IG/OpenIG) for troubleshooting purposes. This article shows default log locations, although most of them can be changed.

AM/OpenAM logs

The following logs are available:

Log type Location
AM/OpenAM debug logs $HOME/[am_instance]/openam/debug
AM/OpenAM audit logs $HOME/[am_instance]/openam/log
ssoadm logs /path/to/openam-tools/[ssoadm_install_folder]/debug
Fedlet debug logs $HOME/fedlet/debug
Container logs if deployed on Apache Tomcat™ (localhost_access_log.YYYY-MM-DD.log and catalina.out) /path/to/tomcat/logs
SSL debug logs if deployed on Apache Tomcat (catalina.out) /path/to/tomcat/logs

Policy Agents logs

The following logs are available:

Log type Location
Web policy agents debug.log /path/to/policy_agent/instances/agent_n/logs/debug directory where the policy agent is installed.
Web policy agents audit.log /path/to/policy_agent/instances/agent_n/logs/audit directory where the policy agent is installed.
Web policy agents install log (installYYYYMMDDHHMMSS.log) /path/to/policy_agent/log directory where the policy agent is installed.

Web policy agents system information:

  • system_n.log (Agents 5.5 and later)
  • agent.log (Pre-Agents 5.5)
/path/to/policy_agent/log directory where the policy agent is installed.
Java agents debug log (amAgent) /path/to/policy_agent/agent_n/logs/debug directory where the policy agent is installed.
Java agents audit logs /path/to/policy_agent/agent_n/logs/audit directory where the policy agent is installed.
Java agents install logs /path/to/policy_agent/installer-logs directory where the policy agent is installed.

DS/OpenDJ logs

The following logs are available:

Log type Location
Embedded DS/OpenDJ logs (access, errors and replication) $HOME/[am_instance]/opends/logs
External DS/OpenDJ logs (access, errors and replication) /path/to/ds/logs directory where DS/OpenDJ is installed.
SSL debug logs (server.out) /path/to/ds/logs directory where DS/OpenDJ is installed

IDM/OpenIDM logs

The following logs are available:

Log type Location
IDM/OpenIDM log (openidm0.log.n) /path/to/idm/logs
IDM/OpenIDM audit logs /path/to/idm/audit
Jetty® Request logs /path/to/idm/logs
DS/OpenDJ password sync plugin /path/to/ds/logs directory where DS/OpenDJ is installed.
Active Directory® password sync plugin (idm.log) The location is specified in the logPath registry entry under the HKEY_LOCAL_MACHINE\SOFTWARE\ForgeRock\OpenIDM\PasswordSync registry key.

IG/OpenIG logs

The following logs are available:

Log type Location
IG 5 and later $HOME/.openig/logs
OpenIG 3.x and 4.x stdout (catalina.out if you are using Apache Tomcat™) or /tmp/gateway.log depending on how you have configured logging.
SAML logs $HOME/.openig/SAML/debug

See Also

How do I collect all the data required for troubleshooting AM/OpenAM and Policy Agents (All versions)?

How do I enable debug logging for troubleshooting Policy Agents (All versions)?

How do I use the Support Extract tool in DS/OpenDJ (All versions) to capture troubleshooting information?

How do I collect debug logs in IG/OpenIG (All versions) for troubleshooting?

Troubleshooting AM/OpenAM and Policy Agents

Troubleshooting DS/OpenDJ

Troubleshooting IDM/OpenIDM

Troubleshooting IG/OpenIG

Related Training

N/A

Related Issue Tracker IDs

N/A


How do I trace transactions through the audit logs for troubleshooting across ForgeRock products?

The purpose of this article is to provide information on tracing transactions through the audit logs to help with troubleshooting across ForgeRock products. This article covers an IG/OpenIG --> AM/OpenAM --> DS/OpenDJ flow to demonstrate this usage and applies to AM/OpenAM (All versions) and OpenAM 13.x, DS/OpenDJ (All versions) and IG/OpenIG (All versions).

Overview

A Trust Transaction Header system property (org.forgerock.http.TrustTransactionHeader) exists to allow the propagation of transaction IDs across ForgeRock products using the HTTP header X-ForgeRock-TransactionId. This header can also be passed to ForgeRock products from client applications so that the same transaction ID is used in client applications and ForgeRock products. For example, in flows such as Client Application --> IG/OpenIG --> AM/OpenAM --> DS/OpenDJ or Client Application --> AM/OpenAM --> DS/OpenDJ.

By default, this system property is set to false so that a malicious actor cannot flood the system with requests using the same transaction ID header to hide their tracks. Once enabled, you can trace a single transaction across multiple products in order to troubleshoot their interactions. See the following links for further information:

This article demonstrates using this HTTP header to trace a single transaction through the audit logs for an IG/OpenIG --> AM/OpenAM --> DS/OpenDJ flow and includes the same transaction in the Apache Tomcat™ access logs for a complete view of the transaction's journey.

Note

A related article exists for tracing transaction IDs through IDM/OpenIDM logs using the thread ID: How do I add Thread IDs to log statements in IDM/OpenIDM (All versions)?

Tracing transaction IDs

The following example demonstrates configuring the products to propagate transaction IDs and then verifying that they appear in the logs, including the Tomcat log:

  1. Configure amHandler in IG/OpenIG to use the ForgeRockClientHandler in place of the ClientHandler. This change ensures the X-ForgeRock-TransactionId header values are passed to AM/OpenAM. For example, this demonstration is based on making this change to the amHandler in PolicyEnforcementFilter. See Configuration Reference › PolicyEnforcementFilter for further details. 
  2. Enable audit logging in IG/OpenIG as detailed in Gateway Guide › Auditing.
  3. Configure DS/OpenDJ to trust transaction IDs sent with client application requests using dsconfig, for example:
    $ ./dsconfig set-global-configuration-prop --advanced --port 4444 --hostname ds1.example.com --bindDN "cn=Directory Manager" --bindPassword password --set trust-transaction-ids:true --trustAll --no-prompt 
  4. Configure AM/OpenAM to trust any incoming X-ForgeRock-TransactionId headers as detailed in Setup and Maintenance Guide › Configuring the Trust Transaction Header System Property.
  5. Update Tomcat's server.xml file to log the X-ForgeRock-Transactionid. You should modify the pattern to match the following example for the org.apache.catalina.valves.AccessLogValve className:
    <Valve className="org.apache.catalina.valves.AccessLogValve" directory="logs"
          prefix="localhost_access_log" suffix=".txt"
          pattern="%h %l %u %t &quot;%r&quot; %s %b %{X-ForgeRock-Transactionid}i" />
    
  6. Invoke the IG/OpenIG route.
  7. Note the transaction ID and then examine the following logs to trace the progress and corresponding operations of the request - In this example, we search the logs for transaction ID "7db465af-c0fc-4713-b1ac-0d2c889ea1b7-27" (shown in bold):
    • IG/OpenIG access.audit.json log:
      {"realm":"/","transactionId":"2f34280f-c061-423c-8e64-adc9f8e12b95-508","userId":"id=jdoe,ou=user,o=employees,ou=services,dc=openam,dc=forgerock,dc=org","client":{"ip":"127.0.0.1","port":37796},"server":{"ip":"127.0.0.1","port":8989},"http":{"request":{"secure":false,"method":"POST","queryParameters":{"_action":["validate"]},"headers":{"accept-api-version":["protocol=1.0,resource=1.1"],"host":["host1.example.com:8989"],"user-agent":["Apache-HttpAsyncClient/4.1.2 (Java/1.8.0_151)"],"x-forgerock-transactionid":["7db465af-c0fc-4713-b1ac-0d2c889ea1b7-27/0"],"x-forwarded-for":["192.0.2.0"]},"cookies":{},"path":"http://host1.example.com:8989/openam/json/sessions"}},"request":{"protocol":"CREST","operation":"ACTION","detail":{"action":"validate"}},"timestamp":"2018-05-18T14:32:38.987Z","eventName":"AM-ACCESS-OUTCOME","component":"Session","response":{"status":"SUCCESSFUL","statusCode":"","elapsedTime":11,"elapsedTimeUnits":"MILLISECONDS"},"trackingIds":["2f34280f-c061-423c-8e64-adc9f8e12b95-469"],"_id":"2f34280f-c061-423c-8e64-adc9f8e12b95-510"}
      {"realm":"/","transactionId":"2f34280f-c061-423c-8e64-adc9f8e12b95-514","userId":"id=jdoe,ou=user,o=employees,ou=services,dc=openam,dc=forgerock,dc=org","client":{"ip":"192.0.2.0","port":55437},"server":{"ip":"192.0.2.24","port":8989},"http":{"request":{"secure":false,"method":"POST","queryParameters":{"_action":["getSessionInfo"]},"headers":{"accept":["application/json, text/javascript, */*; q=0.01"],"accept-api-version":["protocol=1.0,resource=2.0"],"dnt":["1"],"host":["host1.example.com:8989"],"referer":["http://host1.example.com:8989/openam/XUI/?service=LDAP&realm=/employees&goto=http://ig1.example.com:9292/home/pep"],"user-agent":["Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:59.0) Gecko/20100101 Firefox/59.0"],"x-requested-with":["XMLHttpRequest"]},"cookies":{"i18next":"en-US"},"path":"http://host1.example.com:8989/openam/json/sessions"}},"request":{"protocol":"CREST","operation":"ACTION","detail":{"action":"getSessionInfo"}},"timestamp":"2018-05-18T14:32:39.009Z","eventName":"AM-ACCESS-OUTCOME","component":"Session","response":{"status":"SUCCESSFUL","statusCode":"","elapsedTime":1,"elapsedTimeUnits":"MILLISECONDS"},"trackingIds":["2f34280f-c061-423c-8e64-adc9f8e12b95-469"],"_id":"2f34280f-c061-423c-8e64-adc9f8e12b95-518"}
      {"realm":"/employees","transactionId":"7db465af-c0fc-4713-b1ac-0d2c889ea1b7-27/1","userId":"id=policyAdmin,ou=user,o=employees,ou=services,dc=openam,dc=forgerock,dc=org","timestamp":"2018-05-18T14:32:39.057Z","eventName":"AM-ACCESS-OUTCOME","component":"Authentication","response":{"status":"SUCCESSFUL","statusCode":"","elapsedTime":54,"elapsedTimeUnits":"MILLISECONDS"},"client":{"ip":"127.0.0.1","port":37797},"http":{"request":{"secure":false,"method":"POST","path":"http://host1.example.com:8989/openam/json/employees/authenticate","queryParameters":{},"headers":{"Accept-API-Version":["protocol=2.1"],"host":["host1.example.com:8989"],"user-agent":["Apache-HttpAsyncClient/4.1.2 (Java/1.8.0_151)"],"x-forgerock-transactionid":["7db465af-c0fc-4713-b1ac-0d2c889ea1b7-27/1"],"x-forwarded-for":["192.0.2.0"],"x-openam-username":["policyAdmin"]},"cookies":{}}},"trackingIds":["2f34280f-c061-423c-8e64-adc9f8e12b95-516"],"_id":"2f34280f-c061-423c-8e64-adc9f8e12b95-527"}
      {"realm":"/employees","transactionId":"2f34280f-c061-423c-8e64-adc9f8e12b95-529","userId":"id=policyAdmin,ou=user,o=employees,ou=services,dc=openam,dc=forgerock,dc=org","client":{"ip":"127.0.0.1","port":37796},"server":{"ip":"127.0.0.1","port":8989},"http":{"request":{"secure":false,"method":"POST","path":"http://host1.example.com:8989/openam/json/employees/policies","queryParameters":{"_action":["evaluate"]},"headers":{"accept-api-version":["protocol=1.0,resource=2.0"],"host":["host1.example.com:8989"],"user-agent":["Apache-HttpAsyncClient/4.1.2 (Java/1.8.0_151)"],"x-forgerock-transactionid":["7db465af-c0fc-4713-b1ac-0d2c889ea1b7-27/2"],"x-forwarded-for":["192.0.2.0"]},"cookies":{}}},"request":{"protocol":"CREST","operation":"ACTION","detail":{"action":"evaluate"}},"timestamp":"2018-05-18T14:32:39.083Z","eventName":"AM-ACCESS-OUTCOME","component":"Policy","response":{"status":"SUCCESSFUL","statusCode":"","elapsedTime":20,"elapsedTimeUnits":"MILLISECONDS","detail":[{"advices":{},"ttl":9223372036854775807,"resource":"http://igApp.example.com:8081/home/pep","actions":{"POST":true,"GET":true},"attributes":{"givenName":["Jane"],"dn":["uid=jdoe,ou=People,dc=example,dc=com"],"isMemberOf":[],"cn":["Jane Doe"]}}]},"trackingIds":["2f34280f-c061-423c-8e64-adc9f8e12b95-516"],"_id":"2f34280f-c061-423c-8e64-adc9f8e12b95-533"}
      
    • Tomcat localhost_access_log file:
      127.0.0.1 - - [18/May/2018:10:32:38 -0400] "POST /openam/json/sessions?_action=validate HTTP/1.1" 200 61 7db465af-c0fc-4713-b1ac-0d2c889ea1b7-27/0
      192.0.2.0 - - [18/May/2018:10:32:38 -0400] "GET /openam/XUI/org/forgerock/openam/ui/common/models/JSONSchema.js?v=5.5.1 HTTP/1.1" 200 12619 -
      192.0.2.0 - - [18/May/2018:10:32:38 -0400] "GET /openam/XUI/org/forgerock/openam/ui/common/models/JSONValues.js?v=5.5.1 HTTP/1.1" 200 16418 -
      192.0.2.0 - - [18/May/2018:10:32:38 -0400] "GET /openam/XUI/org/forgerock/openam/ui/admin/services/SMSServiceUtils.js?v=5.5.1 HTTP/1.1" 200 6112 -
      192.0.2.0 - - [18/May/2018:10:32:38 -0400] "GET /openam/XUI/org/forgerock/openam/ui/common/util/Promise.js?v=5.5.1 HTTP/1.1" 200 1827 -
      192.0.2.0 - - [18/May/2018:10:32:39 -0400] "GET /openam/XUI/org/forgerock/openam/ui/common/models/schemaTransforms/transformEnumTypeToString.js?v=5.5.1 HTTP/1.1" 200 838 -
      192.0.2.0 - - [18/May/2018:10:32:39 -0400] "GET /openam/XUI/org/forgerock/openam/ui/common/models/schemaTransforms/transformBooleanTypeToCheckboxFormat.js?v=5.5.1 HTTP/1.1" 200 643 -
      192.0.2.0 - - [18/May/2018:10:32:39 -0400] "GET /openam/XUI/org/forgerock/openam/ui/common/models/schemaTransforms/warnOnInferredPasswordWithoutFormat.js?v=5.5.1 HTTP/1.1" 200 999 -
      192.0.2.0 - - [18/May/2018:10:32:39 -0400] "POST /openam/json/sessions?_action=getSessionInfo HTTP/1.1" 200 297 -
      127.0.0.1 - - [18/May/2018:10:32:39 -0400] "POST /openam/json/employees/authenticate HTTP/1.1" 200 164 7db465af-c0fc-4713-b1ac-0d2c889ea1b7-27/1
      127.0.0.1 - - [18/May/2018:10:32:39 -0400] "POST /openam/json/employees/policies?_action=evaluate HTTP/1.1" 200 266 7db465af-c0fc-4713-b1ac-0d2c889ea1b7-27/2​
      
    • AM/OpenAM access.audit.json file:
      {"realm":"/","transactionId":"2f34280f-c061-423c-8e64-adc9f8e12b95-508","userId":"id=jdoe,ou=user,o=employees,ou=services,dc=openam,dc=forgerock,dc=org","client":{"ip":"127.0.0.1","port":37796},"server":{"ip":"127.0.0.1","port":8989},"http":{"request":{"secure":false,"method":"POST","queryParameters":{"_action":["validate"]},"headers":{"accept-api-version":["protocol=1.0,resource=1.1"],"host":["host1.example.com:8989"],"user-agent":["Apache-HttpAsyncClient/4.1.2 (Java/1.8.0_151)"],"x-forgerock-transactionid":["7db465af-c0fc-4713-b1ac-0d2c889ea1b7-27/0"],"x-forwarded-for":["192.0.2.0"]},"cookies":{},"path":"http://host1.example.com:8989/openam/json/sessions"}},"request":{"protocol":"CREST","operation":"ACTION","detail":{"action":"validate"}},"timestamp":"2018-05-18T14:32:38.987Z","eventName":"AM-ACCESS-OUTCOME","component":"Session","response":{"status":"SUCCESSFUL","statusCode":"","elapsedTime":11,"elapsedTimeUnits":"MILLISECONDS"},"trackingIds":["2f34280f-c061-423c-8e64-adc9f8e12b95-469"],"_id":"2f34280f-c061-423c-8e64-adc9f8e12b95-510"}
      {"realm":"/","transactionId":"2f34280f-c061-423c-8e64-adc9f8e12b95-514","userId":"id=jdoe,ou=user,o=employees,ou=services,dc=openam,dc=forgerock,dc=org","client":{"ip":"192.0.2.0","port":55437},"server":{"ip":"192.0.2.24","port":8989},"http":{"request":{"secure":false,"method":"POST","queryParameters":{"_action":["getSessionInfo"]},"headers":{"accept":["application/json, text/javascript, */*; q=0.01"],"accept-api-version":["protocol=1.0,resource=2.0"],"dnt":["1"],"host":["host1.example.com:8989"],"referer":["http://host1.example.com:8989/openam/XUI/?service=LDAP&realm=/employees&goto=http://ig1.example.com:9292/home/pep"],"user-agent":["Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:59.0) Gecko/20100101 Firefox/59.0"],"x-requested-with":["XMLHttpRequest"]},"cookies":{"i18next":"en-US"},"path":"http://host1.example.com:8989/openam/json/sessions"}},"request":{"protocol":"CREST","operation":"ACTION","detail":{"action":"getSessionInfo"}},"timestamp":"2018-05-18T14:32:39.009Z","eventName":"AM-ACCESS-OUTCOME","component":"Session","response":{"status":"SUCCESSFUL","statusCode":"","elapsedTime":1,"elapsedTimeUnits":"MILLISECONDS"},"trackingIds":["2f34280f-c061-423c-8e64-adc9f8e12b95-469"],"_id":"2f34280f-c061-423c-8e64-adc9f8e12b95-518"}
      {"realm":"/employees","transactionId":"7db465af-c0fc-4713-b1ac-0d2c889ea1b7-27/1","userId":"id=policyAdmin,ou=user,o=employees,ou=services,dc=openam,dc=forgerock,dc=org","timestamp":"2018-05-18T14:32:39.057Z","eventName":"AM-ACCESS-OUTCOME","component":"Authentication","response":{"status":"SUCCESSFUL","statusCode":"","elapsedTime":54,"elapsedTimeUnits":"MILLISECONDS"},"client":{"ip":"127.0.0.1","port":37797},"http":{"request":{"secure":false,"method":"POST","path":"http://host1.example.com:8989/openam/json/employees/authenticate","queryParameters":{},"headers":{"Accept-API-Version":["protocol=2.1"],"host":["host1.example.com:8989"],"user-agent":["Apache-HttpAsyncClient/4.1.2 (Java/1.8.0_151)"],"x-forgerock-transactionid":["7db465af-c0fc-4713-b1ac-0d2c889ea1b7-27/1"],"x-forwarded-for":["192.0.2.0"],"x-openam-username":["policyAdmin"]},"cookies":{}}},"trackingIds":["2f34280f-c061-423c-8e64-adc9f8e12b95-516"],"_id":"2f34280f-c061-423c-8e64-adc9f8e12b95-527"}
      {"realm":"/employees","transactionId":"2f34280f-c061-423c-8e64-adc9f8e12b95-529","userId":"id=policyAdmin,ou=user,o=employees,ou=services,dc=openam,dc=forgerock,dc=org","client":{"ip":"127.0.0.1","port":37796},"server":{"ip":"127.0.0.1","port":8989},"http":{"request":{"secure":false,"method":"POST","path":"http://host1.example.com:8989/openam/json/employees/policies","queryParameters":{"_action":["evaluate"]},"headers":{"accept-api-version":["protocol=1.0,resource=2.0"],"host":["host1.example.com:8989"],"user-agent":["Apache-HttpAsyncClient/4.1.2 (Java/1.8.0_151)"],"x-forgerock-transactionid":["7db465af-c0fc-4713-b1ac-0d2c889ea1b7-27/2"],"x-forwarded-for":["192.0.2.0"]},"cookies":{}}},"request":{"protocol":"CREST","operation":"ACTION","detail":{"action":"evaluate"}},"timestamp":"2018-05-18T14:32:39.083Z","eventName":"AM-ACCESS-OUTCOME","component":"Policy","response":{"status":"SUCCESSFUL","statusCode":"","elapsedTime":20,"elapsedTimeUnits":"MILLISECONDS","detail":[{"advices":{},"ttl":9223372036854775807,"resource":"http://igApp.example.com:8081/home/pep","actions":{"POST":true,"GET":true},"attributes":{"givenName":["Jane"],"dn":["uid=jdoe,ou=People,dc=example,dc=com"],"isMemberOf":[],"cn":["Jane Doe"]}}]},"trackingIds":["2f34280f-c061-423c-8e64-adc9f8e12b95-516"],"_id":"2f34280f-c061-423c-8e64-adc9f8e12b95-533"}
      
    • DS/OpenDJ ldap-access.audit.json file:
      {"eventName":"DJ-LDAP","client":{"ip":"127.0.0.1","port":40459},"server":{"ip":"127.0.0.1","port":3389},"request":{"protocol":"LDAP","operation":"SEARCH","connId":20,"msgId":24,"dn":"dc=example,dc=com","scope":"sub","filter":"(uid=policyAdmin)","attrs":["dn","uid"]},"transactionId":"7db465af-c0fc-4713-b1ac-0d2c889ea1b7-27/1/0","response":{"status":"SUCCESSFUL","statusCode":"0","elapsedTime":1,"elapsedTimeUnits":"MILLISECONDS","nentries":1},"timestamp":"2018-05-18T14:32:39.018Z","_id":"7d2d88aa-26f4-4ca6-a248-0e1486292159-124403"}
      {"eventName":"DJ-LDAP","client":{"ip":"127.0.0.1","port":40460},"server":{"ip":"127.0.0.1","port":3389},"request":{"protocol":"LDAP","operation":"BIND","connId":21,"msgId":24,"version":"3","dn":"uid=policyAdmin,ou=people,dc=example,dc=com","authType":"SIMPLE"},"transactionId":"7db465af-c0fc-4713-b1ac-0d2c889ea1b7-27/1/1","response":{"status":"SUCCESSFUL","statusCode":"0","elapsedTime":1,"elapsedTimeUnits":"MILLISECONDS"},"userId":"uid=policyAdmin,ou=people,dc=example,dc=com","timestamp":"2018-05-18T14:32:39.020Z","_id":"7d2d88aa-26f4-4ca6-a248-0e1486292159-124405"}
      {"eventName":"DJ-LDAP","client":{"ip":"127.0.0.1","port":40453},"server":{"ip":"127.0.0.1","port":3389},"request":{"protocol":"LDAP","operation":"SEARCH","connId":18,"msgId":43,"dn":"uid=policyAdmin,ou=people,dc=example,dc=com","scope":"base","filter":"(objectClass=*)","attrs":["inetuserstatus"]},"transactionId":"7db465af-c0fc-4713-b1ac-0d2c889ea1b7-27/1/2","response":{"status":"SUCCESSFUL","statusCode":"0","elapsedTime":1,"elapsedTimeUnits":"MILLISECONDS","nentries":1},"timestamp":"2018-05-18T14:32:39.023Z","_id":"7d2d88aa-26f4-4ca6-a248-0e1486292159-124407"}
      {"eventName":"DJ-LDAP","client":{"ip":"127.0.0.1","port":40453},"server":{"ip":"127.0.0.1","port":3389},"request":{"protocol":"LDAP","operation":"SEARCH","connId":18,"msgId":44,"dn":"uid=policyAdmin,ou=people,dc=example,dc=com","scope":"base","filter":"(objectClass=*)","attrs":["inetuserstatus"]},"transactionId":"7db465af-c0fc-4713-b1ac-0d2c889ea1b7-27/1/3","response":{"status":"SUCCESSFUL","statusCode":"0","elapsedTime":1,"elapsedTimeUnits":"MILLISECONDS","nentries":1},"timestamp":"2018-05-18T14:32:39.025Z","_id":"7d2d88aa-26f4-4ca6-a248-0e1486292159-124409"}
      {"eventName":"DJ-LDAP","client":{"ip":"127.0.0.1","port":40453},"server":{"ip":"127.0.0.1","port":3389},"request":{"protocol":"LDAP","operation":"SEARCH","connId":18,"msgId":45,"dn":"ou=groups,dc=example,dc=com","scope":"sub","filter":"(&(objectclass=groupOfUniqueNames)(uniqueMember=uid=policyAdmin,ou=people,dc=example,dc=com))","attrs":["dn"]},"transactionId":"7db465af-c0fc-4713-b1ac-0d2c889ea1b7-27/2/0","response":{"status":"SUCCESSFUL","statusCode":"0","elapsedTime":1,"elapsedTimeUnits":"MILLISECONDS","nentries":1},"timestamp":"2018-05-18T14:32:39.070Z","_id":"7d2d88aa-26f4-4ca6-a248-0e1486292159-124411"}
      {"eventName":"DJ-LDAP","client":{"ip":"127.0.0.1","port":40453},"server":{"ip":"127.0.0.1","port":3389},"request":{"protocol":"LDAP","operation":"SEARCH","connId":18,"msgId":46,"dn":"ou=groups,dc=example,dc=com","scope":"sub","filter":"(&(objectclass=groupOfUniqueNames)(uniqueMember=uid=policyAdmin,ou=people,dc=example,dc=com))","attrs":["dn"]},"transactionId":"7db465af-c0fc-4713-b1ac-0d2c889ea1b7-27/2/3","response":{"status":"SUCCESSFUL","statusCode":"0","elapsedTime":1,"elapsedTimeUnits":"MILLISECONDS","nentries":1},"timestamp":"2018-05-18T14:32:39.079Z","_id":"7d2d88aa-26f4-4ca6-a248-0e1486292159-124413"}
      {"eventName":"DJ-LDAP","client":{"ip":"127.0.0.1","port":40272},"server":{"ip":"127.0.0.1","port":3389},"request":{"protocol":"LDAP","operation":"SEARCH","connId":16,"msgId":56,"dn":"","scope":"base","filter":"(objectClass=*)","attrs":["1.1"]},"transactionId":"0","response":{"status":"SUCCESSFUL","statusCode":"0","elapsedTime":1,"elapsedTimeUnits":"MILLISECONDS","nentries":1},"timestamp":"2018-05-18T14:32:41.824Z","_id":"7d2d88aa-26f4-4ca6-a248-0e1486292159-124415"}
      {"eventName":"DJ-LDAP","client":{"ip":"127.0.0.1","port":40273},"server":{"ip":"127.0.0.1","port":3389},"request":{"protocol":"LDAP","operation":"SEARCH","connId":17,"msgId":57,"dn":"","scope":"base","filter":"(objectClass=*)","attrs":["1.1"]},"transactionId":"0","response":{"status":"SUCCESSFUL","statusCode":"0","elapsedTime":0,"elapsedTimeUnits":"MILLISECONDS","nentries":1},"timestamp":"2018-05-18T14:32:41.879Z","_id":"7d2d88aa-26f4-4ca6-a248-0e1486292159-124417"}
      
Note

Tracing the journey of a single transaction can be very effective in locating root cause when it's not really possible to go by timestamp alone. For example, if you are experiencing delays with response times, you can now see if a specific user or group of users has any missing attributes that are expected or needs special consideration during AuthN processing of a request.

See Also

How do I configure audit logging via ssoadm in AM (All versions) and OpenAM 13.x?

How do I use the Access log to troubleshoot DS/OpenDJ (All versions)?

Troubleshooting AM/OpenAM and Policy Agents

Troubleshooting DS/OpenDJ

Troubleshooting IG/OpenIG

Related Training

N/A

Related Issue Tracker IDs

N/A


How do I use the Access log to troubleshoot DS/OpenDJ (All versions)?

The purpose of this article is to provide information on understanding the Access log to help you troubleshoot DS/OpenDJ.

Overview

This article provides information to help you understand the Access log and use it to troubleshoot DS/OpenDJ. The following topics are covered:

You can also analyze operation performance in the log files using a tool called slowops; the source for the slowops tool can be found here: Githutb: OpenDJ Utilities - slowops.

Log details

In DS 5, the default log format changed to JSON, with each entry providing details of the request (REQ) and the response (RES). Earlier versions of OpenDJ use a file-based log format. This article shows examples based on both the JSON and file-based log formats; each set of examples contain the same details to help you correlate the log formats. 

In all versions of DS/OpenDJ, you can choose which log format to use: Administration Guide › Monitoring, Logging, and Alerts › Access Logs. The following table indicates which log formats are available depending on your version of DS/OpenDJ:

Access log format DS 6.x DS 5.x OpenDJ 3.5.x OpenDJ 3.0
File-based -- -- Yes (default) Yes (default)
JSON Yes (default) Yes (default) -- --
CSV Yes Yes Yes Yes
Syslog Yes Yes Yes Yes
JDBC Yes Yes Yes Yes
Elasticsearch Yes Yes Yes --
JMS Yes Yes -- --
Splunk Yes Yes -- --

The log files are located in the following directory depending on whether you are using an embedded or external DS/OpenDJ:

  • Embedded$HOME/[am_instance]/opends/logs
  • External/path/to/ds/logs directory where DS/OpenDJ is installed.

The default log files are named as follows depending on version:

  • DS 5 and later: the files are called ldap-access.audit.json.<datestamp>, for example: ldap-access.audit.json.20170722112257
  • Pre-DS 5: the files are called access.<datestamp>, for example, access.20170707085044Z

Identifying source of the change

You can identify whether a change was made locally or via replication by checking the logs as follows depending on your DS/OpenDJ version:

DS 5 and later

  • Local changes are made on a real connection as indicated by actual client/server addresses and the connection value ("connId":nnnn), for example:
    {"eventName":"DJ-LDAP","client":{"ip":"198.51.100.0","port":8080},"server":{"ip":"198.51.100.0","port":1389},"request":{"protocol":"LDAP","operation":"ADD","connId":5799,"msgId":361,"dn":"uid=user1,ou=People,ou=employees,dc=example,dc=com"},"transactionId":"0"
    
  • Replicated changes are shown with internal IP addresses ("ip":"internal","port":-1), a negative connection value ("connId":-n) and a sync type operation ("opType":"sync"), for example:
    {"eventName":"DJ-LDAP","client":{"ip":"internal","port":-1},"server":{"ip":"internal","port":-1},"request":{"protocol":"internal","operation":"ADD","connId":-4,"msgId":361,"opType":"sync","dn":"uid=user101,ou=People,ou=employees,dc=example,dc=com"},"transactionId":"0"

See How do I troubleshoot replication issues in DS/OpenDJ (All versions)? for further information on troubleshooting any replication issues that are identified. 

Pre-DS 5

  • Local changes are made on a real connection (conn=nnnn), for example:
    [21/Jul/2017:15:47:01 +0000] ADD REQ conn=5799 op=210 msgID=361 dn="uid=user1,ou=People,ou=employees,dc=example,dc=com"
    
  • Replicated changes are shown with a conn=-1 (to indicate it's a replicated internal operation) and type=synchronization, for example:
    [21/Jul/2017:15:47:01 +0000] ADD REQ conn=-1 op=210 msgID=361 dn="uid=user101,ou=People,ou=employees,dc=example,dc=com" type=synchronization

See How do I troubleshoot replication issues in DS/OpenDJ (All versions)? for further information on troubleshooting any replication issues that are identified. 

Identifying missing entries

Missing entries are indicated by a "statusCode":"32" or a result=32 in your logs (depending on DS/OpenDJ version). These log entries can indicate a genuine missing entry or have an innocent explanation (such as a client application testing to see if the entry exists by doing a read):

  • If the entry is really missing, you will need to add the entry or restructure your operation to use an entry that does exist.
  • If you have a misconfigured client application, for example pointing to the wrong base DN, you need to correct the configuration.
  • If you have a client application doing test reads, you can ignore these entries.

DS 5 and later

  • Search issue:
    "response":{"status":"FAILED","statusCode":"32","elapsedTime":2,"elapsedTimeUnits":"MILLISECONDS","detail":"The search base entry 'ou=agent1,ou=default,ou=1.0,ou=AgentService,ou=services,dc=example,dc=com' does not exist","nentries":0},"timestamp":"2017-07-21T15:04:53.850Z","_id":"2fff31b5-4682-4760-b32a-fbbfed6bd9ac-433"}
  • Add issues:
    "response":{"status":"FAILED","statusCode":"32","elapsedTime":1,"elapsedTimeUnits":"MILLISECONDS","detail":"The provided entry entryuuid=f3182f2c-0ed1-83e1-7d61-4f1c712e7fd2+ou=People,o=employees cannot be added because its suffix is not defined as one of the suffixes within the Directory Server"},"timestamp":"2017-07-21T15:04:53.850Z","_id":"2fff31b5-4682-4760-b32a-fbbfed6bd9ac-433"}
    "response":{"status":"FAILED","statusCode":"32","elapsedTime":0,"elapsedTimeUnits":"MILLISECONDS","detail":"Entry uid=jdoe,ou=People,ou=employees,dc=example,dc=com cannot be added because its parent entry ou=People,ou=employees,dc=example,dc=com does not exist in the server"},"timestamp":"2017-07-21T15:04:53.850Z","_id":"2fff31b5-4682-4760-b32a-fbbfed6bd9ac-433"}

Pre-DS 5

  • Search issue:
    [21/Jul/2017:15:04:51 +0800] SEARCH RES conn=189 op=213 msgID=14 result=32 message="The search base entry 'ou=agent1,ou=default,ou=1.0,ou=AgentService,ou=services,dc=example,dc=com' does not exist" nentries=0 etime=2
  • Add issues:
    [21/Jul/2017:15:04:51 +0800] ADD RES conn=-189 op=192 msgID=87 result=32 message="The provided entry entryuuid=f3182f2c-0ed1-83e1-7d61-4f1c712e7fd2+ou=People,o=employees cannot be added because its suffix is not defined as one of the suffixes within the Directory Server" etime=1
    [21/Jul/2017:15:04:51 ADD RES conn=189 op=176 msgID=29 result=32 message="Entry uid=jdoe,ou=People,ou=employees,dc=example,dc=com cannot be added because its parent entry ou=People,ou=employees,dc=example,dc=com does not exist in the server" etime=0
    

Understanding slow searches

Slow searches are indicated by high etimes in the access log and are often the result of unindexed searches. Etimes are typically measured in milliseconds, although you can configure the server to use nanoseconds instead. You can also learn other things about the search, which may be causing an issue.

See How do I troubleshoot issues with my indexes in DS/OpenDJ (All versions)? for further information on resolving indexe issues.

The following examples show some common log entries and their meanings/resolution.

Unindexed search

  • DS 5 and later:
    {"eventName":"DJ-LDAP","client":{"ip":"198.51.100.0","port":8080},"server":{"ip":"198.51.100.0","port":1389},"request":{"protocol":"LDAP","operation":"SEARCH","connId":159078,"msgId":154,"dn":"ou=groups,dc=AMusers","scope":"wholeSubtree","filter":"(memberUid=JDoe)","attrs":["1.1"]},"transactionId":"0","response":{"status":"SUCCESSFUL","statusCode":"0","elapsedTime":11999,"elapsedTimeUnits":"MILLISECONDS","additionalItems":" unindexed","nentries":0},"timestamp":"2017-07-21T15:04:51.850Z","_id":"e3a44b25-4800-8cfa-1bad-6e2bb6461607-38233"}
  • Pre-DS 5:
    [21/Jul/2017:15:04:51 +0000] SEARCH REQ conn=159078 op=1403 msgID=154 base="ou=groups,dc=AMusers" scope=wholeSubtree filter="(memberUid=JDoe)" attrs="1.1"
    [21/Jul/2017:15:04:51 +0000] SEARCH RES conn=159078 op=1403 msgID=154 result=0 nentries=0 unindexed etime=11999

The search is shown as unindexed and has a high etime value (etime values should typically be 2 or 3). The high etime value is also an indicator that the search may be unindexed.

This could be resolved by appropriate indexing as detailed in Unindexed searches causing slow searches and poor performance on DS/OpenDJ (All versions) server.

Unindexed search (objectclass=person filter)

  • DS 5 and later:
    {"eventName":"DJ-LDAP","client":{"ip":"198.51.100.0","port":8080},"server":{"ip":"198.51.100.0","port":1389},"request":{"protocol":"LDAP","operation":"SEARCH","connId":159078,"msgId":135,"dn":"dc=example,dc=com","scope":"sub","filter":"(objectclass=person)","attrs":"cn,memberof,ismemberof"},"transactionId":"0","response":{"status":"SUCCESSFUL","statusCode":"0","elapsedTime":11999,"elapsedTimeUnits":"MILLISECONDS","additionalItems":" unindexed","nentries":500},"timestamp":"2017-07-21T15:04:51.850Z","_id":"e3a44b25-4800-8cfa-1bad-6e2bb6461607-38233"}
  • Pre-DS 5:
    [21/Jul/2017:15:04:51 +0000] SEARCH REQ conn=159078 op=254 msgID=135 base="dc=example,dc=com" scope=sub filter="(objectclass=person)" attrs="cn,memberof,ismemberof"
    [21/Jul/2017:15:04:51 +0000] SEARCH RES conn=159078 op=254 msgID=135 result=0 nentries=500 unindexed etime=11999

This example also indicates the search is unindexed.

Additionally, you can see the search is using the objectclass=person filter. It is very likely that all user entries have a objectclass=person value, which would take the IDs maintained by the objectclass=person index over the index-entry-limit. This means all subsequent searches would be forced to search through the entire backend database to find the search candidates. These searches only hit the objectclass=person index; the requested return attributes listed do not touch any indexes as the values for these attributes are returned by DS/OpenDJ.

This could be resolved by appropriate indexing and also by reconstructing the search to avoid using the objectclass=person filter.

Unindexed search (objectclass=* complex filter)

  • DS 5 and later:
    {"eventName":"DJ-LDAP","client":{"ip":"198.51.100.0","port":8080},"server":{"ip":"198.51.100.0","port":1389},"request":{"protocol":"LDAP","operation":"SEARCH","connId":159078,"msgId":135,"dn":"dc=example,dc=com","scope":"sub","filter":"(|(objectclass=*)(myAttr=12345))","attrs":"cn,uid,mail"},"transactionId":"0","response":{"status":"SUCCESSFUL","statusCode":"0","elapsedTime":20379,"elapsedTimeUnits":"MILLISECONDS","additionalItems":" unindexed","nentries":500},"timestamp":"2017-07-21T15:04:51.850Z","_id":"e3a44b25-4800-8cfa-1bad-6e2bb6461607-38233"}
  • Pre-DS 5:
    [21/Jul/2017:15:04:51 +0000] SEARCH REQ conn=159078 op=254 msgID=135 base="dc=example,dc=com" scope=sub filter="(|(objectclass=*)(myAttr=12345))" attrs="cn,uid,mail"
    [21/Jul/2017:15:04:51 +0000] SEARCH RES conn=159078 op=254 msgID=135 result=0 nentries=500 unindexed etime=20379

This example also indicates the search is unindexed.

Using objectclass=* is a presence filter, meaning it will match all objectclass values. Since all LDAP entries have at least one objectclass, this type of search will always be slow and come back unindexed unless your directory is tiny (less than 4k), above this the searches for objectClass=* will always be unindexed.

To further troubleshoot these types of issue, you may also need to log internal operations. You can enable this logging using the following dsconfig command:

$ ./dsconfig set-log-publisher-prop --publisher-name "File-Based Access Logger" --set suppress-internal-operations:false --hostname ds1.example.com --port 4444 --bindDN "cn=Directory Manager" --bindPassword password

High etime

  • DS 5 and later:
    "response":{"status":"SUCCESSFUL","statusCode":"0","elapsedTime":1187,"elapsedTimeUnits":"MILLISECONDS","nentries":1},"timestamp":"2017-07-21T15:04:51.850Z","_id":"e3a44b25-4800-8cfa-1bad-6e2bb6461607-38233"}
  • Pre-DS 5:
    [21/Jul/2017:15:04:51 -0000] SEARCH RES conn=2220857 op=1335 msgID=1136 result=0 nentries=1 etime=1187

This example has the high etime but does not state it is unindexed. Unindexed only appears in the access log if no indexes were used to process the search, but the high etime still suggests a partially unindexed search.

This could be resolved by appropriate indexing.

Understanding slow write operations

Slow write operations can cause their own issues, but if they are too slow, they can eventually timeout causing the write to fail. The following example shows entry locks that prevent the write taking place:

  • DS 5 and later:
    {"eventName":"DJ-LDAP","client":{"ip":"198.51.100.0","port":8080},"server":{"ip":"198.51.100.0","port":1389},"request":{"protocol":"LDAP","operation":"MODIFY","connId":2131,"msgId":47,"dn":"uid=jdoe,ou=People,ou=employees,dc=example,dc=com"},"transactionId":"0","response":{"status":"FAILED","statusCode":"51","elapsedTime":9000,"elapsedTimeUnits":"MILLISECONDS","detail":"Entry uid=jdoe,ou=People,ou=employees,dc=example,dc=com cannot be modified because the server failed to obtain a write lock for this entry after multiple attempts"},"timestamp":"2017-08-08T14:22:43.683Z","_id":"c2a5dbd3-4960-b7d2-96e2-20370bf25b31-737"}
  • Pre-DS 5:
    08/Aug/2017:14:22:43 -0000] MODIFY conn=2131 op=5428 msgID=47 dn="uid=jdoe,ou=People,ou=employees,dc=example,dc=com" result=51 message="Entry uid=jdoe,ou=People,ou=employees,dc=example,dc=com cannot be modified because the server failed to obtain a write lock for this entry after multiple attempts" etime=9000 

These type of errors indicate potential contention issues with clients updating the same entries at the same time. If one client updates an entry on a system that is highly loaded and then another client tries to update the same entry, you will see these write errors.

Expensive indexes

Some index types (for example, substring) are expensive for the server to maintain, particularly with regards to write performance. Indexes should be configured for the types of searches specific to your client application's needs and unnecessary indexes should be avoided. See the following links for further information:

Static groups

If you see writes taking place against static groups, for example:

  • DS 5 and later:
    {"eventName":"DJ-LDAP","client":{"ip":"198.51.100.0","port":8080},"server":{"ip":"198.51.100.0","port":1389},"request":{"protocol":"LDAP","operation":"MODIFY","connId":2131,"msgId":3,"dn":"cn=Admin1,ou=groups,ou=People,o=Test"},"transactionId":"0","response":{"status":"SUCCESSFUL","statusCode":"0","elapsedTime":26495,"elapsedTimeUnits":"MILLISECONDS"},"timestamp":"2017-05-23T09:56:50.683Z","_id":"c2a5dbd3-4960-b7d2-96e2-20370bf25b31-737"}
  • Pre-DS 5:
    [23/May/2017:09:56:50 -0000] MODIFY REQ conn=2131 op=2 msgID=3 dn="cn=Admin1,ou=groups,ou=People,o=Test"
    [23/May/2017:09:56:50 -0000] MODIFY RES conn=2131 op=2 msgID=3 result=0 etime=26495 

You need to ask yourself the following questions to help understand why these write operations are not as fast as expected:

  • How big is this static group? That is, how many uniqueMembers does it contain?
  • How is the application doing the MODIFY operation on these static groups? Is it replacing the whole group or just adding a single member?
  • Are you performing any isMemberOf searches while the group modifications are taking place? 

Finding out more about the operation

Once you have identified your issue, you can grep on a specific connId or conn value to discover more about that operation to help pinpoint the issue. For example:

Identify the origin of the call

grepping for CONNECT in your logs for a specific connId or conn value will tell you where that call originated from; in these examples, the call came from the 198.51.100.0 IP address:

  • DS 5 and later:
    grep 'connId":159078' ldap-access.audit.* | grep CONNECT
    
    ldap-access.audit.json.20170722112257:{"eventName":"DJ-LDAP","client":{"ip":"198.51.100.0","port":8080},"server":{"ip":"198.51.100.0","port":1389},"request":{"protocol":"LDAP","operation":"CONNECT","connId":159078},"transactionId":"0","response":{"status":"SUCCESSFUL","statusCode":"0","elapsedTime":0,"elapsedTimeUnits":"MILLISECONDS"},"timestamp":"2017-07-21T00:38:33.437Z","_id":"3a6d298f-f9e7-4bbf-bc03-0b2c82224c15-1961"}
    
  • Pre-DS 5:
    grep conn=159078 access.* | grep CONNECT
    
    access.201704221607Z:[22/Apr/2017:11:28:43 +0000] CONNECT conn=159078 from=198.51.100.0:8080 to=198.51.100.0:1389 protocol=LDAP
    

Identify who made the call

grepping for BIND or 'BIND REQ' in your logs for a specific connId or conn value will tell you about the authentication request; in these examples, the call was made by cn=Directory Manager:

  • DS 5 and later:
    grep 'connId":159078' ldap-access.audit.* | grep BIND
    
    ldap-access.audit.json.20170722112257:{"eventName":"DJ-LDAP","client":{"ip":"198.51.100.0","port":8080},"server":{"ip":"198.51.100.0","port":1389},"request":{"protocol":"LDAP","operation":"BIND","connId":159078,"msgId":1,"version":"3","dn":"cn=Directory Manager","authType":"SIMPLE"},"transactionId":"1f80c37c-436a-800f-9528-33ec2b6707be-0/16","response":{"status":"SUCCESSFUL","statusCode":"0","elapsedTime":37,"elapsedTimeUnits":"MILLISECONDS"},"userId":"cn=Directory Manager,cn=Root DNs,cn=config","timestamp":"2017-07-21T00:38:33.437Z","_id":"1f80c37c-800f-436a-9528-33ec2b6707be-661"}
    
  • Pre-DS 5:
    grep conn=159078 access.* | grep 'BIND REQ'
    
    access.201704221607Z:[22/Apr/2017:11:28:43 +0000] BIND REQ conn=159078 op=0 msgID=1 version=3 type=SIMPLE dn="cn=Directory\ Manager"
    

See Also

Unindexed searches causing slow searches and poor performance on DS/OpenDJ (All versions) server

How do I troubleshoot issues with my indexes in DS/OpenDJ (All versions)?

How do I use the Support Extract tool in DS/OpenDJ (All versions) to capture troubleshooting information?

How do I troubleshoot replication issues in DS/OpenDJ (All versions)?

How do I change the location of log files for DS/OpenDJ (All versions)?

How do I configure DS (All versions) and OpenDJ 3.x to use the Syslog audit event handler?

Troubleshooting DS/OpenDJ

Administration Guide › Monitoring, Logging, and Alerts › Access Logs

Administration Guide

Related Training

N/A

Related Issue Tracker IDs

OPENAM-11257 (Numerous "result=32" entries in the OpenDJ 's access log whenever OpenAM restarts )


How do I enable Garbage Collector (GC) Logging for DS/OpenDJ (All versions)?

The purpose of this article is to provide information on enabling GC Logging for DS/OpenDJ. It assumes that you already have a working DS/OpenDJ server that is installed.

Enabling GC Logging

Note

You should ensure there is enough disk space for the GC logs; if not, you should enable GC log rotation as well by including the following options when you enable GC logging: -XX:+UseGCLogFileRotation, -XX:NumberOfGCLogFiles=n, -XX:GCLogFileSize=n

 You can enable GC logging as follows:

  1. Locate and edit your /path/to/ds/config/java.properties file for the DS/OpenDJ instance for which you want to enable GC Logging.
  2. Append the following options to the entry start-ds.java-args=
    -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+PrintGCCause -Xloggc:[filename]
    replacing [filename] with the path to the file that you would like to create to store the log file. For example:
    start-ds.java-args=-server -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+PrintGCCause -Xloggc:/tmp/gc.log
  3. Pre-DS 5 only: Execute the /path/to/ds/bin/dsjavaproperties program (or the /path/to/ds/bat/dsjavaproperties.bat program if you are using Microsoft® Windows®). You should get the following output:
    The operation was successful. The server commands will use the java arguments and java home specified in the properties file located in /path/to/ds/config/java.properties. 
    The dsjavaproperties command has been removed in DS 5.
  4. Restart the DS/OpenDJ server.

Once the server has successfully restarted, there should be a GC log file located in the directory specified in the -Xloggc: option. You can use the Universal GC Log Analyzer to analyze your GC log. This is a third-party website that we suggest can be used for analysis but is not supported by ForgeRock.

See Also

Best practice for JVM Tuning

How do I tune DS/OpenDJ (All versions) process sizes: JVM heap and database cache?

How do I collect JVM data for troubleshooting DS/OpenDJ (All versions)?

FAQ: DS/OpenDJ performance and tuning

Administration Guide › Tuning Servers For Performance › Java Settings

Related Training

ForgeRock Directory Services Core Concepts 

Related Issue Tracker IDs

N/A


How do I collect JVM data for troubleshooting DS/OpenDJ (All versions)?

The purpose of this article is to provide information on collecting JVM data for troubleshooting DS/OpenDJ. JVM data (such as stack traces, heap histograms and heap dumps) can be very useful for troubleshooting DS/OpenDJ issues such as slow response times, consistently high CPU usage, Out of Memory errors and suspected memory leaks. This data must be collected before killing the affected process or restarting the server; otherwise the information relating to the affected process is lost forever, which may make it impossible to identify the root cause.

Collecting JVM data

Note

The following utilities are all JDK utilities and are not included with the JRE; you must have the JDK installed in your production environments in order to collect this data. Additionally, you must be the owner of the identified process to be able to collect this data.

To collect JVM data, you must first determine the process ID of the affected server process using the following command:

$ jps -v

You can then collect the following types of data:

Stack Traces

Stack traces allow us to see what is happening with JVM threads and are sometimes referred to as thread dumps. You should collect several different stack traces (minimum of three) at roughly one minute intervals to enable us to compare them. It is recommended that you also identify the threads associated with the process ID for troubleshooting high CPU as detailed in How do I find which thread is consuming CPU in a Java process in DS/OpenDJ (All versions)?

You can use the following commands to collect stack traces:

  • You can collect long listing stack traces using the jstack command:
    $ jstack -F -l [pid] > [outputfile.txt]
    replacing [pid] with the process ID of the affected server and [outputfile.txt] with the output file for the stack trace.
  • You can use the kill -3 command (Unix® and Linux® systems only), which consistently produces better results:
    $ kill -3 [pid]
    replacing [pid] with the process ID of the affected server.

The kill -3 command outputs the stack trace to the log file where the standard output is directed (typically the error log file). The kill -3 command does not stop the running process; it will continue to run after outputting the stack trace.

You can use the Java Thread Dump Analyzer to analyze your stack traces. This is a third-party website that we suggest can be used for analysis but is not supported by ForgeRock.

Heap Histograms

Heap histograms enable us to visualize the objects in the JVM heap - we are only interested in live objects. You can collect heap histograms using the jmap command:

$ jmap -histo:live [pid]

replacing [pid] with the process ID of the hung container instance.

Caution

The jmap utility must come from the same version of JVM that the container is running.

Heap Dumps

Heap dumps allow us to see what is happening with memory at the point the heap dump is taken, which can help us to identify high memory usage issues or memory leaks. You should collect heap dumps as close in time to the stack traces as possible to enable us to compare the contents. You can collect heap dumps using the jmap command:

$ jmap -F -dump:file=[target file] [pid]

replacing [target file] with the output file for the heap dump and [pid] with the process ID of the affected server.

Current JVM settings

You can find the current JVM settings for the server using one of the following commands (the jinfo command is the preferred way):

  • jinfo:
    $ jinfo [pid]
    replacing [pid] with the required process ID.
  • jps:
    $ jps -v

For Unix® and Linux® systems where AM/OpenAM is running on an Apache Tomcat™ web container, you can also use: 

$ ps -ef | grep tomcat

See Also

How do I use the msnapshots script to capture information for troubleshooting DS/OpenDJ (All versions)?

How do I troubleshoot high CPU utilization on DS/OpenDJ (All versions) servers?

How do I enable Garbage Collector (GC) Logging for DS/OpenDJ (All versions)?

How do I tune DS/OpenDJ (All versions) process sizes: JVM heap and database cache?

Troubleshooting DS/OpenDJ

Related Training

N/A

Related Issue Tracker IDs

N/A


Performance Tuning


FAQ: DS/OpenDJ performance and tuning

The purpose of this FAQ is to provide answers to commonly asked questions regarding DS/OpenDJ performance and tuning.

Frequently asked questions

Q. Do I need to performance test DS/OpenDJ?

A. Yes, performance testing is a very important step to ensure that the system you have configured will cope with the expected load once it is in production. By performance testing your system, you can tweak any performance issues you encounter before going live.

Before running any performance tests, you should ensure DS/OpenDJ is running and configured in a test environment that closely resembles your planned production environment. You should also be aware of the type of load expected in your production system and the types of user data involved to make your tests as realistic as possible.

Generating sample data in DS/OpenDJ is described in: How do I generate sample user data for performance testing in DS/OpenDJ (All versions)?

Q. How do I tune DS/OpenDJ to improve performance?

A. Tuning DS/OpenDJ is described in Administration Guide › Tuning Servers For Performance.

Q. What is the recommended way to load balance client connections?

A. In most scenarios, the way to load balance client connections is specific to your topology so we don't have definitive best practice recommendations. You may want to have pooled connections so that clients only open a set number of connections or your clients may not be able to pool connections and open individual connections for each request.

In either case, you may or may not want to load balance based on:

  • Number of opened connections.
  • System load of the DS/OpenDJ server.
  • Round robin.
  • Failover only
  • Etc...

You should consider setting an idle time limit to ensure idle connections are closed; in particular, you must set this lower than the idle time limit for the load balancer as detailed in Administration Guide › Limiting Idle Time.

if you want more tailored advice, consider engaging Professional Services.

Q. What is the recommended Java® Virtual Machines (JVM) heap size for DS/OpenDJ?

A. There are no definitive rules for the size of JVM heap size required as it will vary across individual environments and applications, but you should refer to Best practice for JVM Tuning for best practice advice. You can use the size of a backend after doing an initial import-ldif of all the expected data to give you an indication of the size required and base your initial heap size on this.

See How do I tune DS/OpenDJ (All versions) process sizes: JVM heap and database cache?and Administration Guide › Java Settings for further information.

Note

You must ensure you configure JVM garbage collection appropriately as garbage collection runs can get quite long if you have large heap sizes.

Q. Can I change caching in DS/OpenDJ to improve performance?

A. Yes, you can improve performance in DS/OpenDJ by configuring the database cache and entry cache, although entry caches are generally not used in DS/OpenDJ (unlike DSEE). This is described in How do I tune DS/OpenDJ (All versions) process sizes: JVM heap and database cache?Administration Guide › Database Cache Settings and Administration Guide › Caching Large, Frequently Used Entries. You need to balance performance against memory usage when configuring caching as larger caches lead to improved performance but increase memory usage.

See Cache strategy for OpenDJ LDAP directory server for further information on devising a caching strategy.

Q. Are there things I should monitor on an ongoing basis in terms of DS/OpenDJ performance?

A. Yes, it is useful to monitor performance on an ongoing basis to allow you to react to changes quickly. Useful things to monitor include:

  • Heap size.
  • Number of open sessions / size of sessions.
  • CPU utilization - utilization of 70-80% is worth investigating.

See FAQ: Monitoring DS/OpenDJHow do I use cn=monitor entry in DS/OpenDJ (All versions) for monitoring?Administration Guide › Monitoring, Logging, and Alerts and How do I troubleshoot high CPU utilization on DS/OpenDJ (All versions) servers? for further information.

Note

You should also ensure the access log file is enabled; this log file contains information about operations the server process. This is described Administration Guide › Server Logs.

See Also

Unindexed searches causing slow searches and poor performance on DS/OpenDJ (All versions) server

Performance tuning and monitoring ForgeRock products

Administration Guide › Tuning Servers For Performance › Testing Performance

Administration Guide › Tuning Servers For Performance › Tweaking OpenDJ Performance

Related Training

ForgeRock Directory Services Core Concepts


How do I tune DS/OpenDJ (All versions) process sizes: JVM heap and database cache?

The purpose of this article is to provide information on tuning DS/OpenDJ process sizes, including setting the JVM heap size and the database cache size.

Tuning heap size

Setting a larger heap size for the DS/OpenDJ servers is generally a good thing that improves performance. You should also ensure all DS/OpenDJ servers in a cluster have the same heap sizes.

The correct way to set the heap size is to edit the start-ds.java-args line in the java.properties file in the /path/to/ds/config directory. For example, to set the minimum and maximum heap sizes to 2GB, you would change this line as follows:

start-ds.java-args=-server -Xms2g -Xmx2g
Note

It is recommended that you set the minimum and maximum heap sizes to the same value for best performance. Otherwise, the JVM runs a full Garbage Collector (GC) cycle when increasing its heap, during which time it can pause ongoing operations for up to a few seconds. Generally, a smaller heap will increase the frequency at which the JVM GC executes but reduce the duration; similarly, a larger heap will reduce the frequency and increase the duration. When tuning the JVM heap, the goal is to strike a balance between frequency and duration so as to reduce the impact of the GC on the application.

If you are using a 64-bit JVM (Java® SE 6u23 or later) and your heap size is less than 32GB, you should also specify CompressedOops by adding -XX:+UseCompressedOops to this line. For 64-bit JVMs, explicitly adding CompressedOops is necessary so that the DS/OpenDJ database can calculate its available memory and caches, and accordingly size its cache properly. For example:

start-ds.java-args=-server -Xms3g -Xmx3g -XX:+UseCompressedOops
Note

Typically, a 64-bit JVM needs a larger heap size than an equivalent 32-bit JVM as a 64-bit JVM has a larger memory footprint due to its object headers and object references being greater in size. An approximate guide is the heap size should be 1.5 times bigger in a 64-bit JVM compared to a similar 32-bit JVM. 

In DS 5.x, you must restart the DS server after you have edited the java.properties file.

In pre-DS 5 only: After you have edited the java.properties file, you must execute the /path/to/ds/bin/dsjavaproperties program (or the /path/to/ds/bat/dsjavaproperties.bat program if you are using Microsoft® Windows®) and restart the OpenDJ server. This command has been removed in DS 5.

Tuning database cache size

For optimal performances, database (DB) cache size should be larger than the sum of the DB file sizes.

First check the size of /path/to/ds/db/userRoot. If the DB has just been imported or hasn't been updated much since import, double that size. This will give you a maximum size the DB can grow to with the current data. You can then budget for the growth of the user data.

Once you have an estimated overall size of the DB, you should make sure that it can fit in the DB cache, which is by default estimated as a percentage of the overall heap size. The default cache size percentage (db-cache-percent) is 50%. In OpenDJ 3 / DS, you should adjust the database cache settings to avoid allocating all memory available to the JVM to database cache when creating additional database backends. Over-allocating memory to database cache leads to out of memory errors.

You can change the db-cache-percent using the dsconfig command. For example, to increase it to 60%:

$ ./dsconfig set-backend-prop --port4444 ---bindPassword password --backend-name userRoot --set db-cache-percent:60 --trustAll --no-prompt
Note

We've run tests with many heap sizes, including some very large ones, and most of the large deployments use between 2GB and 32GB of JVM heap size.

See Also

Best practice for JVM Tuning

FAQ: DS/OpenDJ performance and tuning

How do I tune the DS/OpenDJ (All versions) database file cache?

How do I tune Background Database Verification in DS (All versions) and OpenDJ 3.5.2, 3.5.3?

How do I enable Garbage Collector (GC) Logging for DS/OpenDJ (All versions)?

Administration Guide › Tuning Servers For Performance

Configuration Reference › JE Backend

Configuration Reference › Entry Cache

Related Training

ForgeRock Directory Services Core Concepts 

Related Issue Tracker IDs

N/A


Best practice for JVM Tuning

The purpose of this article is to provide best practice advice on JVM tuning ; including key symptoms, understanding heap space and GC logs. This best practice advice applies to AM/OpenAM, DS/OpenDJ, IDM/OpenIDM and IG/OpenIG. JVM tuning is not an exact science and will vary across individual environments and applications. Consistent load and stress testing of your environment will be the only true way to gauge the effect of changes made.

JVM tuning considerations

Before you begin JVM tuning, you should consider the following points:

  • Costs - depending on your environments, it may be more cost effective to add more hardware to improve performance rather than spending the time to tune.
  • Desired outcome - tuning for stability is more important in the long run than tuning for performance, although there is an overlap.
  • Ongoing problems - tuning may only delay or mask a problem temporarily if there is an underlying issue you need to resolve.
  • Memory leaks - these will always cause Garbage Collection (GC) issues regardless of tuning.

What is Java® heap and how do I know it needs tuning?

The Java heap is the amount of memory allocated to applications running in the JVM. Objects in heap memory can be shared between threads. Garbage collection (GC) refers to the automatic process of managing the runtime memory. The JVM is subject to GC pauses that vary in frequency and duration; if these pauses become more frequent and/or last longer, you will see issues (such as application pauses) which indicate that you may need to tune your Java heap. The heap is initially created when the JVM starts and is split into different spaces or generations, the key ones being Young (Eden or Nursery) and Tenured (Old):

  • The Young generation is used for new objects. A GC process (ParNew) automatically runs when it becomes full which removes unused objects and moves all objects that have lived long enough to the Tenured generation, thereby freeing up space in the Young generation for more new objects. Typically objects in the Young generation are short lived and temporary. The Young generation is small but active and GC happens frequently but with limited impact on performance.
  • The Tenured generation is used for the longer lived objects. Another GC process (CMS) runs when it becomes full to remove any unused objects. The Tenured generation is larger and less active than the Young generation but GC tends to have more impact on performance.

The following diagram illustrates the heap and also the corresponding GC options that can be set in (Java 7 and earlier):

Key symptoms that indicate you need to tune your Java heap are:

  • High CPU usage. When GC threads consume a lot of CPU, you will see spikes in overall CPU usage.
  • Application hangs. When your application appears to hang and stops responding, there are gaps in your log files or you experience general slowness when performing routine tasks, poorly tuned GC can be the cause.

The first thing to check if you experience these symptoms is your GC logs; this are easy to check and may help you identify your issue quickly.

GC Logging

There are various different GCs available; ForgeRock recommends the use of CMS (ConcurrentMMarkSweep), which is a throughput collector and tends to give the best performance as it runs concurrently with the application, meaning less pauses. You should specify this collector by setting the following JVM option:

-XX:+UseConcMarkSweepGC

You can enable GC logging as described in the following articles:

Note

Typically you can leave GC logging on without impacting your system, however, you can just enable it when you are experiencing issues.

Understanding GC Logs

These example log files come from a CMS collector, where ParNew refers to the Young generation and CMS refers to the Tenured generation. These log file examples also show typical GC logs for a system where tuning is appropriate.

ParNew section

An example GC log output for the ParNew section looks like this:

24.245: [GC24.245: [ParNew: 545344K->39439K(613440K), 0.0424360 secs] 594671K->88767K(4101632K), 0.0426850 secs] [Times: user=0.17 sys=0.00, real=0.04 secs] 
29.747: [GC29.747: [ParNew: 584783K->27488K(613440K), 0.0889130 secs] 634111K->104723K(4101632K), 0.0892180 secs] [Times: user=0.24 sys=0.01, real=0.09 secs]

Each line in this log file represents one GC and shows the following types of information for ParNew:

  • Timestamp information - if you did not set the PrintGCTimeStamps and PrintGCDateStamps options, this just shows as the number of seconds from JVM startup:
    24.245: [GC24.245:
    29.747: [GC29.747:
    This information is useful as it shows the frequency of GCs. You should be aiming for GCs to occur once every 1 to 5 seconds; if they are occurring more than once a second, you need to tune your JVM and it is likely that you will be seeing high CPU usage.
  • Young generation information:
    [ParNew: 545344K->39439K(613440K), 0.0424360 secs] 
    [ParNew: 584783K->27488K(613440K), 0.0889130 secs]
    This information shows the initial size of the Young space before doing the GC, the size after doing the GC, the total size available and the time it took to do the GC. If the time taken to do the GC is large, that is more than 0.1 seconds, then it is possible that your Young generation heap size is too big. The total size available will not grow if you set NewSize and MaxNewSize to the same value and this is what you want to see in your logs.
  • Overall heap information:
    594671K->88767K(4101632K), 0.0426850 secs]
    634111K->104723K(4101632K), 0.0892180 secs]
    This information shows initial size of the overall heap before doing the GC, the size after doing the GC and the total size available. With the second lot of figures (size after doing the GC), you can see how this incrementally grows up to the point when the GC is done and then reduces back down to a baseline figure. If you have a memory leak, you will see the baseline figure growing incrementally as it becomes increasingly full even after doing GC.
  • System information:
    [Times: user=0.17 sys=0.00, real=0.04 secs] 
    [Times: user=0.24 sys=0.01, real=0.09 secs]
    This information shows how much time was spent in user space, how much time spent in kernel or system space and what was the real impact to the user. If the user time is high, it indicates there isn't enough CPU for the number of user threads. If the system time is high, it indicates your system is swapping memory to disk which means that there isn't enough physical memory for your heap size.

CMS section

The CMS section of the log file is different to the ParNew section as it shows multiple lines for the same GC. An example GC log output for the CMS section looks like this:

40.146: [GC [1 CMS-initial-mark: 26386K(786432K)] 26404K(1048384K), 0.0074495 secs]
40.154: [CMS-concurrent-mark-start]
40.683: [CMS-concurrent-mark: 0.521/0.529 secs]
40.683: [CMS-concurrent-preclean-start]
40.701: [CMS-concurrent-preclean: 0.017/0.018 secs]
40.704: [GC40.704: [Rescan (parallel) , 0.1790103 secs]40.883: [weak refs processing, 0.0100966 secs] [1 CMS-remark: 26386K(786432K)] 52644K(1048384K), 0.1897792 secs]
40.894: [CMS-concurrent-sweep-start]
41.020: [CMS-concurrent-sweep: 0.126/0.126 secs]
41.020: [CMS-concurrent-reset-start]
41.147: [CMS-concurrent-reset: 0.127/0.127 secs]

This log shows the different phases the GC goes through. All the ones marked concurrent are happening while your application is running and therefore have little impact. The ones that don't say concurrent are stopping your application from running temporarily and therefore can have an impact. The CMS-initial-mark and CMS-remark show heap sizing details.

Note

Typically, you will see these sections intermingled, for example, some ParNews then a CMS, some more ParNews, another CMS and so on as the GC processes automatically occur to manage the heap.

Common issues shown in the GC logs and solutions

This section identifies common issues you might find in your GC logs and how you can resolve them. You can also use the Universal GC Log Analyzer to analyze your GC log. This is a third-party website that we suggest can be used for analysis but is not supported by ForgeRock.

Young generation - too small

The following log snippet shows an issue where the Young generation heap size is too small:

1.813: [GC1.813: [ParNew: 1152K­>128K(1152K), 0.0008100 secs] 16620K­->15756K(26936K), 0.0008350 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
1.816: [GC1.816: [ParNew: 1152K­>128K(1152K), 0.0006430 secs] 16780K­->15913K(26936K), 0.0006640 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
1.819: [GC1.819: [ParNew: 1152K­>128K(1152K), 0.0005370 secs] 16937K-­>16038K(26936K), 0.0005570 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]

As you can see, there are multiple GCs occurring within a second. This means you need to increase the size of the Young generation (NewSize and MaxNewSize) and may also need to increase total heap size to compensate for this change (Xms and Xmx). Alternatively you could change the NewRatio option (this is set to 2 by default, which means the Tenured generation heap is twice as big as the Young generation heap or that the Young generation heap is a 1/3 of the size of the Tenured generation heap).

Young generation - too large / Tenured generation - too small

The following log snippet shows an issue where the Young generation heap size is too large and consequently the Tenured generation heap size is too small:

276.716: [GC (CMS Initial Mark) [1 CMS-initial-mark: 104176K(135168K)] 2758985K(6741248K), 0.8762860 secs] [Times: user=0.88 sys=0.00, real=0.88 secs] 
277.592: [CMS-concurrent-mark-start]
277.657: [CMS-concurrent-mark: 0.063/0.065 secs] [Times: user=0.12 sys=0.00, real=0.06 secs] 
277.657: [CMS-concurrent-preclean-start]
277.658: [CMS-concurrent-preclean: 0.001/0.001 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
277.658: [CMS-concurrent-abortable-preclean-start]
277.658: [CMS-concurrent-abortable-preclean: 0.000/0.000 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
277.658: [GC (CMS Final Remark)[YG occupancy: 2657419 K (6606080 K)]277.658: [Rescan (parallel) , 0.9815460 secs]278.639: [weak refs processing, 0.0000320 secs]278.640: [scrub string table, 0.0011700 secs] [1 CMS-remark: 104176K(135168K)] 2761595K(6741248K), 0.9828250 secs] [Times: user=7.18 sys=0.09, real=0.99 secs] 
278.641: [CMS-concurrent-sweep-start]
278.668: [CMS-concurrent-sweep: 0.026/0.027 secs] [Times: user=0.03 sys=0.00, real=0.02 secs] 
278.668: [CMS-concurrent-reset-start]
278.668: [CMS-concurrent-reset: 0.001/0.001 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
280.673: [GC (CMS Initial Mark) [1 CMS-initial-mark: 104079K(135168K)] 2774091K(6741248K), 0.9033730 secs] [Times: user=0.90 sys=0.00, real=0.90 secs] 
281.577: [CMS-concurrent-mark-start]
281.640: [CMS-concurrent-mark: 0.063/0.063 secs] [Times: user=0.13 sys=0.00, real=0.07 secs] 
281.640: [CMS-concurrent-preclean-start]
281.641: [CMS-concurrent-preclean: 0.000/0.000 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
281.641: [CMS-concurrent-abortable-preclean-start]
281.641: [CMS-concurrent-abortable-preclean: 0.000/0.000 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
281.641: [GC (CMS Final Remark)[YG occupancy: 2670011 K (6606080 K)]281.641: [Rescan (parallel) , 0.9914290 secs]282.633: [weak refs processing, 0.0000110 secs]282.633: [scrub string table, 0.0008100 secs] [1 CMS-remark: 104079K(135168K)] 2774091K(6741248K), 0.9923100 secs] [Times: user=7.14 sys=0.11, real=0.99 secs] 
282.634: [CMS-concurrent-sweep-start]
282.659: [CMS-concurrent-sweep: 0.024/0.025 secs] [Times: user=0.06 sys=0.01, real=0.02 secs] 
282.659: [CMS-concurrent-reset-start]
282.659: [CMS-concurrent-reset: 0.001/0.001 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 

As you can see, the CMS collections are occurring too frequently, without any ParNew collections in between. This is a result of the Young generation heap size being too large so that each time a ParNew collection happens, the Tenured heap size instantly becomes full and a CMS collection runs. This could happen if you increased your Young generation heap size without increasing your total heap size. This log snippet could also happen if you have memory leaks.

CMS failure

The following log snippet shows a CMS failure (concurrent mode failure):

(concurrent mode failure): 1551948K->1548050K(1572864K), 15.1154540 secs] 1986510K->1548050K(2044736K), [CMS Perm : 78597K->78588K(524288K)], 15.4406700 secs] [Times: user=15.52 sys=0.00, real=15.44 secs] 
703240.983: [GC [1 CMS-initial-mark: 1548050K(1572864K)] 1561545K(2044736K), 0.0059690 secs] [Times: user=0.01 sys=0.00, real=0.00 secs] 

This error means that the CMS collector cannot keep up with the amount of garbage being generated (the Tenured generation heap space is about 90% full) and GCs are taking longer meaning your application stops for longer periods of time. This example shows the application stopping for over 15 seconds, which will have a significant impact on performance.

You can add the following GC options to try to avoid the CMS failure:

  • -XX:CMSInitiatingOccupancyFraction=<percentage-of-tenured> 
  • -XX:+UseCMSInitiatingOccupancyOnly

You should set -XX:CMSInitiatingOccupancyFraction to 70 or 80% (instead of default ~90%) which may enable the CMS collector to keep up with the amount of garbage being generated.

Full GC

The following log snippet shows a Full GC:

25.466: [Full GC )25.466: [CMS: 68756K­>75361K(1987392K), 0.2548820 secs] 96571K­->75361K(2064064K), [CMS Perm : 61069K­>60722K(61376K)], 0.2551850 secs] [Times: user=0.25 sys=0.01, real=0.25 secs]
26.185: [Full GC (System.gc())26.185: [CMS: 75361K­>73283K(1987392K), 0.2398450 secs] 76136K-­>73283K(2064064K), [CMS Perm : 60723K­>60723K(101204K)], 0.2400120 secs] [Times: user=0.24 sys=0.00, real=0.24 secs]
27.293: [Full GC (System.gc())27.293: [CMS: 73283K­>73291K(1987392K), 0.2111960 secs] 74096K­>73291K(2064064K), [CMS Perm : 60723K­>60723K(101396K)], 0.2112730 secs] [Times: user=0.21 sys=0.00, real=0.22 secs]
68.081: [Full GC (Heap Inspection Initiated GC)68.081: [CMS: 73291K­->73187K(1987392K), 0.2172670 secs] 77634K­>73187K(2064064K), [CMS Perm : 60742K­>60742K(101396K)], 0.2174520 secs] [Times: user=0.22 sys=0.00, real=0.21 secs]

These log snippets refer to Full GCs occurring, meaning that the entire heap is being cleared (Young and Tenured); this often happens due to system processes such as RMI which performs a Full GC every hour on the hour. The PrintGCCause option is very useful to help you find out why the GC is happening (the last three lines in the log above show the additional information output by this option, making it clear which ones are system related). You can also set the DisableExplicitGC option to prevent system related Full GCs from happening.

Permanent generation 

Note

The following information does not apply to Java 8 since the Permanent generation space has been removed.

The following log snippet shows a Full GC as a result of the Permanent generation space growing (the Permanent generation space contains metadata and other data that should be permanently stored and does not typically grow):

1.241: [Full GC (Ergonomics) [PSYoungGen: 6125K­>0K(45056K)] [ParOldGen: 17659K­->20350K(20480K)] 23785K­>20350K(65536K) [PSPermGen: 13550K­->13546K(27136K)], 0.0531680 secs] [Times: user=0.25 sys=0.00, real=0.05 secs]
1.513: [Full GC (Ergonomics) [PSYoungGen: 38912K­>2783K(45056K)] [ParOldGen: 20350K­>20424K(20480K)] 59262K­>23208K(65536K) [PSPermGen: 15188K­->15188K(31744K)], 0.0992090 secs] [Times: user=0.65 sys=0.02, real=0.10 secs]

These log snippets show the Permanent generation space growing (figure in brackets after PSPermGen). You can increase the MaxPermSize option to resolve this issue.

Best practice advice on tuning

Initially, you should configure your JVM as per the recommended settings for your product (if available):

You can then tune your JVM as and when required.

You should always follow this basic process when tuning:

  1. Establish a baseline using a valid load testing process. This step is essential as it is the only way you will know if your tuning is having the desired effect.
  2. Change one thing at a time and document each change. This allows you to determine the impact of each change and revert if necessary.

Heap tuning guidelines

Although we cannot give recommended heap settings because they vary considerably between environments and applications, you should follow these guidelines when tuning:

  • You should always use the latest supported software versions.
  • You should use the CMS GC as recommended by ForgeRock.
  • Your total heap size must not exceed the amount of physical RAM available.
  • You must leave some physical RAM for other applications to run on your machine.
  • You should set Xms and Xmx to the same value for best performance. These options determine your total heap size.
  • You should set NewSize and MaxNewSize to the same value. These options determine how the heap is allocated between generations.
  • You should set the DisableExplicitGC option to prevent system related full GCs from running.
  • You must not add lots of options at once or add options you don't understand.
  • You must not copy and paste options from other applications.
  • You should use GC logs to help identify issues with your tuning.

Getting your total heap size and allocation between generations right is key; once you have these set correctly, you likely won't need to do anymore tuning.

Summary

In summary, the most important GC knobs to get right are:

  1. The collector, which should be CMS.
  2. The total heap size, where Xmx = Xms.
  3. The split between generations, using NewSize or NewRatio, where NewSize = MaxNewSize.

See Also

Java Platform, Standard Edition HotSpot Virtual Machine Garbage Collection Tuning Guide

AM/OpenAM

How do I change the JVM heap size for AM/OpenAM (All versions)?

FAQ: AM/OpenAM performance and tuning

How do I collect JVM data for troubleshooting AM/OpenAM (All versions)?

How do I diagnose a hung AM/OpenAM (All versions) server?

How do I troubleshoot high CPU utilization on AM/OpenAM (All versions) servers?

How do I find which thread is consuming CPU in a Java process in AM/OpenAM (All versions)?

How do I use the msnapshots script to capture information for troubleshooting AM/OpenAM (All versions)?

DS/OpenDJ

How do I tune DS/OpenDJ (All versions) process sizes: JVM heap and database cache?

FAQ: DS/OpenDJ performance and tuning

How do I collect JVM data for troubleshooting DS/OpenDJ (All versions)?

How do I troubleshoot high CPU utilization on DS/OpenDJ (All versions) servers?

How do I find which thread is consuming CPU in a Java process in DS/OpenDJ (All versions)?

How do I use the msnapshots script to capture information for troubleshooting DS/OpenDJ (All versions)?

IDM/OpenIDM

How do I change the JVM heap size for IDM/OpenIDM (All versions)?

FAQ: IDM/OpenIDM performance and tuning

How do I collect JVM data for troubleshooting IDM/OpenIDM (All versions)?

How do I troubleshoot high CPU utilization on IDM/OpenIDM (All versions) servers?

How do I find which thread is consuming CPU in a Java process in IDM/OpenIDM (All versions)?

How do I use the msnapshots script to capture information for troubleshooting IDM/OpenIDM (All versions)?

IG/OpenIG

How do I change the JVM heap size for IG/OpenIG (All versions)?

How do I collect JVM data for troubleshooting IG/OpenIG (All versions)?

How do I troubleshoot high CPU utilization or Out of Memory errors on IG/OpenIG (All versions) servers?

How do I find which thread is consuming CPU in a Java process in IG/OpenIG (All versions)?

How do I use the msnapshots script to capture information for troubleshooting IG/OpenIG (All versions)?

Related Training

ForgeRock Directory Services Core Concepts

Related Issue Tracker IDs

OPENAM-4414 (Apache Policy Agent does not complete cleanup / logout)

OPENIDM-1563 (Task scanner creates a new thread pool for each execution resulting in a thread leak.)


How do I tune the DS/OpenDJ (All versions) database file cache?

The purpose of this article is to provide information on tuning the DS/OpenDJ database file cache.

Tuning database file cache

The je.log.fileCacheSize property represents the number of file handles that are cached, that is, file handles that remain open by the underlying database (DB). The default value is 200 (100 in pre-DS 6), which is probably far less than the number of .jdb files you have in the /path/to/ds/db/userRoot directory.

Note

You should also be aware of the maximum number of file handlers defined in DS/OpenDJ as discussed in the release notes applicable to your version: Release Notes › Setting Maximum Open Files.

When the DB cannot be fully cached, the DB needs to access the record from the log files. Since only a portion of the files can be opened at one time, the DB keeps closing them and opening the ones it needs. Each close does a disk sync, which has a large penalty on performance.

The number of .jdb files depends on the DB content and also on the occupancy ratio. Occupancy ratio has an impact on the number of files. Berkeley DB JE uses a rolling log algorithm in which writes to the DB are always appended to the current log file. When the log reaches a certain size, JE creates a new one and writes to it. When records are deleted or even updated, the old record is marked as deleted and the new one is appended to the log file. This creates unused records in the previous log files. The occupancy ratio is the ratio of used records / total number of records per log file. When the ratio for one log file hits a certain limit (50% by default), the remaining valid records are copied to the current log file (still appending) and the old log file is deleted. So in theory and in the worse case, the number of log files cannot exceed twice the initial number of files after an import (with the same amount of data).

To set a proper value for the je.log.fileCacheSize, therefore, count the current number of .jdb files, double it and round down.

If it happens that the number of .jdb files exceeds the value set, then the DB starts closing files and reopening other ones, slowly degrading performance.

This property can be maintained using the db-log-filecache-size configuration attribute, for example:

$ ./dsconfig set-backend-prop --port 4444 ---bindPassword password --backend-name userRoot --advanced --set db-log-filecache-size:[cachesize] --trustAll --no-prompt

See Also

FAQ: DS/OpenDJ performance and tuning

How do I tune DS/OpenDJ (All versions) process sizes: JVM heap and database cache?

Performance tuning and monitoring ForgeRock products

Configuration Reference › JE Backend

Related Training

ForgeRock Directory Services Core Concepts

Related Issue Tracker IDs

OPENDJ-383 (Expose JE fileCacheSize property through OpenDJ configuration and admin)


How do I tune Background Database Verification in DS (All versions) and OpenDJ 3.5.2, 3.5.3?

The purpose of this article is to provide information on tuning Background Database Verification in DS/OpenDJ. The JE 7 database used in recent versions of DS/OpenDJ periodically performs automatic database verification, which can impact service availability.

Overview

The following DS/OpenDJ versions use the JE 7 embedded database for backends:

  • DS 5 and later.
  • OpenDJ 3.5.2 and OpenDJ 3.5.3.

In OpenDJ 3.5.2 and OpenDJ 3.5.3, you could also use the PDB database for backends; the functionality described here does not apply if you use the PDB backend. The PDB backend type was deprecated in DS 5 and removed in DS 6.

JE 7 verifier

Version 7 of JE introduced a feature that verifies the entire database at midnight every night to check for some common types of database corruption. The extra disk I/O can result in problems for performance-critical deployments. All the servers in a single timezone will run the verification at exactly the same time, which could also cause more widespread issues.

DS 6 and later provides advanced properties for configuring the JE verifier (db-run-log-verifier and db-log-verifier-schedule). See Configuration Reference › Advanced Properties for further information.

In earlier versions, you can configure the JE verifier in one of two ways:

Configuring the JE verifier schedule

By default, the JE verifier runs automatically at midnight local time. It is best practice to alter the verification schedule on each server to avoid all servers simultaneously using extra disk I/O and potentially impacting the entire service.

The verifier schedule is a cron-style field and can be set using the dsconfig command. The following example sets the verifier for the userRoot backend to run at 1am (local time):

$ ./dsconfig set-backend-prop --backend-name userRoot --add "je-property:je.env.verifySchedule=1 0 * * *" --hostname ds1.example.com --port 4444 --bindDn "cn=Directory Manager" --bindPassword password --trustAll --no-prompt

Disabling the JE verifier

The JE verifier can be completely disabled, which means that some types of database corruption will not be detected. Recovering from any database corruption would typically involve reinitializing the replica directory server or restoring it from a backup taken before the corruption occurred. See FAQ: Backup and restore in DS/OpenDJ for further information.

You can use the dsconfig command to disable verification on a single backend. For example, to disable it on the userRoot backend:

$ ./dsconfig set-backend-prop --backend-name userRoot --add je-property:je.env.runVerifier=false --hostname ds1.example.com --port 4444 --bindDn "cn=Directory Manager" --bindPassword password --trustAll --no-prompt

See Also

Installing and Administering DS/OpenDJ

Performance tuning and monitoring ForgeRock products

Related Training

N/A

Related Issue Tracker IDs

OPENDJ-4465 (Facilitate configuration of JE backend's DB verification feature)

OPENDJ-4418 (Investigate how to disable automatic JE database verification using JE properties)


How do I improve performance when using the PBKDF2 storage scheme in DS/OpenDJ (All versions)?

The purpose of this article is to provide information on improving performance when using the PBKDF2 storage scheme in DS/OpenDJ, including how and why you might change the number of iterations performed. The default number of iterations is 10,000.

Overview

PBKDF2 is designed to be flexible in the number of iterations used for computing the hash. This is intended so that it can be extended over time to keep up to date with current CPU speeds, and so that an implementation can be tailored to specific hardware or security requirements. Requirements are also likely to be different depending on whether hashes are calculated client-side or server-side for authentication attempts.

The higher the number of iterations, the longer it takes to compute a hash from a plain text password, and therefore the harder it becomes to run brute force attacks against a known hash.

You can improve performance as follows:

  • Upgrade to OpenDJ 2.6.4 or later / DS, which improves the concurrency of the PBKDF2 password storage scheme.
  • Tune the number of iterations used for the storage scheme; this is described in detail below.
  • Use the Bouncy Castle JCE provider, which improves the PBKDF2 authentication rate; in tests, we have found this to improve authentication rates by about 30%. This is a third-party library for Java® which contains additional algorithms and also faster versions of the algorithms already in Java. An overview of the setup required is provided below.
Note

Bouncy Castle is a third-party open source library that is not supported by ForgeRock.

Our performance tests

We've made some comparisons between Java 7 and Bouncy Castle. Bouncy Castle's performance over Java 7 showed a marked improvement.

BIND etimes with 100000 PBKDF2 iterations with the Java 7 JCE:

300ms

BIND etimes with 100000 PBKDF2 iterations with Bouncy Castle v1.52:

200ms

BIND etimes for fewer iterations are proportionally faster with Bouncy Castle.

Tuning PBKDF2 performance

In a scenario where DS/OpenDJ (the server) receives passwords in plain text and needs to run the algorithm itself, the benefits of having a large number of iterations should be balanced with not overwhelming the server.

To tune this trade-off of security versus performance, you can alter the number of iterations used for the storage scheme using dsconfig in interactive mode, see:

dsconfig: Password Storage Scheme > View and edit an existing... > PBKDF2 > pbkdf2-iterations

The change won't affect currently stored passwords. The new iteration value will be used when passwords are next changed. At this time, they'll be rehashed with the updated number of iterations.

The number of iterations is prepended to the final stored password hash so different users with different levels of PBKDF2 hashed passwords can co-exist simultaneously.

Setting up Bouncy Castle

The following process provides an overview of installing release 1.5.4 of Bouncy Castle: 

  1. Download bcprov-ext-jdk15on-154.jar and bcprov-jdk15on-154.jar from https://bouncycastle.org/latest_releases.html; they are listed in the SIGNED JAR FILES section.
  2. Copy these two jar files to a directory that the JVM searches; the JVM searches the $JAVA_HOME/jre/lib/ext/ directory by default, so this is a good place to put them.
  3. Ensure the file permissions for these two jar files are set to allow them to be read.
  4. Update the list of security providers in the JVM to put Bouncy Castle first and then renumber the other security providers to follow. This list is set in the java.security text file (located in the $JAVA_HOME/jre/lib/security/ directory). The security provider list should now look similar to this:
    security.provider.1=org.bouncycastle.jce.provider.BouncyCastleProvider
    security.provider.2=sun.security.provider.Sun
    [...]
    This step is recommended by Bouncy Castle and you can read more about it here: https://www.bouncycastle.org/specifications.html
  5. Save this file and restart DS/OpenDJ.
  6. You can check the installation has been successful using a command such pfiles or lsof on DS/OpenDJ and look for these two bcprov jar files.

See Also

Administration Guide › Configuring Password Policy

Configuration Reference › PBKDF2 Password Storage Scheme

Administration Guide › Tuning Servers For Performance

Administration Guide › Tuning Servers For Performance › Testing Performance

The Legion of the Bouncy Castle

Related Training

N/A

Related Issue Tracker IDs

OPENDJ-2151 (High contention with the PBKDF2 password storage scheme.)


Copyright and TrademarksCopyright © 2018 ForgeRock, all rights reserved.

This content has been optimized for printing.

Loading...