Troubleshooting
Define the problem
To solve your problem, save time by clearly defining it first. A problem statement compares the difference between observed behavior and expected behavior:
-
What exactly is the problem?
What is the behavior you expected?
What is the behavior you observed?
-
How do you reproduce the problem?
-
When did the problem begin?
Under similar circumstances, when does the problem not occur?
-
Is the problem permanent?
Intermittent?
Is it getting worse? Getting better? Staying the same?
Performance
Before troubleshooting performance, make sure:
-
The system meets the DS installation requirements.
-
The performance expectations are reasonable.
For example, a deployment can use password policies with cost-based, resource-intensive password storage schemes such as Argon2, Bcrypt, or PBKDF2. This protects passwords at the cost of slow LDAP simple binds or HTTP username/password authentications and lower throughput.
When directory operations take too long, meaning request latency is high, fix the problem first in your test or staging environment. Perform these steps in order and stop when you find a fix:
-
Check for unindexed searches and prevent them when possible.
Unindexed searches are expensive operations, particularly for large directories. When unindexed searches consume the server’s resources, performance suffers for concurrent operations and for later operations if an unindexed search causes widespread changes to database and file system caches.
-
Check performance settings for the server including JVM heap size and DB cache size.
Try adding more RAM if memory seems low.
-
Read the request queue monitoring statistics over LDAP or over HTTP.
If many requests are in the queue, the troubleshooting steps are different for read and write operations. Read and review the request statistics available over LDAP or over HTTP.
If you persistently have many:
-
Pending read requests, such as unindexed searches or big searches, try adding CPUs.
-
Pending write requests, try adding IOPS, such as faster or higher throughput disks.
-
Installation problems
Use the logs
Installation and upgrade procedures result in a log file tracing the operation. The command output shows a message like the following:
See opendj-setup-profile-*.log for a detailed log of the failed operation.
Antivirus interference
Prevent antivirus and intrusion detection systems from interfering with DS software.
Before using DS software with antivirus or intrusion detection software, consider the following potential problems:
- Interference with normal file access
-
Antivirus and intrusion detection systems that perform virus scanning, sweep scanning, or deep file inspection are not compatible with DS file access, particularly write access.
Antivirus and intrusion detection software have incorrectly marked DS files as suspect to infection, because they misinterpret normal DS processing.
Prevent antivirus and intrusion detection systems from scanning DS files, except these folders:
/path/to/opendj/bat/
-
Windows command-line tools
/path/to/opendj/bin/
-
UNIX/Linux command-line tools
/path/to/opendj/extlib/
-
Optional additional
.jar
files used by custom plugins /path/to/opendj/lib/
-
Scripts and libraries shipped with DS servers
- Port blocking
-
Antivirus and intrusion detection software can block ports that DS uses to provide directory services.
Make sure that your software does not block the ports that DS software uses. For details, refer to Administrative access.
- Negative performance impact
-
Antivirus software consumes system resources, reducing resources available to other services including DS servers.
Running antivirus software can therefore have a significant negative impact on DS server performance. Make sure that you test and account for the performance impact of running antivirus software before deploying DS software on the same systems.
JE initialization
When starting a directory server on a Linux system, make sure the server user can watch enough files. If the server user cannot watch enough files, you might read an error message in the server log like this:
InitializationException: The database environment could not be opened: com.sleepycat.je.EnvironmentFailureException: (JE version) /path/to/opendj/db/userData or its sub-directories to WatchService. UNEXPECTED_EXCEPTION: Unexpected internal Exception, may have side effects. Environment is invalid and must be closed.
File notification
A directory server backend database monitors file events.
On Linux systems, backend databases use the inotify API for this purpose.
The kernel tunable fs.inotify.max_user_watches
indicates the maximum number of files
a user can watch with the inotify API.
Make sure this tunable is set to at least 512K:
$ sysctl fs.inotify.max_user_watches
fs.inotify.max_user_watches = 524288
If this tunable is set lower than that, update the /etc/sysctl.conf
file to change the setting permanently,
and use the sysctl -p
command to reload the settings:
$ echo fs.inotify.max_user_watches=524288 | sudo tee -a /etc/sysctl.conf
[sudo] password for admin:
$ sudo sysctl -p
fs.inotify.max_user_watches = 524288
NoSuchAlgorithmException
When running the dskeymgr create-deployment-id
or setup
command
on an operating system with no support for the PBKDF2WithHmacSHA256
SecretKeyFactory
algorithm,
the command displays this error:
NoSuchAlgorithmException: PBKDF2WithHmacSHA256 SecretKeyFactory not available
This can occur on operating systems where the default settings limit the available algorithms.
To fix the issue, enable support for the algorithm and run the command again.
Forgotten superuser password
By default, DS servers store the entry for the directory superuser in an LDIF backend. Edit the file to reset the password:
-
Generate the encoded version of the new password:
$ encode-password --storageScheme PBKDF2-HMAC-SHA256 --clearPassword password {PBKDF2-HMAC-SHA256}10<hash>
-
Stop the server while you edit the LDIF file for the backend:
$ stop-ds
-
Replace the existing password with the encoded version.
In the
db/rootUser/rootUser.ldif
file, carefully replace theuserPassword
value with the new, encoded password:dn: uid=admin ... uid: admin userPassword: <encoded-password>
Trailing whitespace is significant in LDIF. Take care not to add any trailing whitespace at the end of the line.
-
Restart the server:
$ start-ds
-
Verify that you can use the directory superuser account with the new password:
$ status \ --bindDn uid=admin \ --bindPassword password \ --hostname localhost \ --port 4444 \ --usePkcs12TrustStore /path/to/opendj/config/keystore \ --trustStorePassword:file /path/to/opendj/config/keystore.pin \ --script-friendly ... "isRunning" : true,
Debug logging
DS debug logging can generate a high volume of debug messages. Use debug logging very sparingly on production systems. |
-
Create one or more debug targets.
No debug targets are enabled by default:
$ dsconfig \ list-debug-targets \ --hostname localhost \ --port 4444 \ --bindDN uid=admin \ --bindPassword password \ --publisher-name "File-Based Debug Logger" \ --usePkcs12TrustStore /path/to/opendj/config/keystore \ --trustStorePassword:file /path/to/opendj/config/keystore.pin \ --no-prompt Debug Target : enabled : debug-exceptions-only -------------:---------:----------------------
A debug target specifies a fully qualified DS Java package, class, or method:
$ dsconfig \ create-debug-target \ --hostname localhost \ --port 4444 \ --bindDN uid=admin \ --bindPassword password \ --publisher-name "File-Based Debug Logger" \ --type generic \ --target-name org.opends.server.api \ --set enabled:true \ --usePkcs12TrustStore /path/to/opendj/config/keystore \ --trustStorePassword:file /path/to/opendj/config/keystore.pin \ --no-prompt
-
Enable the debug log,
opendj/logs/debug
:$ dsconfig \ set-log-publisher-prop \ --hostname localhost \ --port 4444 \ --bindDN uid=admin \ --bindPassword password \ --publisher-name "File-Based Debug Logger" \ --set enabled:true \ --usePkcs12TrustStore /path/to/opendj/config/keystore \ --trustStorePassword:file /path/to/opendj/config/keystore.pin \ --no-prompt
The server immediately begins to write debug messages to the log file.
-
Read messages in the debug log file:
$ tail -f /path/to/opendj/logs/debug
-
Disable the debug log as soon as it is no longer required.
Lockdown mode
Misconfiguration can put the DS server in a state where you must prevent users and applications from accessing the directory until you have fixed the problem.
DS servers support lockdown mode .
Lockdown mode permits connections only on the loopback address,
and permits only operations requested by superusers, such as uid=admin
.
To put the DS server into lockdown mode, the server must be running. You cause the server to enter lockdown mode by starting a task. Notice that the modify operation is performed over the loopback address (accessing the DS server on the local host):
$ ldapmodify \
--hostname localhost \
--port 1636 \
--useSsl \
--usePkcs12TrustStore /path/to/opendj/config/keystore \
--trustStorePassword:file /path/to/opendj/config/keystore.pin \
--bindDN uid=admin \
--bindPassword password << EOF
dn: ds-task-id=Enter Lockdown Mode,cn=Scheduled Tasks,cn=tasks
objectClass: top
objectClass: ds-task
ds-task-id: Enter Lockdown Mode
ds-task-class-name: org.opends.server.tasks.EnterLockdownModeTask
EOF
The DS server logs a notice message in logs/errors
when lockdown mode takes effect:
...msg=Lockdown task Enter Lockdown Mode finished execution
Client applications that request operations get a message concerning lockdown mode:
$ ldapsearch \
--hostname localhost \
--port 1636 \
--useSsl \
--usePkcs12TrustStore /path/to/opendj/config/keystore \
--trustStorePassword:file /path/to/opendj/config/keystore.pin \
--baseDN "" \
--searchScope base \
"(objectclass=*)" \
+
# The LDAP search request failed: 53 (Unwilling to Perform)
# Additional Information: Rejecting the requested operation because the server is in lockdown mode and will only accept requests from root users over loopback connections
Leave lockdown mode by starting a task:
$ ldapmodify \
--hostname localhost \
--port 1636 \
--useSsl \
--usePkcs12TrustStore /path/to/opendj/config/keystore \
--trustStorePassword:file /path/to/opendj/config/keystore.pin \
--bindDN uid=admin \
--bindPassword password << EOF
dn: ds-task-id=Leave Lockdown Mode,cn=Scheduled Tasks,cn=tasks
objectClass: top
objectClass: ds-task
ds-task-id: Leave Lockdown Mode
ds-task-class-name: org.opends.server.tasks.LeaveLockdownModeTask
EOF
The DS server logs a notice message when leaving lockdown mode:
...msg=Leave Lockdown task Leave Lockdown Mode finished execution
LDIF import
-
By default, DS directory servers check that entries you import match the LDAP schema.
You can temporarily bypass this check with the
import-ldif --skipSchemaValidation
option. -
By default, DS servers ensure that entries have only one structural object class.
You can relax this behavior with the advanced global configuration property,
single-structural-objectclass-behavior
.This can be useful when importing data exported from Sun Directory Server.
For example, warn when entries have more than one structural object class, rather than rejecting them:
$ dsconfig \ set-global-configuration-prop \ --hostname localhost \ --port 4444 \ --bindDN uid=admin \ --bindPassword password \ --set single-structural-objectclass-behavior:warn \ --usePkcs12TrustStore /path/to/opendj/config/keystore \ --trustStorePassword:file /path/to/opendj/config/keystore.pin \ --no-prompt
-
By default, DS servers check syntax for several attribute types. Relax this behavior using the advanced global configuration property,
invalid-attribute-syntax-behavior
. -
Use the
import-ldif -R rejectFile --countRejects
options to log rejected entries and to return the number of rejected entries as the command’s exit code.
Once you resolve the issues, reinstate the default behavior to avoid importing bad data.
Security problems
Incompatible Java versions
Due to a change in Java APIs,
the same DS deployment ID generates different CA key pairs with Java 11 and Java 17 and later.
When running the dskeymgr
and setup
commands,
use the same Java environment everywhere in the deployment.
Using different Java versions is a problem if you use deployment ID-based CA certificates.
Replication breaks, for example, when you use the setup
command
for a new server with a more recent version of Java than was used to set up existing servers.
The error log includes a message such as the following:
...category=SYNC severity=ERROR msgID=119 msg=Directory server DS(server_id) encountered an unexpected error while connecting to replication server host:port for domain "base_dn": ValidatorException: PKIX path validation failed: java.security.cert.CertPathValidatorException: signature check failed
To work around the issue, follow these steps:
-
Update all DS servers to use the same Java version.
Make sure you have a required Java environment installed on the system.
If your default Java environment is not appropriate, use one of the following solutions:
-
Edit the
default.java-home
setting in theopendj/config/java.properties
file. -
Set
OPENDJ_JAVA_HOME
to the path to the correct Java environment. -
Set
OPENDJ_JAVA_BIN
to the absolute path of thejava
command.
-
-
Export CA certificates generated with the different Java versions.
-
Export the CA certificate from an old server:
$ keytool \ -exportcert \ -alias ca-cert \ -keystore /path/to/old-server/config/keystore \ -storepass:file /path/to/old-server/config/keystore.pin \ -file java11-ca-cert.pem
-
Export the CA certificate from a new server:
$ keytool \ -exportcert \ -alias ca-cert \ -keystore /path/to/new-server/config/keystore \ -storepass:file /path/to/new-server/config/keystore.pin \ -file java17-ca-cert.pem
-
-
On all existing DS servers, import the new CA certificate:
$ keytool \ -importcert \ -trustcacerts \ -alias alt-ca-cert \ -keystore /path/to/old-server/config/keystore \ -storepass:file /path/to/old-server/config/keystore.pin \ -file java17-ca-cert.pem \ -noprompt
-
On all new DS servers, import the old CA certificate:
$ keytool \ -importcert \ -trustcacerts \ -alias alt-ca-cert \ -keystore /path/to/new-server/config/keystore \ -storepass:file /path/to/new-server/config/keystore.pin \ -file java11-ca-cert.pem \ -noprompt
The servers reload their keystores dynamically and replication works as expected.
Certificate-based authentication
Replication uses TLS to protect directory data on the network. Misconfiguration can cause replicas to fail to connect due to handshake errors. This leads to repeated error log messages such as the following:
...msg=Replication server accepted a connection from address to local address address but the SSL handshake failed. This is probably benign, but may indicate a transient network outage or a misconfigured client application connecting to this replication server. The error was: Received fatal alert: certificate_unknown
You can collect debug trace messages to help determine the problem.
To display the TLS debug messages, start the server with javax.net.debug
set:
$ OPENDJ_JAVA_ARGS="-Djavax.net.debug=all" start-ds
The debug trace settings result in many, many messages. To resolve the problem, review the output of starting the server, looking in particular for handshake errors.
If the chain of trust for your PKI is broken somehow, consider renewing or replacing keys, as described in Key management. Make sure that trusted CA certificates are configured as expected.
FIPS and key wrapping
DS servers use shared asymmetric keys to protect shared symmetric secret keys for data encryption.
By default, DS uses direct encryption to protect the secret keys.
When using a FIPS-compliant security provider that doesn’t allow direct encryption, such as Bouncy Castle,
change the Crypto Manager configuration to set the advanced property, key-wrapping-mode: WRAP
.
With this setting, DS uses wrap mode to protect the secret keys in a compliant way.
Compromised keys
How you handle the problem depends on which key was compromised:
-
For keys generated by the server, or with a deployment ID and password, refer to Retire secret keys.
-
For a private key whose certificate was signed by a CA, contact the CA for help. The CA might choose to publish a certificate revocation list (CRL) that identifies the certificate of the compromised key.
Replace the key pair that has the compromised private key.
-
For a private key whose certificate was self-signed, replace the key pair that has the compromised private key.
Make sure the clients remove the compromised certificate from their truststores. They must replace the certificate of the compromised key with the new certificate.
Client problems
Use the logs
By default, DS servers record messages for LDAP client operations
in the logs/ldap-access.audit.json
log file.
Show example log messages
[
{
"eventName": "DJ-LDAP",
"client": {
"ip": "<clientIp>",
"port": 12345
},
"server": {
"ip": "<clientIp>",
"port": 1389
},
"request": {
"protocol": "LDAP",
"operation": "CONNECT",
"connId": 0
},
"transactionId": "0",
"response": {
"status": "SUCCESSFUL",
"statusCode": "0",
"elapsedTime": 0,
"elapsedTimeUnits": "MILLISECONDS"
},
"timestamp": "<timestamp>",
"_id": "<uuid>"
},
{
"eventName": "DJ-LDAP",
"client": {
"ip": "<clientIp>",
"port": 12345
},
"server": {
"ip": "<clientIp>",
"port": 1389
},
"request": {
"protocol": "LDAP",
"operation": "SEARCH",
"connId": 0,
"msgId": 1,
"dn": "dc=example,dc=com",
"scope": "sub",
"filter": "(uid=bjensen)",
"attrs": ["ALL"]
},
"transactionId": "0",
"response": {
"status": "SUCCESSFUL",
"statusCode": "0",
"elapsedTime": 9,
"elapsedTimeUnits": "MILLISECONDS",
"nentries": 1
},
"timestamp": "<timestamp>",
"_id": "<uuid>"
},
{
"eventName": "DJ-LDAP",
"client": {
"ip": "<clientIp>",
"port": 12345
},
"server": {
"ip": "<clientIp>",
"port": 1389
},
"request": {
"protocol": "LDAP",
"operation": "UNBIND",
"connId": 0,
"msgId": 2
},
"transactionId": "0",
"timestamp": "<timestamp>",
"_id": "<uuid>"
},
{
"eventName": "DJ-LDAP",
"client": {
"ip": "<clientIp>",
"port": 12345
},
"server": {
"ip": "<clientIp>",
"port": 1389
},
"request": {
"protocol": "LDAP",
"operation": "DISCONNECT",
"connId": 0
},
"transactionId": "0",
"response": {
"status": "SUCCESSFUL",
"statusCode": "0",
"elapsedTime": 0,
"elapsedTimeUnits": "MILLISECONDS",
"reason": "Client Unbind"
},
"timestamp": "<timestamp>",
"_id": "<uuid>"
}
]
Each message specifies the operation performed, the client that requested the operation, and when it completed.
By default, the server does not log internal LDAP operations corresponding to HTTP requests. To match HTTP client operations to internal LDAP operations:
-
Prevent the server from suppressing log messages for internal operations.
Set
suppress-internal-operations:false
on the LDAP access log publisher. -
Match the
request/connId
field in the HTTP access log with the same field in the LDAP access log.
Client access
To help diagnose client errors due to access permissions, refer to Effective rights.
Simple paged results
For some versions of Linux, you read a message in the DS access logs such as the following:
The request control with Object Identifier (OID) "1.2.840.113556.1.4.319" cannot be used due to insufficient access rights
This message means clients are trying to use the simple paged results control without authenticating. By default, a global ACI allows only authenticated users to use the control.
To grant anonymous (unauthenticated) user access to the control, add a global ACI for anonymous use of the simple paged results control:
$ dsconfig \
set-access-control-handler-prop \
--hostname localhost \
--port 4444 \
--bindDN uid=admin \
--bindPassword "password" \
--add global-aci:"(targetcontrol=\"SimplePagedResults\") \
(version 3.0; acl \"Anonymous simple paged results access\"; allow(read) \
userdn=\"ldap:///anyone\";)" \
--usePkcs12TrustStore /path/to/opendj/config/keystore \
--trustStorePassword:file /path/to/opendj/config/keystore.pin \
--no-prompt
Replication problems
Replicas do not connect
If you set up servers with different deployment IDs, they cannot share encrypted data.
By default, they also cannot trust each other’s secure connections.
You may read messages like the following in the logs/errors
log file:
msg=Replication server accepted a connection from /address:port to local address /address:port but the SSL handshake failed.
Unless the servers use your own CA, make sure their keys are generated with the same deployment ID/password. Either set up the servers again with the same deployment ID, or refer to Replace deployment IDs.
Temporary delays
Replication can generally recover from conflicts and transient issues. Temporary delays are normal and expected while replicas converge, especially when the write load is heavy. This is a feature of eventual convergence, not a bug.
For more information, refer to Replication delay (LDAP).
Use the logs
By default, replication records messages in the log file, logs/errors
.
Replication messages have category=SYNC
.
The messages have the following form. The following example message is folded for readability:
...msg=Replication server accepted a connection from 10.10.0.10/10.10.0.10:52859 to local address 0.0.0.0/0.0.0.0:8989 but the SSL handshake failed. This is probably benign, but may indicate a transient network outage or a misconfigured client application connecting to this replication server. The error was: Remote host closed connection during handshake
Stale data
DS servers maintain historical information to bring replicas up to date, and to resolve conflicts. To prevent historical information from growing without limit, servers purge historical information after a configurable delay (replication-purge-delay, default: 3 days). A replica can become irrevocably out of sync if you restore it from a backup that is older than the purge delay, or if you stop it for longer than the purge delay. If this happens, reinitialize the replica from a recent backup or from a server that is up to date.
Incorrect configuration
When replication is configured incorrectly, fixing the problem can involve adjustments on multiple servers.
For example, adding or removing a bootstrap replication server means updating
the bootstrap-replication-server
settings in the synchronization provider configuration of other servers.
(The settings can be hard-coded in the configuration,
or read from the environment at startup time, as described in Property value substitution.
In either case, changing them involves at least restarting the other servers.)
For details, refer to Replication and the related pages.
Support
Sometimes you cannot resolve a problem yourself, and must ask for help or technical support. In such cases, identify the problem and how you reproduce it, and the version where you observe the problem:
$ status --offline --version
ForgeRock Directory Services 7.3.5-20240506155118-fba46f79e3510f00d8e7e5228e53672e49833f71
Build <datestamp>
Be prepared to provide the following additional information:
-
The Java home set in
config/java.properties
. -
Access and error logs showing what the server was doing when the problem started occurring.
-
A copy of the server configuration file,
config/config.ldif
, in use when the problem started occurring. -
Other relevant logs or output, such as those from client applications experiencing the problem.
-
A description of the environment where the server is running, including system characteristics, hostnames, IP addresses, Java versions, storage characteristics, and network characteristics. This helps to understand the logs, and other information.
-
The
.zip
file generated using thesupportextract
command.For an example showing how to use the command, refer to supportextract.