Directory server 1 was attempting to connect to replication server 2 but has disconnected in handshake phase error in DS 5 and OpenDJ 3.0, 3.5, 3.5.1
The purpose of this article is to provide assistance if you encounter a "Directory server 1 was attempting to connect to replication server 2 but has disconnected in handshake phase" error in DS/OpenDJ. You may also see a result=53 message="The Replication is configured for suffix dc=openam,dc=example,dc=org but was not able to connect to any Replication Server". This issue also affects the embedded DS/OpenDJ in AM/OpenAM.
1 reader recommends this article
Archived
This article has been archived and is no longer maintained by ForgeRock.
Symptoms
The errors you will see vary slightly depending on whether DS/OpenDJ is standalone or embedded as follows:
Standalone DS/OpenDJ
The following errors are shown in the DS/OpenDJ Errors log:
[11/Nov/2016:13:22:51 -0400] category=SYNC severity=NOTICE msgID=org.opends.messages.replication.204 msg=Replication server RS(1245) started listening for new connections on address 0.0.0.0 port 8989 [11/Nov/2016:13:22:55 -0400] category=SYNC severity=WARNING msgID=org.opends.messages.replication.208 msg=Directory server DS(8764) was unable to connect to any replication servers for domain "cn=admin data" [11/Nov/2016:13:22:59 -0400] category=SYNC severity=WARNING msgID=org.opends.messages.replication.208 msg=Directory server DS(8764) was unable to connect to any replication servers for domain "dc=example,dc=com" [11/Nov/2016:13:23:04 -0400] category=SYNC severity=WARNING msgID=org.opends.messages.replication.208 msg=Directory server DS(8764) was unable to connect to any replication servers for domain "cn=schema" [11/Nov/2016:13:23:26 -0400] category=SYNC severity=ERROR msgID=org.opends.messages.replication.178 msg=Directory server 8764 was attempting to connect to replication server 1245 but has disconnected in handshake phase[11/Nov/2016:13:43:25 -0400] MODIFY RES conn=4 op=114 msgID=115 result=53 message="The Replication is configured for suffix dc=openam,dc=example,dc=org but was not able to connect to any Replication Server" etime=1Note
A quick trace of the log messages as demonstrated in OPENDJ-1135 (DS sometimes fail to connect to RS after server restart) points to a problem with timed out connections because the (RS) is unreachable. The logs show the RS listener is accepting local connections after they have already timed out on the client side (DS).
Embedded DS/OpenDJ
The following error is shown in the embedded DS/OpenDJ logs:
ERROR: Directory server DS(8764) encountered an unexpected error while connecting to replication server host1.example.com:8080 for domain "dc=example,dc=com": SocketException: Broken pipe (SocketOutputStream.java:-2 SocketOutputStream.java:113 SocketOutputStream.java:159 OutputRecord.java:377 OutputRecord.java:363 SSLSocketImpl.java:849 SSLSocketImpl.java:820 SSLSocketImpl.java:691 Handshaker.java:1011 ClientHandshaker.java:1187 ClientHandshaker.java:1099 ClientHandshaker.java:345 Handshaker.java:913 Handshaker.java:849 SSLSocketImpl.java:1035 SSLSocketImpl.java:1344 SSLSocketImpl.java:1371 SSLSocketImpl.java:1355 ReplSessionSecurity.java:196 ReplicationBroker.java:1080 ReplicationBroker.java:792 ...) EmbeddedDJ:10/02/2016 12:26:48:618 AM CDT: Thread[Replication server RS(1245) connection listener on port 50889,5,Directory Server Thread Group]: TransactionId[ea06d007-df28-492b-b74c-7395e70e49bc-6038342] ERROR: Directory server 8764 was attempting to connect to replication server 1245 but has disconnected in handshake phaseThe following error is shown in the AM/OpenAM Configuration debug log:
amSetupServlet:11/16/2016 10:18:52:187 AM CDT: Thread[localhost-startStop-2,5,main]: TransactionId[af908ced-ff94-4ae6-a375-dc06e33eb0d9-5856958] ERROR: EmbeddedOpenDS:shutdown hook failed java.lang.NullPointerException at org.opends.server.core.DirectoryServer.shutDown(DirectoryServer.java:6170) at com.sun.identity.setup.EmbeddedOpenDS.shutdownServer(EmbeddedOpenDS.java:513) at com.sun.identity.setup.EmbeddedOpenDS$1.shutdown(EmbeddedOpenDS.java:490) at com.sun.identity.common.ShutdownManager.shutdown(ShutdownManager.java:211) at com.sun.identity.common.ShutdownServletContextListener.contextDestroyed(ShutdownServletContextListener.java:51) at org.apache.catalina.core.StandardContext.listenerStop(StandardContext.java:5014) at org.apache.catalina.core.StandardContext.stopInternal(StandardContext.java:5659) at org.apache.catalina.util.LifecycleBase.stop(LifecycleBase.java:232) at org.apache.catalina.core.ContainerBase$StopChild.call(ContainerBase.java:1575) at org.apache.catalina.core.ContainerBase$StopChild.call(ContainerBase.java:1564) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)You will also see generic "Unwilling to Perform" errors such as the following in the AM/OpenAM Configuration log:
amSMSEmbeddedLdap:11/16/2016 12:09:52:289 PM CDT: Thread[http-nio-0.0.0.0-9443-exec-16,5,main]: TransactionId[daaeffd3-d4a3-4324-8f1d-865e43f34773-168] ERROR: SMSEmbeddedLdapObject.modify: Error modifying entry ou=AgentUsers,ou=default,ou=OrganizationConfig,ou=1.0,ou=sunidentityrepositoryservice,ou=services,o=agentusers,ou=services,dc=openam,dc=example,dc=com by Principal: id=amadmin,ou=user,dc=openam,dc=example,dc=com, error code = Unwilling to PerformAM/OpenAM
If you are using DS/OpenDJ with AM/OpenAM, you will encounter a variety of issues if AM/OpenAM cannot reach the DS/OpenDJ configuration store and/or user store. These issues include failures when making configuration changes, users not able to authenticate and installations failing with the following error shown in the Install log:
AMSetupServlet.processRequest: errororg.forgerock.opendj.ldap.ConnectionException: Server Connection Closed: Heartbeat failedRecent Changes
Shut down the DS/OpenDJ replication server (RS) or the RS is down for another reason.
Restarted the RS instance.
Rebooted the VM in which DS/OpenDJ is running.
Causes
In a replicated system, the directory server (DS) must be able to connect to the RS. If the RS cannot be contacted (for example, it is down), any writes to the DS/OpenDJ server (whether they originate from DS/OpenDJ or AM/OpenAM) will fail. Restarting a server can also prevent the DS connecting to its own RS during the handshake phase.
Solution
This issue can be resolved by upgrading to DS 5.5 and later, or OpenDJ 3.5.2; you can download this from BackStage.
Workaround
As a workaround, you can restart the RS. If you are using the embedded DS/OpenDJ in AM/OpenAM, you should restart the web application container in which AM/OpenAM runs to do this.
If the other DS/OpenDJ instance is offline for a period longer than your purge delay, you will need to initialize it from a running server once it's back online to take account of any updates that occurred while it was down, for example:
$ ./dsreplication initialize --adminUID admin --adminPassword password --baseDN dc=example,dc=com --hostSource ds1.example.com --portSource 4444 --hostDestination ds2.example.com --portDestination 4444 --trustAll --no-promptSee Also
How do I repair replication configuration in DS 6.x when dsreplication has failed?
Generation IDs do not match error after restoring a DS (All versions) replica
How do I use cn=monitor entry in DS 6.x for monitoring?
Related Training
N/A
Related Issue Tracker IDs
OPENDJ-1135 (DS sometimes fail to connect to RS after server restart)