Replication fails and you see an error such as the following in your logs:
[25/Aug/2017:10:27:23 +0000] category=SYNC severity=SEVERE_ERROR msgID=14942389 msg=The connection from this replication server RS(1234) to replication server RS(5678) at ds1.example.com/203.0.113.0:9989 for domain "dc=example,dc=com" has failed
Observing the connection flow
When a DS/OpenDJ instance starts, the connection flow should be as follows:
- The instance starts up the Replication Server (RS) and starts listening for connections on the replication port.
- The local RS connects to a remote RS for each domain; cn=schema, cn=admin data and the replicated backend.
- The local Directory Server (DS) for each domain connects to its local RS.
When you encounter this error, you will be able to observe the DS connecting to its local RS in the log files, but you will not not see the local RS connecting to the remote RS:
- You will see messages similar to the following to indicate that the DS has connected
to the local RS:
[21/Jun/2017:16:17:24 -0400] category=SYNC severity=INFORMATION msgID=131 msg=Replication server RS(1234) has accepted a connection from directory server DS(5678) for domain "dc=example,dc=com" at ds1.example.com/203.0.113.0:9989
- But you
see messages like the following that show the local RS connecting to the remote RS:
[21/Jun/2017:16:17:56 -0400] category=SYNC severity=INFORMATION msgID=116 msg=Replication server RS(9012) has accepted a connection from replication server RS(1234) for domain "dc=example,dc=com" at ds1.example.com/203.0.113.0:9989
Network changes, such as updates to the firewall.
The local RS cannot connect to the remote RS; this happens when the RS server is down or unreachable.
You need to check that the following are all true, and if not, resolve any issues you encounter:
- The remote RS is up and running.
- The network is working correctly.
- The local RS can successfully connect to the remote RS over the network. You should
test connectivity as indicated below. If connectivity fails, here are a few suggested things to check
that can commonly prevent connection:
- Is there a firewall or other network device blocking the replication port and/or the admin port?
- Is the hostname resolution as expected? For example, can the DNS resolve each hostname from the other server? Do the hostnames resolve to IP addresses physically present on the servers?
Once you have resolved any issues and confirmed that the local RS can connect to the remote RS, you will need to reinitialize replication and ensure the servers are in sync. You can reinitialize replication using the initialize command, for example:
$ ./dsreplication initialize --adminUID admin --adminPassword password --baseDN dc=example,dc=com --hostSource ds1.example.com --portSource 4444 --hostDestination ds2.example.com --portDestination 5444 --trustAll --no-prompt
You should test connectivity to ensure that each server can connect to each others' replication ports. You can use a variety of tools for this, for example:
$ openssl s_client -connect [remote_server]:[replication_port]
$ telnet [remote_server] [replication_port]
Ensure that you test connectivity from all servers to verify that all connections are working as expected.
Related Issue Tracker IDs