How To
ForgeRock Identity Platform
Does not apply to Identity Cloud

How do I troubleshoot replication issues in DS 5.x and 6.x?

Last updated Apr 13, 2021

The purpose of this article is to provide assistance for troubleshooting replication issues in DS. It also provides other useful information about replication, including: background information, regular tasks you should perform to ensure replication is behaving as expected (to avoid future issues) and the recommended ways to recover and stop replication.


2 readers recommend this article

Warning

Do not compress, tamper with, or otherwise alter changelog database files directly unless specifically instructed to do so by a qualified ForgeRock technical support engineer. External changes to changelog database files can render them unusable by the server. By default, changelog database files are located under the /path/to/ds/changelogDb directory.

Overview

This article provides background information on replication and how it works, and also on how to monitor and troubleshoot replication issues. The following topics are covered:

Background information

DS uses a DS and RS model for replication.

  • A DS is a Directory Server. DSs contain the backend databases and answer client requests.
  • A RS is a Replication Server. RSs contain a changelog and handle replication traffic with DSs and with other RSs; receiving, sending and storing only changes to directory data rather than directory data itself. A DS connects to an RS for replication purposes.

When installed without replication enabled, the DS instance is a DS by default. If you enable replication, the instance spins up RS threads within the process. The instance then becomes a DS+RS.

A DS replication topology can consist of the following types of instances:

  • DS+RS
  • Standalone DS
  • Standalone RS

With this in mind, each DS and each RS have a unique ID assigned to them. The DS ID is used for keeping track of changes to the system and is included in the CSN. See Using the CSN to troubleshoot data consistency for further information on decoding the CSN and identifying the DS ID.

See Administration Guide › About Replication and Administration Guide › Replication Per Suffix for further information on replication.

Note

You should be cautious about changing the hostname as this affects replication. If you need to change it, follow the procedure given in How do I change the hostname in DS 5.x and 6.x? to ensure replication is correctly handled. Additionally, you should be consistent with your use of either FQDNs or IP addresses for hostnames as noted in Administration Guide › Configuring Replication.

changelogDb

The changelog stores replication changes in the replication changes database (changelogDb directory). These changes are purged from the changelog according to the replication purge delay setting. You must ensure this is set appropriately to keep data long enough for replication and data recovery purposes; these changes are permanently lost once they are purged from the changelog. See How do I control how long replication changes are retained in DS (All versions)? and FAQ: Backup and restore in DS 5.x and 6.x (Q. When does the replication purge take place?) for further information.

Change Sequence Number (CSN)

The CSN is used to track changes via replication and the changelog. The CSN is an encoded value that represents the date and time, the change number for that timestamp and the DS's server ID. See Using the CSN to troubleshoot data consistency for further information on decoding the CSN.

Generation ID

The generation ID is a checksum of attributes from some of the entries and is used during replication to check that the suffix being updated is the same as the one offering the updates.

Checking the status of replication

It is very important to check the status of replication on a regular basis so that you can be confident that all changes are being replayed successfully; in particular it can be helpful to check if you notice that replication changes are slower than expected.

You can use the dsreplication status command to give you an overall view of the replication topology. The output shows you information on a per server/suffix basis, including how many entries each server has as well as any missing changes. For example:

$ ./dsreplication status --adminUID admin --adminPassword password --hostname ds1.example.com --port 4444 --trustAll Suffix DN : Server : Entries : Replication enabled : DS ID : RS ID : RS Port (1) : Delay (ms) : Security (2) ------------------:----------------------:---------:---------------------:-------:-------:-------------:------------:-------------- dc=example,dc=com : ds1.example.com:4444 : 2002 : true : 12790 : 11500 : 8989 : 0 : true dc=example,dc=com : ds2.example.com:5444 : 2002 : true : 14057 : 12210 : 9989 : 0 : true

The Delay (ms) metric replaces the M.C.and A.O.M.C. metrics returned in pre-DS 6.5.

Compare the Entries values across the servers to ensure they match and monitor the replication delay. In DS 5.x, you should also check there are no missing changes (M.C.). If you see discrepancies, you should run dsreplication status a few times in short succession to see if the server catches up. If Entries values continue to be different and/or you still have replication delays or missing changes, you know replication is out of sync.

Note

It is possible that you will sometimes see missing changes but as long as they do go away (that is, the count returns to 0) it is normal. dsreplication status searches each server's backend for the information it uses to calculate entries / missing entries. This is like trying to hit a moving target and can sometimes display missing changes when there aren't any. This is why it is important to monitor replication on a regular basis.

See Reference › dsreplication for further information.

Monitoring replication

It is very important to monitor replication on a regular basis; you should also ensure your systems are not down for longer than the period set in your purge delay. If a server is down longer than the purge delay and the entry count has not changed, any LDAP MOD operations that took place during this time will not be seen as Missing Changes in the M.C. column of the dsreplication status output. Any ADD or DEL operations during this time will show a difference in the number of entries.  For example, if the purge delay is one day and Master 2 is down for two days, any modifications made to Master 1 will be purged after one day and never seen in the M.C. column in the replication status.

You can run a ldapsearch against baseDN "cn=Replication,cn=monitor" to monitor replication. You can either return all LDAP attributes under cn=monitor or specific LDAP attributes, as required. For example, you may use a command such as this, which includes some useful attributes to check for replication purposes:

  • DS 6.x: $ ./ldapsearch --port 389 --bindDN "cn=Directory Manager" --bindPassword password --baseDN "cn=Replication,cn=monitor" --searchScope sub "(&(objectClass=*)(ds-mon-domain-name=dc=example,dc=com))" \* +
  • DS 5.x: $ ./ldapsearch --port 389 --bindDN "cn=Directory Manager" --bindPassword password --baseDN "cn=Replication,cn=monitor" --searchScope sub "(&(objectClass=*)(domain-name=dc=example,dc=com))" \* +

Example response in DS 6:

dn: ds-mon-domain-name=dc=example\,dc=com,cn=changelog,cn=replication,cn=monitor objectClass: top objectClass: ds-monitor objectClass: ds-monitor-changelog-domain ds-mon-domain-generation-id: 1938690 ds-mon-missing-changes: 0 ds-mon-domain-name: dc=example,dc=com pwdPolicySubentry: cn=Default Password Policy,cn=Password Policies,cn=config subschemaSubentry: cn=schema hasSubordinates: true entryUUID: 86861fcf-7c9b-36fe-a280-f4361a3e936c numSubordinates: 3 etag: 0000000064c735d8 structuralObjectClass: ds-monitor-changelog-domain entryDN: ds-mon-domain-name=dc=example\,dc=com,cn=changelog,cn=replication,cn=monitor dn: ds-mon-domain-name=dc=example\,dc=com,cn=replicas,cn=replication,cn=monitor objectClass: top objectClass: ds-monitor objectClass: ds-monitor-replica ds-mon-ds-mon-server-id: 11406 ds-mon-domain-generation-id: 1938690 ds-mon-server-state: 00000168c95fd61b52f50001a196 ds-mon-sent-updates: 0 ds-mon-replayed-updates: {"count":4,"total":4957.000,"mean_rate":0.000,"m1_rate":0.000,"m5_rate":0.000,"m15_rate":0.000,"mean":1241.383,"min":207.618,"max":2332.033,"stddev":963.963,"p50":358.613,"p75":2080.375,"p95":2332.033,"p98":2332.033,"p99":2332.033,"p999":2332.033,"p9999":2332.033,"p99999":2332.033} ds-mon-current-delay: 0 ds-mon-connected-to-server-id: 27782 ds-mon-connected-to-server-hostport: ds1.example.com:8989 ds-mon-lost-connections: 0 ds-mon-current-receive-window: 2147483643 ds-mon-ssl-encryption: false ds-mon-updates-outbound-queue: 0 ds-mon-replayed-updates-conflicts-resolved: 0 ds-mon-replayed-updates-conflicts-unresolved: 0 ds-mon-entries-awaiting-updates-count: 0 ds-mon-updates-inbound-queue: 0 ds-mon-ds-mon-updates-totals-per-replay-thread: [0, 1, 4, 0] ds-mon-assured-sr-sent-updates: 0 ds-mon-assured-sr-acknowledged-updates: 0 ds-mon-assured-sr-not-acknowledged-updates: 0 ds-mon-assured-sr-timeout-updates: 0 ds-mon-assured-sr-wrong-status-updates: 0 ds-mon-assured-sr-replay-error-updates: 0 ds-mon-assured-sr-received-updates: 0 ds-mon-assured-sr-received-updates-acked: 0 ds-mon-assured-sr-received-updates-not-acked: 0 ds-mon-assured-sd-sent-updates: 0 ds-mon-assured-sd-acknowledged-updates: 0 ds-mon-assured-sd-timeout-updates: 0 ds-mon-status-last-changed: 20190205153757.130-0500 ds-mon-status: Normal ds-mon-domain-name: dc=example,dc=com pwdPolicySubentry: cn=Default Password Policy,cn=Password Policies,cn=config subschemaSubentry: cn=schema hasSubordinates: true entryUUID: 54f15ca8-98e3-3e6e-8e29-0cd2bdadce14 numSubordinates: 1 etag: 00000000a8eef794 structuralObjectClass: ds-monitor-replica entryDN: ds-mon-domain-name=dc=example\,dc=com,cn=replicas,cn=replication,cn=monitor dn: ds-mon-server-id=21237,cn=remote replicas,ds-mon-domain-name=dc=example\,dc=com,cn=replicas,cn=replication,cn=monitor objectClass: top objectClass: ds-monitor objectClass: ds-monitor-remote-replica ds-mon-domain-name: dc=example,dc=com ds-mon-current-delay: 0 ds-mon-replayed-updates: {"count":4,"total":5064.000,"mean_rate":0.000,"m1_rate":0.000,"m5_rate":0.000,"m15_rate":0.000,"mean":1264.452,"min":207.618,"max":2365.587,"stddev":986.156,"p50":358.613,"p75":2139.095,"p95":2365.587,"p98":2365.587,"p99":2365.587,"p999":2365.587,"p9999":2365.587,"p99999":2365.587} ds-mon-server-id: 21237 pwdPolicySubentry: cn=Default Password Policy,cn=Password Policies,cn=config subschemaSubentry: cn=schema hasSubordinates: false entryUUID: 0d097c14-9b25-3768-a7a4-47540fe6b247 numSubordinates: 0 etag: 0000000069f67cef structuralObjectClass: ds-monitor-remote-replica entryDN: ds-mon-server-id=21237,cn=remote replicas,ds-mon-domain-name=dc=example\,dc=com,cn=replicas,cn=replication,cn=monitor

Meanings of key attributes, such as the ones included in the above example:

Attribute (DS 6.x) Attribute (DS 5.x) Meaning
ds-mon-status-last-changed last-status-change-date The date and time of the last status change. 
ds-mon-lost-connections lost-connections The number of times connection has been lost between DSs and RSs. This value should roughly equate to the number of times you have stopped replication. If it is much greater, you should investigate to find out what is causing these connection losses.
ds-mon-assured-sr-received-updates-acked received-updates The number of replicated changes received by this server. This value should match replayed-updates.
ds-mon-sent-updates sent-updates

The number of changes that have been sent by this server. You should see these being received and applied to other servers via the received-updates / replayed-updates attribute values.

This value, in conjunction with the received-updates / replayed-updates values on the other servers, indicates how well replication is working.

ds-mon-replayed-updates replayed-updates The number of replicated changes that have been applied to this server. This value should match received-updates.
ds-mon-updates-outbound-queue pending-updates The number of replicated changes waiting to be applied to this server.
ds-mon-replayed-updates replayed-updates-ok The number of replicated changes that have been successfully applied to this server. This value should match received-updates.
-- resolved-modify-conflicts The number of modify conflicts that have been resolved since the server was last started. Modify conflicts are always resolved automatically.
ds-mon-replayed-updates-conflicts-resolved resolved-naming-conflicts The number of naming conflicts that have been resolved since the server was last started. This value includes both automatically and manually resolved conflicts. 
ds-mon-replayed-updates-conflicts-unresolved unresolved-naming-conflicts

The number of unresolved naming conflicts since the server was last started. This value should equal 0, otherwise it means there are naming conflicts that you need to identify and resolve manually.

Naming conflicts are identified by an additional entryuuid RDN in the DN as demonstrated in the Identifying replication issues from the DS log files section.

This value does not decrease once a conflict has been manually resolved. An RFE exists for this: OPENDJ-251 (Provide count of unresolved replication naming conflicts as part of the Monitoring information)

ds-mon-current-delay --

The delay reflects the time between the latest update that the replica has received and the latest update that the replica has replayed. This metric is accurate only when the replica receives updates quickly. In the event of a network partition, the delay cannot accurately reflect updates happening on the other side of the partition.

Replaces ds-mon-missing-changes and ds-mon-approximate-delay attributes in DS 6.x.

ds-mon-missing-changes missing-changes

The number of changes that have been sent by the other server but have not yet been applied to this server. This value should equal 0.

This attribute is deprecated in DS 6; you should use ds-mon-current-delay instead.

ds-mon-approximate-delay approximate-delay

The difference between the RS current time and the timestamp of the oldest update that has not yet been sent to the DS; this indicates replication latency. This value should equal 0.

This attribute is deprecated in DS 6; you should use ds-mon-current-delay instead.

numSubordinates numSubordinates The number of entries below the baseDN.
Note

Some of these attributes can indicate issues with replication; however, there can be an innocent reason for these discrepancies since replication is not real time and replicated changes can be delayed because of things such as network speed or latency, CPU usage and overall load on the individual instances. That is why it is important to monitor replication on a regular basis to understand what is normal for your environment.

Using the CSN to troubleshoot data consistency

Some attributes, such as ds-sync-state and ds-sync-hist give the CSN so you can determine the replication state. The format of the CSN varies slightly by version as follows:

  • DS 6.5.x:
    • The first 4 digits provide versioning information (in the format 01xx for DS 6.5.x), where xx indicates the length of the server ID.
    • The next 12 digits are the timestamp.
    • The next 8 digits are the sequence numbers that identify the change.
    • The remaining characters are the server ID (DS ID).
  • Pre-DS 6.5:
    • The first 4 digits provide versioning information (always 0000 to indicate a pre-6.5 version).
    • The next 12 digits are the timestamp.
    • The next 4 digits are the server ID (DS ID).
    • The last 8 digits are the sequence numbers that identify the change.

For example, the ds-sync-state value of 0105016c7034aec2000f360e12790 in DS 6.5 decodes as follows using the decodecsn tool:

$ ./decodecsn 0105016c7034aec2000f360e12790 CSNv2 0105016c7034aec2000f360e12790 ts=016c7034aec2 (1565250596546) Thu Aug 8 2019 15:49:56.546 SGT id=12790 no=000f360e (996878)

This gives you the timestamp of the change, the ID of the DS server where the change was made (12790) and the the sequence number. From the first 4 digits (0105), you can tell that the server ID is 5 characters. Comparing this DS ID to the information output from dsreplication status identifies this DS as Master 1.

Innocent replication example

The following ldapsearch example shows an innocent replication issue where there are multiple ds-sync-state attributes:

$ ./ldapsearch --port 389 --bindDN "cn=Directory Manager" --bindPassword password --searchScope base --baseDN dc=example,dc=com "(objectClass=*)" \* ds-sync-state + dn: dc=example,dc=com objectClass: top objectClass: domain dc: example ds-sync-state: 0000015d190d9ab64d712790003a ds-sync-state: 0000015d18b72dd8550b0000000a

The presence of multiple ds-sync-state attributes indicates that you have (or have had) many different replicas in the lifetime of your replicated directory service.

ds-sync-hist example

The following ldapsearch example shows that the attributes sn, cn, postalAddress, and givenname were changed to Doe, John Doe, John Doe$01251 Chestnut Street$Panama City, DE  50369 and John respectively:

$ ./ldapsearch --port 389 --bindDN "cn=Directory Manager" --bindPassword password --searchScope sub --baseDN dc=example,dc=com "(uid=user.0)" \* ds-sync-hist + dn: uid=user.0,ou=People,dc=example,dc=com objectClass: top objectClass: inetOrgPerson objectClass: organizationalPerson objectClass: person mail: user.0@maildomain.net initials: ASA homePhone: +1 225 216 5900 pager: +1 779 041 6341 givenName: John employeeNumber: 0 telephoneNumber: +1 685 622 6202 mobile: +1 010 154 3228 sn: Doe cn: John Doe userPassword: {SSHA}f+6nCXygJSBwS9G3VDAOXNDRvI+YXI3CYswvug== description: This is the description for Aaccf Amar. street: 01251 Chestnut Street st: DE postalAddress: Aaccf Amar$01251 Chestnut Street$Panama City, DE 50369 uid: user.0 l: Panama City postalCode: 50369 ds-sync-hist: sn:0000015d13a119d23b470000005f:repl:Doe ds-sync-hist: cn:0000015d13a119d23b470000005f:repl:John Doe ds-sync-hist: postaladdress:0000015d13a119d23b470000005f:repl:John Doe$01251 Chestnut Street$Panama City, DE 50369 ds-sync-hist: modifiersName:0000015d13a119d23b470000005f:repl:cn=Directory Manager ds-sync-hist: modifyTimestamp:0000015d13a119d23b470000005f:repl:20170705164151Z ds-sync-hist: givenname:0000015d13a119d23b470000005f:repl:John modifyTimestamp: 20170705164151Z modifiersName: cn=Directory Manager entryUUID: 0d3ce3bf-4107-3b34-9e5a-fa71deb8b504 pwdPolicySubentry: cn=Default Password Policy,cn=Password Policies,cn=config subschemaSubentry: cn=schema hasSubordinates: false numSubordinates: 0 etag: 000000001d86d123 structuralObjectClass: inetOrgPerson entryDN: uid=user.0,ou=People,dc=example,dc=com

Data consistency example

Remembering what a CSN (Change Sequence Number) is, each CSN represents a single change, be it an ADD, DELETE or MODIFY. These CSNs therefore represent a data element's (change) consistency ID. Since dsreplication checks for status based on the deltas of these CSNs, we can extrapolate that if all servers have all changes/CSNs, then the data can be deemed consistent between the instances. If an instance is missing a change, then it can be assumed there is a divergence in the consistency of each entry on the database.

The following worked example demonstrates using the CSN to determine data consistency:

  1. Let's take a two master replication topology with 2000 entries. Since the following dsreplication status is taken just after instance setup, data creation and initialization, the data is known to be consistent; at this point, the backends are the same and there are no changes yet, that is, the changelogDb has 0 changes: $ ./dsreplication status --adminUID admin --adminPassword password --hostname ds1.example.com --port 4444 --trustAll Suffix DN : Server : Entries : Replication enabled : DS ID : RS ID : RS Port (1) : Delay (ms) : Security (2) ------------------:----------------------:---------:---------------------:-------:-------:-------------:------------:-------------- dc=example,dc=com : ds1.example.com:4444 : 2000 : true : 14409 : 5070 : 8989 : 0 : true dc=example,dc=com : ds1.example.com:5444 : 2000 : true : 26696 : 3946 : 9989 : 0 : true
  2. Make a change in the form of an ADD (adding John Doe). The CSN for this change is 00000155c6188be0384900000001 as seen in the changelog (both masters' changelogs now have the same change - an ADD): dn: changeNumber=1,cn=changelog objectClass: top objectClass: changeLogEntry changeNumber: 1 changeTime: 20160707160225Z changeType: add targetDN: uid=jdoe,ou=People,dc=example,dc=com changes:: b2JqZWN0Q2xhc3M6IG9yZ2FuaXphdGlvbmFsUGVyc29uCm9iamVjdENsYXNzOiB0b3AKb2JqZWN0Q2xhc3M6IHBlcnNvbgpvYmplY3RDbGFzczogaW5ldE9yZ1BlcnNvbgp1aWQ6IGpkb2UKZ2l2ZW5OYW1lOiBKb2huCnNuOiBEb2UKY246IEpvaG4gRG9lCnVzZXJQYXNzd29yZDoge1NTSEF9WmJTcnJDL05BMHEwUFBzQmRPaVdRZTRaV3FQTDQ5Nll2RmR2NVE9PQplbnRyeVVVSUQ6IGY0MTRmZmVkLTVlZDAtNDUzNy1iMDU5LTU5YzUyMTc5MmNkMApjcmVhdGVUaW1lc3RhbXA6IDIwMTYwNzA3MTYwMjI1Wgpwd2RDaGFuZ2VkVGltZTogMjAxNjA3MDcxNjAyMjUuMzc2WgpjcmVhdG9yc05hbWU6IGNuPURpcmVjdG9yeSBNYW5hZ2VyLGNuPVJvb3QgRE5zLGNuPWNvbmZpZw== subschemaSubentry: cn=schema numSubordinates: 0 hasSubordinates: false entryDN: changeNumber=1,cn=changelog replicationCSN: 00000155c6188be0384900000001 replicaIdentifier: 14409 changeInitiatorsName: cn=Directory Manager targetEntryUUID: f414ffed-5ed0-4537-b059-59c521792cd0 changeLogCookie: dc=example,dc=com:00000155c6188be0384900000001;
  3. Check the replication status again: $ ./dsreplication status --adminUID admin --adminPassword password --hostname ds1.example.com --port 4444 --trustAll Suffix DN : Server : Entries : Replication enabled : DS ID : RS ID : RS Port (1) : Delay (ms) : Security (2) ------------------:----------------------:---------:---------------------:-------:-------:-------------:------------:-------------- dc=example,dc=com : ds1.example.com:4444 : 2001 : true : 14409 : 5070 : 8989 : 0 : true dc=example,dc=com : ds1.example.com:5444 : 2001 : true : 26696 : 3946 : 9989 : 0 : trueSince the dsreplication command uses the CSN as its basis for replication and data consistency, we now have proof that the data element representing the ADD is on both servers, and because the data element is on both servers, we know based on the matching "entry count" from the replication status, that the entries in the backend are consistent. If the CSN (change) was not played to the other server, the servers' CSNs would not match and therefore the data would not be consistent. This would be seen as a difference in the entry count displayed by dsreplication status.
  4. Make a simple MODIFY to John's entry (add a description). The CSN for this change is 00000155c619858e384900000002. dn: changeNumber=2,cn=changelog objectClass: top objectClass: changeLogEntry changeNumber: 2 changeTime: 20160707160329Z changeType: modify targetDN: uid=jdoe,ou=People,dc=example,dc=com changes:: YWRkOiBkZXNjcmlwdGlvbgpkZXNjcmlwdGlvbjogVGhpcyBpcyBKb2huJ3MgRGVzY3JpcHRpb24KLQpyZXBsYWNlOiBtb2RpZmllcnNOYW1lCm1vZGlmaWVyc05hbWU6IGNuPURpcmVjdG9yeSBNYW5hZ2VyLGNuPVJvb3QgRE5zLGNuPWNvbmZpZwotCnJlcGxhY2U6IG1vZGlmeVRpbWVzdGFtcAptb2RpZnlUaW1lc3RhbXA6IDIwMTYwNzA3MTYwMzI5Wgot subschemaSubentry: cn=schema numSubordinates: 0 hasSubordinates: false entryDN: changeNumber=2,cn=changelog replicationCSN: 00000155c619858e384900000002 replicaIdentifier: 14409 changeInitiatorsName: cn=Directory Manager targetEntryUUID: f414ffed-5ed0-4537-b059-59c521792cd0 changeLogCookie: dc=example,dc=com:00000155c619858e384900000002;
  5. Check the replication status again and observe there are no Missing Changes. The Delay (ms) metric replaces the M.C.and A.O.M.C. metrics returned in pre-DS 6.5: $ ./dsreplication status --adminUID admin --adminPassword password --hostname ds1.example.com --port 4444 --trustAll Thu Jul 7 10:26:12 MDT 2016 Suffix DN : Server : Entries : Replication enabled : DS ID : RS ID : RS Port (1) : Delay (ms) : Security (2) ------------------:----------------------:---------:---------------------:-------:-------:-------------:------------:-------------- dc=example,dc=com : ds1.example.com:4444 : 2001 : true : 14409 : 5070 : 8989 : 0 : true dc=example,dc=com : ds1.example.com:5444 : 2001 : true : 26696 : 3946 : 9989 : 0 : trueKnowing the same criteria from above, all CSNs have been played from Master 1 to Master 2 and therefore again, we have proof that the data is consistent between Master 1 and Master 2.

Identifying replication issues from the DS log files

The following table shows error messages you may see in your logs along with what they mean and possible resolutions:

Error Meaning / resolution
dn="entryuuid=bfbbd0fd-53ba-451f-93a1-2f446f4de18+uid=user1,dc=example,dc=com"

This entry indicates a naming conflict (that is, you have an additional entryuuid RDN in the DN), which can happen when changes are applied to two servers at the same time meaning replication then creates a duplicate entry.

It can also occur as a result of the following known issue: OPENDJ-3343 (Invalid Conflict resolution on Add sequence when Parent & Child are added on different replica). This issue is fixed in DS 6.5.

This can be resolved by locating the duplicate entries on both servers and then deleting the entry that was modified first. DS understands that you are deleting a conflicting entry and cleans up after itself. See Administration Guide › Resolving Replication Conflicts for further information.

category=SYNC severity=MILD_ERROR msgID=14876739 msg=Could not replay operation AddOperation(connID=-1, opID=72, dn=uid=user1,ou=People,dc=example,dc=com) with ChangeNumber 00000148a9d35134620200000002 error Unwilling to Perform There is not enough space on the disk for the database to perform the write operation"

The server has low disk space, which prevents write operations from taking place.

This can be resolved by increasing or freeing up disk space. Once space is available, replication will resume with the next set of changes, although you will be missing the changes indicated in this error as they will have been skipped.

You should avoid running out of disk space by utilizing the disk space monitoring tools detailed in Administration Guide › Setting Disk Space Thresholds For Database Backends.

category=SYNC severity=SEVERE_WARNING msgID=14811232 msg=Directory server DS(42134) has connected to replication server RS(42706) for domain "cn=admin data" at ds1.example.com/198.51.100.0:8080, but the generation IDs do not match, indicating that a full re-initialization is required. The local (DS) generation ID is 36215293 and the remote (RS) generation ID is 172193 The generation ID contained in the restored data is not the same as the one in the current replication topology. See Generation IDs do not match error after restoring a DS (All versions) replica for further information on resolving this.
category=SYNC severity=INFORMATION msgID=14680180 msg=Late monitor data received for domain "cn=schema" from replication server RS(1797), and will be ignored

 category=SYNC severity=SEVERE_WARNING msgID=14811242 msg=Timed out waiting for monitor data for the domain "dc=example,dc=com" from replication server RS(1797) 

The other server is not responding quickly enough with monitoring information or there is a heavy load on the server.

This can be caused by an underlying network issue but is typically related to high levels of Garbage Collection (GC) and can be resolved by tuning your JVM. See Best practice for JVM Tuning with CMS GC for further information.

category=SYNC severity=SEVERE_ERROR msgID=14841194 msg=Replication server caught exception while listening for client connections Read timed out This entry indicates network issues or there is a heavy load on the server.
category=SYNC severity=ERROR msgID=org.opends.messages.replication.178 msg=Directory server 8764 was attempting to connect to replication server 1245 but has disconnected in handshake phase

 category=SYNC severity=SEVERE_ERROR msgID=14942387 msg=Replication server 1797 was attempting to connect to replication server ds1.example.com/198.51.100.0:8989 but has disconnected in handshake phase 

The DS cannot connect to the RS. See Directory server 1 was attempting to connect to replication server 2 but has disconnected in handshake phase error in DS 5 and OpenDJ 3.0, 3.5, 3.5.1 for further information on resolving this.

The underlying issue is: OPENDJ-1135 (DS sometimes fail to connect to RS after server restart), which is fixed in DS 5.5.

The addition of the IP address in this message can suggest that the hostname cannot be resolved to the IP address and, in many cases, is trying to connect to itself. DS can be a DS+RS so depending upon your topology, the DS is connecting to the RS if it's on the same instance/system. If you believe the issue is related to the IP address, you can resolve this by ensuring each server can connect to each other's replication port via the hostname used and also that the hostnames resolve to IP addresses physically present on the servers. You should also ensure that you consistently use either FQDNs or IP addresses for hostnames.

Identifying replication issues in the embedded DS

The following error in the AM Configuration log indicates replication is failing in the embedded DS:

ERROR: EmbeddedOpenDS:syncReplication:cmd failed

Recovering replication

You can quickly recover replication by restoring a backup or you can initialize from a known good node. However, you should ensure you are not restoring a backup that has a corrupted database as this will restore the corrupted database as well. See the following articles for further information on recovering replication in different situations:

Stopping replication

Warning

Do not allow modifications on the DS server when replication is disabled, as no record of such changes is kept, and the changes cause replication to diverge.

The following dsreplication commands are used to stop replication; you should ensure you use the correct command for your use case as follows:

  • dsreplication suspend: this command is used to temporarily stop replication, for example, to do maintenance etc.
  • dsreplication unconfigure: this command is used to permanently stop replication and completely removes the replication configuration information from the server.
  • dsreplication unconfigure -- unconfigureAll: this command is used to fully remove the local server's replication configuration from itself and all other servers in the topology.

Using the wrong command can cause issues with replication.

See Reference › dsreplication and Administration Guide › Stopping Replication for further information and examples.

See Also

How do I use cn=monitor entry in DS 5.x and 6.x for monitoring?

How do I use the Support Extract tool in DS (All versions) to capture troubleshooting data?

FAQ: Monitoring DS

FAQ: Backup and restore in DS 5.x and 6.x

Replication in DS

Getting Started › First Steps With Replication

Administration Guide › About Replication 

Administration Guide › Configuring Replication

Administration Guide › Managing Data Replication

Reference › dsreplication 

Related Training

ForgeRock Directory Services Core Concepts (DS-400) 

Related Issue Tracker IDs

OPENDJ-3343 (Invalid Conflict resolution on Add sequence when Parent & Child are added on different replica)


Copyright and Trademarks Copyright © 2021 ForgeRock, all rights reserved.