Improve Reliability With Queued Synchronization

By default, IDM implicitly synchronizes managed object changes out to all resources for which the managed object is configured as a source. If there are several targets that must be synchronized, these targets are synchronized one at a time, one after the other. If any of the targets is remote or has a high latency, the implicit synchronization operations can take some time, delaying the successful return of the managed object change.

To decouple the managed object changes from the corresponding synchronizations, you can configure queued synchronization, which persists implicit synchronization events to the IDM repository. Queued events are then read from the repository and executed according to the queued synchronization configuration.

Because synchronization operations are performed in parallel, queued synchronization can improve performance if you have several fast, reliable targets. However, queued synchronization is also useful when your targets are slow or unreliable, because the managed object changes can complete before all targets have been synchronized.

The following illustration shows how synchronization operations are added to a local, in-memory queue. Note that this queue is distinct from the repository queue for synchronization events:

Queued Synchronization

Configure Queued Synchronization

Queued synchronization is disabled by default. To enable it, add a queuedSync object to your mapping, as follows:

{
     "mappings" : [
         {
             "name" : "managedUser_systemLdapAccounts",
             "source" : "managed/user",
             "target" : "system/ldap/account",
             "links" : "systemLdapAccounts_managedUser",
             "queuedSync" : {
                 "enabled" : true,
                 "pageSize" : 100,
                 "pollingInterval" : 1000,
                 "maxQueueSize" : 20000,
                 "maxRetries" : 5,
                 "retryDelay" : 1000,
                 "postRetryAction" : "logged-ignore"
             },
             ...
         }
     ]
 }

Note

These settings apply only to the implicit synchronization operations for that mapping. Reconciliation is unaffected by queued synchronization settings. Events associated with mappings where queued synchronization is enabled are submitted to the synchronization queue for asynchronous processing. Events associated with mappings where queued synchronization is not enabled are processed immediately, and block further event processing until they are complete.

During implicit synchronization, mappings are processed in the order in which they are defined, regardless of whether queued synchronization is enabled for those mappings. If you want all queued synchronization mappings to be processed first, you must explicitly order your mappings accordingly.

The queuedSync object has the following configuration:

enabled: Specifies whether queued synchronization is enabled for that mapping. Boolean, true, or false.
pageSize (integer): Specifies the maximum number of events to retrieve from the repository queue within a single polling interval. The default is 100 events.
pollingInterval (integer): Specifies the repository queue polling interval, in milliseconds. The default is 1000 ms.
maxQueueSize (integer): Specifies the maximum number of synchronization events that can be accepted into the in-memory queue. The default is 20000 events.
maxRetries (integer): The number of retries to perform before invoking the postRetry action. Most sample configurations set the maximum number of retries to 5. To set an infinite number of retries, either omit the maxRetries property, or set it to a negative value, such as -1.
retryDelay (integer): In the event of a failed queued synchronization operation, this parameter specifies the number of milliseconds to delay before attempting the operation again. The default is 1000 ms.
postRetryAction: The action to perform after the retries have been exhausted. Possible options are logged-ignore, dead-letter-queue, and script. These options are described in "Configure the LiveSync Retry Policy". The default action is logged-ignore.

Note

Retries occur synchronously to the failure. For example, if the maxRetries is set to 10, at least 10 seconds will pass between the failing sync event and the next sync. (There are 10 retries, and the retryDelay is 1 second by default.) These 10 seconds do not take into account the latency of the ten sync requests. Retries are configured per-mapping and block processing of all subsequent sync events until the configured retries have been exhausted.

Tune Queued Synchronization

Queued synchronization employs a single worker thread. While implicit synchronization operations are being generated, that worker thread should always be occupied. The occupation of the worker thread is a function of the pageSize, the pollingInterval, the latency of the poll request, and the latency of each synchronization operation for the mapping.

For example, assume that a poll takes 500 milliseconds to complete. Your system must provide operations to the worker thread at approximately the same rate at which the thread can consume events (based on the page size, poll frequency, and poll latency). Operation consumption is a function of the notifyaction.execution for that particular mapping. If the system does not provide operations fast enough, implicit synchronization will not occur as optimally as it could. If the system provides operations too quickly, the operations in the queue could exceed the default maximum of 20000. If the maxQueueSize is reached, additional synchronization events will result in a RejectedExecutionException.

Depending on your hardware and workload, you might need to adjust the default pageSize, pollingInterval, and maxQueueSize.

Monitor the queued synchronization metrics; specifically, the rejected-executions, and adjust the maxQueueSize accordingly. Set a large enough maxQueueSize to prevent slow mappings and heavy loads from causing newly-submitted synchronization events to be rejected.

Monitor the synchronization latency using the sync.queue.mapping-name.poll-pending-events metric.

For more information on monitoring metrics, see Metrics Reference.

Manage the Synchronization Queue

You can manage queued synchronization events over the REST interface, at the openidm/sync/queue endpoint. The following examples show the operations that are supported on this endpoint:

List all events in the synchronization queue:

curl \
--header "X-OpenIDM-Username: openidm-admin" \
--header "X-OpenIDM-Password: openidm-admin" \
--header "Accept-API-Version: resource=1.0" \
--request GET \
"http://localhost:8080/openidm/sync/queue?_queryFilter=true"
{
  "result": [
    {
      "_id": "03e6ab3b-9e5f-43ac-a7a7-a889c5556955",
      "_rev": "0000000034dba395",
      "mapping": "managedUser_systemLdapAccounts",
      "resourceId": "e6533cfe-81ad-4fe8-8104-55e17bd9a1a9",
      "syncAction": "notifyCreate",
      "state": "PENDING",
      "resourceCollection": "managed/user",
      "nodeId": null,
      "createDate": "2018-11-12T07:45:00.072Z"
    },
    {
      "_id": "ed940f4b-ce80-4a7f-9690-1ad33ad309e6",
      "_rev": "000000007878a376",
      "mapping": "managedUser_systemLdapAccounts",
      "resourceId": "28b1bd90-f647-4ba9-8722-b51319f68613",
      "syncAction": "notifyCreate",
      "state": "PENDING",
      "resourceCollection": "managed/user",
      "nodeId": null,
      "createDate": "2018-11-12T07:45:00.150Z"
    },
    {
      "_id": "f5af2eed-d83f-4b70-8001-8bc86075134f",
      "_rev": "00000000099aa321",
      "mapping": "managedUser_systemLdapAccounts",
      "resourceId": "d2691a45-0a10-4f51-aa2a-b6854b2f8086",
      "syncAction": "notifyCreate",
      "state": "PENDING",
      "resourceCollection": "managed/user",
      "nodeId": null,
      "createDate": "2018-11-12T07:45:00.276Z"
    },
    ...
  ],
  "resultCount": 8,
  "pagedResultsCookie": null,
  "totalPagedResultsPolicy": "NONE",
  "totalPagedResults": -1,
  "remainingPagedResults": -1
}

Query the queued synchronization events based on the following properties:

mapping—the mapping associated with this event. For example:

curl \
--header "X-OpenIDM-Username: openidm-admin" \
--header "X-OpenIDM-Password: openidm-admin" \
--header "Accept-API-Version: resource=1.0" \
--request GET \
"http://localhost:8080/openidm/sync/queue?_queryFilter=mapping+eq+'managedUser_systemLdapAccount'"

nodeId—the ID of the node that has acquired this event.
resourceId—the source object resource ID.
resourceCollection—the source object resource collection.
_id—the ID of this sync event.

state—the state of the synchronization event. For example:

curl \
--header "X-OpenIDM-Username: openidm-admin" \
--header "X-OpenIDM-Password: openidm-admin" \
--header "Accept-API-Version: resource=1.0" \
--request GET \
"http://localhost:8080/openidm/sync/queue?_queryFilter=state+eq+'PENDING'"

The state of a queued synchronization event is one of the following:

PENDING—the event is waiting to be processed.

ACQUIRED—the event is being processed by a node.

remainingRetries—the number of retries available for this synchronization event before it is abandoned. For more information about how synchronization events are retried, see "Configure the LiveSync Retry Policy". For example:

curl \
--header "X-OpenIDM-Username: openidm-admin" \
--header "X-OpenIDM-Password: openidm-admin" \
--header "Accept-API-Version: resource=1.0" \
--request GET \
"http://localhost:8080/openidm/sync/queue?_queryFilter=remainingRetries+lt+2"

syncAction—the synchronization action that initiated this event. Possible synchronization actions are notifyCreate, notifyUpdate, and notifyDelete. For example:

curl \
--header "X-OpenIDM-Username: openidm-admin" \
--header "X-OpenIDM-Password: openidm-admin" \
--header "Accept-API-Version: resource=1.0" \
--request GET \
"http://localhost:8080/openidm/sync/queue?_queryFilter=syncAction+eq+'notifyCreate'"

createDate—the date that the event was created.

Recover Mappings When Nodes Are Down

Synchronization events for mappings with queued synchronization enabled are processed by a single cluster node. While a node is present in the cluster, that node holds a lock on the specific mapping. The node can release or reacquire the mapping lock if a balancing event occurs (see "Balance Mapping Locks Across Nodes"). However, the mapping lock is held across all events on that mapping. In a stable running cluster, a single node will hold the lock for a mapping indefinitely.

It is possible that a node goes down, or is removed from the cluster, while holding a mapping lock on operations in the synchronization queue. To prevent these operations from being lost, the queued synchronization facility includes a recovery monitor that checks for any orphaned mappings in the cluster.

A mapping is considered orphaned in the following cases:

No active node holds a lock on the mapping.
The node that holds a lock on the mapping has an instance state of STATE_DOWN.
The node that holds a lock on the mapping does not exist in the cluster.

The recovery monitor periodically checks for orphaned mappings. When all orphaned mappings have been recovered, it attempts to initialize new queue consumers.

The recovery monitor is enabled by default and executes every 300 seconds. To change the default behavior for a mapping, add the following to the mapping configuration and change the parameters as required:

{
    "mappings" : [...],
    "queueRecovery" : {
        "enabled" : true,
        "recoveryInterval" : 300
    }
}

Important

If a queued synchronization job has already been claimed by a node, and that node is shut down, IDM notifies the entire cluster of the shutdown. This enables a different node to pick up the job in progress. The recovery monitor takes over jobs in a synchronization queue that have not been fully processed by an available cluster node, so no job should be lost. If you have configured queued synchronization for one or more mappings, do not use the enabled flag in conf/cluster.json to remove a node from the cluster. Instead, shut down the node so that the remaining nodes in the cluster can take over the queued synchronization jobs.

Balance Mapping Locks Across Nodes

Queued synchronization mapping locks are balanced equitably across cluster nodes. At a specified interval, each node attempts to release and acquire mapping locks, based on the number of running cluster nodes. When new cluster nodes come online, existing nodes release sufficient mapping locks for new nodes to pick them up, resulting in an equitable distribution of locks.

Lock balancing is enabled by default, and the interval at which nodes attempt to balance locks in the queue is 5 seconds. To change the default configuration, add a queueBalancing object to your mapping and set the following parameters:

{
    "mappings" : [...],
    "queueBalancing" : {
        "enabled" : true,
        "balanceInterval" : 5
    }
}