Distributing Reconciliation Operations Across a Cluster

In a clustered deployment, you can configure reconciliation jobs to be distributed across multiple nodes in the cluster. Clustered reconciliation is configured per mapping and can improve reconciliation performance, particularly for very large data sets.

Clustered reconciliation uses the paged reconciliation mechanism and the scheduler service to divide the source data set into pages, and then to schedule reconciliation "sub-jobs" per page, distributing these sub-jobs across the nodes in the cluster.

Regular (non-clustered) reconciliation has two phases—a source phase and a target phase. Clustered reconciliation effectively has three phases:

Source page phase

During this phase, reconciliation sub-jobs are scheduled in succession, page by page. Each source page job does the following:

Executes a source query using the paging cookie from the invocation context.
Schedules the next source page job.
Performs the reconciliation of the source IDs returned by the query.
Writes statistics summary information which is aggregated so that you can obtain the status of the complete reconciliation run by performing a GET on the recon endpoint.
On completion, writes the repo_id, source_id, and target_id to the repository.

Source phase completion check

This phase is scheduled when the source query returns null. This check runs, and continues to reschedule itself, as long as source page jobs are running. When the completion check determines that all the source page jobs are complete, it schedules the target phase.

Target phase

This phase queries the target IDs, then removes all of the IDs that correspond to the repo_id, source_id, and target_id written by the source pages. The remaining target IDs are used to run the target phase, taking into account all records on the target system that were not correlated to a source ID during the source phase sub-jobs.