Tuning reconciliation performance

By default, reconciliation is configured to perform optimally. In some cases, the default optimizations might not be suitable for your deployment. The following sections describe these default optimizations, how they can be configured, and additional methods you can use to improve reconciliation performance.

Correlate empty target sets

To optimize performance, reconciliation does not correlate source objects to target objects if the set of target objects is empty when the correlation is started. This considerably speeds up the process the first time reconciliation is run. You can change this behavior for a specific mapping by adding the correlateEmptyTargetSet property to the mapping definition and setting it to true. For example:

{
    "mappings": [
        {
            "name"                     : "systemMyLDAPAccounts_managedUser",
            "source"                   : "system/MyLDAP/account",
            "target"                   : "managed/user",
            "correlateEmptyTargetSet"  : true
        },
    ]
}

json

Be aware that this setting will have a performance impact on the reconciliation process.

Prefetch links

All links are queried at the start of reconciliation and the results of that query are used. You can disable the link prefetching so that the reconciliation process looks up each link in the database as it processes each source or target object. To disable link prefetching, add the prefetchLinks property to your mapping, and set it to false:

{
    "mappings": [
        {
            "name": "systemMyLDAPAccounts_managedUser",
            "source": "system/MyLDAP/account",
            "target": "managed/user"
            "prefetchLinks" : false
        }
    ]
}

json

Be aware that this setting will have a performance impact on the reconciliation process.

Parallel reconciliation threads

By default, reconciliation is multithreaded; numerous threads are dedicated to the same reconciliation run. Multithreading generally improves reconciliation performance. The default number of threads for a single reconciliation run is 10 (plus the main reconciliation thread). Under normal circumstances, you should not need to change this number. However the default might not be appropriate in the following situations:

The hardware has many cores and supports more concurrent threads. As a rule of thumb for performance tuning, start with setting the thread number to twice the number of cores.
The source or target is an external system with high latency or slow response times. Threads may then spend considerable time waiting for a response from the external system. Increasing the available threads enables the system to prepare or continue with additional objects.

To change the number of threads, set the taskThreads property in your mapping:

"mappings" : [
    {
        "name" : "systemCsvfileAccounts_managedUser",
        "source" : "system/csvfile/account",
        "target" : "managed/user",
        "taskThreads" : 20
        ...
    }
]

json

A zero value runs reconciliation as a serialized process, on the main reconciliation thread.

Improve reconciliation query performance

Reconciliation operations are processed in two phases; a source phase and a target phase. In most reconciliation configurations, source and target queries make a read call to every record on the source and target systems to determine candidates for reconciliation. On slow source or target systems, these frequent calls can incur a substantial performance cost.

To improve query performance in these situations, you can preload the entire result set into memory on the source or target system, or on both systems. Subsequent read queries on known IDs are made against the data in memory, rather than the data on the remote system. For this optimization to be effective, the entire result set must fit into the available memory on the system for which it is enabled.

The optimization works by defining a sourceQuery or targetQuery in the synchronization mapping that returns not just the ID, but the complete object.

The following example query loads the full result set into memory during the source phase of the reconciliation. The example uses a common filter expression, called with the _queryFilter keyword. The query returns the complete object:

"mappings" : [
    {
        "name" : "systemLdapAccounts_managedUser",
        "source" : "system/ldap/account",
        "target" : "managed/user",
        "sourceQuery" : {
            "_queryFilter" : "true"
        },
  ...

json

IDM attempts to detect what data has been returned. The autodetection mechanism assumes that a result set that includes three or more fields per object (apart from the _id and rev fields) contains the complete object.

You can explicitly state whether a query is configured to return complete objects by setting the value of sourceQueryFullEntry or targetQueryFullEntry in the mapping. The setting of these properties overrides the autodetection mechanism.

Setting these properties to false indicates that the returned object is not the complete object. This might be required if a query returns more than three fields of an object, but not the complete object. Without this setting, the autodetect logic would assume that the complete object was being returned. IDM uses only the IDs from this query result. If the complete object is required, the object is queried on demand.

Setting these properties to true indicates that the complete object is returned. This setting is typically required only for very small objects, for which the number of returned fields does not reach the threshold required for the auto-detection mechanism to assume that it is a full object. In this case, the query result includes all the details required to pre-load the full object.

The following excerpt indicates that the full objects are returned and that IDM should not autodetect the result set:

"mappings" : [
    {
        "name" : "systemLdapAccounts_managedUser",
        "source" : "system/ldap/account",
        "target" : "managed/user",
        "sourceQueryFullEntry" : true,
        "sourceQuery" : {
            "_queryFilter" : "true"
        },
    ...

json

By default, all the attributes defined in the connector configuration are loaded into memory. If your mapping uses only a small subset of the attributes in the connector configuration, you can restrict your query to return only those attributes required for synchronization by using the _fields parameter with the query filter.

The following excerpt loads only a subset of attributes into memory, for all users in an LDAP directory.

"mappings" : [
    {
        "name" : "systemLdapAccounts_managedUser",
        "source" : "system/ldap/account",
        "target" : "managed/user",
        "sourceQuery" : {
            "_queryFilter" : "true",
            "_fields" : "cn,sn,dn,uid,employeeType,mail"
        },
    ...

json

The default source query for clustered reconciliations and for paged reconciliations is a queryFilter that returns the full source objects, not just their IDs. So, source queries for clustered and paged reconciliations are optimized for performance by default.

Paging reconciliation query results

Improve Reconciliation Query Performance describes how to improve reconciliation performance by loading all entries into memory to avoid making individual requests to the external system for every ID. However, this optimization depends on the entire result set fitting into the available memory on the system for which it is enabled. For particularly large data sets (for example, data sets of hundreds of millions of users), having the entire data set in memory might not be feasible.

To alleviate this constraint, you can use reconciliation paging, which breaks down extremely large data sets into chunks. It also lets you specify the number of entries that should be reconciled in each chunk or page.

Reconciliation paging is disabled by default, and can be enabled per mapping. To configure reconciliation paging, set the reconSourceQueryPaging property to true and set the reconSourceQueryPageSize in the synchronization mapping:

{
    "mappings" : [
        {
            "name" : "systemLdapAccounts_managedUser",
            "source" : "system/ldap/account",
            "target" : "managed/user",
            "reconSourceQueryPaging" : true,
            "reconSourceQueryPageSize" : 100,
            ...
        }

json

The value of reconSourceQueryPageSize must be a positive integer, and specifies the number of entries that will be processed in each page. If reconciliation paging is enabled but no page size is set, a default page size of 1000 is used.

If you are reconciling from a JDBC database using the Database Table connector, you must set the _sortkeys property in the source query and ensure that the corresponding column is indexed in the database.

The following excerpt of a mapping configures paged reconciliation queries using the Database Table connector:

{
    "mappings" : [
        {
            "name" : "systemHrdb_managedUser",
            "source" : "system/db/users",
            "target" : "managed/user",
            "reconSourceQueryPaging" : true,
            "reconSourceQueryPageSize" : 1000,
            "sourceQueryFullEntry" : true,
            "sourceQuery" : {
                "_queryFilter" : "true",
                "_sortKeys" : "email"
            },
        ...
    ]
}

json

PingIDM 7.5.0

Tuning reconciliation performance

Correlate empty target sets

Prefetch links

Parallel reconciliation threads

Improve reconciliation query performance

Paging reconciliation query results