Improving Reconciliation Query Performance
Reconciliation operations are processed in two phases; a source phase and a target phase. In most reconciliation configurations, source and target queries make a read call to every record on the source and target systems to determine candidates for reconciliation. On slow source or target systems, these frequent calls can incur a substantial performance cost.
To improve query performance in these situations, you can preload the entire result set into memory on the source or target system, or on both systems. Subsequent read queries on known IDs are made against the data in memory, rather than the data on the remote system. For this optimization to be effective, the entire result set must fit into the available memory on the system for which it is enabled.
The optimization works by defining a sourceQuery
or targetQuery
in the synchronization mapping that returns not just the ID, but the complete object.
The following example query loads the full result set into memory during the source phase of the reconciliation. The example uses a common filter expression, called with the _queryFilter
keyword. The query returns the complete object:
"mappings" : [ { "name" : "systemLdapAccounts_managedUser", "source" : "system/ldap/account", "target" : "managed/user", "sourceQuery" : { "_queryFilter" : "true" }, ...
IDM attempts to detect what data has been returned. The autodetection mechanism assumes that a result set that includes three or more fields per object (apart from the _id
and rev
fields) contains the complete object.
You can explicitly state whether a query is configured to return complete objects by setting the value of sourceQueryFullEntry
or targetQueryFullEntry
in the mapping. The setting of these properties overrides the autodetection mechanism.
Setting these properties to false
indicates that the returned object is not the complete object. This might be required if a query returns more than three fields of an object, but not the complete object. Without this setting, the autodetect logic would assume that the complete object was being returned. IDM uses only the IDs from this query result. If the complete object is required, the object is queried on demand.
Setting these properties to true
indicates that the complete object is returned. This setting is typically required only for very small objects, for which the number of returned fields does not reach the threshold required for the auto-detection mechanism to assume that it is a full object. In this case, the query result includes all the details required to pre-load the full object.
The following excerpt indicates that the full objects are returned and that IDM should not autodetect the result set:
"mappings" : [ { "name" : "systemLdapAccounts_managedUser", "source" : "system/ldap/account", "target" : "managed/user", "sourceQueryFullEntry" : true, "sourceQuery" : { "_queryFilter" : "true" }, ...
By default, all the attributes that are defined in the connector configuration file are loaded into memory. If your mapping uses only a small subset of the attributes in the connector configuration file, you can restrict your query to return only those attributes required for synchronization by using the _fields
parameter with the query filter.
The following excerpt loads only a subset of attributes into memory, for all users in an LDAP directory.
"mappings" : [ { "name" : "systemLdapAccounts_managedUser", "source" : "system/ldap/account", "target" : "managed/user", "sourceQuery" : { "_queryFilter" : "true", "_fields" : "cn, sn, dn, uid, employeeType, mail" }, ...
Note
The default source query for clustered reconciliations and for paged reconciliations is a queryFilter
that returns the full source objects, not just their IDs. So, source queries for clustered and paged reconciliations are optimized for performance by default.