Import Bulk Data

The bulk import facility lets you import large numbers of external entries over REST. You import entries from a comma-separated values (CSV) file, to a specified managed object type in the IDM repository. Bulk import works as follows:

  • Loads bulk CSV entries and stores them temporarily (in the IDM repository) as JSON objects

  • Creates a temporary mapping between those entries and the managed object store in the repository

  • Performs a reconciliation between the JSON objects and the objects in the repository

Note

The bulk import mechanism assumes that the CSV file is the authoritative data source. If you run an import more than once, the import overwrites all of the properties of the managed object (including timestamps) with the values in the CSV file.

To import bulk CSV entries into the repository, using the REST API, follow these steps:

The first time you upload entries, you must generate a CSV template. The template is essentially an empty CSV file with one header row that matches the managed object type to which you are importing. In most cases, you will be importing data that fits the managed/user object model, but you can import any managed object type, such as roles and assignments.

To generate the CSV template, send a GET request to the openidm/csv/template endpoint. The following request generates a CSV template for the managed user object type:

curl \
--header "X-OpenIDM-Username: openidm-admin" \
--header "X-OpenIDM-Password: openidm-admin" \
--header "Accept-API-Version: resource=1.0" \
--request GET \
"http://localhost:8080/openidm/csv/template?resourceCollection=managed/user&_fields=header&_mimeType='text/plain'"
{
  "_id": "template",
  "header": "\"userName\",\"givenName\",\"sn\",\"mail\",\"description\",\"accountStatus\",\"telephoneNumber\",
 \"postalAddress\",\"city\",\"postalCode\",\"country\",\"stateProvince\",\"preferences/updates\",
 \"preferences/marketing\""
}

The template is generated based on the specified resourceCollection, and includes a single header row. The names of each header column are derived from the schema of the managed object type. The template includes only a subset of managed user properties that can be represented by CSV fields.

Only the following managed object properties are included in the header row:

  • Properties of type string, boolean, and number

  • Properties that do not start with an underscore (such as _id or _rev)

    If you are importing entries to managed/user, the bulk import facility assumes that self-service password reset is enabled. This is because the import does not support upload of hashed passwords.

  • Properties whose scope is not private

Set the parameters _fields=header and _mimeType=text/csv to download the template as a CSV file.

When you have generated the template, export your external data to CSV format, using the headers in the generated template.

The default maximum file size for bulk import is 50MBytes. If you need to import a number of records that exceeds this size, divide the data into chunks and import each file separately. You can also increase the maximum file size by changing the value of the maxRequestSizeInMegabytes property in your conf/servletfilter-upload.json file.

When you have a CSV file, with the structure of the template generated in the previous example, upload the file to the IDM repository with the following request:

curl \
--header "X-OpenIDM-Username: openidm-admin" \
--header "X-OpenIDM-Password: openidm-admin" \
--header "Accept-API-Version: resource=1.0" \
--form upload=@/path/to/example-users.csv \
--request POST \
"http://localhost:8080/upload/csv/managed/user?uniqueProperty=userName"
{
  "importUUIDs": [
    "3ebd514f-bdd7-491f-928f-21b72f44e381"
  ]
}
--form (-F)

This option causes curl to POST data using the Content-Type multipart/form-data, which lets you upload binary files. To indicate that the form content is a file, prefix the file name with an @ sign.

To import more than one file at once, specify multiple --form options, for example:

--form upload=@/path/to/example-users-a-j.csv \
--form upload=@/path/to/example-users-k-z.csv \
uniqueProperty (required)

This parameter lets you correlate existing entries, based on a unique value field. This is useful if you need to upload the same file a number of times (for example, if data in the file changes, or if some entries in the file contained errors). You can specify any unique value property here. You can also correlate on more than one property by specifying multiple, comma-delimited unique properties.

A successful upload generates an array of importUUIDs. You need these UUIDs to perform other operations on the import records.

Important

Note that the endpoint (upload/csv) is not an IDM endpoint.

A query on the csv/metadata endpoint returns the import ID, the data structure (header fields in the CSV file), a recon ID, and a number of fields indicating the status of the import:

curl \
--header "X-OpenIDM-Username: openidm-admin" \
--header "X-OpenIDM-Password: openidm-admin" \
--header "Accept-API-Version: resource=1.0" \
--request GET \
"http://localhost:8080/openidm/csv/metadata/?_queryFilter=true"
{
  "result": [
    {
      "_id": "3ebd514f-bdd7-491f-928f-21b72f44e381",
      "_rev": "000000003e8ef4f7",
      "header": [
        "userName",
        "givenName",
        "sn",
        "mail",
        "description",
        "accountStatus",
        "country"
      ],
      "reconId": "2e2cf41a-c4b8-4dda-9d92-6e0af65a15fe-6528",
      "filename": "example-users.csv",
      "resourcePath": "managed/user",
      "total": 1000,
      "success": 1000,
      "failure": 0,
      "created": 1000,
      "updated": 0,
      "unchanged": 0,
      "begin": "2020-04-17T16:31:02.955Z",
      "end": "2020-04-17T16:31:09.861Z",
      "cancelled": false,
      "importDeleted": false,
      "tempRecords": 0,
      "purgedTempRecords": true,
      "purgedErrorRecords": false,
      "authId": "openidm-admin",
      "authzComponent": "internal/user"
    },
    {
      "_rev": "00000000d4392fc8"
    }
  ],
  ...
}

Use a query filter to restrict your query to imports to a specific managed object type. The following example queries uploads to the managed user object:

curl \
--header "X-OpenIDM-Username: openidm-admin" \
--header "X-OpenIDM-Password: openidm-admin" \
--header "Accept-API-Version: resource=1.0" \
--request GET \
'http://localhost:8080/openidm/csv/metadata/?_queryFilter=/resourcePath+eq+"managed/user"'
{
  "result": [
    {
      "_id": "82d9a643-8b03-4cec-86fc-3e09c4c2f01c",
      "_rev": "000000009b3ff60b",
      "header": [
        "userName",
        "givenName",
        "sn",
        "mail",
        "description",
        "accountStatus",
        "country"
      ],
      "reconId": "417dae3b-c939-4191-acbf-6eb1b9e802af-53335",
      "filename": "example-users.csv",
      "resourcePath": "managed/user",
      "total": 1001,
      "success": 1000,
      "failure": 1,
      "created": 0,
      "updated": 0,
      "unchanged": 1000,
      "begin": "2020-04-20T13:12:03.028Z",
      "end": "2020-04-20T13:12:05.222Z",
      "cancelled": false,
      "importDeleted": false,
      "tempRecords": 0,
      "purgedTempRecords": true,
      "purgedErrorRecords": false,
      "authId": "openidm-admin",
      "authzComponent": "internal/user"
    }
  ],
  ...
}  

The previous example showed the statistics that are returned when you query bulk imports. One of these fields is "failure": 0,. If the import was unsuccessful for any records, this failure field will have a positive value. You can then download the failed records, examine the failures and correct them in the CSV file, then run the import again.

To download failed records, send a GET request to the endpoint export/csvImportFailures/importUUID:

curl \
--header "X-OpenIDM-Username: openidm-admin" \
--header "X-OpenIDM-Password: openidm-admin" \
--request GET \
--header "Accept-API-Version: resource=1.0" \
"http://localhost:8080/export/csvImportFailures/82d9a643-8b03-4cec-86fc-3e09c4c2f01c"

userName,  givenName,  sn,      mail,     ...,  _importError
emacheke,  Edward,     Macheke, emacheke, ...,  "{code=403, reason=Forbidden, message=Policy validation
   failed, detail={result=false, failedPolicyRequirements=[{policyRequirements=[
   {policyRequirement=VALID_EMAIL_ADDRESS_FORMAT}], property=mail}]}}"

The output indicates the failed record or records, and the reason for the failure, in the _importError field. In this example, the import failed because of a policy validation error—the email address is not the correct format.

Warning

IDM does not scan for possible CSV injection attacks on uploaded files. Do not edit the downloaded CSV file with Microsoft Excel, as this can expose your data to CSV injection.

Cancel an import that is in progress by sending a POST request to the openidm/csv/metadata/importUUID endpoint, with the cancel action. You might want to cancel an import if the import is taking too long, or if you have noticed problems with the import data, for example:

curl \
--header "X-OpenIDM-Username: openidm-admin" \
--header "X-OpenIDM-Password: openidm-admin" \
--header "Accept-API-Version: resource=1.0" \
--request POST \
"http://localhost:8080/openidm/csv/metadata/92971c92-67bb-4ae7-b41b-96d249b0b2aa/?_action=cancel"
{
  "status": "OK"
}

By default, the timeout for the bulk import servlets is 30 seconds (or 30000 milliseconds). This parameter is set in your resolver/boot.properties file, as follows:

openidm.servlet.timeoutMillis=30000

If you are importing a very large number of records, you might need to increase the HTTP request timeout to prevent requests timing out.

In test environments, you can set this parameter to 0 to disable the request timeout. You should not disable the timeout in a production environment because no timeout can lead to DDoS attacks where thousands of slow HTTP connections are made.

For a list of all REST endpoints related to bulk import, see "Bulk Import".

Read a different version of :