Data Preparation
Once you have deployed Autonomous Identity, you can prepare your dataset into a format that meets the schema.
The initial step is to obtain the data as agreed upon between ForgeRock and your company. The files contain a subset of user attributes from the HR database and entitlement metadata required for the analysis. Only the attributes necessary for analysis are used.
Clients can transfer the data to ForgeRock via some portable media, like USB, or through a connector from the client systems. The analysts review the data to ensure that it is in its proper formatted form.
There are a number of steps that must be carried out before your production entitlement data is input into Autonomous Identity. The summary of these steps are outlined below:
Data Collection
Typically, the raw client data is not in a form that meets the Autonomous Identity schema. For example, a unique user identifier can have multiple names, such as user_id
, account_id
, user_key
, or key
. Similarly, entitlement columns can have several names, such as access_point
, privilege_name
, or entitlement
.
To get the correct format, here are some general rules:
Submit the raw client data in various file formats: .csv, .xlsx, .txt. The data can be in a single file, or multiple files. Data includes user attributes, entitlements descriptions, and entitlement assignments.
Duplicate values should be removed.
Add optional columns for additional training attributes, for example,
MANAGERS_MANAGER
andMANAGER_FLAG
.Merge user attribute information and entitlement metadata into the entitlement assignments. This creates one large dataframe that should have an individual row for each assignment. Each row should contain the relevant user attribute profile information and entitlement metadata for the assignment.
Rename any columns that Autonomous Identity uses to the appropriate names, for example,
employeeid
toUSR_KEY
,entitlement_name
toENT
.Build seven dataframes needed for Autonomous Identity, for example, features, labels, HRName, etc. This step may also include adding some additional columns to each dataframe, for example,
labels['IS_ASSIGNED'] = 'Y'
.Write out the nine dataframes to nine csv files and store them in the
/data/input
directory.
CSV Files and Schema
ForgeRock provides a transformation script that takes in raw data and converts them to acceptable .csv formatted files.
You can access a Python script template to transform your client files to correct the .csv files. Run the following steps:
On the target machine, go to the
/data/conf/
.Open a text editor, and view the
zoran_client_transformation.py
template. You can edit this script for your company's dataset.
The script outputs files with the following contents:
Files | Description |
---|---|
features.csv | Contains one row for each employee with all of their user attributes. |
labels.csv | Contains the user-to-entitlement mappings. Also, includes usage data if provided. |
HRName.csv | Maps user ID’s to their names. This file is needed for the UI. |
EntName.csv | Maps entitlement ID’s to their names. This file is needed for the UI. |
RoleOwner.csv | Maps entitlements ID’s to the employees who "owns" these entitlements, the people responsible for approving or revoking accesses to this entitlement. |
JobAndDeptDesc.csv | Maps user ID’s to the department in which they work, and also includes a description of their job within the company. |
AppToEnt.csv | Maps entitlements to the applications they belong to. This file is needed for the UI. |
app_atrributes.csv | Maps attributes to applications. This file is needed for attribute filtering on the applications page. |
ent_attributes.csv | Maps attributes to entitlements. This file is needed for entitlement attribute filtering on the applications page. |
The schema for the input files are as follows:
Files | Schema |
---|---|
features.csv |
This file depends on the attributes that the client wants to include. These are some required columns:
|
label.csv |
|
HRName.csv |
|
EntName.csv |
|
RoleOwner.csv |
|
JobAndDeptDesc.csv |
|
AppToEnt.csv |
|
app_atrributes.csv |
|
ent_atrributes.csv |
|