Appendix A: The analytics_init_config.yml File
The analytics_init_config.yml
is an important configuration file in Autonomous Identity. For each deployment, you customize the parameters to the environment. Deployers should configure this file before ingesting the input data into Cassandra.
The process to use the analytics_init_config.yml
is as follows:
On the target node, use the analytics create-template command to generate the
analytics_init_config.yml
file.Make changes to the
analytics_init_config.yml
tailored to your deployment and production environment.Run the analytics apply-template command to apply your changes. The output file is
analytics_config.yml
file that is used for the other analytics jobs.
Note
Do not make changes to the analytics_config.yml
. If you want to make changes to the configuration, update the analytics_init_config.yml
file and then re-run the analytics create-template command.
The file is used to do the following:
Sets the input and output paths.
Configures the association rule options.
Sets the user attributes to be used for training.
Sets up connection to the Cassandra database.
Configures column names and mappings for the UI dataload.
The following analytics_init_config.yml
file version is v0.32.0. The file is not expected to change much for each release, but some revision will occur intermittently.
################################
# Common configuration options #
################################
common:
base_path: /data/ # Base directory for analytics I/O. Configurable.
######################################
# Data-related configuration options #
# (Input & Output of files/rules) #
######################################
data:
# input data
input:
input_path: input # Input file directory under base_path. Configurable.
features_file: features.csv # Contains user attribute data
labels_file: labels.csv # Contains user-to-entitlement mappings.
application_file: AppToEnt.csv # Contains entitlements-to-applications mappings.
role_owner_file: RoleOwner.csv # Contains entitlement IDs to employees who "own the entitlements"
account_name_file: HRName.csv # Contains user ID mappings to names.
entitlement_name_file: EntName.csv # Contains entitlement IDs to their names.
job_dept_desc_file: JobAndDeptDesc.csv # Contains user ID mappings to the departments where they work plus
# job descriptions
#########################################
# Extract Transform Load to Database #
# (Database Technologies i.e. Cassandra #
#########################################
etl:
med_conf: 0.35 # Confidence threshold for medium confidence assignments
high_conf: 0.75 # Confidence threshold for high confidence assignments
edf_thresh: 0.75 # Confidence threshold for driving factor rules
org_column_value: test # Use client organization identifier
app_source_column: test # Use client organization identifier
filtering_columns: CITY:CITY # Specifies any filtering columns
############################################
# Association Rules configuration options #
# (Training & As-Is/Recommend Predictions) #
############################################
assoc_rules:
# base config
features_filter: USR_KEY,CITY,USR_DEPARTMENT_NAME, # update with columns you want to be
COST_CENTER,JOBCODE_NAME, # used in training (must contain USR_KEY
LINE_OF_BUSINESS, # or equivalent)
LINE_OF_BUSINESS_SUBGROUP,
CHIEF_YES_NO,USR_EMP_TYPE,
USR_DISPLAY_NAME,MANAGER_NAME,
USR_MANAGER_KEY,IS_ACTIVE
features_table_columns: USR_KEY,CITY,USR_DEPARTMENT_NAME, # update with list of all columns in
COST_CENTER,JOBCODE_NAME, # features
LINE_OF_BUSINESS,
LINE_OF_BUSINESS_SUBGROUP,
CHIEF_YES_NO,USR_EMP_TYPE,
USR_DISPLAY_NAME,MANAGER_NAME,
USR_MANAGER_KEY,IS_ACTIVE
############################################
# User Column Description for the Feature CSV Headers
# ( user_name : User Name , user_manager :Manager Name)
############################################
ui_config: # Configurable column descriptions in the UI.
user_column_descriptions: USR_KEY:User Key,CITY:City Location Building,USR_DEPARTMENT_NAME:User Department Name,COST_CENTER:User Cost Center,JOBCODE_NAME:Job Code Name,LINE_OF_BUSINESS:LOB,CHIEF_YES_NO:Manager Flag,USR_EMP_TYPE:Employee Type,USER_DISPLAY_NAME:User name,MANAGER_NAME:Manager Name,USR_MANAGER_KEY:Manager Key,IS_ACTIVE:Active
#########################################
# Spark-related configuration options #
#########################################
spark:
# spark base
logging_level: WARN # Logging level
# spark config
config:
#spark-submit params
spark.executor.memory: 6G # Recommended value >= 6
spark.driver.memory: 4G # Memory allocation to job on master.
spark.driver.maxResultSize: 2G # Maximum size of results storage capacity.
spark.executor.cores: 3 # Number of executor cores
spark.total.cores: 3 # Modify this value based on the cluster size. Number of
# executors will be calculated automatically.
# other spark configuration properties
spark.scheduler.mode: FAIR # Set the scheduler for resources.
spark.sql.shuffle.partitions: 6
spark.task.maxFailures: 200
spark.driver.blockManager.port: 39999
spark.blockManager.port: 40016