Appendix A: The analytics_init_config.yml File
The analytics_init_config.yml
is an important configuration file in Autonomous Identity. For each deployment, you customize the parameters to the environment. Deployers should configure this file before ingesting the input data into Cassandra.
The process to use the analytics_init_config.yml
is as follows:
-
On the target node, use the
analytics create-template
command to generate theanalytics_init_config.yml
file. -
Make changes to the
analytics_init_config.yml
tailored to your deployment and production environment. -
Run the
analytics apply-template
command to apply your changes. The output file isanalytics_config.yml
file that is used for the other analytics jobs.
Do not make changes to the |
The file is used to do the following:
-
Sets the input and output paths.
-
Configures the association rule options.
-
Sets the user attributes to be used for training.
-
Sets up connection to the Cassandra database.
-
Configures column names and mappings for the UI dataload.
The following analytics_init_config.yml
file version is v0.32.0. The file is not expected to change much for each release, but some revision will occur intermittently.
# Common configuration options # common: base_path: /data/ # Base directory for analytics I/O. Configurable. # Data-related configuration options # # (Input & Output of files/rules) # data: # input data input: input_path: input # Input file directory under base_path. Configurable. features_file: features.csv # Contains user attribute data labels_file: labels.csv # Contains user-to-entitlement mappings. application_file: AppToEnt.csv # Contains entitlements-to-applications mappings. role_owner_file: RoleOwner.csv # Contains entitlement IDs to employees who "own the # entitlements" account_name_file: HRName.csv # Contains user ID mappings to names. entitlement_name_file: EntName.csv # Contains entitlement IDs to their names. job_dept_desc_file: JobAndDeptDesc.csv # Contains user ID mappings to the departments where # they work plus job descriptions # # Extract Transform Load to Database # # (Database Technologies i.e. Cassandra # # etl: med_conf: 0.35 # Confidence threshold for medium confidence assignments high_conf: 0.75 # Confidence threshold for high confidence assignments edf_thresh: 0.75 # Confidence threshold for driving factor rules org_column_value: test # Use client organization identifier app_source_column: test # Use client organization identifier filtering_columns: CITY # Specifies any filtering columns # # Association Rules configuration options # # (Training & As-Is/Recommend Predictions) # assoc_rules: # base config features_filter: USR_KEY,CITY,USR_DEPARTMENT_NAME, # update with columns you want to be COST_CENTER,JOBCODE_NAME, # used in training (must contain LINE_OF_BUSINESS, # USR_KEY or equivalent) LINE_OF_BUSINESS_SUBGROUP, CHIEF_YES_NO,USR_EMP_TYPE, USR_DISPLAY_NAME,MANAGER_NAME, USR_MANAGER_KEY,IS_ACTIVE features_table_columns: USR_KEY,CITY,USR_DEPARTMENT_NAME, # update with list of all columns in COST_CENTER,JOBCODE_NAME, # features LINE_OF_BUSINESS, LINE_OF_BUSINESS_SUBGROUP, CHIEF_YES_NO,USR_EMP_TYPE, USR_DISPLAY_NAME,MANAGER_NAME, USR_MANAGER_KEY,IS_ACTIVE # User Column Description for the Feature CSV Headers # ( user_name : User Name , user_manager :Manager Name) ui_config: # Configurable column descriptions in the UI. user_column_descriptions: USR_KEY:User Key,CITY:City Location Building,USR_DEPARTMENT_NAME:User Department Name, COST_CENTER:User Cost Center,JOBCODE_NAME:Job Code Name,LINE_OF_BUSINESS:LOB,CHIEF_YES_NO:Manager Flag, USR_EMP_TYPE:Employee Type,USER_DISPLAY_NAME:User name,MANAGER_NAME:Manager Name,USR_MANAGER_KEY:Manager Key, IS_ACTIVE:Active # # Spark-related configuration options # ## spark: logging_level: WARN # Logging level config: spark.executor.memory: 6G # Recommended value >= 6 spark.driver.memory: 4G # Memory allocation to job on master. spark.driver.maxResultSize: 2G # Maximum size of results storage capacity. spark.executor.cores: 3 # Number of executor cores spark.total.cores: 3 # Modify this value based on the cluster size. Number of # executors will be calculated automatically. spark.scheduler.mode: FAIR # Set the scheduler for resources. spark.sql.shuffle.partitions: 6 spark.task.maxFailures: 200 spark.driver.blockManager.port: 39999 spark.blockManager.port: 40016 ingestion: drop_if_create: true catalog_step: false staging: false connector: type: iiq batchsize: 100 timeout: 30 change_reconciliation: enabled: false time: '2013-05-17T00:00:00Z' features_mappings: chief_yes_no: User:CHIEF_YES_NO city: User:CITY jobcode_name: User:JOBCODE_NAME line_of_business: User:LINE_OF_BUSINESS line_of_business_subgroup: User:LINE_OF_BUSINESS_SUBGROUP usr_manager_name: User:MANAGER_NAME usr_dep_name: User:USR_DEPARTMENT_NAME usr_display_name: User:USR_DISPLAY_NAME usr_emp_type: User:USR_EMP_TYPE costcenter: User:COST_CENTER