Appendix A: The analytics_init_config.yml File

The analytics_init_config.yml is an important configuration file in Autonomous Identity. For each deployment, you customize the parameters to the environment. Deployers should configure this file before ingesting the input data into Cassandra.

The process to use the analytics_init_config.yml is as follows:

On the target node, use the analytics create-template command to generate the analytics_init_config.yml file.
Make changes to the analytics_init_config.yml tailored to your deployment and production environment.
Run the analytics apply-template command to apply your changes. The output file is analytics_config.yml file that is used for the other analytics jobs.

Note

Do not make changes to the analytics_config.yml. If you want to make changes to the configuration, update the analytics_init_config.yml file and then re-run the analytics create-template command.

The file is used to do the following:

Sets the input and output paths.
Configures the association rule options.
Sets the user attributes to be used for training.
Sets up connection to the Cassandra database.
Configures column names and mappings for the UI dataload.

The following analytics_init_config.yml file version is v0.32.0. The file is not expected to change much for each release, but some revision will occur intermittently.

################################
# Common configuration options #
################################
common:
  base_path:                /data/              # Base directory for analytics I/O. Configurable.

  ######################################
  # Data-related configuration options #
  # (Input & Output of files/rules)    #
  ######################################
data:
  # input data
  input:
    input_path:             input               # Input file directory under base_path. Configurable.
    features_file:          features.csv        # Contains user attribute data
    labels_file:            labels.csv          # Contains user-to-entitlement mappings.
    application_file:       AppToEnt.csv        # Contains entitlements-to-applications mappings.
    role_owner_file:        RoleOwner.csv       # Contains entitlement IDs to employees who "own the
                                                #   entitlements"
    account_name_file:      HRName.csv          # Contains user ID mappings to names.
    entitlement_name_file:  EntName.csv         # Contains entitlement IDs to their names.
    job_dept_desc_file:     JobAndDeptDesc.csv  # Contains user ID mappings to the departments where
                                                #   they work plus job descriptions

#########################################
# Extract Transform Load to Database    #
# (Database Technologies i.e. Cassandra #
#########################################
etl:
  med_conf:                 0.35                # Confidence threshold for medium confidence assignments
  high_conf:                0.75                # Confidence threshold for high confidence assignments
  edf_thresh:               0.75                # Confidence threshold for driving factor rules
  org_column_value:         test                # Use client organization identifier
  app_source_column:        test                # Use client organization identifier
  filtering_columns:        CITY                # Specifies any filtering columns

############################################
# Association Rules configuration options  #
# (Training & As-Is/Recommend Predictions) #
############################################
assoc_rules:
  # base config
  features_filter:        USR_KEY,CITY,USR_DEPARTMENT_NAME,        # update with columns you want to be
                          COST_CENTER,JOBCODE_NAME,                #   used in training (must contain
                          LINE_OF_BUSINESS,                        #   USR_KEY or equivalent)
                          LINE_OF_BUSINESS_SUBGROUP,
                          CHIEF_YES_NO,USR_EMP_TYPE,
                          USR_DISPLAY_NAME,MANAGER_NAME,
                          USR_MANAGER_KEY,IS_ACTIVE
  features_table_columns: USR_KEY,CITY,USR_DEPARTMENT_NAME,        # update with list of all columns in
                          COST_CENTER,JOBCODE_NAME,                #   features
                          LINE_OF_BUSINESS,
                          LINE_OF_BUSINESS_SUBGROUP,
                          CHIEF_YES_NO,USR_EMP_TYPE,
                          USR_DISPLAY_NAME,MANAGER_NAME,
                          USR_MANAGER_KEY,IS_ACTIVE

############################################
# User Column Description for the Feature CSV Headers
# ( user_name : User Name , user_manager :Manager Name)
############################################
ui_config:                                                 # Configurable column descriptions in the UI.
  user_column_descriptions: USR_KEY:User Key,CITY:City Location Building,USR_DEPARTMENT_NAME:User Department Name,
  COST_CENTER:User Cost Center,JOBCODE_NAME:Job Code Name,LINE_OF_BUSINESS:LOB,CHIEF_YES_NO:Manager Flag,
  USR_EMP_TYPE:Employee Type,USER_DISPLAY_NAME:User name,MANAGER_NAME:Manager Name,USR_MANAGER_KEY:Manager Key,
  IS_ACTIVE:Active

#########################################
# Spark-related configuration options   #
#########################################
spark:
  logging_level:  WARN                 # Logging level
  config:
    spark.executor.memory:            6G        # Recommended value >= 6
    spark.driver.memory:              4G        # Memory allocation to job on master.
    spark.driver.maxResultSize:       2G        # Maximum size of results storage capacity.
    spark.executor.cores:             3         # Number of executor cores
    spark.total.cores:                3         # Modify this value based on the cluster size. Number of
                                                #  executors will be calculated automatically.
    spark.scheduler.mode:             FAIR      # Set the scheduler for resources.
    spark.sql.shuffle.partitions:     6
    spark.task.maxFailures:           200
    spark.driver.blockManager.port:   39999
    spark.blockManager.port:          40016
ingestion:
  drop_if_create: true
  catalog_step: false
  staging: false
  connector:
    type: iiq
    batchsize: 100
    timeout: 30
    change_reconciliation:
      enabled: false
      time: '2013-05-17T00:00:00Z'
    features_mappings:
      chief_yes_no: User:CHIEF_YES_NO
      city: User:CITY
      jobcode_name: User:JOBCODE_NAME
      line_of_business: User:LINE_OF_BUSINESS
      line_of_business_subgroup: User:LINE_OF_BUSINESS_SUBGROUP
      usr_manager_name: User:MANAGER_NAME
      usr_dep_name: User:USR_DEPARTMENT_NAME
      usr_display_name: User:USR_DISPLAY_NAME
      usr_emp_type: User:USR_EMP_TYPE
      costcenter: User:COST_CENTER