Install a Standalone Analytics Container

You can install a standalone deployment of the analytics container to set up your data files. The accuracy of the data files are important for a proper analysis of your entitlements.

The deployment uses the Docker deployer machine to set up a standalong analytics container on the target node. The analytics container is automatically installed at the node where the Spark master is running.

  • Review the hardware and software specifications as outlined in the Release Notes (LINK)

  • Ensure your deployers have the appropriate system access to the ForgeRock Google Cloud Repository repository site to access the Docker image. The fully qualified image is: gcr.io/forgerock-autoid//autoid/dev-compact:deployer-rel-31-spark2.4.4-analytics-19-elektra

    • The image contains release 31 of the microservices

    • Apache Spark 2.4.4

    • Analytics container version 19

  • On both the control and target nodes, make sure you have CentOS 7 and Python installed on all nodes. Python is packaged with CentOS 7. Autonomous Identity works with the following versions:

    • CentOS 7 uses Python 2.7.5 in its base package repository.

    • CentOS 7.7 uses Python 3.7.2 in its base package repository.

  • Ensure that you have access to the ForgeRock git repository. The ForgeRock Git repository has some scripts that you can clone to your system.

Getting Started

  1. On the Target node, download the registry_key.json to access the Google Cloud Repository (gcr.io) from here.

  2. Access the Google Cloud Repository using the registry_key.json.

    $ [../../resources/install.bash:#access-gcr]
  3. Download the analytics container.

    $ docker pull gcr.io/forgerock-autoid/analytics:v14

Start the Analytics Container

To start the analytics container, replace the following values for your environment:

  • SPARK_MASTER_URL

  • SPARK_HOST

  • DB_HOST

  • DB_PORT

  • DB_CERT_PASSWORD

  • DB_SSL_ENABLED

  • DB_USER

  • DB_PASSWORD

  • CONFIGURATION_SERVICE_URL

  • CONFIGURATION_SERVICE_USER

  • CONFIGURATION_SERVICE_PASSWORD

    1. Run the analytics container with the variables for your environment.

      $ [../../resources/install.bash:#start-standalone-analytics]
    2. Create a symlink to the Python runtime on the host.

      $ sudo ln -s /opt/zoran/python-3.6/bin/python3.6 /usr/local/bin/python3.6

Configure the Template

  1. Run the create-template function on the Analytics container. The command creates the /data/conf/analytics_init_config.yml .

    $ docker exec -it analytics bash analytics create-template
  2. Edit the /data/conf/analytics_init_config.yml file with the data directory and user column descriptions.

  3. Make sure the csv data files are placed in the path as specified in the /data/conf/analytics_init_config.yml .

  4. Apply the template.

    $ docker exec -it analytics bash analytics apply-template

Run the Analytics Pipeline

Analytics Container Commands
Command Description

analytics create-template

When you run this command, the analytics container creates the analytics_init_config.yml . You want to edit the user_column_descriptions and the spark.total.cores that represents the number of cores that you want to allocate to the analytics.

analytics apply-template

Make sure to copy the .csv files to the /data/input/ folder. Then run analytics apply-template.

analytics create-ui-config

Creates the analytics_config.yml file that is used later for various analytic jobs.

analytics apply-ui-config

Applies the analytics_config.yml .

analytics validate

Runs data validation.

analytics ingest

Ingests data into Autonomous Identity.

analytics audit

Runs a data audit to ensure if meets the specifications.

analytics train

Runs an analytics training run.

analytics predict-as-is

Runs an as-is predictions.

analytics predict-recommendation

Runs recommendations.

analytics publish

Publishes the analytics.

  • Run the Analytics pipeline using one of the above commands.

    $ docker exec -it analytics bash analytics <command>