Server Maintenance

Autonomous Identity administrators must conduct various tasks to maintain the service for their users.

The following are basic server maintenance tasks that may occur:

Stopping and Starting

The following commands are for Linux distributions.

Stopping Docker

  1. Stop docker. This will shutdown all of the containers.

    $ sudo systemctl stop docker

Restarting Docker

  1. To restart docker, first set the docker to start on boot using the enable command.

    $ sudo systemctl enable docker
  2. To start docker, run the start command.

    $ sudo systemctl start docker

Shutting Down Cassandra

  1. On the deployer node, SSH to the target node.

  2. Check Cassandra status.

    Datacenter: datacenter1
    =======================
    Status=Up/Down
    |/ State=Normal/Leaving/Joining/Moving —  Address      Load       Tokens       Owns (effective)  Host ID                               Rack
    UN  10.128.0.38  1.17 MiB   256          100.0%            d134e7f6-408e-43e5-bf8a-7adff055637a  rack1
  3. To stop Cassandra, find the process ID and run the kill command.

    $ pgrep -u autoid -f cassandra | xargs kill -9
  4. Check the status again.

    nodetool: Failed to connect to '127.0.0.1:7199' - ConnectException: 'Connection refused (Connection refused)'.

Re-Starting Cassandra

  1. On the deployer node, SSH to the target node.

  2. Restart Cassandra. When you see the No gossip backlog; proceeding message, hit Enter to continue.

    $ cassandra
    
    …​
    INFO  [main] 2020-11-10 17:22:49,306 Gossiper.java:1670 - Waiting for gossip to settle…​
    INFO  [main] 2020-11-10 17:22:57,307 Gossiper.java:1701 - No gossip backlog; proceeding
  3. Check the status of Cassandra. You should see that it is in UN status ("Up" and "Normal").

    $ nodetool status

Shutting Down MongoDB

  1. Check the status of the MongDB

    $ ps -ef | grep mongod
  2. Connect to the Mongo shell.

    $ mongo --tls --tlsCAFile /opt/autoid/mongo/certs/rootCA.pem --tlsCertificateKeyFile /opt/autoid/mongo/certs/mongodb.pem
        --tlsAllowInvalidHostnames --host <ip-address>
    
    MongoDB shell version v4.2.9
    connecting to: mongodb://<ip-address>:27017/?compressors=disabled&gssapiServiceName=mongodb
    2020-10-08T18:46:23.285+0000 W  NETWORK  [js] The server certificate does not match the host name. Hostname: <ip-address> does not match CN: mongonode
    Implicit session: session { "id" : UUID("22c0123-30e3-4dc9-9d16-5ec310e1ew7b") }
    MongoDB server version: 4.2.9
  3. Switch the admin table.

    > use admin
    
    switched to db admin
  4. Authenticate using the password set in vault.yml file.

    > db.auth("root", "Welcome123")
    
    1
  5. Start the shutdown process.

    > db.shutdownServer()
    
    2020-10-08T18:47:06.396+0000 I  NETWORK  [js] DBClientConnection failed to receive message from <ip-address>:27017 - SocketException: short read
    server should be down…​
    2020-10-08T18:47:06.399+0000 I  NETWORK  [js] trying reconnect to <ip-address>:27017 failed
    2020-10-08T18:47:06.399+0000 I  NETWORK  [js] reconnect <ip-address>:27017 failed
  6. Exit the mongo shell.

    $ quit()
    or <Ctrl-C>
  7. Check the status of the MongDB

    $ ps -ef | grep mongod
    
    no instance of mongod found

Re-Starting MongoDB

  1. Re-start the MongoDB service.

    $ /usr/bin/mongod --config /opt/autoid/mongo/mongo.conf
    
    about to fork child process, waiting until server is ready for connections.
    forked process: 31227
    child process started successfully, parent exiting
  2. Check the status of the MongDB

    $ ps -ef | grep mongod
    
    autoid    9245     1  0 18:48 ?        00:00:45 /usr/bin/mongod --config /opt/autoid/mongo/mongo.conf
    autoid   22003  6037  0 21:12 pts/1    00:00:00 grep --color=auto mongod

Shutting Down Spark

  1. On the deployer node, SSH to the target node.

  2. Check Spark status. You should see that it is up-and-running.

  3. Stop the Spark Master and workers.

    $ /opt/autoid/spark/spark-2.4.4-bin-hadoop2.7/sbin/stop-all.sh
    
    localhost: stopping org.apache.spark.deploy.worker.Worker
    stopping org.apache.spark.deploy.master.Master
  4. Check the Spark status again. You should see: Unable to retrieve htp://localhost:8080: Connection refused.

Re-Starting Spark

  1. On the deployer node, SSH to the target node.

  2. Start the Spark Master and workers. Enter the user password on the target node when prompted.

    $ /opt/autoid/spark/spark-2.4.4-bin-hadoop2.7/sbin/start-all.sh
    
    starting org.apache.spark.deploy.master.Master, logging to /opt/autoid/spark/spark-2.4.4-bin-hadoop2.7/logs/spark-a
    utoid-org.apache.spark.deploy.master.Master-1.out
    autoid-2 password:
    localhost: starting org.apache.spark.deploy.worker.Worker, logging to /opt/autoid/spark/spark-2.4.4-bin-hadoop2.7/l
    ogs/spark-autoid-org.apache.spark.deploy.worker.Worker-1.out
  3. Check the Spark status again. You should see that it is up-and-running.

Backing Up and Restoring

Autonomous Identity stores its entitlement analytics results, association rules, predictions, and confidence scores in the Apache Cassandra, MongoDB, and Open Distro for Elasticsearch databases. Cassandra is an open-source, NoSQL database system where data is distributed across multiple nodes in a master-less cluster. MongoDB is a popular schema-free database that uses JSON-like documents. Open Distro for Elasticsearch is a distributed search engine based on Apache Lucene.

For single-node deployments, however, you need to back up Cassandra or MongoDB on a regular basis. If the machine goes down for any reason, you need to restore the database as required.

To simplify the backup process, ForgeRock provides backup and restore scripts in the target directory.

Backing Up Cassandra

  1. On the ForgeRock Google Cloud Registry (gcr.io), download the cassandra-backup.sh script.

  2. Move the script to the Cassandra home directory on your deployment.

  3. Run the backup.

    $ ./cassandra-backup.sh \
        -d <Cassandra Database path> \
        -b <Backup folder path> \
        -u <Cassandra Username> \
        -p <Cassandra Password> \
        -s <SSL enable true/false> \
        -k <Keyspace (optional) default value: zoran>

Restore Cassandra

  1. On the ForgeRock Google Cloud Registry (gcr.io), download the cassandra-restore.sh script.

  2. Move the script to the Cassandra home directory on your deployment.

  3. Run the restore.

    $ ./cassandra-restore.sh \
        -d <Cassandra Database path> \
        -b <Snapshot Backup tar file> \
        -f <Schema file> \
        -u <Cassandra Username> \
        -p <Cassandra Password> \
        -c <Cassandra commitlog path> \
        -i <Cassandra install path> \
        -s <SSL enable true/false> \
        -k <Keyspace (optional) default value: zoran>

Backing Up Assignment Index Data in Elasticsearch

  1. From the deployer node, SSH to the target node.

  2. Change to the /opt/autoid/elastic directory. The directory was configured during the ./deployer.sh run.

    $ cd /opt/autoid/elastic
  3. Run the backup.

    $ ./assignment-index-backup.sh
    
    Elastic Host: 10.128.0.52
    Elastic Server Status : 200
    Elastic server is up and running …​
    assignment index exists status : 200
     registerSnapshotStatus 200
    backup snapshot name with time stamp : assignment_snapshot_2020_10_07__19_31_53
     entitlement-assignment backup status : 200
    * entitlement-assignment backup successful *
  4. Make note of the snapshot name. For example, assignment_snapshot_2020_10_07__19_31_53.

Restoring Assignment Index Data in Elasticsearch

  1. From the deployer node, SSH to the target node.

  2. Change to the /opt/autoid/elastic directory.

    $ cd /opt/autoid/elastic
  3. Run the restore using the snapshot taken from the previous procedure. When prompted if you want to close the existing index, enter Y. When prompted for the snapshot name, enter the name of the snapshot.

    $ ./assignment-index-restore.sh
    
    [Elastic Host: 10.128.0.55
     Elastic Server Status : 200
     Elastic server is up and running …​
     assignment index exists status : 200
     index with alias name -→ entitlement-assignment exists and is in open state…​
     Do you want to close the existing index -→ entitlement-assignment. (Required for restoring from snapshot ) (Y/N) ?
      y
     Restore snapshot ? true
      registerSnapshotStatus 200
     registering assignment_index_backup successful…​
     proceeding with index restore…​
     Enter the snapshot name to restore [snapshot_01]: assignment_snapshot_2020_10_0719_31_53
     snapshot to restore -→ assignment_snapshot_2020_10_0719_31_53
     entitlement-assignment index restore status -→ 200
     * entitlement-assignment restore successful *

Accessing Elasticsearch Index Data using Kibana

During the Autonomous Identity deployment, Open Distro for Elasticsearch (ODFE) is installed to facilitate the efficient searching of entitlement data within the system. A typical deployment may have millions of different entitlements and assignments that require fast search processing. ODFE provides that performance.

ODFE comes bundled with its visualization console, Kibana, that lets you monitor and manage your Elasticsearch data. Once you run the analytics create-assignment-index command that populates the Elasticsearch index, you can configure an SSL tunnel to access Kibana. This is particularly useful when you want to retrieve a list of your backup snapshots.

  1. Open a local terminal, and set up an SSL tunnel to your target node. The syntax is as follows:

    $ ssh -L < local-port >:<private-ip-remote>:<remote-port> -i <private-key> <user@public-ip-remote>

    For example:

    $ ssh -L 5601:10.128.0.71:5601 -i ~/.ssh/id_rsa autoid@34.70.190.144
    
    Last login: Fri Oct  9 20:10:59 2020
  2. Open a browser and point it to localhost:5601 Login in as elasticadmin. Enter your password that you set in the ~/autoid-config/vault.yml file on the deployer node during install.

  3. On the Elasticsearch page, click Explore on my own.

  4. On the Elasticsearch Home page, click the menu in the top left corner, and click Dev Tools.

  5. On the Dev Tools page, get a total count of indices.

    $ GET /entitlement-assignment/_count
  6. On the Dev Tools page, search the indices.

    $ GET /entitlement-assignment/_search
  7. On the Dev Tools page, get the list of snapshot backups.

    $ GET /_cat/snapshots/assignment_index_backup

Exporting and Importing Data

Export Your Data

If you are migrating data, for example, from a development server to a QA server, then follow this section to export your data from your current deployment. Autonomous Identity provides a python script to export your data to .csv files and stores them to a folder in your home directory.

  1. On the target machine, change to the dbutils directory.

    $ cd /opt/autoid/dbutils
  2. Export the database.

    $ python dbutils.py export ~/backup

Import the Data into the Autonomous Identity Keyspace

If you are moving your data from another server, import your data to the target environment using the following steps.

  1. First, create a zoran_user.cql file. This file is used to drop and re-create the Autonomous Identity user and user_history tables. The file should go to the same directory as the other .csv files. Make sure to create this file from the source node, for example, the development server, from where we exported the data.

    Start cqlsh in the source environment, and use the output of these commands to create the zoran_user.cql file:

    $ describe zoran.user;
    $ describe zoran.user_history;

    Make sure the DROP TABLE cql commands precedes the CREATE TABLE commands as shown in the zoran_user.cql example file below:

    USE zoran ;
    
    DROP TABLE IF EXISTS  zoran.user_history ;
    
    DROP TABLE IF EXISTS zoran.user ;
    
    CREATE TABLE zoran.user (
        user text PRIMARY KEY,
        chiefyesno text,
        city text,
        costcenter text,
        isactive text,
        jobcodename text,
        lineofbusiness text,
        lineofbusinesssubgroup text,
        managername text,
        usrdepartmentname text,
        userdisplayname text,
        usremptype text,
        usrmanagerkey text
    ) WITH bloom_filter_fp_chance = 0.01
        AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
        AND comment = ''
        AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
        AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
        AND crc_check_chance = 1.0
        AND dclocal_read_repair_chance = 0.1
        AND default_time_to_live = 0
        AND gc_grace_seconds = 864000
        AND max_index_interval = 2048
        AND memtable_flush_period_in_ms = 0
        AND min_index_interval = 128
        AND read_repair_chance = 0.0
        AND speculative_retry = '99PERCENTILE';
    
    CREATE TABLE zoran.user_history (
        user text,
        batch_id int,
        chiefyesno text,
        city text,
        costcenter text,
        isactive text,
        jobcodename text,
        lineofbusiness text,
        lineofbusinesssubgroup text,
        managername text,
        usrdepartmentname text,
        userdisplayname text,
        usremptype text,
        usrmanagerkey text,
        PRIMARY KEY (user, batch_id)
    ) WITH CLUSTERING ORDER BY (batch_id ASC)
        AND bloom_filter_fp_chance = 0.01
        AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
        AND comment = ''
        AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
        AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
        AND crc_check_chance = 1.0
        AND dclocal_read_repair_chance = 0.1
        AND default_time_to_live = 0
        AND gc_grace_seconds = 864000
        AND max_index_interval = 2048
        AND memtable_flush_period_in_ms = 0
        AND min_index_interval = 128
        AND read_repair_chance = 0.0
        AND speculative_retry = '99PERCENTILE';
  2. Copy the ui-config.json from the source environment where you ran an analytics pipeline, usually under /data/config, to the same folder where you have your .csv files.

  3. On the target machine, change to the dbutils directory.

    $ cd /opt/autoid/dbutils
  4. Use the dbutils.py import command to populate the Autonomous Identity keyspace with the .csv files, generated from the export command from the source environment using the previous steps. Note that before importing the data, the script truncates the existing tables to remove duplicates. Again, make sure the zoran_user.cql and the ui-config.json are in the /import-dir .

    $ python dbutils.py import /import-dir

    For example:

    $ python dbutils.py import ~/import/AutoID-data
  5. Verify that the data is imported in the directory on your server.

Accessing Log Files

Autonomous Identity provides different log files to monitor or troubleshoot your system.

Getting Docker Container Information

  1. On the target node, get system wide information about the Docker deployment. The information shows the number of containers running, paused, and stopped containers as well as other information about the deployment.

    $ docker info
  2. If you want to get debug information, use the -D option. The option specifies that all docker commands will output additional debug information.

    $ docker -D info
  3. Get information on all of your containers on your system.

    $ docker ps -a
  4. Get information on the docker images on your system.

    $ docker images
  5. Get docker service information on your system.

    $ docker service ls
  6. Get docker the logs for a service.

    $ docker service logs <service-name>

    For example, to see the nginx service:

    $ docker service logs nginx_nginx

    Other useful arguments:

    • --details. Show extra details.

    • --follow, -f. Follow log output. The command will stream new output from STDOUT and STDERR.

    • --no-trunc. Do not truncate output.

    • --tail {n|all}. Show the number of lines from the end of log files, where n is the number of lines or all for all lines.

    • --timestamps, -t. Show timestamps.

Getting Cassandra Logs

The Apache Cassandra output log is kicked off at startup. Autonomous Identity pipes the output to a log file in the directory, /opt/autoid/ .

  1. On the target node, get the log file for the Cassandra install.

    $ cat /opt/autoid/cassandra/installcassandra.log
  2. Get startup information. Cassandra writes to cassandra.out at startup.

    $ cat /opt/autoid/cassandra.out
  3. Get the general Cassandra log file.

    $ cat /opt/autoid/apache-cassandra-3.11.2/logs/system.log

    By default, the log level is set to INFO. You can change the log level by editing the /opt/autoid/apache-cassandra-3.11.2/conf/logback.xml file. After any edits, the change will take effect immediately. No restart is necessary. The log levels from most to least verbose are as follows:

    • TRACE

    • DEBUG

    • INFO

    • WARN

    • ERROR

    • FATAL

  4. Get the JVM garbage collector logs.

    $ cat /opt/autoid/apache-cassandra-3.11.2/logs/gc.log.<number>.current

    For example:

    $ cat /opt/autoid/apache-cassandra-3.11.2/logs/gc.log.0.current

    The output is configured in the /opt/autoid/apache-cassandra-3.11.2/conf/cassandra-env.sh file. Add the following JVM properties to enable them:

    • JVM_OPTS="$JVM_OPTS -XX:+PrintGCDetails"

    • JVM_OPTS="$JVM_OPTS -XX:+PrintGCDateStamps"

    • JVM_OPTS="$JVM_OPTS -XX:+PrintHeapAtGC"

    • JVM_OPTS="$JVM_OPTS -XX:+PrintGCApplicationStoppedTime"

  5. Get the debug log.

    $ cat /opt/autoid/apache-cassandra-3.11.2/logs/debug.log

Other Useful Cassandra Monitoring Tools and Files

Apache Cassandra has other useful monitoring tools that you can use to observe or diagnose and issue. To see the complete list of options, see the Apache Cassandra documentation.

  1. View statistics for a cluster, such as IP address, load, number of tokens,

    $ /opt/autoid/apache-cassandra-3.11.2/bin/nodetool status
  2. View statistics for a node, such as uptime, load, key cache hit, rate, and other information.

    $ /opt/autoid/apache-cassandra-3.11.2/bin/nodetool info
  3. View the Cassandra configuration file to determine how properties are pre-set.

    $ cat /opt/autoid/apache-cassandra-3.11.2/conf/cassandra.yaml

Apache Spark Logs

Apache Spark provides several ways to monitor the server after an analytics run.

  1. To get an overall status of the Spark server, point your browser to http://<spark-master-ip>:8080.

  2. Print the logging message sent to the output file during an analytics run.

    $ cat /opt/autoid/spark/spark-2.4.4-bin-hadoop2.7/logs/<file-name>

    For example:

    $ cat /opt/autoid/spark/spark-2.4.4-bin-hadoop2.7/logs/spark-org.apache.spark.deploy.master.Master-1-autonomous-id-test.out
  3. Print the data logs that were written during an analytics run.

    $ cat /data/log/files/<filename>

    For example:

    $ cat /data/log/files/f6c0870e-5782-441e-b145-b0e662f05f79.log