Exporting and Importing Data

Export Your Data

If you are migrating data, for example, from a development server to a QA server, then follow this section to export your data from your current deployment. Autonomous Identity provides a python script to export your data to .csv files and stores them to a folder in your home directory.

  1. On the target machine, change to the dbutils directory.

    $ cd /opt/autoid/dbutils
  2. Export the database.

    $ python dbutils.py export ~/backup

Import the Data into the Autonomous Identity Keyspace

If you are moving your data from another server, import your data to the target environment using the following steps.

  1. First, create a zoran_user.cql file. This file is used to drop and re-create the Autonomous Identity user and user_history tables. The file should go to the same directory as the other .csv files. Make sure to create this file from the source node, for example, the development server, from where we exported the data.

    Start cqlsh in the source environment, and use the output of these commands to create the zoran_user.cql file:

    $ describe zoran.user;
    $ describe zoran.user_history;

    Make sure the DROP TABLE cql commands precedes the CREATE TABLE commands as shown in the zoran_user.cql example file below:

    USE zoran ;
    
    DROP TABLE IF EXISTS  zoran.user_history ;
    
    DROP TABLE IF EXISTS zoran.user ;
    
    CREATE TABLE zoran.user (
        user text PRIMARY KEY,
        chiefyesno text,
        city text,
        costcenter text,
        isactive text,
        jobcodename text,
        lineofbusiness text,
        lineofbusinesssubgroup text,
        managername text,
        usrdepartmentname text,
        userdisplayname text,
        usremptype text,
        usrmanagerkey text
    ) WITH bloom_filter_fp_chance = 0.01
        AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
        AND comment = ''
        AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
        AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
        AND crc_check_chance = 1.0
        AND dclocal_read_repair_chance = 0.1
        AND default_time_to_live = 0
        AND gc_grace_seconds = 864000
        AND max_index_interval = 2048
        AND memtable_flush_period_in_ms = 0
        AND min_index_interval = 128
        AND read_repair_chance = 0.0
        AND speculative_retry = '99PERCENTILE';
    
    CREATE TABLE zoran.user_history (
        user text,
        batch_id int,
        chiefyesno text,
        city text,
        costcenter text,
        isactive text,
        jobcodename text,
        lineofbusiness text,
        lineofbusinesssubgroup text,
        managername text,
        usrdepartmentname text,
        userdisplayname text,
        usremptype text,
        usrmanagerkey text,
        PRIMARY KEY (user, batch_id)
    ) WITH CLUSTERING ORDER BY (batch_id ASC)
        AND bloom_filter_fp_chance = 0.01
        AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
        AND comment = ''
        AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
        AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
        AND crc_check_chance = 1.0
        AND dclocal_read_repair_chance = 0.1
        AND default_time_to_live = 0
        AND gc_grace_seconds = 864000
        AND max_index_interval = 2048
        AND memtable_flush_period_in_ms = 0
        AND min_index_interval = 128
        AND read_repair_chance = 0.0
        AND speculative_retry = '99PERCENTILE';
  2. Copy the ui-config.json from the source environment where you ran an analytics pipeline, usually under /data/config , to the same folder where you have your .csv files.

  3. On the target machine, change to the dbutils directory.

    $ cd /opt/autoid/dbutils
  4. Use the dbutils.py import command to populate the Autonomous Identity keyspace with the .csv files, generated from the export command from the source environment using the previous steps. Note that before importing the data, the script truncates the existing tables to remove duplicates. Again, make sure the zoran_user.cql and the ui-config.json are in the /import-dir .

    $ python dbutils.py import /import-dir

    For example:

    $ python dbutils.py import ~/import/AutoID-data
  5. Verify that the data is imported in the directory on your server.