AWS Data Stores

Robust Intelligence supports loading data directly from AWS. Today, S3 is the only natively supported Cloud Store. Configure AWS Data Stores using the UI.

Configuring AWS Data Stores Integration

In order to configure the AWS Data Store Integration, there are 2 authentication methods:

Access Key Based Authentication

In order to configure AWS Data Stores via access key based authentication, you need to enter the following information:

  1. Access Key ID

  2. Secret Access Key

You can find more information by the following this AWS Guide

IAM Roles for Service Accounts Based Authentication

Integrations support authenticating using AWS Role ARN in the same way when deploying the Agent.

To enable the integration using Role ARN, you only need to associate the service accounts rime-agent-model-tester with the agents where you intend to use the integration. You can find more information by following this AWS Guide.

Note: You do not need to update the service account rime-agent-model-tester with the new Role ARN as this will be updated during runtime when the integration is used.

Using the AWS Integration

After configuring the integration, use it to register datasets and models. When registering datasets, the path field can point to files of type csv, Parquet, jsonl, jsonl.gz, and json.gz. In this case, data_type does not need to be specified and defaults to "DATA_TYPE_UNSPECIFIED".

integration_id = "abe23******" #Choose the AWS Integraiton ID

project.register_dataset(
    name= f"ref_data_{dt}",
    data_config= {
        "connection_info": {
            "data_file": {
                "path": "s3://rime-blob-integration/data/fraud_ref.csv" #Enter the s3 file path
            }
        },
        "data_params": {"label_col": "label"}
    },
    integration_id= integration_id
)

model_id = project.register_model(
    name= f"model_{dt}",
    model_config= {
        "model_path": {
            "path": "s3://rime-blob-integration/models/fraud_model.py" #Enter the s3 file path
        }
    },
    integration_id= integration_id
)

The AWS integration can also be used to register a Delta lake table on S3. The path mush point to the S3 folder where the Delta lake table exists and the "data_type": "DATA_TYPE_DELTA_TABLE" must be specified.

integration_id = "abe23******" #Choose the AWS Integraiton ID

project.register_dataset(
    name= f"ref_data_{dt}",
    data_config= {
        "connection_info": {
            "data_file": {
                "path": "s3://rime-blob-integration/data/delta_table/fraud_ref", #Enter the delta table path on s3
                "data_type": "DATA_TYPE_DELTA_TABLE"
            }
        },
        "data_params": {"label_col": "label"}
    },
    integration_id= integration_id
)

More information on defining the configurations is available in the Data Configuration section.