AWS Data Stores
Robust Intelligence supports loading data directly from AWS. Today, S3 is the only natively supported Cloud Store. Configure AWS Data Stores using the UI.
Configuring AWS Data Stores Integration
In order to configure the AWS Data Store Integration, there are 2 authentication methods:
Access Key Based Authentication
In order to configure AWS Data Stores via access key based authentication, you need to enter the following information:
Access Key ID
Secret Access Key
You can find more information by the following this AWS Guide
IAM Roles for Service Accounts Based Authentication
Integrations support authenticating using AWS Role ARN in the same way when deploying the Agent.
To enable the integration using Role ARN, you only need to associate the
service accounts rime-agent-model-tester
with the agents where you intend
to use the integration. You can find more information by following this
AWS Guide.
Note: You do not need to update the service account rime-agent-model-tester
with the new Role ARN as this will be updated during runtime when the integration is used.
Using the AWS Integration
After configuring the integration, use it to register datasets and models. When registering datasets, the path
field can point to files of type csv
, Parquet, jsonl
, jsonl.gz
, and json.gz
. In this case, data_type
does not need to be specified and defaults to "DATA_TYPE_UNSPECIFIED"
.
integration_id = "abe23******" #Choose the AWS Integraiton ID
project.register_dataset(
name= f"ref_data_{dt}",
data_config= {
"connection_info": {
"data_file": {
"path": "s3://rime-blob-integration/data/fraud_ref.csv" #Enter the s3 file path
}
},
"data_params": {"label_col": "label"}
},
integration_id= integration_id
)
model_id = project.register_model(
name= f"model_{dt}",
model_config= {
"model_path": {
"path": "s3://rime-blob-integration/models/fraud_model.py" #Enter the s3 file path
}
},
integration_id= integration_id
)
The AWS integration can also be used to register a Delta lake table on S3. The path mush point to the S3 folder where the Delta lake table exists and the "data_type": "DATA_TYPE_DELTA_TABLE"
must be specified.
integration_id = "abe23******" #Choose the AWS Integraiton ID
project.register_dataset(
name= f"ref_data_{dt}",
data_config= {
"connection_info": {
"data_file": {
"path": "s3://rime-blob-integration/data/delta_table/fraud_ref", #Enter the delta table path on s3
"data_type": "DATA_TYPE_DELTA_TABLE"
}
},
"data_params": {"label_col": "label"}
},
integration_id= integration_id
)
More information on defining the configurations is available in the Data Configuration section.