AWS Data Stores
Robust Intelligence supports loading data directly from AWS. Today, S3 is the only natively supported Cloud Store. Configure AWS Data Stores using the UI.
Configuring AWS Data Store Integration
To integrate Robust Intelligence with an AWS Data Store, you must configure one of the following authentication methods:
See the relevant steps for your approach below.
Access Key-Based Authentication
In order to configure an AWS Data Store that Robust Intelligence will connect to using access key-based authentication, follow the steps below. This procedure will provide the Access Key ID and Secret Access Key that you need when you create the Robust Intelligence-ASW integration.
You can learn more about access key based authentication in the AWS document, AWS Account and Access Keys
Procedure:
Create or find the AWS bucket that your Robust Intelligence instance will use. This will serve as the agent’s default data source. See the AWS document, Creating a bucket.
Create a policy to get and list objects for the designated S3 bucket. This will be similar to the policy shown here:
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:ListBucket" ], "Resource": "arn:aws:s3:::<YOUR_BUCKET>" }, { "Effect": "Allow", "Action": [ "s3:GetObject" ], "Resource": "arn:aws:s3:::<YOUR_BUCKET>/*" } ] }
Add the policy to the relevant user group. See the AWS document, Attaching a policy to an IAM user group
Note: If you do not want to create a new user, proceed to Step 5.
If needed, add and/or create users in this user group. See the AWS document on Adding and removing users
Get the access key as explained in the AWS document, Get your access key ID and secret access key.
Once you have the access key:
If you did not create a new user, then copy the Access Key ID and Secret Access Key. You’ll need these when you configure the integration in the next step.
If you did create a new user, then you must create the access key for the new user. See the section, “To create an access key” in the AWS document, Managing access keys.
Add your AWS integration through the Robust Intelligence UI: Workspace: Integrations: Add.
IAM Role-Based Authentication
In order to configure an AWS Data Store that Robust Intelligence will connect to using IAM role-based authentication, follow the steps below. This procedure will provide the AWS Role ARN that you will need when you create the Robust Intelligence-AWS integration. Learn more about IAM role-based access in the AWS document, Configuring a Kubernetes service account to assume an IAM role.
Note: Integrations support authenticating using AWS Role ARN in the same way when deploying the Robust Intelligence Agent.
Procedure:
Create or find the AWS bucket that your Robust Intelligence instance will use. For information, see the AWS document, Creating a bucket.
If not already configured for your cluster, create an IAM OpenID Connect (OIDC) provider that will enable your cluster to use IAM roles for service accounts. See the AWS document, Creating an IAM OIDC provider for your cluster
For the designated S3 bucket, create an access policy that allows listing and reading objects:
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:ListBucket" ], "Resource": "arn:aws:s3:::<YOUR_BUCKET>" }, { "Effect": "Allow", "Action": [ "s3:GetObject" ], "Resource": "arn:aws:s3:::<YOUR_BUCKET>/*" } ] }
Create an IAM role that the two K8s service accounts (see top of page) can assume, and associate it with the policy created in the previous step.
Save this ARN. You will need to specify this during the Robust Intelligence integration setup step.
Create a trust relationship to allow the two K8s service accounts to assume the IAM role. See the AWS document, Configuring a K8s service account to assume an IAM role
Your trust relationship should have a similar definition as the template shown below. Note the two K8s service accounts in the subject field.
Note: You do not need to update the service account
rime-agent-model-tester
andrime-agent-rime-cross-plane-server
with the new Role ARN as this will be populated at runtime when the integration is used.{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Federated": "arn:aws:iam::111122223333:oidc-provider/oidc.eks.region-code.amazonaws.com/id/EXAMPLED539D4633E53DE1B71EXAMPLE" }, "Action": "sts:AssumeRoleWithWebIdentity", "Condition": { "StringEquals": { "oidc.eks.region-code.amazonaws.com/id/EXAMPLED539D4633E53DE1B71EXAMPLE:sub": [ "system:serviceaccount:$AGENT_NAMESPACE:rime-agent-rime-cross-plane-server", "system:serviceaccount:$AGENT_NAMESPACE:rime-agent-model-tester" ] } } } ] }
See also the AWS document, Configuring role and service account.
Add your AWS integration through the Robust Intelligence UI: Workspace: Integrations: Add.
Using the AWS Integration
After configuring the integration, use it to register datasets and models. When registering datasets, the path
field can point to files of type csv
, Parquet, jsonl
, jsonl.gz
, and json.gz
. In this case, data_type
does not need to be specified and defaults to "DATA_TYPE_UNSPECIFIED"
.
integration_id = "abe23******" #Choose the AWS Integration ID
project.register_dataset(
name= f"ref_data_{dt}",
data_config= {
"connection_info": {
"data_file": {
"path": "s3://rime-blob-integration/data/fraud_ref.csv" #Enter the s3 file path
}
},
"data_params": {"label_col": "label"}
},
integration_id= integration_id
)
model_id = project.register_model(
name= f"model_{dt}",
model_config= {
"model_path": {
"path": "s3://rime-blob-integration/models/fraud_model.py" #Enter the s3 file path
}
},
integration_id= integration_id
)
The AWS integration can also be used to register a Delta lake table on
S3. The path mush point to the S3 folder where the Delta lake table
exists and the "data_type": "DATA_TYPE_DELTA_TABLE"
must be
specified.
integration_id = "abe23******" #Choose the AWS Integration ID
project.register_dataset(
name= f"ref_data_{dt}",
data_config= {
"connection_info": {
"data_file": {
"path": "s3://rime-blob-integration/data/delta_table/fraud_ref", #Enter the delta table path on s3
"data_type": "DATA_TYPE_DELTA_TABLE"
}
},
"data_params": {"label_col": "label"}
},
integration_id= integration_id
)
More information on defining the configurations is available in the Data Configuration section.