Configuring integrations

Administrators can configure integrations for a RIME instance. Integrations can provide data or compute resources to RIME.

Sensitive information, such as access tokens, are stored in a secret manager service built using Hashicorp Vault.

All configured integrations can be managed from the Integrations tab in Workspace settings.

Supported integrations

RIME supports integrations with the following vendors:

  • Google Cloud Storage

  • Databricks Deltalake

  • Amazon S3

Configurable through the RIME web UI

The Databricks Deltalake integration can be configured directly from the web UI of a RIME instance.

These integrations can only be configured when deploying a RIME instance:

  • Amazon S3

  • Google Cloud Storage

Adding a data integration through the RIME web UI

An administrator can add or modify a data integration that the RIME instance can use.

  1. Sign in to a RIME instance.

    The Workspaces page appears.

  2. Click the three-dot menu at the right side of a workspace and select Manage Workspace.

    Alternately, select Workspace Settings from a workspace page. The Workspace Settings page appears.

  3. Click Integrations in the left navigation bar.

    The Integrations configuration pane appears.

  4. Click Add Integration.

    The Add Integration dialog box appears.

  5. In Name, type a name for the integration.

  6. From the Integration Type drop-down, choose a type.

    Supported types are S3, GCS, Databricks, and Custom.

  7. (Adding a Databricks integration) Type the following information in the corresponding fields.

    • Name

    • Server hostname

    • HTTP Path

    • Databricks Access Token

  1. (Adding a Custom integration) Type a name for the integration in Name.

  2. (Adding a Custom integration) Type the configuration information for the custom integration as key/value pairs and select a sensitivity for each pair.

    Sensitivity levels are Not sensitive, Workspace level, and Members level.

  3. (Optional) Click Add Variable to add specific environment variables as key/value pairs.

    A set of Key and Value fields appears with a Sensitivity drop-down selector between them.

  4. (Adding a variable) In Key and Value, type a key and the value for that key.

  5. (Adding a variable) From the Sensitivity drop-down, select a sensitivity level for the key/value pair.

    Sensitivity levels are Not sensitive, Workspace level, and Members level.

  6. Click Save Integration.

The new integration is now available to the RI Platform.

Custom integrations in the Python SDK

RIME can load data from arbitrary sources defined in a Python file for use in a Firewall or stress test. To use a custom integration, pass a custom data dictionary to either the Client.start_stress_test() or the Firewall.start_continous_test() functions. The custom data dictionary used for a continuous test must specify:

  • A file path to a Python file

  • The name of a function in that Python file

  • Arguments to that function, if any

The Python file specified in the custom data dictionary must provide any required credentials for access to the filepath where the data is located. The output of the data loading function must be a Pandas dataframe.

Example stress test with a custom integration

The following code specifies a custom data dictionary and the Python SDK call that starts the stress test.

custom_config = {
    "run_name": "Weather Predictor Test",
    "data_info": {
        "type": "split",
        "ref_data_info": {
            "type": "custom",
            "load_path": "s3://rime-datasets/custom-loader-weather/custom_weather_loader.py",
            "load_func_name": "get_weather_data",
            "loader_kwargs_json": '{"start_time": 0, "end_time": 1758608114}',
        },
        "eval_data_info": {
            "type": "custom",
            "load_path": "s3://rime-datasets/custom-loader-weather/custom_weather_loader.py",
            "load_func_name": "get_weather_data",
            "loader_kwargs_json": '{"start_time": 0, "end_time": 1758608114}',
        },
    }
}

job = client.start_stress_test(
    test_run_config=custom_config,
)

The custom_weather_loader.py code specifies the actual data loading logic to use. This example parses data from an Amazon S3 bucket, but a custom data loader can be written to handle data from any source.

"""Data loader file for weather file."""
import boto3
from datetime import datetime
import pandas as pd
​
BUCKET_NAME = "rime-datasets"
ACCESS_KEY = "*access key*"
SECRET_ACCESS_KEY = "*secret key*"
​
​
def get_weather_data(start_time: int, end_time: int) -> pd.DataFrame:
    start_time = datetime.fromtimestamp(start_time)
    end_time = datetime.fromtimestamp(end_time)
​
    master_df = pd.DataFrame()
    s3 = boto3.resource('s3', aws_access_key_id=ACCESS_KEY, aws_secret_access_key=SECRET_ACCESS_KEY)
    my_bucket = s3.Bucket(BUCKET_NAME)
    for object_summary in my_bucket.objects.filter(Prefix="custom-loader-weather/"):
        if ".csv" in object_summary.key:
            date_str = object_summary.key.split("/")[1].replace(".csv", "")
            date_str = date_str.replace("day_", "")
            file_time = datetime.strptime(date_str, "%Y-%m-%d")
            if start_time <= file_time <= end_time:
                obj = s3.Object(BUCKET_NAME, object_summary.key)
                curr_df = pd.read_csv(obj.get()["Body"])
                master_df = pd.concat([master_df, curr_df], ignore_index=True)
    return master_df

Example continuous test with a custom integration

Continuous tests rely on an established reference data source and do not need to specify one in the data dictionary. The contents of custom_weather_loader.py do not change, but the custom data dictionary is different.

incremental_config.= {
"eval_data_info": {
            "type": "custom",
            "load_path": "s3://rime-datasets/custom-loader-weather/custom_weather_loader.py",
            "load_func_name": "get_weather_data",
            "loader_kwargs_json": '{"start_time": 0, "end_time": 1758608114}',
}
}