Scheduling Continuous Tests

Configuration of schedules for RIME AI Firewall Continuous Tests is done through the SDK or the UI.

See Scheduled CT How-To-Guide for some background.

Arguments

  • location_type: string, required

    Type of location that data is housed in. Can be “data_collector”, “delta_lake” or “custom_loader”.

  • location_info: Dict or null, default = null, required for some location types

    Dict containing arguments needed to access the location provided in the location_type field. While some types don’t require extra arguments, others do.

  • data_params: Dict or null, default = null, required for some location types

    Dict containing arguments needed to process the data once loaded from the location eg. timestamp_col.

  • reference_set_window: Tuple[datetime, datetime] or null, default = null

    Time Range to set the reference set to for scheduled runs.

  • rolling_window_duration: timedelta or null, default = null

    The length of the rolling window to use as a reference set in scheduled runs.

Templates for Data Location Types

Data Collector

Data Collector does not take a location_info argument, since the service is internal

data_params_dict = {}
# Arguments for activating a CT
# Schedule with the Data Collector
firewall.activate_ct_schedule(
    location_type="<YOUR_LOCATION_TYPE>", 
    reference_set_window=reference_set_bin, 
)

Delta Lake

Delta Lake requires a location_info argument, since the service is external. One of these arguments is a data source name. See our data source guide to configure authentication. You can also specify parameters like pred_col.

location_info = {
  "time_col": "", # Name of the timestamp column in your delta lake table
  "table_name": "", # Delta lake table name
  "data_source_name": "" # Data Source name configured from UI
}

data_params_dict = {}
# Arguments for activating a CT
# Schedule with Databricks Delta Lake
firewall.activate_ct_schedule(
    location_type="delta_lake", 
    location_info=location_info_dict, 
    data_params=data_params_dict
)

Custom Loader

Our custom loader integration is designed to integrate with any location. This requires a location_info argument defining the path to load from, the function that does the loading, as well as any extra parameters. Your load function must accept parameters start_time, end_time as options. You will need to specify a data params dict with the timestamp_col parameter. You can also specify parameters like pred_col.

location_info = {
  "load_path": "", # the path to the file where you define how to load in your data
  "load_func_name": "", # the function name in the file where you load your data. This must accept parameters start_time, end_time
  "loader_kwargs_json": "",  # any additional parameters and values needed for your function
}
data_params_dict = {
  "timestamp_col": "" # Required for custom loader
}
# Arguments for activating a CT
# Schedule with Custom Loader
firewall.activate_ct_schedule(
    location_type="custom_loader", 
    location_info=location_info_dict, 
    data_params=data_params_dict
)

Templates for Configuring Data Params

The dictionary below outlines all the keys you can specify for data params. If these are not provided, the defaults are taken from the reference set. For more info about the types see our General Tabular Parameters Section.

data_params = {
  "label_col": "",
  "pred_col": "",
  "nrows": #,
  "categorical_features": [],
  "protected_features": [],
  "features_not_in_model": [],
  "loading_kwargs": "",
  "feature_type_path": "",
  "pred_path": "",
  "ranking_info": {
    "query_col": "",
    "nqueries": #,
    "nrows_per_query": #,
    "drop_query_id": true
  },
  "embeddings": {
    "name": "",
    "cols": [],
  }
}

Templates for Configuring Reference Sets

When specifying a reference set you can choose to use the default value or change the set to a rolling window or time period.

To choose the default value, you don’t need to specify rolling_window_duration or reference_set_window:

# Fill in the dictionaries 
# with the appropriate keys as below
data_params_dict = {}
location_info_dict= {}

# General Arguments for Activating a Schedule
# if using a default reference set
firewall.activate_ct_schedule(
    location_type="<YOUR_LOCATION_TYPE>", 
    location_info=location_info_dict, 
    data_params=data_params_dict
)

To choose a rolling window:

from datetime import timedelta
# Fill in the dictionaries 
# with the appropriate keys as below
data_params_dict = {}
location_info_dict= {}

rolling_window_period = timedelta(days=1)
# General Arguments for Activating a Schedule
# if specifying a reference set with a rolling window
firewall.activate_ct_schedule(
    location_type="<YOUR_LOCATION_TYPE>", 
    location_info=location_info_dict, 
    rolling_window_duration=rolling_window_period,
    data_params=data_params_dict
)

To choose a time period:

from datetime import datetime
# Fill in the dictionaries 
# with the appropriate keys as below
data_params_dict = {}
location_info_dict= {}

reference_start_time = datetime(2022, 1, 3)
reference_end_time = datetime(2021, 1, 3)
reference_set_bin = (reference_start_time, reference_end_time)

# General Arguments for Activating a Schedule
# if specifying a reference set with a time period
firewall.activate_ct_schedule(
    location_type="<YOUR_LOCATION_TYPE>", 
    location_info=location_info_dict, 
    reference_set_window=reference_set_bin, 
    data_params=data_params_dict
)

Specifying Data Locations for Manual Testing Runs

Each of the location types above can also be run independently through continuous testing configs on the Firewall. Additionally, the Delta Lake and Custom Loader types can be run in an offline setting as stress tests. To learn about how to specify the configs, please visit our Tabular Configuration Page.

For Delta Lake configs specifically, you will need to specify a data_source_name when running a test. This is the name of the Delta Lake Data Source, which can be configured through the UI.

For a continuous test, this might look like:

delta_lake_incremental_config = {
  "eval_data_info": {
    "type": "delta_lake",
    "table_name": "",
    "start_time": #,
    "end_time": #,
    "time_col": "",
  }
}
firewall.start_continuous_test(delta_lake_incremental_config, data_source_name="data source")

For a stress test, this might look like:

delta_lake_config = {
  "run_name": "Delta Lake Run"
  "data_info": {
    "type": "split",
    "ref_data_info": {
      "type": "delta_lake",
      "table_name": "",
      "start_time": #,
      "end_time": #,
      "time_col": "",
    },
    "eval_data_info": {
      "type": "delta_lake",
      "table_name": "",
      "start_time": #,
      "end_time": #,
      "time_col": "",
    },
  },
}
client.start_stress_test(delta_lake_config, data_source_name="data source")