Scheduling Continuous Tests ============================= Configuration of schedules for RIME AI Firewall Continuous Tests is done through the SDK or the UI. See [Scheduled CT How-To-Guide](../../how_to_guides/common_use_cases/scheduling_ct_runs.md) for some background. ### Arguments - **`location_type`**: string, ***required*** Type of location that data is housed in. Can be "data_collector", "delta_lake" or "custom_loader". - **`location_info`**: Dict or null, *default* = `null`, ***required for some location types*** Dict containing arguments needed to access the location provided in the `location_type` field. While some types don't require extra arguments, others do. - **`data_params`**: Dict or null, *default* = `null`, ***required for some location types*** Dict containing arguments needed to process the data once loaded from the location eg. timestamp_col. - `reference_set_window`: Tuple[datetime, datetime] or null, *default* = `null` Time Range to set the reference set to for scheduled runs. - `rolling_window_duration`: timedelta or null, *default* = `null` The length of the rolling window to use as a reference set in scheduled runs. ## Templates for Data Location Types ### Data Collector Data Collector does not take a location_info argument, since the service is internal ```python data_params_dict = {} # Arguments for activating a CT # Schedule with the Data Collector firewall.activate_ct_schedule( location_type="", reference_set_window=reference_set_bin, ) ``` ### Delta Lake Delta Lake requires a location_info argument, since the service is external. One of these arguments is a data source name. See our data source guide to configure authentication. You can also specify parameters like `pred_col`. ```python location_info = { "time_col": "", # Name of the timestamp column in your delta lake table "table_name": "", # Delta lake table name "data_source_name": "" # Data Source name configured from UI } data_params_dict = {} # Arguments for activating a CT # Schedule with Databricks Delta Lake firewall.activate_ct_schedule( location_type="delta_lake", location_info=location_info_dict, data_params=data_params_dict ) ``` ### Custom Loader Our custom loader integration is designed to integrate with any location. This requires a `location_info` argument defining the path to load from, the function that does the loading, as well as any extra parameters. Your load function must accept parameters `start_time`, `end_time` as options. You will need to specify a data params dict with the `timestamp_col` parameter. You can also specify parameters like `pred_col`. ```python location_info = { "load_path": "", # the path to the file where you define how to load in your data "load_func_name": "", # the function name in the file where you load your data. This must accept parameters start_time, end_time "loader_kwargs_json": "", # any additional parameters and values needed for your function } data_params_dict = { "timestamp_col": "" # Required for custom loader } # Arguments for activating a CT # Schedule with Custom Loader firewall.activate_ct_schedule( location_type="custom_loader", location_info=location_info_dict, data_params=data_params_dict ) ``` ## Templates for Configuring Data Params The dictionary below outlines all the keys you can specify for data params. If these are not provided, the defaults are taken from the reference set. For more info about the types see our [General Tabular Parameters Section](data_source.md#general-tabular-parameters-for-single-data-info). ```python data_params = { "label_col": "", "pred_col": "", "nrows": #, "categorical_features": [], "protected_features": [], "features_not_in_model": [], "loading_kwargs": "", "feature_type_path": "", "pred_path": "", "ranking_info": { "query_col": "", "nqueries": #, "nrows_per_query": #, "drop_query_id": true }, "embeddings": { "name": "", "cols": [], } } ``` ## Templates for Configuring Reference Sets When specifying a reference set you can choose to use the default value or change the set to a rolling window or time period. To choose the default value, you don't need to specify rolling_window_duration or reference_set_window: ```python # Fill in the dictionaries # with the appropriate keys as below data_params_dict = {} location_info_dict= {} # General Arguments for Activating a Schedule # if using a default reference set firewall.activate_ct_schedule( location_type="", location_info=location_info_dict, data_params=data_params_dict ) ``` To choose a rolling window: ```python from datetime import timedelta # Fill in the dictionaries # with the appropriate keys as below data_params_dict = {} location_info_dict= {} rolling_window_period = timedelta(days=1) # General Arguments for Activating a Schedule # if specifying a reference set with a rolling window firewall.activate_ct_schedule( location_type="", location_info=location_info_dict, rolling_window_duration=rolling_window_period, data_params=data_params_dict ) ``` To choose a time period: ```python from datetime import datetime # Fill in the dictionaries # with the appropriate keys as below data_params_dict = {} location_info_dict= {} reference_start_time = datetime(2022, 1, 3) reference_end_time = datetime(2021, 1, 3) reference_set_bin = (reference_start_time, reference_end_time) # General Arguments for Activating a Schedule # if specifying a reference set with a time period firewall.activate_ct_schedule( location_type="", location_info=location_info_dict, reference_set_window=reference_set_bin, data_params=data_params_dict ) ``` ## Specifying Data Locations for Manual Testing Runs Each of the location types above can also be run independently through continuous testing configs on the Firewall. Additionally, the Delta Lake and Custom Loader types can be run in an offline setting as stress tests. To learn about how to specify the configs, please visit our [Tabular Configuration Page](data_source.md). For Delta Lake configs specifically, you will need to specify a `data_source_name` when running a test. This is the name of the Delta Lake Data Source, which can be configured through the UI. For a continuous test, this might look like: ```python delta_lake_incremental_config = { "eval_data_info": { "type": "delta_lake", "table_name": "", "start_time": #, "end_time": #, "time_col": "", } } firewall.start_continuous_test(delta_lake_incremental_config, data_source_name="data source") ``` For a stress test, this might look like: ```python delta_lake_config = { "run_name": "Delta Lake Run" "data_info": { "type": "split", "ref_data_info": { "type": "delta_lake", "table_name": "", "start_time": #, "end_time": #, "time_col": "", }, "eval_data_info": { "type": "delta_lake", "table_name": "", "start_time": #, "end_time": #, "time_col": "", }, }, } client.start_stress_test(delta_lake_config, data_source_name="data source") ```