Scheduling Continuous Tests
=============================
 
Configuration of schedules for RIME AI Firewall Continuous Tests is done through the SDK or the UI.

See [Scheduled CT How-To-Guide](../../how_to_guides/common_use_cases/scheduling_ct_runs.md) for some background.
### Arguments
- **`location_type`**: string, ***required***

    Type of location that data is housed in. Can be "data_collector", "delta_lake" or "custom_loader".

- **`location_info`**: Dict or null, *default* = `null`, ***required for some location types***

  Dict containing arguments needed to access the location provided in the `location_type` field.
  While some types don't require extra arguments, others do.

- **`data_params`**: Dict or null, *default* = `null`, ***required for some location types***
  
  Dict containing arguments needed to process the data once loaded from the location eg. timestamp_col.

- `reference_set_window`: Tuple[datetime, datetime] or null, *default* = `null`
  
  Time Range to set the reference set to for scheduled runs.
- `rolling_window_duration`: timedelta or null, *default* = `null`

  The length of the rolling window to use as a reference set in scheduled runs.

## Templates for Data Location Types

### Data Collector
Data Collector does not take a location_info argument, since the service is internal
```python
data_params_dict = {}
# Arguments for activating a CT
# Schedule with the Data Collector
firewall.activate_ct_schedule(
    location_type="<YOUR_LOCATION_TYPE>", 
    reference_set_window=reference_set_bin, 
)
```

### Delta Lake
Delta Lake requires a location_info argument, since the service is external. One 
of these arguments is a data source name. See our data source guide to configure authentication.
You can also specify parameters like `pred_col`.
```python
location_info = {
  "time_col": "", # Name of the timestamp column in your delta lake table
  "table_name": "", # Delta lake table name
  "data_source_name": "" # Data Source name configured from UI
}

data_params_dict = {}
# Arguments for activating a CT
# Schedule with Databricks Delta Lake
firewall.activate_ct_schedule(
    location_type="delta_lake", 
    location_info=location_info_dict, 
    data_params=data_params_dict
)
```

### Custom Loader
Our custom loader integration is designed to integrate with any location. This requires a `location_info` argument defining the path to load from,
the function that does the loading, as well as any extra parameters. Your load function must accept parameters `start_time`, `end_time` as options.
You will need to specify a data params dict with the `timestamp_col` parameter. You can also specify parameters like `pred_col`. 
```python
location_info = {
  "load_path": "", # the path to the file where you define how to load in your data
  "load_func_name": "", # the function name in the file where you load your data. This must accept parameters start_time, end_time
  "loader_kwargs_json": "",  # any additional parameters and values needed for your function
}
data_params_dict = {
  "timestamp_col": "" # Required for custom loader
}
# Arguments for activating a CT
# Schedule with Custom Loader
firewall.activate_ct_schedule(
    location_type="custom_loader", 
    location_info=location_info_dict, 
    data_params=data_params_dict
)
```

## Templates for Configuring Data Params
The dictionary below outlines all the keys you can specify for data params. If these are
not provided, the defaults are taken from the reference set. For more info about the types
see our [General Tabular Parameters Section](data_source.md#general-tabular-parameters-for-single-data-info).
```python
data_params = {
  "label_col": "",
  "pred_col": "",
  "nrows": #,
  "categorical_features": [],
  "protected_features": [],
  "features_not_in_model": [],
  "loading_kwargs": "",
  "feature_type_path": "",
  "pred_path": "",
  "ranking_info": {
    "query_col": "",
    "nqueries": #,
    "nrows_per_query": #,
    "drop_query_id": true
  },
  "embeddings": {
    "name": "",
    "cols": [],
  }
}
```  

## Templates for Configuring Reference Sets
When specifying a reference set you can choose to use the default value or change the set to a rolling window or time period.

To choose the default value, you don't need to specify rolling_window_duration or reference_set_window:
```python
# Fill in the dictionaries 
# with the appropriate keys as below
data_params_dict = {}
location_info_dict= {}

# General Arguments for Activating a Schedule
# if using a default reference set
firewall.activate_ct_schedule(
    location_type="<YOUR_LOCATION_TYPE>", 
    location_info=location_info_dict, 
    data_params=data_params_dict
)
```

To choose a rolling window: 
```python
from datetime import timedelta
# Fill in the dictionaries 
# with the appropriate keys as below
data_params_dict = {}
location_info_dict= {}

rolling_window_period = timedelta(days=1)
# General Arguments for Activating a Schedule
# if specifying a reference set with a rolling window
firewall.activate_ct_schedule(
    location_type="<YOUR_LOCATION_TYPE>", 
    location_info=location_info_dict, 
    rolling_window_duration=rolling_window_period,
    data_params=data_params_dict
)
```

To choose a time period: 
```python
from datetime import datetime
# Fill in the dictionaries 
# with the appropriate keys as below
data_params_dict = {}
location_info_dict= {}

reference_start_time = datetime(2022, 1, 3)
reference_end_time = datetime(2021, 1, 3)
reference_set_bin = (reference_start_time, reference_end_time)

# General Arguments for Activating a Schedule
# if specifying a reference set with a time period
firewall.activate_ct_schedule(
    location_type="<YOUR_LOCATION_TYPE>", 
    location_info=location_info_dict, 
    reference_set_window=reference_set_bin, 
    data_params=data_params_dict
)
```

## Specifying Data Locations for Manual Testing Runs
Each of the location types above can also be run independently through
continuous testing configs on the Firewall. Additionally, the Delta Lake and Custom Loader
types can be run in an offline setting as stress tests. To learn about how to specify the configs, please visit
our [Tabular Configuration Page](data_source.md).

For Delta Lake configs specifically, you will need to specify a `data_source_name` 
when running a test. This is the name of the Delta Lake Data Source, which can be configured through the UI.

For a continuous test, this might look like:
```python
delta_lake_incremental_config = {
  "eval_data_info": {
    "type": "delta_lake",
    "table_name": "",
    "start_time": #,
    "end_time": #,
    "time_col": "",
  }
}
firewall.start_continuous_test(delta_lake_incremental_config, data_source_name="data source")
```

For a stress test, this might look like:

```python
delta_lake_config = {
  "run_name": "Delta Lake Run"
  "data_info": {
    "type": "split",
    "ref_data_info": {
      "type": "delta_lake",
      "table_name": "",
      "start_time": #,
      "end_time": #,
      "time_col": "",
    },
    "eval_data_info": {
      "type": "delta_lake",
      "table_name": "",
      "start_time": #,
      "end_time": #,
      "time_col": "",
    },
  },
}
client.start_stress_test(delta_lake_config, data_source_name="data source")
```