Test Suite Configuration
All of the tests RIME runs are easily configurable via the "test_suite_config"
key in the test run config. The "test_suite_config"
key expects a dictionary as its value, the configuration options for which are presented below.
Test Suite Configuration Template
{
"test_suite_config": {
"global_test_sensitivity": "TEST_SENSITIVITY_DEFAULT",
"custom_tests": [],
"categories": [],
"individual_tests_config": {...},
}
}
Global Configuration Options
This dictionary contains several global configuration options, which, if specified, will apply to all relevant tests. All of these default to null
, which means RIME will rely on the specific test configuration to provide this value. These are:
categories
: List[dict], default =[]
Test categories to run. Options include
Abnormal Inputs
,Attacks
,Bias and Fairness
,Data Cleanliness
,Data Poisoning Detection
,Drift
,Model Performance
,Subset Performance
, andTransformations
. If no categories are specified, the default categories are run. The default categories for Stress Testing areAttacks
,Model Performance
,Subset Performance
, andTransformations
, and the default categories for Continuous Testing areAbnormal Inputs
,Drift
,Model Performance
, andSubset Performance Degradation
. The format for each category configuration is a dictionary with the"name"
of the category and"run_st"
,"run_ct"
boolean flags that specify whether to run that category in Stress Testing and Continuous Testing, respectively. For example, this categories configuration runs Model Performance in both Stress Testing and Continuous Testing, and Data Cleanliness in Continuous Testing only:
{
# ...
"categories": [
{"name": "Model Performance", "run_st": True, "run_ct": True},
{"name": "Data Cleanliness", "run_st": False, "run_ct": True}
]
}
# ...
}
global_test_sensitivity
: str, default ="TEST_SENSITIVITY_DEFAULT"
The global setting for test sensitivity to be applied for all test. The options are are"TEST_SENSITIVITY_LESS_SENSITIVE"
,"TEST_SENSITIVITY_DEFAULT"
, and"TEST_SENSITIVITY_MORE_SENSITIVE"
.custom_tests
: List[dict], default = [] Specification for custom tests. For more information on custom test configuration, see Custom Test Configurationindividual_tests_config
: Optional[dict], default =null
This is the part of the configuration where the user can specify configuration for individual tests run by RIME. The first few parameters are parameters that affect the global behavior of tests.global_exclude_columns
: Optional[List[str]], default =null
Columns to exclude from all tests.
global_abnormal_inputs_performance_change_config
: Optional[mapping], default =null
Parameters for measuring the impact of abnormal inputs on model performance (applies to all abnormal input tests). The different values of this mapping should be:
severity_thresholds
: List[float, float]Ascending list of three float thresholds, corresponding to the observed or simulated performance change which must be achieved in order for the test to return, respectively, Low, Medium, or High severity. This is a logical OR: if both types of performance change are measured, take the maximum of the two and return the severity corresponding to the highest threshold that was exceeded. If there are observed failing rows but the observed or simulated performance changes do not exceed any of the thresholds, return a Low severity.
min_num_samples
: intThe minimum number of rows needed to reliably compute performance change. If there are fewer than this many abnormal inputs, the observed model performance change will not be taken into when determining test status and severity.
global_transformation_performance_change_config
: Optional[mapping], default =null
Parameters for measuring the impact of transformation on model performance (applies to all transformation tests). The different values of this mapping should be:
severity_thresholds
: List[float, float]Ascending list of three float thresholds, corresponding to the observed or simulated performance change which must be achieved in order for the test to return, respectively, Low, Medium, or High severity. This is a logical OR: if both types of performance change are measured, take the maximum of the two and return the severity corresponding to the highest threshold that was exceeded. If there are observed failing rows but the observed or simulated performance changes do not exceed any of the thresholds, return a Low severity.
ignore_errors
: boolIf False, if the model raises an error on inputs with the given abnormality then the test case will fail with High severity.
num_samples_to_simulate
: intThe number of clean rows to sample and perturb for the sake of measuring the simulated performance change.
global_drift_scaling_factor
: floatUsed for drift tests. How large of an estimated change in predictions is needed to increase the Model Impact Level by 1. Defaults to
0.005
.
Besides these global parameters in the
individual_tests_config
, there are also many individual tests that can be configured. See below for an example.
Default configuration
The default configuration for all tests is available in the rime_trial
bundle, at examples/test_configs/default_test_config.json
.
To use such a configuration JSON object, load the configuration JSON into Python as a dictionary, for example by using json.load
. Specify the dictionary as the value of the "test_suite_config"
key in the test run configuration.