Test Suite Configuration

All of the tests RIME runs are easily configurable via the "test_suite_config" key in the test run config. The "test_suite_config" key expects a dictionary as its value, the configuration options for which are presented below.

Test Suite Configuration Template

{
    "test_suite_config": {
      "global_test_sensitivity": "TEST_SENSITIVITY_DEFAULT",
      "custom_tests": [],
      "categories": [],
      "individual_tests_config": {...},
    }
}

Global Configuration Options

This dictionary contains several global configuration options, which, if specified, will apply to all relevant tests. All of these default to null, which means RIME will rely on the specific test configuration to provide this value. These are:

categories: List[dict], default = []

Test categories to run. Options include Abnormal Inputs, Attacks, Bias and Fairness, Data Cleanliness, Data Poisoning Detection, Drift, Model Performance, Subset Performance, and Transformations. If no categories are specified, the default categories are run. The default categories for Stress Testing are Attacks, Model Performance, Subset Performance, and Transformations, and the default categories for Continuous Testing are Abnormal Inputs, Drift, Model Performance, and Subset Performance Degradation. The format for each category configuration is a dictionary with the "name" of the category and "run_st", "run_ct" boolean flags that specify whether to run that category in Stress Testing and Continuous Testing, respectively. For example, this categories configuration runs Model Performance in both Stress Testing and Continuous Testing, and Data Cleanliness in Continuous Testing only:

{
    # ...
    "categories": [
          {"name": "Model Performance", "run_st": True, "run_ct": True},
          {"name": "Data Cleanliness", "run_st": False, "run_ct": True}
      ]
    }
    # ...
}

global_test_sensitivity: str, default = "TEST_SENSITIVITY_DEFAULT" The global setting for test sensitivity to be applied for all test. The options are are "TEST_SENSITIVITY_LESS_SENSITIVE", "TEST_SENSITIVITY_DEFAULT", and "TEST_SENSITIVITY_MORE_SENSITIVE".
custom_tests: List[dict], default = [] Specification for custom tests. For more information on custom test configuration, see Custom Test Configuration
individual_tests_config: Optional[dict], default = null This is the part of the configuration where the user can specify configuration for individual tests run by RIME. The first few parameters are parameters that affect the global behavior of tests.
- global_exclude_columns: Optional[List[str]], default = null
  
  Columns to exclude from all tests.
- global_abnormal_inputs_performance_change_config: Optional[mapping], default = null
  
  Parameters for measuring the impact of abnormal inputs on model performance (applies to all abnormal input tests). The different values of this mapping should be:
  - severity_thresholds: List[float, float]
    
    Ascending list of three float thresholds, corresponding to the observed or simulated performance change which must be achieved in order for the test to return, respectively, Low, Medium, or High severity. This is a logical OR: if both types of performance change are measured, take the maximum of the two and return the severity corresponding to the highest threshold that was exceeded. If there are observed failing rows but the observed or simulated performance changes do not exceed any of the thresholds, return a Low severity.
  - min_num_samples: int
    
    The minimum number of rows needed to reliably compute performance change. If there are fewer than this many abnormal inputs, the observed model performance change will not be taken into when determining test status and severity.
- global_transformation_performance_change_config: Optional[mapping], default = null
  
  Parameters for measuring the impact of transformation on model performance (applies to all transformation tests). The different values of this mapping should be:
  - severity_thresholds: List[float, float]
    
    Ascending list of three float thresholds, corresponding to the observed or simulated performance change which must be achieved in order for the test to return, respectively, Low, Medium, or High severity. This is a logical OR: if both types of performance change are measured, take the maximum of the two and return the severity corresponding to the highest threshold that was exceeded. If there are observed failing rows but the observed or simulated performance changes do not exceed any of the thresholds, return a Low severity.
  - ignore_errors: bool
    
    If False, if the model raises an error on inputs with the given abnormality then the test case will fail with High severity.
  - num_samples_to_simulate: int
    
    The number of clean rows to sample and perturb for the sake of measuring the simulated performance change.
- global_drift_scaling_factor: float
  
  Used for drift tests. How large of an estimated change in predictions is needed to increase the Model Impact Level by 1. Defaults to 0.005.
Besides these global parameters in the individual_tests_config, there are also many individual tests that can be configured. See below for an example.

Default configuration

The default configuration for all tests is available in the rime_trial bundle, at examples/test_configs/default_test_config.json.

To use such a configuration JSON object, load the configuration JSON into Python as a dictionary, for example by using json.load. Specify the dictionary as the value of the "test_suite_config" key in the test run configuration.