Tests Configuration =================== All of the tests RIME runs are easily configurable via a JSON configuration file. In order to use this configuration for a run, you should specify the path to this JSON file in the overall configuration file using the `"tests_config_path"` key. ## Global Configuration Options This JSON file contains several global configuration options, which, if specified, will apply to all relevant tests. All of these default to `null`, which means RIME will rely on the specific test configuration to provide this value. These are: - **`categories`**: List[str], *default* = ``[]`` Test categories to run. Options include `Abnormal Inputs`, `Drift`, `Subset Performance`, `Data Cleanliness`, `Transformations`, and `Compliance`. - **`run_default`**: Optional[bool], *default* = ``null`` Whether to run default categories or not. Defaults to `True` if no `categories` are specified, `False` if any are. The default categories are `Abnormal Inputs`, `Drift`, `Subset Performance`, `Data Cleanliness` and `Transformations`. - **`global_exclude_columns`**: Optional[List[str]], *default* = `null` Columns to exclude from all tests. - **`global_abnormal_inputs_performance_change_config`**: Optional[mapping], *default* = `null` Parameters for measuring the impact of abnormal inputs on model performance (applies to all abnormal input tests). The different values of this mapping should be: - `severity_thresholds`: List[float, float, float] Ascending list of three float thresholds, corresponding to the observed or simulated performance change which must be achieved in order for the test to return, respectively, Low, Medium, or High severity. This is a logical OR: if both types of performance change are measured, take the maximum of the two and return the severity corresponding to the highest threshold that was exceeded. If there are observed failing rows but the observed or simulated performance changes do not exceed any of the thresholds, return a Low severity. - `min_num_samples`: int The minimum number of rows needed to reliably compute performance change. If there are fewer than this many abnormal inputs, the observed model performance change will not be taken into when determining test status and severity. - **`global_transformation_performance_change_config`**: Optional[mapping], *default* = `null` Parameters for measuring the impact of transformation on model performance (applies to all transformation tests). The different values of this mapping should be: - `severity_thresholds`: List[float, float, float] Ascending list of three float thresholds, corresponding to the observed or simulated performance change which must be achieved in order for the test to return, respectively, Low, Medium, or High severity. This is a logical OR: if both types of performance change are measured, take the maximum of the two and return the severity corresponding to the highest threshold that was exceeded. If there are observed failing rows but the observed or simulated performance changes do not exceed any of the thresholds, return a Low severity. - `ignore_errors`: bool If False, if the model raises an error on inputs with the given abnormality then the test case will fail with High severity. - `num_samples_to_simulate`: int The number of clean rows to sample and perturb for the sake of measuring the simulated performance change. - **`global_drift_scaling_factor`**: float Used for drift tests. How large of an estimated change in predictions is needed to increase the Model Impact Level by 1. Defaults to `0.005`. Besides these global parameters, there are also keys for configuration for individual tests. ## Default configuration Below is the default configuration for all tests. A copy of this can also be found in your `rime_trial` bundle (inside the `examples/test_configs/default_test_config.json`). ```json { "categories": [], "run_default": null, "custom_tests": null, "numeric_outlier": { "exclude_columns": [], "run": true, "performance_change_config": { "severity_thresholds": null, "min_num_samples": 10 }, "min_normal_prop": 0.99, "baseline_quantile": 0.1, "perturb_multiplier": 1.0 }, "unseen_categorical": { "exclude_columns": [], "run": true, "performance_change_config": { "severity_thresholds": null, "min_num_samples": 10 } }, "unseen_domain": { "exclude_columns": [], "run": true, "performance_change_config": { "severity_thresholds": null, "min_num_samples": 10 } }, "unseen_email": { "exclude_columns": [], "run": true, "performance_change_config": { "severity_thresholds": null, "min_num_samples": 10 } }, "unseen_url": { "exclude_columns": [], "run": true, "performance_change_config": { "severity_thresholds": null, "min_num_samples": 10 } }, "rare_categories": { "exclude_columns": [], "run": true, "performance_change_config": { "severity_thresholds": null, "min_num_samples": 10 }, "include_columns": [], "min_num_occurrences": 0, "min_pct_occurrences": 0, "min_ratio_rel_uniform": 0.005 }, "out_of_range": { "exclude_columns": [], "run": true, "performance_change_config": { "severity_thresholds": null, "min_num_samples": 10 }, "std_factor": 3 }, "req_characters": { "column_specific_params": {}, "exclude_columns": [], "run": true, "performance_change_config": { "severity_thresholds": null, "min_num_samples": 10 } }, "inconsistencies": { "exclude_columns": [], "run": true, "performance_change_config": { "severity_thresholds": null, "min_num_samples": 10 }, "freq_ratio_threshold": 0.02, "min_correlation": 0.1, "max_pairwise_tests": 200, "max_unique_pairs_for_firewall": 15 }, "capitalization": { "exclude_columns": [], "run": true, "performance_change_config": { "severity_thresholds": null, "min_num_samples": 10 } }, "empty_string": { "exclude_columns": [], "run": true, "performance_change_config": { "severity_thresholds": null, "min_num_samples": 10 } }, "feat_subset_auc": { "exclude_columns": [], "run": true, "min_sample_size": 20, "performance_change_thresholds": null }, "feat_subset_accuracy": { "exclude_columns": [], "run": true, "min_sample_size": 20, "performance_change_thresholds": null }, "feat_subset_f1": { "exclude_columns": [], "run": true, "min_sample_size": 20, "performance_change_thresholds": null }, "feat_subset_macro_f1": { "exclude_columns": [], "run": true, "min_sample_size": 20, "performance_change_thresholds": null }, "feat_subset_precision": { "exclude_columns": [], "run": true, "min_sample_size": 20, "performance_change_thresholds": null }, "feat_subset_macro_precision": { "exclude_columns": [], "run": true, "min_sample_size": 20, "performance_change_thresholds": null }, "feat_subset_fpr": { "exclude_columns": [], "run": true, "min_sample_size": 20, "performance_change_thresholds": null }, "feat_subset_recall": { "exclude_columns": [], "run": true, "min_sample_size": 20, "performance_change_thresholds": null }, "feat_subset_macro_recall": { "exclude_columns": [], "run": true, "min_sample_size": 20, "performance_change_thresholds": null }, "feat_subset_pred_variance_pos": { "exclude_columns": [], "run": true, "min_sample_size": 20, "performance_change_thresholds": null }, "feat_subset_pred_variance_neg": { "exclude_columns": [], "run": true, "min_sample_size": 20, "performance_change_thresholds": null }, "feat_subset_pred_variance_all": { "exclude_columns": [], "run": true, "min_sample_size": 20, "performance_change_thresholds": null }, "feat_subset_rmse": { "exclude_columns": [], "run": true, "min_sample_size": 20, "performance_change_thresholds": null }, "feat_subset_mae": { "exclude_columns": [], "run": true, "min_sample_size": 20, "performance_change_thresholds": null }, "feat_subset_rank_correlation": { "exclude_columns": [], "run": true, "min_sample_size": 20, "performance_change_thresholds": null }, "feat_subset_ndcg": { "exclude_columns": [], "run": true, "min_sample_size": 20, "performance_change_thresholds": null }, "feat_subset_mrr": { "exclude_columns": [], "run": true, "min_sample_size": 20, "performance_change_thresholds": null }, "correlation_drift": { "exclude_columns": [], "run": true, "min_correlation": 0.1, "correlation_thresholds": [ 0.1, 0.2, 0.3 ], "p_value_threshold": 0.05, "max_pairwise_tests": 200 }, "mutual_information_feat_drift": { "exclude_columns": [], "run": true, "min_mutual_information": 0.1, "mutual_information_thresholds": [ 0.1, 0.2, 0.3 ], "max_pairwise_tests": 200, "min_sample_size": 100 }, "mutual_information_label_drift": { "exclude_columns": [], "run": true, "min_mutual_information": 0.1, "mutual_information_thresholds": [ 0.1, 0.2, 0.3 ], "max_pairwise_tests": 200, "min_sample_size": 100 }, "categorical_label_drift": { "run": true, "drift_statistic": "Population Stability Index", "params": { "run": true, "num_values_for_graph": 5, "distance_thresholds": [ 0.2, 0.4, 0.6 ] } }, "multiclass_pred_label_drift": { "run": true, "drift_statistic": "Population Stability Index", "params": { "run": true, "num_values_for_graph": 5, "distance_thresholds": [ 0.2, 0.4, 0.6 ] } }, "regression_label_drift": { "run": true, "p_value_threshold": 0.05, "ks_stat_thresholds": [ 0.1, 0.33, 0.67 ] }, "categorical_drift": { "exclude_columns": [], "run": true, "drift_statistic": "Population Stability Index", "params": { "run": true, "drift_scaling_factor": 0.005, "performance_change_thresholds": null, "min_sample_size": 100, "max_sample_size": null, "distance_threshold": 0.2 } }, "continuous_drift": { "exclude_columns": [], "run": true, "drift_scaling_factor": 0.005, "performance_change_thresholds": null, "drift_statistic": "Population Stability Index", "params": { "run": true, "drift_scaling_factor": 0.005, "performance_change_thresholds": null, "min_sample_size": 100, "min_num_quantiles": 1000, "distance_threshold": 0.2, "num_bins": 100 } }, "overall_metrics": { "run": true, "metrics_specific_thresholds": {} }, "prediction_drift": { "run": true, "drift_statistic": "Population Stability Index", "params": { "run": true, "min_sample_size": 100, "min_num_quantiles": 1000, "psi_thresholds": [ 0.2, 0.4, 0.6 ], "num_bins": 100 } }, "calibration_comparison": { "run": true, "severity_level_thresholds": [ 0.02, 0.06, 0.1 ] }, "global_exclude_columns": null, "global_abnormal_inputs_performance_change_config": null, "global_transformation_performance_change_config": null, "global_drift_scaling_factor": null, "out_of_range_substitution": { "exclude_columns": [], "run": true, "performance_change_config": { "ignore_errors": false, "severity_thresholds": null, "num_samples_to_simulate": 100 }, "std_factor": 3 }, "outlier_substitution": { "exclude_columns": [], "run": true, "performance_change_config": { "ignore_errors": false, "severity_thresholds": null, "num_samples_to_simulate": 100 }, "min_normal_prop": 0.99, "baseline_quantile": 0.1, "perturb_multiplier": 1.0 }, "int_feature_type_change": { "exclude_columns": [], "run": true, "performance_change_config": { "ignore_errors": false, "severity_thresholds": null, "num_samples_to_simulate": 100 } }, "float_feature_type_change": { "exclude_columns": [], "run": true, "performance_change_config": { "ignore_errors": false, "severity_thresholds": null, "num_samples_to_simulate": 100 } }, "str_feature_type_change": { "exclude_columns": [], "run": true, "performance_change_config": { "ignore_errors": false, "severity_thresholds": null, "num_samples_to_simulate": 100 } }, "bool_feature_type_change": { "exclude_columns": [], "run": true, "performance_change_config": { "ignore_errors": false, "severity_thresholds": null, "num_samples_to_simulate": 100 } }, "url_feature_type_change": { "exclude_columns": [], "run": true, "performance_change_config": { "ignore_errors": false, "severity_thresholds": null, "num_samples_to_simulate": 100 } }, "domain_feature_type_change": { "exclude_columns": [], "run": true, "performance_change_config": { "ignore_errors": false, "severity_thresholds": null, "num_samples_to_simulate": 100 } }, "email_feature_type_change": { "exclude_columns": [], "run": true, "performance_change_config": { "ignore_errors": false, "severity_thresholds": null, "num_samples_to_simulate": 100 } }, "empty_string_substitution": { "exclude_columns": [], "run": true, "performance_change_config": { "ignore_errors": false, "severity_thresholds": null, "num_samples_to_simulate": 100 } }, "req_characters_deletion": { "column_specific_params": {}, "exclude_columns": [], "run": true, "performance_change_config": { "ignore_errors": false, "severity_thresholds": null, "num_samples_to_simulate": 100 } }, "unseen_categorical_substitution": { "exclude_columns": [], "run": true, "performance_change_config": { "ignore_errors": false, "severity_thresholds": null, "num_samples_to_simulate": 100 } }, "unseen_domain_substitution": { "exclude_columns": [], "run": true, "performance_change_config": { "ignore_errors": false, "severity_thresholds": null, "num_samples_to_simulate": 100 } }, "unseen_email_substitution": { "exclude_columns": [], "run": true, "performance_change_config": { "ignore_errors": false, "severity_thresholds": null, "num_samples_to_simulate": 100 } }, "unseen_url_substitution": { "exclude_columns": [], "run": true, "performance_change_config": { "ignore_errors": false, "severity_thresholds": null, "num_samples_to_simulate": 100 } }, "null_substitution": { "exclude_columns": [], "run": true, "performance_change_config": { "ignore_errors": false, "severity_thresholds": null, "num_samples_to_simulate": 100 } }, "capitalization_change": { "exclude_columns": [], "run": true, "performance_change_config": { "ignore_errors": false, "severity_thresholds": null, "num_samples_to_simulate": 100 } }, "vulnerability": { "exclude_columns": [], "run": true, "severity_level_thresholds": null, "sample_size": 10, "search_count": 10 }, "sensitivity": { "exclude_columns": [], "run": true, "severity_level_thresholds": null, "linf_constraint": 0.01, "sample_size": 10 }, "multi_feat_vulnerability": { "exclude_columns": [], "run": true, "severity_level_thresholds": null, "l0_constraint": 3, "sample_size": 10, "search_count": 10 }, "multi_feat_sensitivity": { "exclude_columns": [], "run": true, "severity_level_thresholds": null, "l0_constraint": 3, "sample_size": 10, "linf_constraint": 0.01 }, "int_feature_type": { "exclude_columns": [], "run": true, "performance_change_config": { "severity_thresholds": null, "min_num_samples": 10 } }, "float_feature_type": { "exclude_columns": [], "run": true, "performance_change_config": { "severity_thresholds": null, "min_num_samples": 10 } }, "str_feature_type": { "exclude_columns": [], "run": true, "performance_change_config": { "severity_thresholds": null, "min_num_samples": 10 } }, "bool_feature_type": { "exclude_columns": [], "run": true, "performance_change_config": { "severity_thresholds": null, "min_num_samples": 10 } }, "url_feature_type": { "exclude_columns": [], "run": true, "performance_change_config": { "severity_thresholds": null, "min_num_samples": 10 } }, "domain_feature_type": { "exclude_columns": [], "run": true, "performance_change_config": { "severity_thresholds": null, "min_num_samples": 10 } }, "email_feature_type": { "exclude_columns": [], "run": true, "performance_change_config": { "severity_thresholds": null, "min_num_samples": 10 } }, "null_check": { "exclude_columns": [], "run": true, "performance_change_config": { "severity_thresholds": null, "min_num_samples": 10 } }, "null_proportion": { "exclude_columns": [], "run": true, "drift_scaling_factor": 0.005, "performance_change_thresholds": null, "p_value_threshold": 0.05, "min_sample_size": 100 }, "row_null_proportion": { "exclude_columns": [], "run": true, "drift_statistic": "Population Stability Index", "params": { "exclude_columns": [], "run": true, "drift_scaling_factor": 0.005, "performance_change_thresholds": null, "psi_threshold": 0.2 } }, "required_features": { "run": true, "required_feats": null, "allowed_feats": null, "ordered": false, "required_only": false }, "duplicate_rows": { "exclude_columns": [], "run": false }, "feature_leakage": { "exclude_columns": [], "run": true, "severity_thresholds": [ 0.1, 0.2, 0.3 ], "min_mutual_information_requirement": 0.2 }, "demographic_parity": { "exclude_columns": [], "run": true, "min_sample_size": 20, "performance_change_thresholds": null }, "protected_feature_drift": { "exclude_columns": [], "run": true, "drift_statistic": "Chi Squared", "params": { "run": true, "drift_scaling_factor": 0.005, "performance_change_thresholds": null, "min_sample_size": 100, "max_sample_size": null, "p_value_threshold": 0.05 } }, "protected_proxies": { "exclude_columns": [], "run": true, "severity_thresholds": [ 0.2, 0.3, 0.4 ] }, "intersectional_group_fairness": { "exclude_columns": [], "run": true, "min_sample_size": 20, "performance_change_thresholds": null }, "selection_rate": { "exclude_columns": [], "run": true, "min_sample_size": 20, "performance_change_thresholds": [ 0.8, 0.7, 0.6 ] }, "chi_squared_independence": { "run": true, "p_value_thresholds": [ 0.01, 0.05, 0.1 ], "min_sample_size": 100 }, "subset_sensitivity": { "exclude_columns": [], "run": true, "min_sample_size": 20, "performance_change_thresholds": null, "num_samples_to_simulate": 100 } } ```