Data Configuration
Configuring a data source can be done by specifying a mapping in the main RIME JSON configuration file, under the
data_info argument.
NOTE: for AI Continuous Testing, predictions are required. Either pred_col must be specified or ref_pred_path and eval_pred_path must be specified.
Template
{
    "data_info": {
        "ref_path": "path/to/ref.csv",        (REQUIRED)
        "eval_path": "path/to/eval.csv",      (REQUIRED)
        "label_col": "Label",
        "pred_col": "Prediction",             (WORKS FOR ALL TASKS EXCEPT MULTI-CLASS, REQUIRED FOR CONTINUOUS TESTING)
        "ref_pred_path": "path/to/ref/preds.csv",  (ONLY SPECIFY FOR MULTI-CLASS, REQUIRED FOR CONTINUOUS TESTING)
        "eval_pred_path": "path/to/eval/preds.csv",  (ONLY SPECIFY FOR MULTI-CLASS, REQUIRED FOR CONTINUOUS TESTING)
        "nrows": null,
        "categorical_features": null,
        "loading_kwargs": null,
        "ranking_info": null,
        "protected_features": null
    },
    ...
}
Arguments
ref_path: string, requiredPath to reference data file.
eval_path: string, requiredPath to evaluation data file.
label_col: string or null, default =nullName of column in data that corresponds to the labels.
pred_col: string or null, default =nullName of column in data that corresponds to the predictions.
ref_pred_path: string or null, default =nullPath to a csv or parquet file containing the predictions on the reference dataset. This is how predictions are specified for multi-class models.
eval_pred_path: string or null, default =nullPath to a csv or parquet file containing the predictions on the evaluation dataset. This is how predictions are specified for multi-class models.
nrows: int or null, default =nullNumber of rows of data to load and test. If
null, will load all rows. By default isnull.categorical_features: list or null, default =nullList of categorical features in data. If provided, these should be ALL the categorical features. If
null, RIME will automatically determine whether a column is categorical or not. By default isnull.loading_kwargs: mapping, default =nullKeyword arguments to be passed to the
pandasloading function (eitherpd.read_csvorpd.read_parquet, depending on your data format). NOTE: if you wish to specifynrows, this should NOT be done with these kwargs but rather with thenrowsparameter above.ranking_info: mapping, default =nullArguments to be used for Ranking tasks. If you are not running RIME on a Ranking task this value should be null. If you are running on a Ranking task, the following keys should be provided:
query_col: string, requiredName of column in dataset that contains the query ids.
nqueries: int or null, default =nullNumber of queries to consider when running RIME. If
null, will use all queries.nrows_per_query: int or null, default =nullNumber of rows to use per query when running RIME. If
null, will use all rows.drop_query_id: bool, default = TrueWhether to drop the query ID column from the dataset to avoid passing as a feature to the model.
protected_features: list or null, default =nullList of protected features in data. If
Compliancecategory is added tocategoriesin the test config (see TestSuiteConfig(), andprotected_featuresare included - a set of compliance tests will be run over the protected features.