Hugging Face Datasets
RIME now offers a data provider that natively integrates with Hugging Face’s load_dataset API. To use this functionality, specify "type": "huggingface" in the data_info configuration arg and provide the target dataset tag.
Data Info Configuration
{
"data_info": {
"type": "huggingface", (REQUIRED)
"dataset_uri": "path", (REQUIRED)
"ref_split": "train",
"eval_split": "test",
"text_key": "text",
"label_key": "label",
"eval_label_key": "label",
"loading_params": {}
},
...
}
Arguments
dataset_uri: string, requiredThe path or name of the dataset.
ref_split: string, default ="train"The name of the split to use for the reference set.
eval_split: string, default ="test"The name of the split to use for the evaluation set.
text_key: string, default ="text"The name of the feature holding the input text.
label_key: string, default ="label"The name of the feature holding the classification label in the reference set. If
null, assume labels are not provided for this dataset.label_key: string or null, default ="label"The name of the feature holding the classification label in the evaluation set. If
null, assume labels are not provided for this dataset.loading_params: string or null, default =nullAdditional kwargs passed to
load_dataset. This can help e.g., specify a dataset configuration name (if multiple are available.)