Input Data Format

Automated Validation

The Python SDK exposes a command-line utility that can automatically validate your input data:

rime-data-format-check <ARGS>

Inspecting <REFERENCE_SET>
Done!

Inspecting <EVALUATION_SET>
Done!

---

Your data should work with RIME!

Instructions are available here.

Supported File Formats

RIME Tabular currently supports both CSV (.csv) and Parquet (.parquet) file formats, with task-specific nuances defined below. Input files should have header columns in string format — these will be used as feature names.

RIME is most effective when both label and prediction column are provided; however, neither are required for most tasks*.

Requirements By Task

Binary Classification

Labels should be integer values 0 or 1
Predictions should be float values between 0 and 1 that represent the positive class (label = 1) probability

Multi-Class Classification

Labels should be integers referring to class index
Predictions should be uploaded as a separate .csv or .parquet file. Columns should be ordered, with the ith column representing the probability of the ith class. Predictions should sum to 1.

Ranking

* Labels are required
Labels should be any real number
Predictions should be any real number
ranking_info must be provided in the data configuration

Regression

Labels should be any real number
Predictions should be any real number