Input Data Format

Automated Validation

The Python SDK exposes a command-line utility that can automatically validate your input data:

rime-data-format-check <ARGS>


Inspecting <REFERENCE_SET>
Done!

Inspecting <EVALUATION_SET>
Done!


---


Your data should work with RIME!

Instructions are available here.


Supported File Formats

RIME Tabular currently supports both CSV (.csv) and Parquet (.parquet) file formats, with task-specific nuances defined below. Input files should have header columns in string format — these will be used as feature names.

RIME is most effective when both label and prediction column are provided; however, neither are required for most tasks*.

Requirements By Task

Binary Classification

  • Labels should be integer values 0 or 1

  • Predictions should be float values between 0 and 1 that represent the positive class (label = 1) probability

Multi-Class Classification

  • Labels should be integers referring to class index

  • Predictions should be uploaded as a separate .csv or .parquet file. Columns should be ordered, with the ith column representing the probability of the ith class. Predictions should sum to 1.

Ranking

  • * Labels are required

  • Labels should be any real number

  • Predictions should be any real number

  • ranking_info must be provided in the data configuration

Regression

  • Labels should be any real number

  • Predictions should be any real number