# Improving Model Performance

{{ rime_library_setup_note }}

## Overview

This tutorial will guide you through various ways to use RIME to improve model performance. For more information, please see [the corresponding reference](../reference/performance.rst).

## Preprocessing

The RIME Python Library offers standard preprocessing methods to automatically handle tasks like mapping categorical values to numbers or imputing null values.

To use preprocessing, simply import it and wrap your existing DataFrame:
```python
from rime.tabular.performance import preprocess_df
preprocessed_df = preprocess_df(df)
```

## Active Learning
RIME also contains some functionality for "active learning".

Concretely, RIME can take in an unlabeled dataset and model(s) and suggest points that would be high value to label.

To use this functionality, let's first import the relevant functions:

```python
from rime.tabular.performance import single_model_active_learning, two_model_active_learning
```

We can then get N points that would be high value to label:
```python
N = 10
indices_to_label = single_model_active_learning(df, model_wrapper, N)
```

This will give us the indices of the original dataframe that are high value to label.

We also expose some functionality to do this utilizing two models. This can be useful if you have two versions of a model (either trained over the same dataset or different slices). If we have a second container (`container_2`) we can then do:

```python
model_wrapper_2 = container_2.model.base_model
indices_to_label = two_model_active_learning(df, model_wrapper, model_wrapper_2, N)
```

This, like the previous function, will return some indices that are high value to label.

Note that for both these algorithms there is some randomness involved, and if you want to get deterministic results make sure to pass the `seed` parameter.