RI NYC Taxi and Limousine Data Walkthrough 🚖

In this walkthrough, we’ll run AI Stress Testing and AI Continuous Testing on public NYC Taxi and Limousine Commission data (https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page) to demonstrate how RIME can be used with regression models. This data consists of information such as the pickup and dropoff locations, pickup and dropoff times, fare amounts and the number of passengers for every taxi trip that happens in New York City. We’ll be predicting the duration of each trip given a bunch of other information about the trip.

As you might imagine, the COVID-19 pandemic caused a significant change in the number and nature of the taxi rides that occur in New York City. We’ve included data from 2018 to 2021 for this walkthrough to demonstrate how AI Continuous Testing can help you identify and understand such distribution drifts.

Latest Colab version of this notebook available here

To get started, provide the API credentials and link to the backend of RIME to connect the instance.

[ ]:

API_TOKEN = '' # PASTE API_KEY RECIEVED IN EMAIL
CLUSTER_URL = '' # PASTE DEDICATED BACKEND ENDPOINT

Libraries 📕

Run the cell below to install libraries to receive data, install our SDK, and load analysis libraries.

[ ]:

!pip install rime-sdk &> /dev/null

[ ]:

import pandas as pd
from pathlib import Path
from rime_sdk import Client

Data and Model ☁️

Run the cell below to download and unzip a preprocessed dataset and pretrained model based on public NYC Taxi and Limousine Commission data.

[ ]:

!pip install git+https://github.com/RobustIntelligence/ri-public-examples.git
from ri_public_examples.download_files import download_files
download_files('tabular-2.0/nyc_tlc', 'nyc_tlc')

Next, let’s take a quick look at the reference data (in this case, this was the data used to train the model).

[ ]:

pd.read_csv("nyc_tlc/data/ref.csv", nrows=5)

The key columns to look at above are the TripDuration, the duration of the trip in seconds, and Prediction, our model’s estimate of the duration of the trip. The other columns are features used by the model to help predict the trip duration. We’ll now proceed to run RIME Stress Testing on our data and model! We’ll start by creating a project and uploading our datasets and model.

[ ]:

client = Client(CLUSTER_URL, API_TOKEN)

[ ]:

description = (
    "Run Stress Testing, Continuous Testing and AI Firewall on a"
    " tabular regression model and dataset. Demonstration uses the"
    " NYC Taxi and Limousine Commission trip duration dataset"
    " (https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page)."
)
project = client.create_project(
    'Tabular Regression Demo',
    description,
    "MODEL_TASK_REGRESSION",
)

[ ]:

from datetime import datetime

# Note: All registered models and datasets need to have unique names.
dt = str(datetime.now())
upload_path = "ri_public_examples_nyc_tlc"

model_s3_path = client.upload_directory(Path("nyc_tlc/models"), upload_path=upload_path)
model_id = project.register_model_from_path(f"model_{dt}", model_s3_path + "/model.py")

def upload_and_register_data(dataset, **kwargs):
    s3_path = client.upload_file(Path(f"nyc_tlc/data/{dataset}.csv"), upload_path=upload_path)
    dataset_id = project.register_dataset_from_file(f"{dataset}_{dt}", s3_path, {"label_col": "TripDuration", **kwargs})
    preds_s3_path = client.upload_file(Path(f"nyc_tlc/data/{dataset}_preds.csv"), upload_path=upload_path)
    project.register_predictions_from_file(dataset_id, model_id, preds_s3_path)
    return dataset_id

ref_id = upload_and_register_data("ref")
eval_id = upload_and_register_data("eval")

AI Stress Testing

Next, we’ll create a stress testing configuration specifying relevant metadata for our datasets and model and run stress testing! When running stress testing, the reference data should be data used to train the model, and the evaluation data should be data used to evaluate the model. In this case, the reference and evaluation datasets are random splits of the NYC TLC data collected from 2018.

[ ]:

stress_test_config = {
  "run_name": "NYC TLC",
  "data_info": {
    "ref_dataset_id": ref_id,
    "eval_dataset_id": eval_id,
  },
  "model_id": model_id
}
stress_test_job = client.start_stress_test(
    stress_test_config,
    project.project_id,
)
stress_test_job.get_status(verbose=True, wait_until_finish=True)

You can view the detailed results in the UI by running the below cell and redirecting to the generated link. This page shows granular results for a given AI Stress Test run.

[ ]:

test_run = stress_test_job.get_test_run()
test_run

Stress testing should be used during model development to inform us about various issues with the data and model that we might want to address before the model is deployed.

AI Firewall and Continuous Testing

In this walkthrough, we’ll be focusing on the production setting where we’ve deployed a model and would like to ensure that it continues to perform well as the underlying data drifts and evolves. For this we’ll need to create an AI Firewall that will allow us to perform AI Continuous Testing. Run the following snippet to create an AI Firewall and set up continuous testing to split the data into 4 week bins.

[ ]:

from datetime import timedelta

firewall = project.create_firewall(model_id, ref_id, timedelta(weeks=4))

Our firewall is ready to go! Next, we’ll upload some incoming production data. The data we’re uploading here is from 2019 through to 2021, which will look substantially different from what the model saw in its training data from 2018 (the data will be automatically split into pieces based on the timestamps specified and the bin size set for this AI Firewall). We’ll use AI Continuous Testing to identify the differences and understand how they’re impacting our model.

[ ]:

test_id = upload_and_register_data("test", timestamp_col="PickupDatetime")

[ ]:

ct_job = firewall.start_continuous_test(test_id)
ct_job.get_status(verbose=True, wait_until_finish=True)
firewall

Time to see how our model is doing! Navigate to the “Continuous Tests” tab on the left nav. Here, you’ll see some of the key metrics that are being tracked over time along with active monitoring alerts.

If we head to the “Operational Risk” page we can see all of the metrics and alerting related to the operational health of our model and data pipelines. In the “Model Performance” category (available when labels are provided) we can take a look at the Mean Absolute Error plotted over time and we see a significant increase in this error around the spring of 2020 when the COVID-19 pandemic caused a substantial decrease in the number of taxi rides in New York City.

If labels weren’t available (as is often the case with production data) we could look at changes in prediction distributions as a proxy for changes in model performance.

We can also inspect the input data for abnormal values. The overall abnormality rate shows the percent of abnormal inputs (for all types of abnormalities including outliers, missing values, unseen categories) over time. We can see that there’s a spike in the abnormality rate in November of 2019 after which it remains high. Further exploration reveals that much of this comes from Numeric Outlier values in the TripDistance feature.