RI Image Classification Walkthrough

▶️ Try this in Colab! Run the RI Image Classification Walkthrough in Google Colab.

You are a data scientist working for a wildlife research foundation. The data science team has been tasked with implementing an animal classifier and monitoring how that model performs over time. The performance of this model directly impacts the profits of the foundation. In order to ensure the data science team develops the best model and the performance of this model doesn’t degrade over time, the VP of Data Science purchases the RIME platform.

In this Notebook Walkthrough, we will walkthrough 2 of RIME’s core products - AI Stress Testing and AI Continuous Testing.

  1. AI Stress Testing is used in the model development stage. Using AI Stress Testing you can test the developed model. RIME goes beyond simply optimizing for basic model performance like accuracy and automatically discovers the model’s weaknesses.

  2. AI Continuous Testing is used after the model is deployed in production. Using AI Continuous Testing, you can automate the monitoring, discovery and remediation of issues that occur post-deployment.

Install Dependencies, Import Libraries and Download Data

Run the cell below to install libraries to prep data, install our SDK, and load analysis libraries.

[ ]:
!pip install rime-sdk &> /dev/null

[ ]:
from rime_sdk import Client
from pathlib import Path

[ ]:
!pip install https://github.com/RobustIntelligence/ri-public-examples/archive/master.zip

from ri_public_examples.download_files import download_files

download_files("images/classification/awa2", "awa2")

Establish the RIME Client

To get started, provide the API credentials and the base domain/address of the RIME Cluster. You can generate and copy an API token from the API Access Tokens Page under Workspace settings. For the domain/address of the RIME Cluster, contact your admin.

Image of getting an API tokenImage of creating an API token

[ ]:
API_TOKEN = '' # PASTE API_KEY
CLUSTER_URL = '' # PASTE DEDICATED DOMAIN OF RIME SERVICE (e.g., https://rime.example.rbst.io)
AGENT_ID = '' # PASTE AGENT_ID IF USING AN AGENT THAT IS NOT THE DEFAULT

rime_client = Client(CLUSTER_URL, API_TOKEN)

Create a New Project

You can create projects in RIME to organize your test runs. Each project represents a workspace for a given machine learning task. It can contain multiple candidate models, but should only contain one promoted production model.

[ ]:
description = (
    "Run Stress Testing and Continuous Testing on an "
    "image classification model and dataset. Demonstration uses "
    "the Animals with Attributes 2 (AwA2) dataset."
)
project = rime_client.create_project(
    'Image Classification Demo',
    description,
    "MODEL_TASK_MULTICLASS_CLASSIFICATION",
)
project

Go back to the UI or click on the link above to see the Project.

Preparing the Model + Datasets

For this demo, we are going to use the predictions of a image classification model for animals. The dataset we will be using is from Animals With Attributes 2 (AWA2), a benchmarking image dataset that records features and labels for numerous animals in the wild. The model you have trained is a ResNet (resnet18) designed to predict on the images in this diverse dataset.

The model classifies an image into a number of different categories such as -

  1. Sheep

  2. Killer Whale

  3. Monkey

We now want to kick off RIME Stress Tests that will help us evaluated the model in further depth beyond basic performance metrics like accuracy, precision, recall.

Define the Model Interface

We will load and execute this model natively in the RIME product. To do so, a function predict_dict() (or predict_df()) must be implemented in a Python file that is uploaded to the RIME S3 bucket along with the model artifact(s).

For more details on implementing this function, see “Defining a Model Interface” in the product documentation.

The implementation for this ResNet model is shown below:

from typing import Dict, List
from pathlib import Path

import numpy as np
import torch
import torch.nn as nn
from torchvision.io import read_image, ImageReadMode
import torchvision.models as models
import torchvision.transforms as transforms


IMG_SIZE=224
NUM_CLASSES=40
NUM_FEATURES=512
MODEL_FOLDER_PATH = Path(__file__).parent.absolute()


class Net(nn.Module):
    def __init__(self, backbone, features_size, num_classes):
        super(Net, self).__init__()
        # Resnet Backbone (includes avg pooling layer, takes off last FC layer)
        self.features = nn.Sequential(*list(backbone.children())[:-1])
        self.out = nn.Linear(features_size, num_classes)

    def forward(self, inputs):
        """Returns network outputs and the features """
        # put images through ResNet backbone
        img_features = self.features(inputs)
        img_features = torch.flatten(img_features, start_dim=1)
        outputs = self.out(img_features)
        return outputs


backbone = models.resnet18(pretrained=False)
model = Net(backbone, NUM_FEATURES, NUM_CLASSES)
model.load_state_dict(
    torch.load(
        MODEL_FOLDER_PATH / "model.pt",
        map_location=torch.device('cpu')
    )
)
model.eval()
train_mean = [0.485, 0.456, 0.406]
train_std = [0.229, 0.224, 0.225]
img_normalize = transforms.Normalize(mean=train_mean, std=train_std)
transform = transforms.Compose(
    [
        transforms.Resize((IMG_SIZE, IMG_SIZE)),
        transforms.ConvertImageDtype(torch.float),
        img_normalize,
    ]
)


def predict_dict(x: dict) -> np.ndarray:
    """Predicts on datapoint."""
    with torch.no_grad():
        image = transforms.ToTensor()(x["image_path"])
        image = transform(image)
        image = torch.unsqueeze(image, 0)
        output = model.forward(image)
        probs = torch.squeeze(torch.softmax(output, dim=1))
    return np.array(probs)

Uploading Artifacts to Blob Storage

For SaaS environments using the default S3 storage location, the Python SDK supports direct file uploads using upload_*().

For other environments and storage technologies, artifacts must be managed through alternate means.

[ ]:
IS_SAAS = False # TOGGLE True/False (Note: SaaS environments use URLs ending in "rbst.io" and have an "Internal Agent")

[ ]:
if not IS_SAAS:
    BLOB_STORE_URI = "" # PROVIDE BLOB STORE URI (e.g., "s3://acmecorp-rime")
    assert BLOB_STORE_URI != ""

UPLOAD_PATH = "ri_public_examples_awa2"

This file and the pre-trained model artifact should be uploaded to the S3 bucket in a directory, like so:

s3://acmecorp-rime/
└── ri_public_examples_awa2/
    └── models/
        ├── awa2_cpu.py # The model interface.
        └── model.pt    # The pre-trained model artifact.
[ ]:
if IS_SAAS:
    model_directory = rime_client.upload_directory(
        Path("awa2/models"), upload_path=UPLOAD_PATH
    )
    model_path = model_directory + "/awa2_cpu.py"
else:
    model_path = f"{BLOB_STORE_URI}/{UPLOAD_PATH}/models/awa2_cpu.py"

Prepare the Datasets

Datasets

Datasets used for Image Classification are treated similarly to those for other multi-class classification tasks; however, because these models require loading many separate image files, an index file is used to define how each image file can be accessed.

This is a JSON list with the following schema:

[
  {
    "image_path": "s3://acmecorp-rime/ri_public_examples_awa2/data/JPEGImages/rabbit_10567.jpg",
    "label": 22
  },
  {
    "image_path": "s3://acmecorp-rime/ri_public_examples_awa2/data/JPEGImages/gorilla_10007.jpg",
    "label": 15
  },
  ...
]
  • image_path: The remote path (S3 URI in this case) to the actual image file.

  • label: The index corresponding to the correct classification for this image.

Each index file represents a single dataset, so two index files should be made: one to represent the train set and another for the test set.

Prediction Logs (Recommended)

Prediction logs for image classification can be provided as JSON lists as well, taking the following schema:

[
  [
    1.1104089002456075e-11,
    3.6050631924133825e-10,
    4.1508294722951724e-20,
    7.330719148032627e-13,
    9.293689337846649e-18,
    3.868183646115429e-14,
    1.704536202851159e-06,
    2.4145534558567857e-20,
    2.8761520942666152e-11,
    1.7608596601225573e-13,
    4.269430989300993e-11,
    1.116736217388059e-11,
    7.042922902655846e-07,
    9.716905014986454e-13,
    1.7269354671120438e-18,
    6.485594957761229e-17,
    7.252261366040482e-15,
    2.3613219823914733e-08,
    4.920715087296506e-15,
    3.4614554078545067e-15,
    6.401289276425359e-09,
    1.0357405715423831e-13,
    0.999997615814209,
    2.574246371622735e-10,
    2.507005291297465e-13,
    1.0043908760248854e-10,
    8.996452888210271e-11,
    1.1165354862896493e-08,
    5.0673867879602597e-11,
    4.4046667355135405e-13,
    7.908640749394995e-16,
    2.0289467883571888e-08,
    9.762042285643702e-10,
    6.997168686476152e-13,
    5.562191263130956e-10,
    6.239210708830989e-23,
    4.135972814234279e-12,
    9.318691202733309e-16,
    1.6564650431887982e-15,
    4.727699351794734e-13
  ],
  ...
]

Each value corresponds to a prediction for the ``nth`` class, where ``n`` corresponds to the index of the class.

(e.g., in the above list, the model has predicted the class corresponding to index 22: a "rabbit")

All datasets for this example should be uploaded to the RIME S3 bucket in a directory, like so:

s3://acmecorp-rime/
└── ri_public_examples_awa2/
    └── data/
        ├── train_inputs_trial.json   # Index file for train set.
        ├── train_preds_trial.json    # Prediction logs for train set.
        ├── test_inputs_trial.json    # Index file for test set.
        ├── test_preds_trial.json     # Prediction logs for test set.
        └── JPEGImages/               # Directory of raw image files.
            ├── rabbit_10567.jpg
            ├── gorilla_10007.jpg
            └── ...
[ ]:
if IS_SAAS:
    ref_inputs_local_path = "awa2/data/train_inputs_trial.json"
    eval_inputs_local_path = "awa2/data/test_inputs_trial.json"
    _, ref_inputs_path = rime_client.upload_local_image_dataset_file(
        ref_inputs_local_path, ["image_path"], upload_path=UPLOAD_PATH
    )
    _, eval_inputs_path = rime_client.upload_local_image_dataset_file(
        eval_inputs_local_path, ["image_path"], upload_path=UPLOAD_PATH
    )

    ref_preds_path = rime_client.upload_file(
        Path("awa2/data/train_preds_trial.json"), upload_path=UPLOAD_PATH
    )
    eval_preds_path = rime_client.upload_file(
        Path("awa2/data/test_preds_trial.json"), upload_path=UPLOAD_PATH
    )
else:
    # The reference and evaluation dataset index files.
    ref_inputs_path = f"{BLOB_STORE_URI}/{UPLOAD_PATH}/data/train_inputs_trial.json"
    eval_inputs_path = f"{BLOB_STORE_URI}/{UPLOAD_PATH}/data/test_inputs_trial.json"

    # Prediction logs for the reference and evaluation datasets.
    ref_preds_path = f"{BLOB_STORE_URI}/{UPLOAD_PATH}/data/train_preds_trial.json"
    eval_preds_path = f"{BLOB_STORE_URI}/{UPLOAD_PATH}/data/test_preds_trial.json"

Register the Artifacts

[ ]:
class_names = [
    "antelope",
    "grizzly+bear",
    "killer+whale",
    "beaver",
    "dalmatian",
    "horse",
    "german+shepherd",
    "blue+whale",
    "siamese+cat",
    "skunk",
    "mole",
    "tiger",
    "moose",
    "spider+monkey",
    "elephant",
    "gorilla",
    "ox",
    "fox",
    "sheep",
    "hamster",
    "squirrel",
    "rhinoceros",
    "rabbit",
    "bat",
    "giraffe",
    "wolf",
    "chihuahua",
    "weasel",
    "otter",
    "buffalo",
    "zebra",
    "deer",
    "bobcat",
    "lion",
    "mouse",
    "polar+bear",
    "collie",
    "walrus",
    "cow",
    "dolphin",
]

data_info = {
    "image_features": ["image_path"],
    "label_col": "label",
    "class_names": class_names,
}
[ ]:
from datetime import datetime

dt = str(datetime.now())

# All registered resources need to have unique names, so we append the current
# timestamp in case this notebook is rerun.
model_id = project.register_model_from_path(f"model_{dt}", model_path, agent_id=AGENT_ID)
[ ]:
ref_id = project.register_dataset_from_file(
    f"ref_set_{dt}",
    ref_inputs_path,
    data_info,
    agent_id=AGENT_ID
)
eval_id = project.register_dataset_from_file(
    f"eval_set_{dt}",
    eval_inputs_path,
    data_info,
    agent_id=AGENT_ID
)
[ ]:
project.register_predictions_from_file(
    ref_id, model_id, ref_preds_path, agent_id=AGENT_ID
)
project.register_predictions_from_file(
    eval_id, model_id, eval_preds_path, agent_id=AGENT_ID
)

Run a Stress Test

AI Stress Tests allow you to test your data and model before deployment. They are a comprehensive suite of hundreds of tests that automatically identify implicit assumptions and weaknesses of pre-production models. Each stress test is run on a single model and its associated reference and evaluation datasets.

Below is a sample configuration of how to setup and run a RIME Stress Test for Images.

[ ]:
stress_test_config = {
    "run_name": "Image Classification AWA2",
    "data_info": {
        "ref_dataset_id": ref_id,
        "eval_dataset_id": eval_id,
    },
    "model_id": model_id,
    "categories": [
        "TEST_CATEGORY_TYPE_TRANSFORMATIONS",
        "TEST_CATEGORY_TYPE_ADVERSARIAL",
        "TEST_CATEGORY_TYPE_SUBSET_PERFORMANCE",
        "TEST_CATEGORY_TYPE_DRIFT",
    ]
}
stress_job = rime_client.start_stress_test(stress_test_config, project.project_id, agent_id=AGENT_ID)
stress_job.get_status(verbose=True, wait_until_finish=True)

Analyze the Stress Test Results

Stress tests are grouped first by risk categories and then into categories that measure various aspects of model robustness (subset performance, distribution drift, adversarial, transformations). Key findings to improve your model are aggregated on the category level as well. Tests are ranked by default by a shared severity metric. Clicking on an individual test surfaces more detailed information.

You can view the detailed results in the UI by running the below cell and redirecting to the generated link. This page shows granular results for a given AI Stress Test run.

[ ]:
test_run = stress_job.get_test_run()
test_run

Analyzing the Results

Below you can see a snapshot of the results. Some of these tests such as the Subset Performance Tests analyze how your model performs on different groups properties related to your data, while others such as Transformations Tests analyze how your model reacts to augmented and perturbed images.

Image of stress tests results for an image classification model

Subset Performance Tests

Here are the results of the Subset Performance tests. These tests can be thought as more detailed performance tests that identify subsets of underperformance in your images metadata. These tests help ensure that the model works equally well across different styles of images.

Image of subset performance results for an image classification model

Below we are exploring the “Subset F1 score” test cases for the image metadata feature ImageBrightness. We can see that even though the model has an overall F1 score of 0.52, it performs poorly on images at the tails of the brightness distribution - images that are either very dim or very bright.

Image of subset results for an image brightness feature of an image classification model

Transformation Tests

The results of the transformation tests are below. These tests can be thought as ways to test your models response to augmented image data, which can often occur in reality. They help to make sure that your model is invariant to such changes in your data.

Image of subset results for an image brightness feature of an image classification model

Programatically Query the Results

RIME not only provides you with an intuitive UI to visualize and explore these results, but also allows you to programmatically query these results. This allows customers to integrate with their MLOps pipeline, log results to experiment management tools like MLFlow, bring automated decision making to their ML practicies, or store these results for future references.

Run the below cell to programmatically query the results. The results are outputed as a pandas dataframe.

To access results at the a test run overview level:

[ ]:
test_run_result = test_run.get_result_df()
test_run_result.to_csv("AWA2_Test_Run_Results.csv")
test_run_result

To access detailed test results at each individual test cases level:

[ ]:
test_case_result = test_run.get_test_cases_df()
test_case_result.to_csv("AWA2_Test_Case_Results.csv")
test_case_result

Deploy to Production and set up Continuous Testing

Assume that the image classification model has been in production for the past week, and production data and predictions have been collected for the past two weeks.

Now, we can use Continuous Testing to track how the model performed during this time!

[ ]:
from datetime import timedelta

ct_instance = project.create_ct(model_id, ref_id, timedelta(days=1))
ct_instance

Prepare an Incremental Batch of Data

Data used in Continuous Testing follows the same formatting guidelines as indicated in Prepare the Datasets above, with the addition of a "timestamp" key to the JSON index file to indicate when the data was received:

[
  {
    "image_path": "s3://acmecorp-rime/ri_public_examples_awa2/data/JPEGImages/antelope_10453.jpg",
    "label": 0,
    "timestamp": "2022-03-01"
  },
  {
    "image_path": "s3://acmecorp-rime/ri_public_examples_awa2/data/JPEGImages/rhinoceros_10475.jpg",
    "label": 21,
    "timestamp": "2022-03-01"
  },
  ...
]

An incremental dataset for this example should be uploaded to the S3 bucket in a directory, like so:

s3://acmecorp-rime/
└── ri_public_examples_awa2/
    └── data/
        ├── ...
        ├── test_inputs_monitoring_trial.json  # Index file for the incremental batch.
        ├── monitoring_preds_trial.json        # Prediction logs for the incremental batch.
        └── JPEGImages/                        # Directory of raw image files.
            ├── rabbit_10567.jpg
            ├── gorilla_10007.jpg
            └── ...
[ ]:
if IS_SAAS:
    monitoring_inputs_local_path = "awa2/data/test_inputs_monitoring_trial.json"
    _, monitoring_inputs_path = rime_client.upload_local_image_dataset_file(
        monitoring_inputs_local_path, ["image_path"], upload_path=UPLOAD_PATH
    )
    monitoring_preds_path = rime_client.upload_file(
        Path("awa2/data/monitoring_preds_trial.json"), upload_path=UPLOAD_PATH
    )
else:
    # The index file for the incremental batch.
    monitoring_inputs_path = f"{BLOB_STORE_URI}/{UPLOAD_PATH}/data/test_inputs_monitoring_trial.json"

    # Prediction logs for the incremental batch.
    monitoring_preds_path = f"{BLOB_STORE_URI}/{UPLOAD_PATH}/data/monitoring_preds_trial.json"

Register the Artifacts

[ ]:
monitoring_id = project.register_dataset_from_file(
    f"monitoring_set_{dt}",
    monitoring_inputs_path,
    {
        "image_features": ["image_path"],
        "label_col": "label",
        "class_names": class_names,
        "timestamp_col": "timestamp" # Indicates the name of the timestamp key in the JSON schema.
    },
    agent_id=AGENT_ID
)

project.register_predictions_from_file(
    monitoring_id, model_id, monitoring_preds_path, agent_id=AGENT_ID
)

Run a Continuous Test

[ ]:
ct_job = ct_instance.start_continuous_test(
    monitoring_id,
    override_existing_bins=True,
    agent_id=AGENT_ID
)

ct_job.get_status(verbose=True, wait_until_finish=True)

Wait for a couple minutes and your results will appear in the UI

Analyze the Continuous Test Results

The Continuous Tests operate at the batch level and provide a mechanism to monitor the health of ML deployments in production. They allow the user to understand when errors begin to occur and surface the underlying drivers of such errors.

You can explore the results in the UI by running the below cell and redirecting to the generated link

[ ]:
ct_instance