Updating your Continuous Test
In this Notebook walkthrough, we will show how to update a Continuous Test after it has been deployed to production. The Continuous Test can be updated live to account for many service changes, such as modifying the reference dataset and upgrading the model, or configuring individual tests.
Latest Colab version of this notebook available here
Install dependencies
[ ]:
!pip install rime-sdk &> /dev/null
!pip install https://github.com/RobustIntelligence/ri-public-examples/archive/master.zip
[ ]:
from pathlib import Path
from tempfile import TemporaryDirectory
from typing import List
import pandas as pd
from ri_public_examples.download_files import download_files
from rime_sdk import Client
Download and prep data
[ ]:
download_files('tabular-2.0/fraud', 'fraud')
ct_data = pd.read_csv("fraud/data/fraud_incremental.csv")
ct_data[:len(ct_data)//2].to_csv("fraud/data/fraud_incremental_0.csv", index=False)
ct_data[len(ct_data)//2:].to_csv("fraud/data/fraud_incremental_1.csv", index=False)
ct_preds = pd.read_csv("fraud/data/fraud_incremental_preds.csv")
ct_preds[:len(ct_preds)//2].to_csv("fraud/data/fraud_incremental_0_preds.csv", index=False)
ct_preds[len(ct_preds)//2:].to_csv("fraud/data/fraud_incremental_1_preds.csv", index=False)
Instantiate RIME client and create project
[ ]:
API_TOKEN = '' # PASTE API_KEY
CLUSTER_URL = '' # PASTE DEDICATED DOMAIN OF RIME SERVICE (eg: rime.stable.rbst.io)
AGENT_ID = '' # PASTE AGENT_ID IF USING AN AGENT THAT IS NOT THE DEFAULT
[ ]:
client = Client(CLUSTER_URL, API_TOKEN)
[ ]:
description = (
"Create a Continuous Test and update the configuration after it is deployed to production."
" Demonstration uses a tabular binary classification dataset"
" and model that simulates credit card fraud detection."
)
project = client.create_ct(
"Continuous Testing Configuration Demo",
description,
"MODEL_TASK_BINARY_CLASSIFICATION"
)
Upload data to S3 and register dataset and prediction set
[ ]:
from datetime import datetime
dt = str(datetime.now())
# Note: models and datasets need to have unique names.
model_id = project.register_model(f"fraud_model_{dt}", None, agent_id=AGENT_ID)
[ ]:
upload_path = "ri_public_examples_fraud"
def upload_and_register_data(dataset_name, **kwargs):
dt = str(datetime.now())
s3_path = client.upload_file(
Path(f'fraud/data/fraud_{dataset_name}.csv'), upload_path=upload_path
)
preds_s3_path = client.upload_file(
Path(f"fraud/data/fraud_{dataset_name}_preds.csv"), upload_path=upload_path
)
dataset_id = project.register_dataset_from_file(
f"{dataset_name}_dataset_{dt}", s3_path, data_params={"label_col": "label", **kwargs}, agent_id=AGENT_ID
)
project.register_predictions_from_file(
dataset_id, model_id, preds_s3_path, agent_id=AGENT_ID
)
return dataset_id
ref_data_id = upload_and_register_data("ref")
Create a Continuous Test
[ ]:
from datetime import timedelta
ct = project.create_ct(model_id, ref_data_id, timedelta(days=1))
ct
Run Continuous Testing on a batch of production data
[ ]:
ct_data_0_id = upload_and_register_data("incremental_0", timestamp_col="timestamp")
ct_job = ct.start_continuous_test(ct_data_0_id, agent_id=AGENT_ID)
ct_job.get_status(verbose=True, wait_until_finish=True)
Update the Reference Dataset
Suppose a week has passed, and we have updated your model by retraining on new data. We want to update our deployed Continuous Test to reflect the new reference dataset.
[ ]:
new_ref_data_id = upload_and_register_data("eval")
# Update configuration based on the new stress test run
ct.update_ct(ref_data_id=new_ref_data_id)
# The new stress test run will now be highlighted to reflect the update
project
Run Continuous Testing on the latest batch of production data This time using the updated reference set as the baseline against which the production data is compared.
[ ]:
ct_data_1_id = upload_and_register_data("incremental_1", timestamp_col="timestamp")
ct_job = ct.start_continuous_test(ct_data_1_id, override_existing_bins=True, agent_id=AGENT_ID)
ct_job.get_status(verbose=True, wait_until_finish=True)