Scheduled Testing Feedback and Observability
Once a model is in production, Robust Intelligence can provide detailed information on the model’s performance to enable you to identify and correct issues.
A model under testing displays summary information about model health on the overview panel of the project that contains the model. Robust Intelligence monitors machine learning model performance across the following three risk categories:
Operational tests a model’s overall performance and data stability of over time.
Security tests a model’s resilience against compromise from external attacks.
Fairness tests a model’s outcome for fair treatment among subsets in the data.
Viewing a stress test schedule
Sign in to the Robust Intelligence UI. The Workspaces page appears.
Click a workspace. The Workspaces summary page appears.
Select a project.
You can filter or sort the list of projects in a workspace with the Sort and Filter controls in the upper right. Click the glyph to the right of the Filter control to switch between list and card display for projects. Type a string in Search Projects… to display only projects that match the string.
The Production panel on the right shows the Schedule ID and Schedule Status for the test. If no schedule is configured, see Scheduled Stress Testing.
Viewing the CT risk pages
Sign in to the Robust Intelligence UI. The Workspaces page appears.
Click a workspace. The Workspaces summary page appears.
Select a project.
Select the Continuous Testing tab in the left panel. This shows the Continuous Testing overview panel for the project you selected.
Continuous Testing overview panel
The Continuous Testing overview panel shows, for this project:
a graph of test results over time; and
a list of events list summarizing the results of each test run with respect to measures of Operational, Security, and Fairness risk.
Click the Filter button at the upper right to help find an event. See the next section for help with filters.
Click on a row to inspect an event. This displays the inspection page for the continuous testing run, including a list of all tests that ran. Click on a test to see its measures, Key Insights, and any features that were flagged in this test run.
Filter events in the Continuous Testing overview panel
Use the Filter button at the upper right to help find an event. See the filter criteria explanations, below.
Filter | Description | Example states |
---|---|---|
Bin date | Show only events generated from data points whose dates fall within the range you specify. | Last 7 days |
Run date | Show only events generated from test runs that were completed within the range you specify. | Last 7 days |
Model | Model name | "My image classification" |
Status | Run status of the continuous test run. | Completed, Failed, In Progress |
Operational | Operational risk result of the test run. | Alert, Warning, Pass |
Fairness | Fairness risk result of the test run. | Alert, Warning, Pass |
Security | Security risk result of the test run. | Alert, Warning, Pass |
Tests
Abnormal inputs test
Abnormal inputs monitor | Description |
---|---|
Unseen Categorical | Tests the models response to data that contains categorical values that are never observed in the reference dataset. |
Rare Categories | Tests the models response to data that contains categorical values that are rarely observed in the reference dataset. |
Numeric Outliers | Tests the models response to data that contains numeric values outside the typical range for that feature in the reference dataset. |
Abnormality Rate | Tests the total percent of rows with any abnormalities. |
Feature Type Check - Count | Tests the number of feature values that are of the incorrect type. |
Null Check - Count | Tests the number of null values for features that do no have nulls in the reference dataset. |
Empty String - Count | Tests the number of empty or null strings for each string feature. |
Capitalization - Count | Tests the number of string values that are capitalized differently from those observed in the reference set. |
Inconsistencies - Count | Tests the number of data points with pairs of feature values that are inconsistent with each other. |
Required Characters - Count | Tests the number of required characters in feature values. |
Security risk
Tests for security risk assess the security of the model and underlying dataset, providing alerts in cases of model evasion or subversion.
Security risk monitor | Description |
---|---|
Data Poisoning | Tests for corrupted input data. |
Model Evasion | Tests for adversarial evasion attacks. |
Fairness and Compliance risk
Tests for fairness and compliance risk assess a model’s outcome for fair treatment among subcategories in the data.
Fairness monitor | Description |
---|---|
Intersectional Group Fairness | Tests for changes in the model performance over different slices of data from the intersection of two protected features. |
Positive Prediction Rate | Tests whether the model's positive prediction rate differs significantly across different subsets of protected features. |
Predictive Equality | Test whether model performance differs significantly across different subsets of protected features. |
Equal Opportunity Recall | Tests whether model recall differs significantly across different subsets of protected features. |
Class imbalance | Test if any subsets of a feature have high class imbalance bias as a result of having a significantly smaller sample size. |
Demographic Parity | Tests how the model performance over subsets of protected features compare to the subset with highest performance. |
Protected Feature Drift | Tests the change in distribution of protected features. |
Model monitors
Monitors are an optional Robust Intelligence feature that work with continuous tests to alert you when a test fails. Learn more about optional Monitors.