AI Stress Testing

This tutorial will get you started using the RIME NLP CLI to stress test a Text Classification model using the CARER Emotion Recognition Dataset.

Setup

Please ensure that the extra RIME NLP dependencies have been installed from the nlp_requirements.txt file from installation. If you run into a ModuleNotFoundError at any point during this walkthrough, it is likely that you need to install the RIME NLP Extras!

pip install -r nlp_requirements.txt

Running Stress Testing on the Text Classification Example

In this tutorial, we will cover pre-production testing of a DistilBERT emotion recognition model using the RIME NLP automated test suite.

The example NLP models provided for this tutorial rely on a couple additional dependencies. To install, please run the following command:

pip install -r nlp_examples/trial_model_requirements.txt

Then, to kick off an NLP Test run, execute the following command in your local terminal:

rime-engine run-nlp --config-path nlp_examples/classification/emotion_recognition/stress_tests_config.json

NOTE: if the above command throws a ModuleNotFoundError, it is likely that you forgot to install the RIME NLP Extras (see setup above).

After this finishes running, you should be able to see the results in the web client, where they will be uploaded to the Default Project. For a full guide to what you are seeing, please see our UI Documentation.

If you explore the test config in nlp_examples/classification/emotion_recognition/stress_tests_config.json you’ll see that we’ve configured a few parameters to specify the data, model, and other task-specific information. For a full reference on the configuration file see the NLP Configuration Reference.

Running Stress Testing on a Text Classification Example with Metadata

In this tutorial, we will cover adding custom metadata to your test run. This tutorial uses a RoBERTa based model trained on tweets and finetuned for sentiment analysis. The dataset used in this example is data scraped from Twitter to analyze how travelers expressed feelings about airlines.

The data includes several attributes alongside text and label that RIME will also run automated tests on:

  • Custom numeric metadata: Retweet_count

  • Custom categorical metadata: Reason, Airline, Location

These attributes exist for each datapoint in the meta dict, as key-value pairs. For example:

{"text": "@USAirways You have no idea how upset and frustrated I am right now. I'll be sure to follow through with never flying with you again.", "label": 0, "meta": {"Reason": "Can't Tell", "Airline": "US Airways", "Location": "Saratoga Springs", "Retweet_count": 0}}

Kicking off a test run on the Sentiment Analysis (Twitter Airline) example is super simple — all we need to do is update the --config-path argument:

rime-engine run-nlp --config-path nlp_examples/classification/sentiment_analysis/stress_tests_config_with_metadata.json

If you poke around in stress_tests_config_with_metadata.json you’ll see that we’ve added custom_numeric_metadata and custom_categorical_metadata to data_profiling_info. For a full reference on the data profile file see the Data Profile config document.

Running Stress Testing on the NER Example

Kicking off a test run on the Named Entity Recognition (NER) example is super simple — all we need to do is update the --config-path argument:

rime-engine run-nlp --config-path nlp_examples/ner/conll/stress_tests_config.json

If you poke around in nlp_examples/ner/conll/stress_tests_config.json you’ll see that we’ve changed the model_task to Named Entity Recognition, along with a couple other parameters. For a full reference on the configuration file see the NLP Configuration Reference.

Running Stress Testing on your own Model and Datasets

To run RIME using your own data and model, please consult the NLP Data Guide guide for the expected data format and How to Create an NLP Model File for step-by-step instructions on how to connect your model to the testing framework.

Because model inference is usually the most time-consuming part of the testing framework, we recommend specifying cached prediction logs using the prediction_info argument of the runtime config.

In the classification example above, we ran RIME using model predictions saved in the files nlp_examples/classification/emotion_recognition/data/{train|val}_preds.json. An example prediction can be viewed from this file by running the following command from your terminal:

cat nlp_examples/classification/emotion_recognition/data/test_preds.json | jq '.[0]'

RIME also supports predictions stored in compressed JSON or JSONL format and accepts predictions added within the datafile itself (by adding the “probabilities” key to each data sample).

However, if you do not wish to create a prediction log beforehand, RIME can call your model during a test run and infer its performance using a subsample of the provided datasets. The following command initiates a run similar to the text classification run above, except the specified config doesn’t point to any predictions files.

rime-engine run-nlp --config-path nlp_examples/classification/emotion_recognition/stress_tests_config_no_preds.json

Note that the command is exactly the same EXCEPT for the --config-path provided. Comparing this runtime config with the previous one, you can see that this file omits the "prediction_info" section of the previous config.

Conclusion

Congratulations! You’ve successfully used RIME to test out the various NLP models.

Once again, we strongly recommended that you run RIME using a cached predictions file, similar to the one provided in the first part of this tutorial. This will greatly improve both the RIME runtime and the test suite result quality. Model inference tends to be the most computationally expensive part of each RIME run, especially for large transformer models. While access to the model is still required for some tests due to design constraints (e.g., the use of randomness, iterative attacks, etc.), providing a prediction file can help RIME avoid redundant computation so each run is fast and focused.

Troubleshooting

If you run into errors running this walkthrough, please reference the RIME Python Package section of our FAQ. Additionally, your RI representative will be happy to assist–feel free to reach out!