Troubleshooting

RIME Installation

How do I upgrade my production version of RIME?

New major versions of RIME are released on a six-week release cycle. There are often minor version releases with fixes for bugs and the like inbetween these releases as well. The upgrade process involves updating the RIME cluster image, updating the image itself, and then updating the SDK. See Updating RIME for detailed instructions on this process.

RIME Python Package

I’m seeing “Missing option” errors with my rime-engine CLI commands! How do I resolve them?
```
rime-engine run-stress-tests --config-path examples/income/stress_tests_model.json

Error: Missing option '--upload-endpoint'.

..
```
The commands in the RIME CLI Walkthroughs use environment variables to keep the commands short and readable.

Be sure to set these up in your terminal session before running any commands:

Local

NOTE: Disabling TLS is recommended for local uploads only!
```
export RIME_UPLOAD_URL=localhost:5001
export RIME_FIREWALL_URL=localhost:5002
export RIME_DISABLE_TLS=True
```
Cloud

Be sure to replace <YOUR_ORG_NAME> and <YOUR_API_KEY with the specific values for your RIME Cloud instance!
```
export RIME_UPLOAD_URL=rime-backend.<YOUR_ORG_NAME>.rime.dev
export RIME_FIREWALL_URL=rime-backend.<YOUR_ORG_NAME>.rime.dev
export RIME_API_KEY=<YOUR_API_KEY>
```
Alternatively, missing options can be provided in the command itself, as flags (e.g., --upload-endpoint below):
```
rime-engine run-stress-tests --config-path examples/income/stress_tests_model.json --upload-endpoint <YOUR_UPLOAD_ENDPOINT>
```
Mappings of environment variables to their option names can be found by running --help for the chosen command:
```
rime-engine run-stress-tests --help
```
I’m seeing ModuleNotFound errors in the console. How do I resolve them?

These errors likely result from not having the extras installed for your use case.

Make sure you are inside of the rime_trial/ directory using your rime-venv virtual environment before proceeding.

(If not already run during installation) Run the following to generate the necessary requirements lists:
```
python rime_helper.py generate-rime-requirements --token-file $PATH_TO_TOKEN_TXT_FILE   
```
For Natural Language Processing (NLP) use cases (i.e., text data):
```
pip install -r nlp_requirements.txt
```
For Computer Vision (CV) use cases (i.e., image data):
```
pip install -r cv_requirements.txt
```
I’m seeing grpc.FutureTimeoutError(s) when trying to upload stress tests. How do I resolve them?

This error can be due to DNS resolution across operating systems and protocols (IPv4 vs. IPv6).

If that is the case, running the following in your terminal session can resolve the issue:
```
export GRPC_DNS_RESOLVER=native
```
Otherwise, make sure your machine has access to the endpoint(s) in question (e.g., by enabling VPN).

RIME SDK

Troubleshooting RIME Stress Tests

When running a suite of stress tests on arbitrary models and datasets on a custom image, things can go wrong. The RIME SDK has tools available to help you debug your stress test jobs. This document includes a few common failure scenarios and recommended debugging techniques.

Lost RIMEStressTestJob Object

If you close your Python notebook or scripting session, you will lose access to the ephemeral in-memory objects such as RIMEStressTestJob. To recover these objects, connect the client to the same backend service.

rime_client = RIMEClient("my_vpc.rime.com", "api-key")

Then, use rime_client.list_stress_test_jobs() to query the server for a list of jobs from the past two days. You can filter by status and project ID to reduce the volume of jobs returned. Then, you can call get_status() on each job to find which job is yours. The return value from get_status() includes the start time and status of the job which should help you identify which job you started.

jobs = rime_client.list_stress_test_jobs(status_filters= ['RUNNING', 'FAILING'], project_id="bar")
# Print out the metadata for each job to see which one you started most recently.
for job in jobs:
    print(job.get_status())

Test Run Results Don’t Show Up in UI

This indicates that the RIMEStressTestJob executing the suite of stress tests failed along the way. There are a number of reasons why this would happen. Here are a few:

Misspecified test_run_config, dataset, or model.
CustomImage cannot be pulled.
Resource limits exceeded. The best place to start is the get_status() of the RIMEStressTestJob object. If the job status is 'FAILING' and the verbose flag is set to True, get_status() will dump the logs to stdout. For configuration issues, this can be very helpful.

# Assume the job is 'FAILING'
status = job.get_status(verbose=True, wait_until_finish=True)
# This will dump the logs if there any to stdout.

Looking at the logs can help solve a lot of problems. If you have trouble making additional progress with debugging, please contact RI support.