Troubleshooting

RIME Installation

  1. How do I upgrade my production version of RIME?

    New major versions of RIME are released on a six-week release cycle. There are often minor version releases with fixes for bugs and the like inbetween these releases as well. The upgrade process involves updating the RIME cluster image, updating the image itself, and then updating the SDK. See Updating RIME for detailed instructions on this process.

RIME Python Package

  1. I’m seeing “Missing option” errors with my rime-engine CLI commands! How do I resolve them?

    rime-engine run-stress-tests --config-path examples/income/stress_tests_model.json
    
    Error: Missing option '--upload-endpoint'.
    
    ..
    

    The commands in the RIME CLI Walkthroughs use environment variables to keep the commands short and readable.

    Be sure to set these up in your terminal session before running any commands:

    Local

    NOTE: Disabling TLS is recommended for local uploads only!

    export RIME_UPLOAD_URL=localhost:5001
    export RIME_FIREWALL_URL=localhost:5002
    export RIME_DISABLE_TLS=True
    

    Cloud

    Be sure to replace <YOUR_ORG_NAME> and <YOUR_API_KEY with the specific values for your RIME Cloud instance!

    export RIME_UPLOAD_URL=rime-backend.<YOUR_ORG_NAME>.rime.dev
    export RIME_FIREWALL_URL=rime-backend.<YOUR_ORG_NAME>.rime.dev
    export RIME_API_KEY=<YOUR_API_KEY>
    

    Alternatively, missing options can be provided in the command itself, as flags (e.g., --upload-endpoint below):

    rime-engine run-stress-tests --config-path examples/income/stress_tests_model.json --upload-endpoint <YOUR_UPLOAD_ENDPOINT>
    

    Mappings of environment variables to their option names can be found by running --help for the chosen command:

    rime-engine run-stress-tests --help
    
  2. I’m seeing ModuleNotFound errors in the console. How do I resolve them?

    These errors likely result from not having the extras installed for your use case.

    Make sure you are inside of the rime_trial/ directory using your rime-venv virtual environment before proceeding.

    (If not already run during installation) Run the following to generate the necessary requirements lists:

    python rime_helper.py generate-rime-requirements --token-file $PATH_TO_TOKEN_TXT_FILE   
    

    For Natural Language Processing (NLP) use cases (i.e., text data):

    pip install -r nlp_requirements.txt
    

    For Computer Vision (CV) use cases (i.e., image data):

    pip install -r cv_requirements.txt
    
  3. I’m seeing grpc.FutureTimeoutError(s) when trying to upload stress tests. How do I resolve them?

    This error can be due to DNS resolution across operating systems and protocols (IPv4 vs. IPv6).

    If that is the case, running the following in your terminal session can resolve the issue:

    export GRPC_DNS_RESOLVER=native
    

    Otherwise, make sure your machine has access to the endpoint(s) in question (e.g., by enabling VPN).

RIME SDK

Troubleshooting RIME Stress Tests

When running a suite of stress tests on arbitrary models and datasets on a custom image, things can go wrong. The RIME SDK has tools available to help you debug your stress test jobs. This document includes a few common failure scenarios and recommended debugging techniques.

Lost RIMEStressTestJob Object

If you close your Python notebook or scripting session, you will lose access to the ephemeral in-memory objects such as RIMEStressTestJob. To recover these objects, connect the client to the same backend service.

rime_client = RIMEClient("my_vpc.rime.com", "api-key")

Then, use rime_client.list_stress_test_jobs() to query the server for a list of jobs from the past two days. You can filter by status and project ID to reduce the volume of jobs returned. Then, you can call get_status() on each job to find which job is yours. The return value from get_status() includes the start time and status of the job which should help you identify which job you started.

jobs = rime_client.list_stress_test_jobs(status_filters= ['RUNNING', 'FAILING'], project_id="bar")
# Print out the metadata for each job to see which one you started most recently.
for job in jobs:
    print(job.get_status())

Test Run Results Don’t Show Up in UI

This indicates that the RIMEStressTestJob executing the suite of stress tests failed along the way. There are a number of reasons why this would happen. Here are a few:

  • Misspecified test_run_config, dataset, or model.

  • CustomImage cannot be pulled.

  • Resource limits exceeded. The best place to start is the get_status() of the RIMEStressTestJob object. If the job status is 'FAILING' and the verbose flag is set to True, get_status() will dump the logs to stdout. For configuration issues, this can be very helpful.

# Assume the job is 'FAILING'
status = job.get_status(verbose=True, wait_until_finish=True)
# This will dump the logs if there any to stdout.

Looking at the logs can help solve a lot of problems. If you have trouble making additional progress with debugging, please contact RI support.