Deploying Your RI Platform Cluster

With requirements satisfied and configuration files populated, you are now ready to deploy the RI Platform.

At a high-level, there are two main ways to do this: as a standalone Kubernetes cluster (recommended) or integrated within an existing Kubernetes cluster. The former is achieved by specifying create_eks = true, whereas the latter is specified by create_eks = false and including a cluster_name.

The steps below may vary, depending on your infrastructure.

Deploy the Cluster

Open a terminal session on a box containing the installation tools.
Verify that your Terraform configuration files (main.tf and backend.tf) are present in your working directory.

Authenticate your AWS CLI.

aws sts get-caller-identity # verify you're in the right account

Add the Robust Intelligence Helm repository (or your private registry, if configured).

helm repo add robustintelligence https://robustintelligence.github.io/helm --force-update

Initialize your Terraform environment.
```
terraform init
```

Verify your Terraform plan (recommended).

terraform plan -out "rime.plan" | tee "rime-plan.txt"
less rime-plan.txt # proof-read the changes

Terraform apply! (This step can take up to ~30 minutes.)

terraform apply "rime.plan" # if you skipped #6, you can omit the "rime.plan"

Validate Your Deployment

Once the terraform apply command completes, your cluster should be operational! The following actions can help verify all services are up and running.

Load Balancer ALPN Policy

Find the load balancer used by the rime-kong-proxy with kubectl get svc rime-kong-proxy.
Locate the Load Balancer in your AWS console.
In the “Listeners” section, verify that the TLS: 443 listener’s ALPN policy is set to HTTP2Preferred.

Kubernetes Services

Point your local kubectl to the new cluster.

aws eks --region us-west-2 update-kubeconfig --name <cluster-name>

Inspect the running pods.

kubectl get pods -n <rime-namespace>

Your output should look something like this:

NAME                                             READY   STATUS      RESTARTS   AGE
rime-agent-job-monitor-6bddd4697d-t9118          1/1     Running     0          5m26s
rime-agent-launcher-56bc47549c-dod60             1/1     Running     0          5m26s
rime-frontend-cd6c89884-8ljrl                    1/1     Running     0          5m26s
...

Verify you can access the web client at your rime sub-domain. This domain is the value you configured during DNS setup and will be of the form (rime.<DOMAIN>.com).
cURL your version endpoint and verify that metadata is successfully returned:
```
curl --location --request GET rime.<DOMAIN>.com/v1/rime-info
```
Verify you can make an API token in the web client using this guide.

Test your Python SDK connection using the API token you made:

pip install rime-sdk

rime_client = Client("rime.<DOMAIN>.com", "<API_TOKEN>")
project = rime_client.create_project("Health Check", "Testing the SDK's upstream connection.")

Return to the web client and verify that a project was created. If everything succeeds you are ready to achieve ML Integrity with the RI Platform!

Configure Backups

Backups ensure that your team can restore your testing data in the event of a disaster. If your cluster has been successfully deployed (and you opted in via install_velero = true), you can configure backups using the steps below.

Download Velero.

curl -fsSL -o velero-v1.6.3-linux-amd64.tar.gz https://github.com/vmware-tanzu/velero/releases/download/v1.6.3/velero-v1.6.3-linux-amd64.tar.gz
tar -xvf velero-v1.6.3-linux-amd64.tar.gz

Ensure that your backups are scheduled properly.
```
./velero schedule get -n rime-extras
```

Troubleshooting

If you are getting timeouts in the SDK, ensure that you are connected to VPN.
If the webapp is marked as insecure, verify that you have an ACM SSL/TLS cert for your webapp.
On older operating systems, you may need to run export GRPC_DNS_RESOLVER=native in the shell. Otherwise requests may hang due to ipv4 vs ipv6 issues.