Deploying Your RI Platform Cluster
With requirements satisfied and configuration files populated, you are now ready to deploy the RI Platform.
At a high-level, there are two main ways to do this: as a standalone Kubernetes cluster (recommended) or integrated within an existing Kubernetes cluster. The former is achieved by specifying create_eks = true
, whereas the latter is specified by create_eks = false
and including a cluster_name
.
The steps below may vary, depending on your infrastructure.
Deploy the Cluster
Open a terminal session on a box containing the installation tools.
Verify that your Terraform configuration files (
main.tf
andbackend.tf
) are present in your working directory.Authenticate your AWS CLI.
aws sts get-caller-identity # verify you're in the right account
Add the Robust Intelligence Helm repository (or your private registry, if configured).
helm repo add robustintelligence https://robustintelligence.github.io/helm --force-update
Initialize your Terraform environment.
terraform init
Verify your Terraform plan (recommended).
terraform plan -out "rime.plan" | tee "rime-plan.txt" less rime-plan.txt # proof-read the changes
Terraform apply! (This step can take up to ~30 minutes.)
terraform apply "rime.plan" # if you skipped #6, you can omit the "rime.plan"
Validate Your Deployment
Once the terraform apply
command completes, your cluster should be operational! The following actions can help verify all services are up and running.
Load Balancer ALPN Policy
Find the load balancer used by the
rime-kong-proxy
withkubectl get svc rime-kong-proxy
.Locate the Load Balancer in your AWS console.
In the “Listeners” section, verify that the
TLS: 443
listener’s ALPN policy is set toHTTP2Preferred
.
Kubernetes Services
Point your local
kubectl
to the new cluster.aws eks --region us-west-2 update-kubeconfig --name <cluster-name>
Inspect the running pods.
kubectl get pods -n <rime-namespace>
Your output should look something like this:
NAME READY STATUS RESTARTS AGE rime-agent-job-monitor-6bddd4697d-t9118 1/1 Running 0 5m26s rime-agent-launcher-56bc47549c-dod60 1/1 Running 0 5m26s rime-frontend-cd6c89884-8ljrl 1/1 Running 0 5m26s ...
Verify you can access the web client at your
rime
sub-domain. This domain is the value you configured during DNS setup and will be of the form (rime.<DOMAIN>.com
).cURL your version endpoint and verify that metadata is successfully returned:
curl --location --request GET rime.<DOMAIN>.com/v1/rime-info
Verify you can make an API token in the web client using this guide.
Test your Python SDK connection using the API token you made:
pip install rime-sdk
rime_client = Client("rime.<DOMAIN>.com", "<API_TOKEN>") project = rime_client.create_project("Health Check", "Testing the SDK's upstream connection.")
Return to the web client and verify that a project was created. If everything succeeds you are ready to achieve ML Integrity with the RI Platform!
Configure Backups
Backups ensure that your team can restore your testing data in the event of a disaster.
If your cluster has been successfully deployed (and you opted in via install_velero = true
), you can configure backups using the steps below.
Download Velero.
curl -fsSL -o velero-v1.6.3-linux-amd64.tar.gz https://github.com/vmware-tanzu/velero/releases/download/v1.6.3/velero-v1.6.3-linux-amd64.tar.gz tar -xvf velero-v1.6.3-linux-amd64.tar.gz
Ensure that your backups are scheduled properly.
./velero schedule get -n rime-extras
Troubleshooting
If you are getting timeouts in the SDK, ensure that you are connected to VPN.
If the webapp is marked as insecure, verify that you have an ACM SSL/TLS cert for your webapp.
On older operating systems, you may need to run
export GRPC_DNS_RESOLVER=native
in the shell. Otherwise requests may hang due to ipv4 vs ipv6 issues.