NavigationContentFooter
Jump toSuggest an edit

Deploying models with NVIDIA Triton Inference Server on Scaleway Object Storage

Reviewed on 19 September 2024Published on 23 August 2023
  • gpu
  • nvidia
  • triton
  • object-storage

In this tutorial, we will walk you through the process of deploying machine learning models using NVIDIA Triton Inference Server on Scaleway Object Storage.

We will cover how to set up Triton Inference Server, store your model in an Object Storage bucket, and enable metric export for monitoring.

Before you start

To complete the actions presented below, you must have:

  • A Scaleway account logged into the console
  • Owner status or IAM permissions allowing you to perform actions in the intended Organization
  • An SSH key
  • A GPU Instance running on Ubuntu GPU 12
  • sudo privileges or access to the root user
  • Basic understanding of machine learning models and Docker

Storing models in Scaleway Object Storage

In this step, we will create an Object Storage bucket on Scaleway and upload your machine-learning model. If you have already created and uploaded your model to an Object Storage bucket, you can skip this step.

For this tutorial, we will use a pre-trained model available in the Triton Inference Server model repository. You can replace this with your own model if desired.

  1. Click Object Storage on the left side menu of the console. The Object Storage dashboard displays.
  2. Create a new bucket to store your model artifacts.
  3. Download the example model repository from NVIDIA’s GitHub to your local computer:
    git clone https://github.com/triton-inference-server/server.git
  4. Download the example models:
    cd server/docs/examples
    ./fetch_models.sh
  5. Navigate to the server/docs/examples/model_repository directory within the cloned repository.
  6. Upload the example model folder to your bucket in Scaleway Object Storage. You can use the Scaleway Object Storage API, any Amazon S3-compatible tool, or web interface to upload the model folder.
    Tip

    You can use the s3cmd command-line tool or any other Amazon S3-compatible tool to upload your data.

Configuring Triton Inference Server

  1. Log into your GPU Instance using SSH.
    ssh root@<gpu_instance_ip>
  2. Edit the Triton server configuration file, typically named config.pbtxt. This file should be located in the Triton Inference Server installation directory.
  3. Configure the model repository path by adding or modifying the model-repository parameter:
    model-repository: "s3://https://s3.fr-par.scw.cloud:443/<your-bucket-name>/<path-to-model-folder>"
    Replace <your-bucket-name> and <path-to-model-folder> with the appropriate values for your Scaleway Object Storage setup.
    Tip

    When using S3, the credentials and default region can be passed by using either the created AWS-CLI configuration file or via the respective environment variables. If the environment variables are set they will take a higher priority and will be used by Triton instead of the credentials set using the AWS-CLI configuration file.

  4. Save the configuration file.

Starting Triton Inference Server

  1. Pull the Triton Inference Server container image:

    docker pull nvcr.io/nvidia/tritonserver:23.07-py3
    Tip

    If Docker is not preinstalled on your GPU Instance, follow the Docker installation guide before pulling the image.

  2. Start the Triton Inference Server container:

    docker run --gpus all --rm -p 8000:8000 -p 8001:8001 -p 8002:8002 -e $SCW_ACCESS_KEY -e $SCW_SECRET_ACCESS \
    -v ${PWD}/model:/models nvcr.io/nvidia/tritonserver:23.07-py3 tritonserver \
    --model-repository=s3://https://s3.fr-par.scw.cloud:443/<your_bucket_name>/<path-to-model-folder> \
    --model-control-mode=explicit --log-verbose=1
    • --gpus all: Utilizes all available GPUs for inference (if present).
    • -p8000:8000 -p8001:8001 -p8002:8002: Exposes ports for HTTP, GRPC, and metrics.
    • -e $SCW_ACCESS_KEY -e $SCW_SECRET_ACCESS: Specifies your Scaleway API access key and secret key
    • -v ${PWD}/model:/models: Mounts your local model repository inside the container.
    • --model-repository=s3://https://s3.fr-par.scw.cloud:443/<your-bucket-name>/<path-to-model-folder>: The path to your Scaleway Object Storage bucket

    Once Triton Server is running, you can access it for inference using the appropriate REST APIs or SDKs provided by NVIDIA.

    Tip

    You can run a health check on your deployment by running the following command:

    curl <ip>:8000/v2/health/ready

Exporting metrics to Scaleway Observability Cockpit

Triton Inference Server provides metrics that can be exported for monitoring. In this section, we will set up Prometheus and Scaleway Cockpit to visualize these metrics.

Configure Triton to export metrics in Prometheus format by adding the following lines to the Triton server configuration:

metrics: [
{
name: "prometheus"
kind: PROMETHEUS
output: "https://<SECRET_KEY>:<COCKPIT_API_KEY>@metrics.cockpit.fr-par.scw.cloud/api/v1/push"
}
]

Replace <SECRET_KEY> and <COCKPIT_API_KEY> with your Cockpit token credentials.

Your metrics are now pushed to Cockpit and you can access Grafana to visualize them.

Conclusion

You have successfully deployed a machine learning model using NVIDIA Triton Inference Server on Scaleway Object Storage and set up metrics monitoring using Cockpit. This tutorial highlights how Scaleway’s Object Storage and AI capabilities can be combined to build and deploy powerful AI applications.

Remember that this tutorial provides just a basic overview. Refer to the official Triton Inference Server documentation for more advanced configurations, security settings, and additional features to configure the application to your specific needs.

API DocsScaleway consoleDedibox consoleScaleway LearningScaleway.comPricingBlogCareers
© 2023-2024 – Scaleway