Scaleway Generative APIs - Dedicated Deployment allows you to deploy and run machine learning models on Scaleway's infrastructure. This service provides scalable and efficient endpoints for your model inference needs. The Scaleway Generative APIs - Dedicated Deployment API enables you to manage these endpoints and perform inference operations with any OpenAI API compatible software.
Tip
To retrieve information about the different models available for deployment on Scaleway Generative APIs - Dedicated Deployment, check out our model documentationOpen in new context.
Concepts
Refer to our dedicated concepts pageOpen in new context to find the definitions of all concepts and terminology related to Generative APIs - Dedicated Deployment.
Quickstart
-
Configure your environment variables
Note
This is an optional step that seeks to simplify your usage of the Generative APIs - Dedicated Deployment API. You can find your Project ID in the Scaleway consoleOpen in new context.
Code -
List available models: Run the following command to get a list of all the models available for deployment, with their details:
Code -
Create a model deployment: Run the following command to create a deployment. Customize the details in the payload (name, model, description, tags, etc.) to your needs:
CodeParameter Description Valid values project_idThe Project in which the deployment should be created (string) Any valid Scaleway Project ID, e.g., "b4bd99e0-b389-11ed-afa1-0242ac120002"nameA name of your choice for the deployment (string) Any string containing only alphanumeric characters, dots, spaces, and dashes, e.g., "my-inference-deployment"model_idThe model to deploy (string) Any valid model ID found in your model library (see models listing) node_typeThe type of node to use for the deployment (string) Example: "L4"min_sizeMinimum number of replicas for the deployment (integer) Any integer, e.g., 1max_sizeMaximum number of replicas for the deployment (integer) Any integer, e.g., 3accept_eulaIndicates acceptance of the End User License Agreement (boolean) trueendpointsDefines the endpoints for the deployment (array) At least one endpoint, e.g., [ { "public": {} } ] -
Create a model endpoint: Run the following command to create an inference endpoint for the deployment. Customize the details in the payload to your needs:
Example for creating a public endpoint
CodeExample for creating a private endpoint
CodeParameter Description Valid values project_idThe Project in which the endpoint should be created (string) Any valid Scaleway Project ID, e.g., "b4bd99e0-b389-11ed-afa1-0242ac120002"deployment_idThe deployment ID to which the endpoint will be associated (string) Any valid deployment ID, e.g. "bcb0976d-98d6-49c1-b6b5-17804941c0b7"disable_authSpecifies whether to disable authentication (boolean) trueorfalsepublicPublic endpoint configuration (object) {}for public endpointprivate_networkPrivate endpoint configuration including the private network ID (object) { "private_network_id": "private-network-id" } -
List your deployments: Run the following command to get a list of all the deployments in your account, with their details:
Code -
List your endpoints: Run the following command to get a list of all the inference endpoints in your account, with their details:
Code -
Delete an endpoint: Run the following command to delete an inference endpoint, specified by its endpoint ID:
CodeThe expected successful response is empty.
Important
Dedicated Generative APIs deployments must have at least one endpoint, either public or private.
Requirement
- You have a Scaleway accountOpen in new context
- You have created an API keyOpen in new context and the API key has sufficient IAM permissionsOpen in new context to perform the actions described on this page
- You have installed
curlOpen in new context
Technical information
Region
Generative APIs - Dedicated Deployment endpoints are available in the following region:
| Name | API ID |
|---|---|
| Paris | fr-par |
Pagination
Most listing requests receive a paginated response. Requests against paginated endpoints accept two query arguments:
page, a positive integer to choose which page to returnper_page, a positive integer lower or equal to 100 to select the number of items to return per page. The default value is50.
Paginated endpoints usually also accept filters to search and sort results. These filters are documented in each endpoint's documentation.
The X-Total-Count header contains the total number of items returned.
Creating a deployment: the model object
When creating a deployment, the model_id parameter is required. This specifies the model to deploy. Use the List Models endpoint to retrieve available model IDs.
Note
This information is designed to help you correctly configure the model_id parameter when using the Create a deployment method.
Going further
For more help using Scaleway Generative APIs - Dedicated Deployment, check out the following resources:
- Our main documentationOpen in new context
- The #ai channel on our Slack CommunityOpen in new context.