List inference deployments
List all your inference deployments.
path Parameters
regionThe region you want to target
query Parameters
pagePage number to return.
page_sizeMaximum number of deployments to return per page.
order_byOrder in which to return results.
project_idFilter by Project ID. (UUID format)
organization_idFilter by Organization ID. (UUID format)
nameFilter by deployment name.
tagsFilter by tags.
List inference deployments › Responses
List of deployments on the current page.
total_countTotal number of deployments.
Create a deployment
Create a new inference deployment related to a specific model.
path Parameters
regionThe region you want to target
Create a deployment › Request Body
nameName of the deployment.
project_idID of the Project to create the deployment in. (UUID format)
model_idID of the model to use. (UUID format)
node_type_nameName of the node type to use.
List of endpoints to create.
accept_eulaAccept the model's End User License Agreement (EULA).
If the model has an EULA, you must accept it before proceeding.
The terms of the EULA can be retrieved using the GetModelEula API call.
tagsList of tags to apply to the deployment.
min_sizeDefines the minimum size of the pool.
max_sizeDefines the maximum size of the pool. Currently, autoscaling is not yet supported, and this value must be equal to min_size.
Quantization settings to apply to this deployment.
Create a deployment › Responses
idUnique identifier. (UUID format)
nameName of the deployment.
project_idProject ID. (UUID format)
statusStatus of the deployment.
tagsList of tags applied to the deployment.
node_type_nameNode type of the deployment.
List of endpoints.
sizeCurrent size of the pool.
min_sizeDefines the minimum size of the pool.
max_sizeDefines the maximum size of the pool. Currently, autoscaling is not yet supported, and this value must be equal to min_size.
error_messageDisplays information if your deployment is in error state.
model_idID of the model used for the deployment. (UUID format)
Quantization parameters for this deployment.
model_nameName of the deployed model.
created_atCreation date of the deployment. (RFC 3339 format)
updated_atLast modification date of the deployment. (RFC 3339 format)
regionRegion of the deployment.
Get a deployment
Get the deployment for the given ID.
path Parameters
regionThe region you want to target
deployment_idID of the deployment to get. (UUID format)
Get a deployment › Responses
idUnique identifier. (UUID format)
nameName of the deployment.
project_idProject ID. (UUID format)
statusStatus of the deployment.
tagsList of tags applied to the deployment.
node_type_nameNode type of the deployment.
List of endpoints.
sizeCurrent size of the pool.
min_sizeDefines the minimum size of the pool.
max_sizeDefines the maximum size of the pool. Currently, autoscaling is not yet supported, and this value must be equal to min_size.
error_messageDisplays information if your deployment is in error state.
model_idID of the model used for the deployment. (UUID format)
Quantization parameters for this deployment.
model_nameName of the deployed model.
created_atCreation date of the deployment. (RFC 3339 format)
updated_atLast modification date of the deployment. (RFC 3339 format)
regionRegion of the deployment.
Delete a deployment
Delete an existing inference deployment.
path Parameters
regionThe region you want to target
deployment_idID of the deployment to delete. (UUID format)
Delete a deployment › Responses
idUnique identifier. (UUID format)
nameName of the deployment.
project_idProject ID. (UUID format)
statusStatus of the deployment.
tagsList of tags applied to the deployment.
node_type_nameNode type of the deployment.
List of endpoints.
sizeCurrent size of the pool.
min_sizeDefines the minimum size of the pool.
max_sizeDefines the maximum size of the pool. Currently, autoscaling is not yet supported, and this value must be equal to min_size.
error_messageDisplays information if your deployment is in error state.
model_idID of the model used for the deployment. (UUID format)
Quantization parameters for this deployment.
model_nameName of the deployed model.
created_atCreation date of the deployment. (RFC 3339 format)
updated_atLast modification date of the deployment. (RFC 3339 format)
regionRegion of the deployment.
Update a deployment
Update an existing inference deployment.
path Parameters
regionThe region you want to target
deployment_idID of the deployment to update. (UUID format)
Update a deployment › Request Body
nameName of the deployment.
tagsList of tags to apply to the deployment.
min_sizeDefines the new minimum size of the pool.
max_sizeDefines the maximum size of the pool. Currently, autoscaling is not yet supported, and this value must be equal to min_size.
model_idId of the model to set to the deployment.
Quantization to use to the deployment.
Update a deployment › Responses
idUnique identifier. (UUID format)
nameName of the deployment.
project_idProject ID. (UUID format)
statusStatus of the deployment.
tagsList of tags applied to the deployment.
node_type_nameNode type of the deployment.
List of endpoints.
sizeCurrent size of the pool.
min_sizeDefines the minimum size of the pool.
max_sizeDefines the maximum size of the pool. Currently, autoscaling is not yet supported, and this value must be equal to min_size.
error_messageDisplays information if your deployment is in error state.
model_idID of the model used for the deployment. (UUID format)
Quantization parameters for this deployment.
model_nameName of the deployed model.
created_atCreation date of the deployment. (RFC 3339 format)
updated_atLast modification date of the deployment. (RFC 3339 format)
regionRegion of the deployment.
Get the CA certificate
Get the CA certificate used for the deployment of private endpoints. The CA certificate will be returned as a PEM file.
path Parameters
regionThe region you want to target
deployment_id(UUID format)
Get the CA certificate › Responses
namecontent_typecontent