NavigationContentFooter
Jump toSuggest an edit

Scaleway Managed Inference as drop-in replacement for the OpenAI APIs

Reviewed on 26 August 2024Published on 06 May 2024

You can use any of the OpenAI official libraries, for example, the OpenAI Python client library to interact with your Scaleway Managed Inference deployment. This feature is especially beneficial for those looking to seamlessly transition applications already utilizing OpenAI.

Chat Completions API

The Chat Completions API is designed for models fine-tuned for conversational tasks (such as X-chat and X-instruct variants).

CURL

To invoke Scaleway Managed Inference’s OpenAI-compatible Chat API, simply edit your dedicated endpoints with a suffix /v1/chat/completions:

https://<Deployment UUID>.ifr.fr-par.scaleway.com/v1/chat/completions

OpenAI Python client library

Use OpenAI’s SDK how you normally would.

from openai import OpenAI
client = OpenAI(
base_url='https://<Deployment UUID>.ifr.fr-par.scaleway.com/v1/',
api_key='<IAM API key>'
)
chat_completion = client.chat.completions.create(
messages=[
{ "role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Sing me a song about Scaleway"
}
],
model='<Model name>' #e.g 'meta/llama-3.1-8b-instruct:fp8'
)
print(chat_completion.choices[0].message.content)
Note

More OpenAI-like APIs (e.. audio) will be released step by step once related models are supported.

Supported parameters

  • messages (required)
  • model (required)
  • max_tokens
  • temperature (default 0.7)
  • top_p (default 1)
  • presence_penalty
  • response_format
  • logprobs
  • stop
  • seed
  • stream
  • tools
  • tool_choice

Unsupported parameters

Currently, the following options are not supported:

  • frequency_penalty
  • n
  • top_logprobs
  • logit_bias
  • user

If you have a use case requiring one of these unsupported features, please contact us via Slack.

Embeddings API

The Embeddings API is designed to get a vector representation of an input that can be easily consumed by other machine learning models.

CURL

Use your dedicated endpoints as follows:

https://<Deployment UUID>.ifr.fr-par.scaleway.com/v1/embeddings
curl https://<Deployment UUID>.ifr.fr-par.scaleway.com/v1/embeddings \
-H "Authorization: Bearer $SCW_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"input": "Embeddings can represent text in a numerical format.",
"model": "$MODEL_NAME"
}'
# model e.g 'sentence-transformers/sentence-t5-xxl:fp32'

OpenAI Python client library

from openai import OpenAI
client = OpenAI(
base_url='https://<Deployment UUID>.ifr.fr-par.scaleway.com/v1/',
api_key='<IAM API key>'
)
embedding = client.embeddings.create(
input=["Embeddings can represent text in a numerical format.","Machine learning models use embeddings for various tasks."]
model='<Model name>' #e.g 'sentence-transformers/sentence-t5-xxl:fp32'
)
print(embedding)

Supported parameters

  • input (required) in string or array of strings
  • model (required)

Unsupported parameters

  • encoding_format (default float)
  • dimensions

Models API

The Models API returns the model(s) available for inferencing.

In the context of a Scaleway Managed Inference deployment, it returns the name of the current model being served.

https://<Deployment UUID>.ifr.fr-par.scaleway.com/v1/models
curl https://<Deployment UUID>.ifr.fr-par.scaleway.com/v1/models \
-H "Authorization: Bearer $SCW_API_KEY" \
-H "Content-Type: application/json"

Differences

Token usage stats

OpenAI API doesn’t return usage stats (number of tokens in prompt and completion) for streaming responses.

Scaleway Managed Inference endpoints return usage stats for both streaming and non-streaming responses.

For streaming responses, the usage field is incremented in each chunk, and completed in the very last chunk of the response. For example:

data: {...,"choices":[{"index":0,"delta":{"content":" Hello","role":"assistant","name":""},"finish_reason":null}],...,"usage":{"prompt_tokens":9,"completion_tokens":1,"total_tokens":10}}
data: {...,"choices":[{"index":0,"delta":{"content":"!","role":"assistant","name":""},"finish_reason":null}],...,"usage":{"prompt_tokens":9,"completion_tokens":2,"total_tokens":11}}
data: {...,"choices":[{"index":0,"delta":{"content":"","role":"assistant","name":""},"finish_reason":"stop"}],...,"usage":{"prompt_tokens":9,"completion_tokens":2,"total_tokens":11}}
data: [DONE]

Future developments

This documentation covers the initial phase of experimental support for the OpenAI API. Gradually, we plan to introduce additional APIs such as:

  • Audio API
  • Images API
Note

We will progressively roll out more OpenAI-like APIs as we expand model support.

If you have a use case requiring one of these unsupported APIs, please contact us via Slack.

Was this page helpful?
API DocsScaleway consoleDedibox consoleScaleway LearningScaleway.comPricingBlogCareers
© 2023-2024 – Scaleway