Understanding the Sentence-t5-xxl embedding model

Model overview

Attribute	Details
Provider	sentence-transformers
Model Name	`sentence-t5-xxl`
Compatible Instances	L4 (FP32)
Context size	512 tokens

Model names

sentence-transformers/sentence-t5-xxl:fp32

The Sentence-T5-XXL model represents a significant evolution in sentence embeddings, building on the robust foundation of the Text-To-Text Transfer Transformer (T5) architecture. Designed for performance in various language processing tasks, Sentence-T5-XXL leverages the strengths of T5’s encoder-decoder structure to generate high-dimensional vectors that encapsulate rich semantic information. This model has been meticulously tuned for tasks such as text classification, semantic similarity, and clustering, making it a useful tool in the RAG (Retrieval-Augmented Generation) framework. It excels in sentence similarity tasks, but its performance in semantic search tasks is less optimal.

Why is it useful?

The Sentence-T5-XXL model is highly ranked on the MTEB leaderboard for open models under Apache-2 license:

Sentence-T5-XXL encodes text into 768-dimensional vectors, providing a detailed and nuanced representation of sentence semantics.
This model was trained on a diverse dataset of 2 billion question-answer pairs from various online communities, ensuring broad applicability and robustness.

How to use it

Sending Managed Inference requests

To perform inference tasks with your Embedding model deployed at Scaleway, use the following command:

curl https://<Deployment UUID>.ifr.fr-par.scw.cloud/v1/embeddings \
  -H "Authorization: Bearer <IAM API key>" \
  -H "Content-Type: application/json" \
  -d '{
    "input": "Embeddings can represent text in a numerical format.",
    "model": "sentence-transformers/sentence-t5-xxl:fp32"
  }'

Make sure to replace <IAM API key> and <Deployment UUID> with your actual IAM API key and the Deployment UUID you are targeting.

Receiving Inference responses

Upon sending the HTTP request to the public or private endpoints exposed by the server, you will receive inference responses from the managed Managed Inference server. Process the output data according to your application’s needs. The response will contain the output generated by the embedding model based on the input provided in the request.