Getting started with Amazon Titan Text Embeddings

Embeddings play a key role in natural language processing (NLP) and machine learning (ML). Text embedding refers to the process of transforming text into numerical representations that reside in a high-dimensional vector space. This technique is achieved through the use of ML algorithms that enable the understanding of the meaning and context of data (semantic relationships) and the learning of complex relationships and patterns within the data (syntactic relationships). You can use the resulting vector representations for a wide range of applications, such as information retrieval, text classification, natural language processing, and many others.

Amazon Titan Text Embeddings is a text embeddings model that converts natural language text—consisting of single words, phrases, or even large documents—into numerical representations that can be used to power use cases such as search, personalization, and clustering based on semantic similarity.

In this post, we discuss the Amazon Titan Text Embeddings model, its features, and example use cases.

Some key concepts include:

Numerical representation of text (vectors) captures semantics and relationships between words
Rich embeddings can be used to compare text similarity
Multilingual text embeddings can identify meaning in different languages

How is a piece of text converted into a vector?

There are multiple techniques to convert a sentence into a vector. One popular method is using word embeddings algorithms, such as Word2Vec, GloVe, or FastText, and then aggregating the word embeddings to form a sentence-level vector representation.

Another common approach is to use large language models (LLMs), like BERT or GPT, which can provide contextualized embeddings for entire sentences. These models are based on deep learning architectures such as Transformers, which can capture the contextual information and relationships between words in a sentence more effectively.

Why do we need an embeddings model?

Vector embeddings are fundamental for LLMs to understand the semantic degrees of language, and also enable LLMs to perform well on downstream NLP tasks like sentiment analysis, named entity recognition, and text classification.

In addition to semantic search, you can use embeddings to augment your prompts for more accurate results through Retrieval Augmented Generation (RAG)—but in order to use them, you’ll need to store them in a database with vector capabilities.

The Amazon Titan Text Embeddings model is optimized for text retrieval to enable RAG use cases. It enables you to first convert your text data into numerical representations or vectors, and then use those vectors to accurately search for relevant passages from a vector database, allowing you to make the most of your proprietary data in combination with other foundation models.

Because Amazon Titan Text Embeddings is a managed model on Amazon Bedrock, it’s offered as an entirely serverless experience. You can use it via either the Amazon Bedrock REST API or the AWS SDK. The required parameters are the text that you would like to generate the embeddings of and the modelID parameter, which represents the name of the Amazon Titan Text Embeddings model. The following code is an example using the AWS SDK for Python (Boto3):

import boto3
import json
 
#Create the connection to Bedrock
bedrock = boto3.client(
    service_name='bedrock',
    region_name='us-west-2', 
    
)
 
bedrock_runtime = boto3.client(
    service_name='bedrock-runtime',
    region_name='us-west-2', 
    
)
 
# Let's see all available Amazon Models
available_models = bedrock.list_foundation_models()
 
for model in available_models['modelSummaries']:
  if 'amazon' in model['modelId']:
    print(model)
 
# Define prompt and model parameters
prompt_data = """Write me a poem about apples"""
 
body = json.dumps({
    "inputText": prompt_data,
})
 
model_id = 'amazon.titan-embed-text-v1' #look for embeddings in the modelID
accept = 'application/json' 
content_type = 'application/json'
 
# Invoke model 
response = bedrock_runtime.invoke_model(
    body=body, 
    modelId=model_id, 
    accept=accept, 
    contentType=content_type
)
 
# Print response
response_body = json.loads(response['body'].read())
embedding = response_body.get('embedding')
 
#Print the Embedding
 
print(embedding)

The output will look something like the following:

[-0.057861328, -0.15039062, -0.4296875, 0.31054688, ..., -0.15625]

Refer to Amazon Bedrock boto3 Setup for more details on how to install the required packages, connect to Amazon Bedrock, and invoke models.

Features of Amazon Titan Text Embeddings

With Amazon Titan Text Embeddings, you can input up to 8,000 tokens, making it well suited to work with single words, phrases, or entire documents based on your use case. Amazon Titan returns output vectors of dimension 1536, giving it a high degree of accuracy, while also optimizing for low-latency, cost-effective results.

Amazon Titan Text Embeddings supports creating and querying embeddings for text in over 25 different languages. This means you can apply the model to your use cases without needing to create and maintain separate models for each language you want to support.

Having a single embeddings model trained on many languages provides the following key benefits:

Broader reach – By supporting over 25 languages out of the box, you can expand the reach of your applications to users and content in many international markets.
Consistent performance – With a unified model covering multiple languages, you get consistent results across languages instead of optimizing separately per language. The model is trained holistically so you get the advantage across languages.
Multilingual query support – Amazon Titan Text Embeddings allows querying text embeddings in any of the supported languages. This provides flexibility to retrieve semantically similar content across languages without being restricted to a single language. You can build applications that query and analyze multilingual data using the same unified embeddings space.

As of this writing, the following languages are supported:

Arabic
Chinese (Simplified)
Chinese (Traditional)
Czech
Dutch
English
French
German
Hebrew
Hindi
Italian
Japanese
Kannada
Korean
Malayalam
Marathi
Polish
Portuguese
Russian
Spanish
Swedish
Filipino Tagalog
Tamil
Telugu
Turkish

Using Amazon Titan Text Embeddings with LangChain

LangChain is a popular open source framework for working with generative AI models and supporting technologies. It includes a BedrockEmbeddings client that conveniently wraps the Boto3 SDK with an abstraction layer. The BedrockEmbeddings client allows you to work with text and embeddings directly, without knowing the details of the JSON request or response structures. The following is a simple example:

from langchain.embeddings import BedrockEmbeddings

#create an Amazon Titan Text Embeddings client
embeddings_client = BedrockEmbeddings() 

#Define the text from which to create embeddings
text = "Can you please tell me how to get to the bakery?"

#Invoke the model
embedding = embeddings_client.embed_query(text)

#Print response
print(embedding)

You can also use LangChain’s BedrockEmbeddings client alongside the Amazon Bedrock LLM client to simplify implementing RAG, semantic search, and other embeddings-related patterns.

Use cases for embeddings

Although RAG is currently the most popular use case for working with embeddings, there are many other use cases where embeddings can be applied. The following are some additional scenarios where you can use embeddings to solve specific problems, either on their own or in cooperation with an LLM:

Question and answer – Embeddings can help support question and answer interfaces through the RAG pattern. Embeddings generation paired with a vector database allow you to find close matches between questions and content in a knowledge repository.
Personalized recommendations – Similar to question and answer, you can use embeddings to find vacation destinations, colleges, vehicles, or other products based on the criteria provided by the user. This could take the form of a simple list of matches, or you could then use an LLM to process each recommendation and explain how it satisfies the user’s criteria. You could also use this approach to generate custom “10 best” articles for a user based on their specific needs.
Data management – When you have data sources that don’t map cleanly to each other, but you do have text content that describes the data record, you can use embeddings to identify potential duplicate records. For example, you could use embeddings to identify duplicate candidates that might use different formatting, abbreviations, or even have translated names.
Application portfolio rationalization – When looking to align application portfolios across a parent company and an acquisition, it’s not always obvious where to start finding potential overlap. The quality of configuration management data can be a limiting factor, and it can be difficult coordinating across teams to understand the application landscape. By using semantic matching with embeddings, we can do a quick analysis across application portfolios to identify high-potential candidate applications for rationalization.
Content grouping – You can use embeddings to help facilitate grouping similar content into categories that you might not know ahead of time. For example, let’s say you had a collection of customer emails or online product reviews. You could create embeddings for each item, then run those embeddings through k-means clustering to identify logical groupings of customer concerns, product praise or complaints, or other themes. You can then generate focused summaries from those groupings’ content using an LLM.

Semantic search example

In our example on GitHub, we demonstrate a simple embeddings search application with Amazon Titan Text Embeddings, LangChain, and Streamlit.

The example matches a user’s query to the closest entries in an in-memory vector database. We then display those matches directly in the user interface. This can be useful if you want to troubleshoot a RAG application, or directly evaluate an embeddings model.

For simplicity, we use the in-memory FAISS database to store and search for embeddings vectors. In a real-world scenario at scale, you will likely want to use a persistent data store like the vector engine for Amazon OpenSearch Serverless or the pgvector extension for PostgreSQL.

Try a few prompts from the web application in different languages, such as the following:

How can I monitor my usage?
How can I customize models?
Which programming languages can I use?
Comment mes données sont-elles sécurisées ?
私のデータはどのように保護されていますか？
Quais fornecedores de modelos estão disponíveis por meio do Bedrock?
In welchen Regionen ist Amazon Bedrock verfügbar?
有哪些级别的支持？

Note that even though the source material was in English, the queries in other languages were matched with relevant entries.

Conclusion

The text generation capabilities of foundation models are very exciting, but it’s important to remember that understanding text, finding relevant content from a body of knowledge, and making connections between passages are crucial to achieving the full value of generative AI. We will continue to see new and interesting use cases for embeddings emerge over the next years as these models continue to improve.

Next steps

You can find additional examples of embeddings as notebooks or demo applications in the following workshops:

About the Authors

Jason Stehle is a Senior Solutions Architect at AWS, based in the New England area. He works with customers to align AWS capabilities with their greatest business challenges. Outside of work, he spends his time building things and watching comic book movies with his family.

Nitin Eusebius is a Sr. Enterprise Solutions Architect at AWS, experienced in Software Engineering, Enterprise Architecture, and AI/ML. He is deeply passionate about exploring the possibilities of generative AI. He collaborates with customers to help them build well-architected applications on the AWS platform, and is dedicated to solving technology challenges and assisting with their cloud journey.

Raj Pathak is a Principal Solutions Architect and Technical Advisor to large Fortune 50 companies and mid-sized financial services institutions (FSI) across Canada and the United States. He specializes in machine learning applications such as generative AI, natural language processing, intelligent document processing, and MLOps.

Mani Khanuja is a Tech Lead – Generative AI Specialists, author of the book – Applied Machine Learning and High Performance Computing on AWS, and a member of the Board of Directors for Women in Manufacturing Education Foundation Board. She leads machine learning (ML) projects in various domains such as computer vision, natural language processing and generative AI. She helps customers to build, train and deploy large machine learning models at scale. She speaks in internal and external conferences such re:Invent, Women in Manufacturing West, YouTube webinars and GHC 23. In her free time, she likes to go for long runs along the beach.

Mark Roy is a Principal Machine Learning Architect for AWS, helping customers design and build AI/ML solutions. Mark’s work covers a wide range of ML use cases, with a primary interest in computer vision, deep learning, and scaling ML across the enterprise. He has helped companies in many industries, including insurance, financial services, media and entertainment, healthcare, utilities, and manufacturing. Mark holds six AWS Certifications, including the ML Specialty Certification. Prior to joining AWS, Mark was an architect, developer, and technology leader for over 25 years, including 19 years in financial services.