Categories: FAANG

Build generative AI and similarity search applications at virtually unlimited scale with Spanner

Spanner, Google Cloud’s fully managed, highly available distributed database service, combines virtually-unlimited horizontal scalability with relational semantics, for both relational and non-relational workloads — all with a 99.999% availability SLA. As data volumes grow and applications demand more from their operational databases, customers need scale. We recently announced support for searching vector embeddings with exact nearest neighbor (KNN) search in preview, helping businesses build generative AI, at virtually unlimited scale. All these capabilities are available within Spanner, so you can perform vector search on your transactional data without moving your data to another database, maintaining operational simplicity.

In this blog, we discuss how vector search can enhance gen AI applications, and how Spanner’s underlying architecture supports extremely large-scale vector search deployments. In addition, we discuss the many operational benefits of using Spanner instead of a dedicated vector database.

Generative AI and vector embeddings

Generative AI is enabling all kinds of new applications, from virtual assistants that can have personalized conversations, to generating new content from simple text prompts. Pre-trained large language models (LLMs), on which gen AI relies, open the door for the broader developer community to easily build gen AI applications, even without specialized machine learning expertise. But because LLMs sometimes hallucinate and provide incorrect responses, combining LLMs with vector search and operational databases can help build gen AI applications that are grounded on contextual, domain-specific, and real-time data, for high-quality AI-assisted user experiences.

Imagine a financial institution has a virtual assistant that helps customers answer questions about their account, performs account management, and recommends financial products that best fit a customer’s unique needs. In complex scenarios, the customer’s decision-making process can spread across multiple chat sessions with the virtual assistant. Performing vector search over the conversation history can help the virtual assistant find the most relevant content, enabling a high-quality, highly relevant, and informative chat experience.

Vector search relies on vector embeddings — numerical representations of content such as text, images, or video generated by embedding models — and helps the gen AI application to identify the most relevant data to include in LLM prompts, thereby customizing and improving the quality of the LLM’s responses. Vector search can be performed by computing the distance between vector embeddings. The closer the embeddings are in the vector space, the more similar their content.

Bring virtually unlimited scale to vector search with Spanner

Vector workloads that need to support a large number of users can easily reach a very large scale, as seen in the financial virtual assistant example described above. Large-scale vector search workloads can have both a large number of vectors (e.g., greater than 10 billion), or queries per second (e.g., greater than millions of QPS). Not surprisingly, this can be challenging for many database systems. But many of these searches are highly partitionable, where each search is constrained to data associated with a particular user. These workloads are a great fit for Spanner KNN search because Spanner efficiently reduces the search space to provide accurate, real-time results with low latencies. Spanner’s horizontally scalable architecture lets it support vector search on trillions of vectors for highly partitionable workloads.

Spanner also lets you query and filter vector embeddings using SQL, maintaining application simplicity. Using SQL, you can easily join vector embeddings with operational data, and combine regular queries with vector search. For example, you can use secondary indexes to efficiently filter rows of interest before performing a vector search. Spanner’s vector search queries return fresh, real-time data as soon as transactions are committed, just like any other query on your operational data.

Operational simplicity and resource efficiency with Spanner

Further, Spanner’s in-database vector search capabilities eliminate the cost and complexity of managing a separate vector database, streamlining your operational workflow. In Spanner, vector embeddings and operational data are stored together and managed the same way, enabling vector embeddings to benefit from all of Spanner’s enterprise features, including high 99.999% availability, managed backups, point-in-time recovery (PITR), security and access control features, and change streams. Compute resources are shared between operational and vector queries, enabling better resource utilization and cost savings. Additionally, these capabilities are also supported by Spanner’s PostgreSQL interface, thereby giving users coming from PostgreSQL a familiar and portable interface.

Spanner is also integrated with popular AI developer tools including LangChain Vector Store, Document Loader, and Memory, helping developers to easily build AI applications with their preferred tooling.

Getting started

The rise of gen AI has spurred new interest in vector search capabilities. With support for KNN vector search on top of its virtually unlimited scale, Spanner is well-suited to support your large-scale vector search needs, all on the same platform that you already rely on for your demanding, distributed workloads. To learn more about Spanner and its vector search (and get started for free), check out the following resources:

AI Generated Robotic Content

Recent Posts

How growing UK midsize businesses are building in the AI era

The UK’s 5-million-plus small and midsize businesses and enterprises (SMBs) are the backbone of our…

5 hours ago

Amazon SageMaker AI Async Inference now supports inline request payloads

Today, we’re announcing inline payload support for Amazon SageMaker AI Async Inference. Customers can now…

6 hours ago

From AI potential to agentic reality: Driving the UK’s next chapter

The United Kingdom, and London in particular, continues to be one of the great hubs…

6 hours ago

The Korean Telecom Giant at the Center of Anthropic’s Mythos Controversy

Days before Anthropic took its most advanced AI models offline, the White House ordered the…

7 hours ago

Upsampling method sharpens AI vision with up to 16 times less GPU memory

From facial recognition on smartphones to humanoid robots, computer vision technology, which serves as the…

7 hours ago

Potentially the most insane LORA you’ll see today – Archer (8 characters + style) Ideogram LORA

Hi, I'm Dever and I like training LORAs, you can download this one from Huggingface…

1 day ago