Categories: FAANG

Find sensitive data faster (but safely) with Google Distributed Cloud’s gen AI search solution

Today, generative AI is giving organizations new ways to process and analyze data, discover hidden insights, increase productivity and build new applications. However, data sovereignty, regulatory compliance, and low-latency requirements can be a challenge. The need to keep sensitive data in certain locations, adhere to strict regulations, and respond swiftly can make it difficult to capitalize on the cloud’s innovation, scalability, and cost-efficiency advantages.

Google Distributed Cloud (GDC) brings Google’s AI services anywhere you need them — in your own data center or at the edge. Designed with AI and data-intensive workloads in mind, GDC is a fully managed hardware and software solution featuring a rich set of services. It comes in a range of extensible hardware form factors, with leading industry independent software vendor (ISV) solutions integrated via GDC Marketplace, and your choice of whether to run it connected to Google Cloud’s systems or air-gapped from the public internet.

In this blog post, we dive into the details of how GDC’s new AI-optimized servers with NVIDIA H100 Tensor Core GPUs and our gen AI search packaged solution — now available in preview — allow you to bring increasingly popular retrieval-augmented generation (RAG) to your on-premises environment, and unlock multimodal and multilingual natural-language search experiences across your text, image, voice, and video data.

aside_block
<ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3eed8ea9cd60>), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>

Gen AI-optimized infrastructure

GDC air-gapped now incorporates new servers with NVIDIA H100 GPUs, powered by the advanced NVIDIA Hopper architecture and the 5th Gen Intel Xeon Scalable processors. The new servers introduce the new GPU-optimized A3 VM family optimized for NVIDIA NVLink interconnect to GDC, enabling faster shared compute and memory for AI workloads using large language models (LLMs) with up to 100 billion parameters. It also extends the set of NVIDIA Multi-Instance GPU (MIG) profiles, supporting a variety of new GPU slicing schemes (both uniform and mixed-mode) and dynamic allocation of GPU resources to serve the needs of AI services with better ownership costs.

Ready-to-deploy on-prem conversational search

With GDC’s new gen AI Search solution, you get a ready-to-deploy, on-prem conversational search solution based on the Gemma 2 LLM with 9 billion parameters. You can easily ingest your sensitive on-prem data into the search solution and quickly find the most relevant information and content via natural language search, boosting employee productivity and knowledge sharing, while helping ensure that the search queries and data remain on-prem.

Responses also include citation links to your original documents so you can easily verify all answers to reduce hallucinations. Watch the demo below to see the solution in action:

For more accurate responses, the GDC gen AI search solution relies on a RAG architecture to combine the benefits of traditional search and generative AI, and user queries are augmented with relevant on-prem data before they’re sent to the LLM to generate responses. Other core integrations available out-of-box include Vertex AI pre-trained APIs (translation for 105 languages, speech-to-text for 13 languages, and optical character recognition for 46 supported and 24 experimental languages) for multimodal and multilingual data ingestion across text, images, and audio. It also includes the AlloyDB Omni database service for embeddings storage and semantic search across ingested data.

GDC’s open cloud approach also allows you to customize this solution according to your needs and swap any components as you see fit, including for other database services like Elasticsearch, other open-source models and LLMs, or your own proprietary models.

Get started on your GDC development journey

To join GDC’s gen AI search solution preview and experience how on-prem gen AI search can transform how your organization retrieves information, contact your Google account representative. Note that you will need a GDC deployment where you can deploy and run the preview.

AI Generated Robotic Content

Recent Posts

Add a generative AI experience to your website or web application with Amazon Q embedded

Generative AI offers many benefits for both you, as a software provider, and your end-users.…

8 hours ago

ChatGPT adds more PC and Mac app integrations, getting closer to piloting your computer

ChatGPT can now support more coding platforms and applications on its desktop apps, opening the…

9 hours ago

Machine learning helps researchers develop perovskite solar cells with near-record efficiency

An international team of scientists has used machine learning to help them develop perovskite solar…

9 hours ago

Accelerating LLM Inference on NVIDIA GPUs with ReDrafter

Accelerating LLM inference is an important ML research problem, as auto-regressive token generation is computationally…

1 day ago

How Fastweb fine-tuned the Mistral model using Amazon SageMaker HyperPod as a first step to build an Italian large language model

This post is co-written with Marta Cavalleri and Giovanni Germani from Fastweb, and Claudia Sacco…

1 day ago

Optimizing RAG retrieval: Test, tune, succeed

Retrieval-augmented generation (RAG) supercharges large language models (LLMs) by connecting them to real-time, proprietary, and…

1 day ago