Today, generative AI is giving organizations new ways to process and analyze data, discover hidden insights, increase productivity and build new applications. However, data sovereignty, regulatory compliance, and low-latency requirements can be a challenge. The need to keep sensitive data in certain locations, adhere to strict regulations, and respond swiftly can make it difficult to capitalize on the cloud’s innovation, scalability, and cost-efficiency advantages.
Google Distributed Cloud (GDC) brings Google’s AI services anywhere you need them — in your own data center or at the edge. Designed with AI and data-intensive workloads in mind, GDC is a fully managed hardware and software solution featuring a rich set of services. It comes in a range of extensible hardware form factors, with leading industry independent software vendor (ISV) solutions integrated via GDC Marketplace, and your choice of whether to run it connected to Google Cloud’s systems or air-gapped from the public internet.
In this blog post, we dive into the details of how GDC’s new AI-optimized servers with NVIDIA H100 Tensor Core GPUs and our gen AI search packaged solution — now available in preview — allow you to bring increasingly popular retrieval-augmented generation (RAG) to your on-premises environment, and unlock multimodal and multilingual natural-language search experiences across your text, image, voice, and video data.
GDC air-gapped now incorporates new servers with NVIDIA H100 GPUs, powered by the advanced NVIDIA Hopper architecture and the 5th Gen Intel Xeon Scalable processors. The new servers introduce the new GPU-optimized A3 VM family optimized for NVIDIA NVLink interconnect to GDC, enabling faster shared compute and memory for AI workloads using large language models (LLMs) with up to 100 billion parameters. It also extends the set of NVIDIA Multi-Instance GPU (MIG) profiles, supporting a variety of new GPU slicing schemes (both uniform and mixed-mode) and dynamic allocation of GPU resources to serve the needs of AI services with better ownership costs.
With GDC’s new gen AI Search solution, you get a ready-to-deploy, on-prem conversational search solution based on the Gemma 2 LLM with 9 billion parameters. You can easily ingest your sensitive on-prem data into the search solution and quickly find the most relevant information and content via natural language search, boosting employee productivity and knowledge sharing, while helping ensure that the search queries and data remain on-prem.
Responses also include citation links to your original documents so you can easily verify all answers to reduce hallucinations. Watch the demo below to see the solution in action:
For more accurate responses, the GDC gen AI search solution relies on a RAG architecture to combine the benefits of traditional search and generative AI, and user queries are augmented with relevant on-prem data before they’re sent to the LLM to generate responses. Other core integrations available out-of-box include Vertex AI pre-trained APIs (translation for 105 languages, speech-to-text for 13 languages, and optical character recognition for 46 supported and 24 experimental languages) for multimodal and multilingual data ingestion across text, images, and audio. It also includes the AlloyDB Omni database service for embeddings storage and semantic search across ingested data.
GDC’s open cloud approach also allows you to customize this solution according to your needs and swap any components as you see fit, including for other database services like Elasticsearch, other open-source models and LLMs, or your own proprietary models.
To join GDC’s gen AI search solution preview and experience how on-prem gen AI search can transform how your organization retrieves information, contact your Google account representative. Note that you will need a GDC deployment where you can deploy and run the preview.
Generative AI offers many benefits for both you, as a software provider, and your end-users.…
ChatGPT can now support more coding platforms and applications on its desktop apps, opening the…
An international team of scientists has used machine learning to help them develop perovskite solar…
Accelerating LLM inference is an important ML research problem, as auto-regressive token generation is computationally…
This post is co-written with Marta Cavalleri and Giovanni Germani from Fastweb, and Claudia Sacco…
Retrieval-augmented generation (RAG) supercharges large language models (LLMs) by connecting them to real-time, proprietary, and…