Developers and engineers face several major challenges when scaling AI/ML workloads. One challenge is getting access to the AI infrastructure they need. AI/ML workloads require a significant amount of computational resources, such as CPUs and GPUs. Developers need to have sufficient resources to run their workloads. Another challenge is handling the diverse patterns and programming interfaces required for effective AI/ML workload scaling. Developers may need to adapt their code to run efficiently on the specific infrastructure they have available. This can be a time-consuming and complex task.
To address these challenges, Ray provides a comprehensive and easy-to-use Python distributed framework. With Ray, you configure a scalable cluster of computational resources and utilize a collection of domain-specific libraries to efficiently distribute common AI/ML tasks like training, serving, and tuning.
Today, we are thrilled to announce our seamless integration of Ray, a powerful distributed Python framework, with Google Cloud’s Vertex AI is generally available. This integration empowers AI developers to effortlessly scale their AI workloads on Vertex AI’s versatile infrastructure, which unlocks the full potential of machine learning, data processing, and distributed computing.
Why Ray on Vertex AI?
Accelerated and Scalable AI Development: Ray’s distributed computing framework provides a unified experience for both generative AI and predictive AI, which seamlessly integrates with Vertex AI’s infrastructure services. Scale your Python-based machine learning, deep learning, reinforcement learning, data processing, and scientific computing workloads from a single machine to a massive cluster, so you can tackle even the most demanding AI challenges without the complexity of managing the underlying infrastructure.
Unified Development Experience: Integrating Ray’s ergonomic API with Vertex AI SDK for Python, AI developers can now seamlessly transition from interactive prototyping on their local development environment or in Vertex AI Colab Enterprise to production deployment on Vertex AI’s managed infrastructure with minimal code changes.
Enterprise-Grade Security: Vertex AI’s robust security features, including VPC Service Controls, Private Service Connect, and Customer-Managed Encryption Keys (CMEK), can help safeguard your sensitive data and models while leveraging the power of Ray’s distributed computing capabilities. Vertex AI’s comprehensive security framework can help ensure that your Ray applications comply with strict enterprise security requirements.
Get started with Ray and Vertex AI
Let’s assume that you want to tune a small language model (SML) such as Llama or Gemma. To fine-tune Gemma using Ray on Vertex AI, first you need a Ray cluster on Vertex AI, which Ray on Vertex AI lets you create in just a few minutes, using either the console or the Vertex AI SDK for Python. You can monitor the cluster either by leveraging the integration with Google Cloud Logging or using the Ray Dashboard.
Currently, Ray on Vertex AI supports Ray 2.9.3. Moreover, you can define a custom image, providing more flexibility in terms of the dependencies included in your Ray cluster.
After you get your Ray cluster running, using Ray on Vertex AI for developing AI/ML applications is straightforward. The process can vary based on your development environment. You can establish a connection to the Ray cluster and run your application interactively by using the Vertex AI SDK for Python either within Colab Enterprise or any IDE you prefer. Alternatively, you have the option to create a Python script and submit it to the Ray cluster on Vertex AI programmatically using the Ray Jobs API as you can see below.
Using Ray on Vertex AI for developing AI/ML applications offers various benefits. In this scenario, you can leverage Vertex AI TensorBoard for validating your tuning jobs. Vertex AI TensorBoard provides a managed TensorBoard service that enables you to track, visualize, compare your tuning jobs, and collaborate effectively with your team. Also, you can use Cloud Storage to conveniently store model checkpoints, metrics and more. This allows you to quickly consume the model for AI/ML downstreaming tasks including generating batch predictions using Ray Data, as you can see in the following code.
- code_block
- <ListValue: [StructValue([(‘code’, ‘# Librariesrnimport datasetsrnimport ray rnrninput_data = datasets.load_dataset(dataset_id)rnray_input_data = ray.data.from_huggingface(input_data)rnrnpredictions_data = ray_input_data.map_batches(rn Summarizer,rn concurrency=config[“num_gpus”],rn num_gpus=1,rn batch_size=config[‘batch_size’])rnrn# Store resulting predictionsrnpredictions_data.write_json(‘your-bucker-uri/preds.json’, try_create_dir=True)’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3eb5b992e430>)])]>
How HEB and eDreams scale AI on Ray on Vertex AI
As any large operation, but especially in grocery stores, getting an accurate forecasting of the demand is directly related with the profitability of the business. It’s hard enough to get the forecasting right with one item, but imagine having millions of items to forecast for hundreds of stores. Scaling the forecasting model is not an easy task. H-E-B, one of the largest grocery chains in the US, uses Ray on Vertex AI to achieve speed, reliability, and cost savings.
“Ray has enabled us to achieve transformative efficiencies that have been critical to our business. We especially appreciate Ray’s easy to use API and enterprise capabilities,” said Philippe Dagher, Principal Data Scientist at H-E-B. “We are excited about the increased accessibility to Vertex AI’s infrastructure that the integration of Ray on Vertex presents, so much that we have chosen it as our production ML platform.”
eDreams ODIGEO, the world’s leading travel subscription platform and one of the largest e-commerce businesses in Europe, offers the best quality products in regular flights, low-cost airlines, hotels, dynamic packages, car rental and travel insurance to make travel easier, more accessible, and better value for consumers across the globe. The company processes 100 million daily user searches, combining travel options from nearly 700 global airlines and 2.1 million hotels enabled by 1.8 billion daily machine learning predictions.
The eDreams ODIGEO Data Science team are currently using Ray on Vertex AI to train their ranking models to enable the best travel experiences for you at the best price with minimum effort.
José Luis González, eDreams ODIGEO Data Science Director, said, “We are creating the best ranking models, personalized to the preferences of our 5.4 million Prime customers at scale, with the largest base of accommodation and flight options. With Ray on Vertex AI taking care of the infrastructure for distributed hyper-parameter tuning, we are focusing on building the best experience to drive better value for our customers.”
What’s next
Are you trying to scale AI/ML applications but you are struggling with it? Start by creating a Ray cluster on Vertex AI in the Google Cloud console – new customers get $300 in free credits on signup.
Ray on Vertex AI will empower you to build innovative and scalable applications. With Ray on Vertex AI, it’s never been easier to scale both your Gen AI and Predictive workloads on Vertex AI and unlock new possibilities for your organizations!
If you want to know more about Ray on Vertex AI, join the vibrant Ray community and Vertex AI Google Cloud community to share your experiences, ask questions, and collaborate on new projects. Also check out the following resources:
- Documentation
- Github samples
- Community blog posts