Announcing new BigQuery inference engine to bring ML closer to your data

Organizations worldwide are excited about the potential of Artificial Intelligence and Machine Learning capabilities. However, according to HBR, only 20% see their ML models go into production because ML often is deployed separately from their core data analytics environment. To bridge this increasing gap between data and AI, organizations need to build massive data pipelines, hire resources skilled in Python and other advanced coding languages, manage governance, and scale deployment infrastructure. This approach to harnessing ML is expensive and exposes several security risks.

BigQuery ML addresses this gap by bringing ML directly to your data. Since BigQuery ML became generally available in 2019, customers have run hundreds of millions of prediction and training queries on it, and its usage grew by over 200% YoY in 2022. 

Today, we are announcing BigQuery ML inference engine, which allows you to run predictions not only with popular models formats directly in BigQuery but also using remotely hosted models and Google’s state of the art pretrained models. This is a major step towards seamless integration of predictive analytics in a data warehouse. With this new feature, you can run ML inferences across:

  • Imported custom models trained outside of BigQuery with a variety of formats (e.g. ONNX, XGBoost and TensorFlow)

  • Remotely hosted models on Vertex AI Prediction

  • State-of-the-art pretrained Cloud AI models (e.g. Vision, NLP, Translate and more)

All these capabilities are available right within BigQuery where your data resides. This eliminates the need for data movement, reducing your costs and security risks. You can harness a broad range of ML capabilities using familiar SQL without knowledge of advanced programming languages. And with BigQuery’s petabyte scale and performance, you don’t have to worry about setting up serving infrastructure. It just works, regardless of the workload!

1 BigQuery inference engine.jpg

Let’s look at each of these capabilities in detail.

Imported custom models trained outside of BigQuery

BigQuery ML can import models that were trained outside of BigQuery. Previously limited to TensorFlow models, we are now expanding this to TensorFlow Lite, XGBoost and ONNX. For example, you can convert many common ML frameworks, such as PyTorch and scikit-learn, into ONNX and then import them into BigQuery ML. This allows you to run predictions on state-of-the-art models that were developed elsewhere directly within BigQuery — without moving your data. By running inference inside BigQuery, you get better performance by leveraging BigQuery’s distributed query engine for batch inference tasks.

2 BigQuery inference engine.jpg

Here’s a basic workflow:

  • Store a pre-trained model artifact in a Cloud Storage bucket

  • Run the CREATE MODEL statement to import the model artifact into BigQuery

  • Run a ML.PREDICT query to make predictions with the imported model

3 BigQuery inference engine.jpg

Inference on remote models 

Some models need unique serving infrastructure to handle low-latency requests and a large number of parameters. Vertex AI endpoints makes this easy by auto-scaling to handle requests and providing access to accelerate compute options with GPU and multi-GPU serving nodes. These endpoints can be configured for virtually limitless model types, with many options for pre-built containers, custom containers, custom prediction routines, and even NVIDIA Triton Inference Server. Now, you can do inference with these remote models from right inside BigQuery ML.

Here’s a basic workflow:

  • Host your model on a Vertex AI endpoint

  • Run CREATE MODEL statement in BigQuery pointing to the Vertex AI endpoint

  • Use ML.PREDICT to send BigQuery data to run inference against the remote Vertex AI endpoint and get the results back to BigQuery

4 BigQuery inference engine.jpg

Inference on Vertex AI APIs with unstructured data

Earlier this year, we announced BigQuery ML’s support of unstructured data such as images. Today, we are taking it one step further by enabling you to run inferences on Vertex AI’s state-of-the-art pretrained models for images (Vision AI), text understanding (Natural Language AI), and translation (Translate AI) right inside BigQuery. These models are available using their own unique prediction functions directly within BigQuery ML inference engine. These APIs take text or images as input and return JSON responses that are stored in BigQuery with the JSON data type.

5 BigQuery inference engine.jpg

Here’s a basic workflow: 

  • If working with images, first create an Object Table with your images. This step is not required if you are working with text in BigQuery.

  • Run the CREATE MODEL statement and use a remote connection along with the Vertex AI model type as a parameter.

  • Use one of the functions below to send BigQuery data to Vertex AI to get inference results.

    • ML.ANNOTATE_IMAGE 

    • ML.TRANSLATE  

    • ML.UNDERSTAND_TEXT 

Get started

By extending support to run inferences on a broad range of open source and other platform hosted models, BigQuery ML makes it simple, easy and cost effective to harness the power of machine learning for your business data. To learn more about these new features, check out the documentation, and be sure to sign up for early access. 


Googlers Firat Tekiner, Jiashang Liu, Mike Henderson, Yunmeng Xie, Xiaoqiu Huang, Bo Yang, Mingge Deng, Manoj Gunti and Tony Lu contributed to this blog post. Many Googlers contributed to make these features a reality.