Today, the vast majority of data that gets generated in the world is unstructured (text, audio, images), but only a fraction of it ever gets analyzed. The AI pipelines required to unlock the value of this data are siloed from mainstream analytic systems, requiring engineers to build custom data infrastructure to integrate structured and unstructured data insights.
Our goal is to help you realize the potential of all your data, whatever its type and format. To make this easier, we launched the preview of BigQuery object tables at Google Cloud Next 2022. Powered by BigLake, object tables provide BigQuery users a structured record interface for unstructured data stored in Cloud Storage. With it, you can use existing BigQuery frameworks to process and manage this data using object tables in a secure and governed manner.
Since we launched the preview, we have seen customers use object tables for many use cases and are excited to announce that object tables are now generally available.
Analyzing unstructured data with BigQuery object tables
Object tables let you leverage the simplicity of SQL to run a wide range of AI models on your unstructured data. There are three key mechanisms for using AI models; all enabled through the BigQuery Inference engine.
First, you can import your models and run queries on the object table to process the data within BigQuery. This approach works well for customers looking for an integrated BigQuery solution that allows them to utilize their existing BigQuery resources. Since the preview, we’ve expanded support beyond TensorFlow models with TF-Lite and ONNX models and introduced new scalar functions to pre-process images. We also added support for saving pre-processed tensors to allow for efficient multi-model use of tensors to help you reduce slot usage.
Second, you can choose from various pre-trained Google models such as Cloud Vision API, Cloud Natural Language API, and Cloud Translation API, for which we have added pre-defined SQL table valued functions that invoke when querying an object table. The results of the inference are stored as a BigQuery table.
Third, you can integrate customer-hosted AI models or custom models built through Vertex AI using remote functions. You can call these remote functions from BigQuery SQL to serve objects to models, and the results are returned as BigQuery tables. This option is well suited if you run your own model infrastructure such as GPUs, or have externally maintained models.
During the preview, customers used a mix of these integration mechanisms to unify their AI workloads with data already present in BigQuery. For example, Semios, an agro-tech company, uses imported and remote image processing models to serve precision agriculture use cases.
“With the new imported model capability with object table, we are able to import state-of-the-art Pytorch vision models to process image data and improve in-orchard temperature prediction using BigQuery. And with the new remote model capability, we can greatly simplify our pipelines and improve maintainability.” – Semios
Storage insights, fine-grained security, sharing and more
Beyond processing with AI models, customers extending existing data management frameworks to unstructured data, resulting in several novel use cases such as:
Cloud Storage insights – Objects tables provide an SQL interface to Cloud Storage metadata (e.g., storage class), making it easy to build analytics on Cloud Storage usage, understand growth, optimize costs, and inform decisions to better manage data.
Fine-grained access control at scale – Object tables are built on BigLake’s unified lakehouse infrastructure and support row- and column-level access controls. You can use it to secure specific objects with governed signed URLs. Fine-grained access control has broad applicability for augmenting unstructured data use cases, for example securing specific documents or images based on PII inferences returned by the AI model.
Sharing with Analytics Hub – You can share object tables, similar to BigLake tables, via Analytics Hub, expanding the set of sharing use cases for unstructured data. Instead of sharing buckets, you now get finer control over the objects you wish to share with partners, customers, or suppliers.
Run generative AI workloads using object tables (Preview)
Members of Google Cloud AI’s trusted tester program can use a wide range of generative AI models available in Model Garden to run on the object table. You can use Generative AI studio to decide on a foundation model of your choice or fine-tune it to deploy a custom API endpoint. You can then call this API using BigQuery using the remote function integration to pass prompts/inputs and return the text results from Language Learning Models (LLM) in a BigQuery table. In the coming months, we will enable SQL functions through the BigQuery Inference engine to call LLMs directly, further simplifying these workloads.
To get started, follow along with a guided lab or tutorials to run your first unstructured data analysis in BigQuery. Learn more by referring to our documentation.
Acknowledgments: Abhinav Khushraj, Amir Hormati, Anoop Johnson, Bo Yang, Eric Hao, Gaurangi Saxena, Jeff Nelson, Jian Guo, Jiashang Liu, Justin Levandoski, Mingge Deng, Mujie Zhang, Oliver Zhuang, Yuri Volobuev and rest of the BigQuery engineering team who contributed to this launch.