BigQuery provides access to a variety of LLMs for text and embedding generation, including Google’s Gemini models, Google-managed models from partners like Anthropic and Mistral. Using Gemini models and Google-managed partner models in BigQuery is simple — just create the model with the foundation model name and run inference directly in SQL queries. Today, we are bringing this same simplicity and power to any model you may choose from Hugging Face or Vertex AI Model Garden.
A SQL-native workflow with automated management
With the launch of managed third-party generative AI inference in BigQuery (Preview), you can now run open models using just two SQL statements.
This new capability delivers four key benefits:
- Simplified deployment: Deploy open models using a single
CREATE MODELSQL statement with the model id string (e.g.,google/gemma-3-1b-it). BigQuery automatically provisions the compute resources with default configurations. - Automated resource management: BigQuery automatically releases idle compute resources, preventing unintended costs. You can configure idle time via
endpoint_idle_ttl. - Granular resource control: You can customize backend computing resources (like machine types and min/max replicas) directly within your
CREATE MODELstatement to meet your performance and cost needs. - Unified SQL interface: The entire workflow — from model creation and inference to cost management and cleanup — is managed directly in BigQuery using SQL.
How it works: A practical example
Let’s take a look at the process of creating and utilizing an open model.
Step 1: Create a BigQuery managed open model
To use an open model from Hugging Face or Vertex AI Model Garden, use a CREATE MODEL statement along with the open model ID. It typically takes a few minutes for the query to complete, depending on the model size and machine types.
Hugging Face models
Specify the option hugging_face_model_id in the format of provider_name/model_name. For example, sentence-transformers/all-MiniLM-L6-v2 .
- code_block
- <ListValue: [StructValue([(‘code’, “CREATE OR REPLACE MODEL my_dataset.managed_embedding_modelrnREMOTE WITH CONNECTION DEFAULTrnOPTIONS (rn hugging_face_model_id = ‘sentence-transformers/all-MiniLM-L6-v2’rn);”), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x7f80bf7840a0>)])]>
Vertex AI Model Garden models
Specify the option model_garden_model_name in the format publishers/publisher/models/model_name@model_version. For example, publishers/google/models/gemma3@gemma-3-1b-it .
- code_block
- <ListValue: [StructValue([(‘code’, “CREATE OR REPLACE MODEL my_dataset.managed_text_modelrnREMOTE WITH CONNECTION DEFAULTrnOPTIONS (rn model_garden_model_name = ‘publishers/google/models/gemma3@gemma-3-1b-it’rn);”), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x7f80bf76e640>)])]>
For demanding workloads, you can customize deployment settings (machine types, replica counts, endpoint idle time) to improve scalability and manage costs. You can also use Compute Engine reservations to secure GPU instances for consistent performance. See CREATE MODEL syntax for all the options.
Step 2: Run batch inference
Once the above CREATE MODEL job finishes, you can use it with AI.GENERATE_TEXT (for LLM inference) or AI.GENERATE_EMBEDDING (for embedding generation) on your data in BigQuery.
- code_block
- <ListValue: [StructValue([(‘code’, “– For embedding generationrnSELECT *rnFROMrn AI.GENERATE_EMBEDDING(rn MODEL my_dataset.managed_embedding_model,rn (rn SELECT text AS contentrn FROM bigquery-public-data.hacker_news.fullrn WHERE text != ”rn LIMIT 10rn ));rnrn– For LLM inferencernSELECT *rnFROMrn AI.GENERATE_TEXT(rn MODEL my_dataset.managed_text_model,rn (rn SELECT ‘Summarize the text: ‘ || text AS promptrn FROM bigquery-public-data.hacker_news.fullrn WHERE text != ”rn LIMIT 10rn ));”), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x7f80bf76e6a0>)])]>
Vertex AI endpoint lifecycle management and cost control
BigQuery offers flexible controls over Vertex AI endpoint lifecycle and costs through both automated and manual options.
Automated control: The
endpoint_idle_ttloption enables automated resource recycling. If the model isn’t used for the specified duration (e.g.,INTERVAL 10 HOUR), BigQuery automatically “undeploys” the Vertex AI endpoint for you, stopping all costs.- Manual control: You can also manually “undeploy“ an endpoint to immediately stop the cost, or redeploy an endpoint using a simple
ALTER MODELstatement.
- code_block
- <ListValue: [StructValue([(‘code’, ‘– Manually undeploy the model to save costsrnALTER MODEL my_dataset.managed_embedding_modelrnSET OPTIONS(deploy_model = FALSE);rnrn– Manually redeploy the model for the next inference job.rnALTER MODEL my_dataset.managed_embedding_modelrnSET OPTIONS(deploy_model = TRUE);’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x7f80bf76ee80>)])]>
Easy resource cleanup
When you are done with using a model, simply drop it. BigQuery automatically cleans up all associated Vertex AI resources (like the endpoint and model) for you, so you are no longer charged for them.
- code_block
- <ListValue: [StructValue([(‘code’, ‘– Model deletion and all backend resource cleanuprnDROP MODEL my_dataset.managed_embedding_model;’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x7f80bf76e490>)])]>
Get started today
BigQuery’s new managed inference capability for 3P models fundamentally changes how data teams access and use third-party gen AI models. By consolidating the entire model lifecycle management into a familiar SQL interface, we’re removing the operational friction and making powerful open models accessible to every BigQuery user, from data analysts to AI/ML engineers. For comprehensive documentation and tutorials, please refer to the following resources:
Read the documentation: Creating automatically-deployed open models
Try the text generation tutorial: Generate text with the Gemma model
Try the embedding generation tutorial: Generate text embeddings with open models
We look forward to seeing what you build!