Transforming customer feedback: analyzing audio customer reviews with BigQuery ML’s speech-to-text

BigQuery’s integrated speech-to-text functionality offers a powerful tool for unlocking valuable insights hidden within audio data. This service transcribes audio files, such as customer review calls, into text format, making them ready for analysis within BigQuery’s robust data platform. By combining speech-to-text with BigQuery’s analytics capabilities, you can delve into customer sentiment, identify recurring product issues, and gain a better understanding of the voice of your customer.

BigQuery speech-to-text transforms audio data into actionable insights, offering potential benefits across industries and enabling a deeper understanding of customer interactions across multiple channels. You can also use BigQuery ML to leverage Gemini 1.0 Pro to gain additional insights & data formatting such as entity extraction and sentiment analysis to the text extracted from audio files using BigQuery ML’s native speech-to-text capability. Below are some use cases and the business value for specific industries:


Use Cases

Business Potential


Analyzing customer call recordings to identify common pain points, product preferences, and overall sentiment

  • Improved product development by addressing issues mentioned in feedback

  • Enhanced customer service through personalization and targeted assistance

  • Enhanced marketing campaigns based on insights discovered in customer calls.


Transcribing patient-doctor interactions to automatically populate medical records, summarize diagnoses, and track treatment progress

  • More streamlined workflows for healthcare providers, reducing administrative burden

  • Comprehensive patient records for better decision-making

  • Potential identification of trends in patient concerns for research and improved care


Analyzing earnings calls and shareholder meetings to gauge market sentiment, identify potential risks, and extract key insights

  • Support for more informed investment decisions

  • Prompt identification of emerging trends or potential issues

  • Proactive Risk Management strategies

Media & Entertainment

Transcribing podcasts, interviews, and focus groups for content analysis and audience insights

  • Earlier identification of trending topics and themes for new content creation

  • Understanding audience preferences for program development or advertising

  • Accessibility improvements through automated closed-captioning

Using advanced AI features such as BigQuery ML, you still have access to all of the built-in governance features of BigQuery, which give you the ability to have access control passthrough, so you can restrict insights from customer audio files based upon row-level security you have on your BigQuery Object Table. 

Ready to turn your audio data into insights? Let’s dive into how you can use speech-to-text in BigQuery:

Imagine you have a collection of customer feedback calls stored as audio files in a Google Cloud Storage bucket. BigQuery’s ML.TRANSCRIBE function, connected to a pre-trained speech-to-text model hosted on Google’s Vertex AI platform, lets you automatically convert these audio files into readable text within BigQuery. Think of it as a specialized translator for audio data. You tell the ML.TRANSCRIBE function where your audio files are located (in your object table) and which speech-to-text model to use. It then handles the transcription process, using the power of machine learning, and delivers the text results directly into BigQuery. This makes it easy to analyze customer conversations alongside other business data.

Let’s walk through the process together in BigQuery.

Setup instructions:

  1. Before starting, choose your Google Cloud project, link a billing account, and enable the necessary API, full instructions here

  2. Create a recognizer, a recognizer stores the configuration for speech recognition and is optional to create

  3. Create a cloud resource connection and get the connection’s service account, full guide here

  4. Grant access to the service account by following the steps here

  5. Create a dataset that will contain the model and the object table by following the steps here

  6. Download and store the audio files in the Google Cloud Storage 

    1. Download 5 audio files from here

    2. Create a bucket in Google Cloud Storage and a folder within the bucket

    3. Upload the downloaded audio files in the folder

Create a model

Create a remote model with a REMOTE_SERVICE_TYPE of CLOUD_AI_SPEECH_TO_TEXT_V2. A model makes the speech to text API available within BigQuery.


<ListValue: [StructValue([(‘code’, “CREATE OR REPLACE MODELrn`PROJECT_ID.DATASET_ID.MODEL_NAME`rnREMOTE WITH CONNECTION `PROJECT_ID.REGION.CONNECTION_ID`rnOPTIONS (rn REMOTE_SERVICE_TYPE = ‘CLOUD_AI_SPEECH_TO_TEXT_V2’,rn SPEECH_RECOGNIZER = ‘projects/PROJECT_NUMBER/locations/LOCATION/recognizers/RECOGNIZER_ID’rn);”), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e122c09d400>)])]>

Example query:

<ListValue: [StructValue([(‘code’, “CREATE OR REPLACE MODELrn`demo_project.speech_to_text_demo_dataset.speech_to_text_bq_model`rnREMOTE WITH CONNECTION ``rnOPTIONS (rn REMOTE_SERVICE_TYPE = ‘CLOUD_AI_SPEECH_TO_TEXT_V2’rn);”), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e122c09dc40>)])]>

Create an object table to reference the audio files


<ListValue: [StructValue([(‘code’, “CREATE EXTERNAL TABLE `PROJECT_ID.DATASET_ID.TABLE_NAME`rnWITH CONNECTION `PROJECT_ID.REGION.CONNECTION_ID`rnOPTIONS(rn object_metadata = ‘SIMPLE’,rn uris = [‘BUCKET_PATH'[,…]],rn max_staleness = STALENESS_INTERVAL,rn metadata_cache_mode = ‘CACHE_MODE’);”), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e122c09dca0>)])]>

Sample code:

Please replace 'BUCKET_PATH'  with your Google Cloud Storage bucket/folder path where audio files are stored

<ListValue: [StructValue([(‘code’, “CREATE EXTERNAL TABLE `demo_project.speech_to_text_demo_dataset.demo_obj_tb`rnWITH CONNECTION ``rnOPTIONS(rn object_metadata = ‘SIMPLE’,rn uris = [‘BUCKET_PATH’],rn max_staleness = INTERVAL 1 DAY,rn metadata_cache_mode = ‘AUTOMATIC’);”), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e122c09dd30>)])]>

Transcribe audio files using BigQuery ML


<ListValue: [StructValue([(‘code’, “SELECT *rnFROM ML.TRANSCRIBE(rn MODEL `PROJECT_ID.DATASET_ID.MODEL_NAME`,rn TABLE `PROJECT_ID.DATASET_ID.OBJECT_TABLE_NAME`,rn RECOGNITION_CONFIG => ( JSON ‘recognition_config’)rn);”), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e122c09dc10>)])]>

Sample query:

<ListValue: [StructValue([(‘code’, ‘SELECT *rnFROM ML.TRANSCRIBE(rn MODEL `demo_project.speech_to_text_demo_dataset.speech_to_text_bq_model`,rn TABLE `demo_project.speech_to_text_demo_dataset.demo_obj_tb`,rn RECOGNITION_CONFIG =>( JSON ‘{“language_codes”: [“en-US” ],”model”: “telephony”,”auto_decoding_config”: {}}’)rn);’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e122c09ddc0>)])]>


The results of ML.TRANSCRIBE include these columns:

  • transcripts: Contains the text transcription of the processed audio files

  • ml_transcribe_result: JSON value that contains the result from the Speech-to-Text API

  • ml_transcribe_status: Contains a string value that indicates the success or failure of the transcription process for each row. It will be empty if the process is successful

  • The object table columns


The ML.TRANSCRIBE function eliminates the need for manual transcription, saving time and effort. Transcribed text becomes easily searchable and analyzable within BigQuery, enabling you to extract valuable insights from your audio data.

Follow-up Ideas

  • Take the text extracted from the audio files, and use Gemini 1.0 Pro with BigQuery ML’s ML.generate_text function, to extract entities such as product names, stock prices, or other types of entity data you are looking to extract and structure them in JSON.

  • Use Gemini 1.0 Pro with BigQuery ML to measure sentiment analysis of the extracted text, and structure positive & negative sentiments in JSON.

  • Join customer feedback verbatims & sentiment scores with Customer Lifetime Total Value score or other relevant customer data to see how quantitative data & qualitative data relate to each other. 

  • Generate embeddings over the extracted text, and use vector search to search the audio files for specific content.

Curious to learn more? The official Google Cloud documentation on ML.TRANSCRIBE has all the details. Please also check out the blog on Gemini 1.0 Pro support for BigQuery ML to see other GenAI use cases as outlined in the Follow-up ideas.