Document AI Workbench is now Generally Available to train document extraction models for your production use cases

Each day, more documents are created and used across companies to make decisions. However, the value in these documents is primarily expressed as unstructured data, which makes the value difficult and manually intensive to extract and use for business processes. 

As the number and variety of documents used by businesses proliferate, machine learning (ML) solutions need to be more flexible to handle the broader set of use cases. That’s why we introduced the first Document AI Workbench model, Custom Document Extractor (CDE), in Public Preview at Google Cloud Next ‘22. CDE makes it fast and easy to apply ML to virtually any document-based workflow to extract structured data from unstructured document types, to automate business processes. 

CDE lets developers and analysts use their own data to train models and extract fields from documents needed for the business. CDE lets organizations build models faster and with less data — thus accelerating time-to-value for processing and analysis of data in documents.

Today, we announce that Document AI Workbench is Generally Available (GA), open to all customers, ready for production use through APIs and the Google Cloud Console. Document AI Workbench is covered by the Document AI SLA — online and batch document prediction is supported with >=99.9% uptime. Furthermore, Document AI Workbench is now covered by Google Cloud’s GA product terms. For example, Google will notify customers at least 12 months before significantly modifying a customer-facing Google API in a backwards-incompatible manner.

In this blog post, we’ll explore ways customers are already using CDE and Document AI Workbench’s updated capabilities.

What users are saying about Workbench

Deliver higher model accuracy with Workbench

Users leverage Workbench to ultimately save time and money. A third party evaluated Document AI Workbench and concluded that it extracts data more accurately1 than several competing products for document types with variable layouts (e.g. invoice, receipt, bank statements, paystubs). Better accuracy drives higher automation rates, helping Workbench users save time and money.  

Chris Jangareddy, managing director for Artificial Intelligence & Data at Deloitte Consulting LLP said, “Google Cloud Document AI is a leading document processing solution packed with rich features like multi-step classify and text extraction to automate sorting, classification, extraction, and quality assurance. By combining Document AI with Workbench, Google Cloud has created a forward-thinking and powerful AI platform for intelligent document processing that will allow for process transformation at an enterprise scale with predictable outcomes that can benefit businesses.”

Mansoor Khan, CEO of OneClinic said, “We help medical professionals scale their clinics through automation. We used Google’s Document AI Workbench to create a model to automatically extract data from patients’ insurance cards as part of our patient check-in software. Workbench is easy to use and we are really happy with the model accuracy — it extracts data more accurately than what we would expect from human data entry.”

Rajnish Palande, VP, Google Business Unit for BFSI, TCS said, “The Google Cloud Document AI Workbench leverages artificial intelligence (AI) to manage and glean insights from unstructured data. The Workbench brings together the power of classification, auto-annotation, page-number identification and multi-language support to help organizations rapidly deliver enhanced accuracy, improved operational efficiency, higher confidence in the information extract, and increased return on investment.” 

Build production ready models faster with Workbench

Document AI Workbench helps users create machine learning models faster. For example, a third-party evaluation shows Document AI Workbench trains machine learning models up to 3x faster than a leading competitor. This is an important improvement which lowers total cost of ownership and increases value.

Dallas Dolen, Partner, Google Alliance Leader at PwC said, “Google Document AI Workbench helps to accelerate our custom parser models training as well as improves accuracy and performance using a custom document extractor with human in the loop. It helps us solve complex business problems for our clients in the financial services and healthcare industries.”

Ziang Jia, Senior DocAl Development Lead at Resultant, said, “Document AI Workbench has unlocked a brand-new machine learning development experience for information extraction solutions. Its simplicity and robustness enabled us to build models and deliver a highly accurate outcome in an agile way for a large government agency. We couldn’t be more impressed by its simplicity and robustness and are excited to see how the product will evolve in the future.”

Sean Earley, VP of Delivery Services of Zencore said, “Document AI Workbench allows us to develop highly accurate document parsing models in a matter of days. Our customers have automated tasks that formerly required significant human labor. For example, using Document AI Workbench, a team of two trained a model to split, classify and extract data from 15 document types to automate Home Mortgage Disclosure Act reporting. The mean trained model accuracy was 94%, drastically reducing the operational cost of our customer’s compliance reporting procedures.”

What’s new with Document AI Workbench 

The latest Workbench capabilities make it even easier to train and deploy an extraction model: 

  • With Workbench’s public APIs, you can programmatically create, delete, train, evaluate and deploy models.

  • Our updated dataset management tools automatically detect and create existing schema labels from your pre-annotated documents. They also provide you more flexibility when creating and managing schema.

  • Our new DocAI Toolkit includes a labeled document converter so that you can easily convert your labeled documents to DocAI’s format and start training faster.

  • We’ve reduced the cognitive load for labelers with efficiency enhancements to our Labeling UI.    

  • The revamped Processor Gallery helps you quickly identify the best model for your use case.

 What’s next for Document AI Workbench

We continue to invest in Document AI Workbench to help you automate document processing. Here are a few things we’re working on that we’re excited about:

  • Classify document types with the Custom Document Classifier (CDC), coming soon in public preview

  • Copy processor versions across projects and processors to streamline managing development and production environments

  • Support larger documents (e.g., longer than 50 pages) so you can process a wider array of documents

  • Broader (non-latin) language support–equivalent to Document AI OCR

  • And many more investments, using state of the art technology, to help you build world class models faster to automate document processing

Document AI Workbench is in GA and ready for production workloads. Learn more via Document AI Workbench documentation or try it out in the Google Cloud Console.


Acknowledgements: Tomas Moreno, Outbound Product Manager, Lukas Rutishauser, Software Engineering Manager, Michael Kwong, Software Engineering Manager, Rajagopal Janani, Software Engineering Manager, Michael Lanning, UX Designer.

1. When trained with 200+ documents