For several years, businesses have used Google Cloud’s Document AI to achieve faster, more accurate document processing, and improve the ways they process invoices, customer forms, and deliver services that rely on documents.
Generative AI is transforming enterprise document processing by letting users input natural language prompts to classify, extract, and get deeper insights from documents, all with high accuracy and limited-to-no machine learning (ML) training. We’re pleased to bring generative AI to Document AI, unlocking powerful and more efficient ways for organizations to structure, manage, and get insights from documents.
Announcing generative AI-powered extraction and summarization in Document AI Workbench
Document AI Workbench enables users to customize models for document processing tasks. In February 2023, we launched the Custom Extractor in General Availability (GA) to help users extract structured data from documents. In March 2023, we launched the Custom Classifier in GA to help automatically classify document types. In July, we launched the Custom Splitter in GA to help automatically split and classify multiple documents within a single file.
At Google Next ’23, we built on this momentum by announcing the public preview launches of two generative AI-powered features in Document AI Workbench: a version of Custom Extractor that uses foundation models, and Summarizer.
Generative AI-powered extraction can help pull data from documents with lots of free form text (like contracts), complex layouts (such as invoices, w2s, and bills of lading), or little or no training data available. Now that a foundation model is available in the Custom Extractor, users can call the endpoint with any document and get structured data quickly, without any configuration required.
Summarizer can be used out of the box without training to provide summaries for documents up to 250 pages long. Most generative AI solutions do not have context windows that can support long documents, requiring that information be broken into small chunks, but Summarizer removes these concerns, making it easy to generate custom summaries based on the user’s preferred length and format.
Here’s what some customers with early access are saying:
“Deutsche Bank (DB) divisions are digitizing high volume documents and extracting data using the Document AI Workbench Custom Extractor for simple, scalable use cases such as KYC and payment forms. Automating the content review process leads to reduced operational risk, increased capacity and a better customer experience. With the introduction of generative AI to Workbench, we hope to automate more complex documents with reduced time to train a model and explore new use cases for faster intelligence such as Q&A and summarization.”
-Inwha Huh, Managing Director – Corporate and Investment Bank Transformation, Deutsche
“BBVA is committed to providing our customers with the best possible experience, and that includes using AI to automate our business processes. By using generative AI now available in Document AI Workbench, we will extract data in complex, highly dense and non-structured documents and prevent errors and potential fraud. This will allow us to provide our customers with a faster, more accurate, and more secure service.”
– Antonio Valle, Global Head of Intelligent Process Automation, BBVA
You can read more about how many other customers are using Document AI Workbench and its generative AI features in this detailed blog post. To get started, customers can visit Document AI Workbench within the Google Cloud Console or view our Custom Extractor and Summarizer demo videos online.
Announcing generative AI-powered search in Document AI Warehouse
In October 2022, we announced the GA of Document AI Warehouse, a fully managed cloud-native service to search, store, and govern documents and their extracted data.
We’re now bringing customers the best of Enterprise Search technology from Google Cloud integrated into Document AI Warehouse, where users can retrieve documents containing answers to their natural language questions. Generative AI also helps summarize the answer from each document, which saves users hours in finding the right answer, as shown in the animated image.
Here are the new features, powered by generative AI to help organizations better manage their documents:
- Generative AI search box and answer Snippets: This feature returns up to the top five documents containing search results, along with the snippets from these documents.
- Grounded summary answers: With generative AI support, users can get a summary from the first (i.e., most relevant) document found in their search. By clicking on the other document links, users can generate summary answers from the other relevant documents as well. The answers are grounded in the text found in the document, to mitigate LLM hallucination. The user can click on each document to review the answer generated from the document text.
- Cross-document analysis and summarized answers: When searching for a particular topic, users often need answers composed from information in multiple documents or to summarize or compare answers across multiple documents. Users can select a handful of short-listed documents and summarize or compare the answer from content across them.
- Access control: Access Control Lists on the documents are enforced in generative AI powered search, which means users need View Access in order to see the document snippets in the Search results and to generate answers to questions from the document.
- Confidence Scores: How can organizations measure the quality of search results generated for queries? Document AI Warehouse now offers confidence scores for the top results in the API and UI, which can be used for analytics or data science operations.
- Text and Faceted Search: For users who want to search on specific text, Document AI Warehouse continues to support semantic text search and filtering/faceted search capabilities in the product. There’s a separate search box for this (labeled “Filters” in the UI), different from the generative AI search box.
- Full integration with Document AI Processors: Since Document AI Warehouse is part of the Document AI solutions suite, structured documents can be run through a bulk extraction and ingestion pipeline where entities are extracted from documents via Document AI processors.
More details can be found in the Feature Documentation for Trusted Tester Program members with Private Preview access.
Announcing a new specialized model and advanced AI add ons in Enterprise Document OCR v2.0
The combination of LLMs and Optical Character Recognition (OCR) marks a significant advancement in data processing and analysis. By leveraging LLMs’ ability to understand context and OCR’s text and layout extraction capabilities, businesses can unlock valuable insights from data and streamline workflows. Enterprise Document OCR v2.0 represents the latest evolution in Document AI’s OCR technology, offering businesses a powerful extraction tool for better downstream processing.
With Enterprise Document OCR v2.0, users can take advantage of:
- Google’s specialized OCR model: Our latest OCR model is designed for diverse document use cases, enhancing read order precision and recognition of over 200 languages.
- Visual element detectors: Document OCR now includes visual element detectors to improve accuracy on hard-to-read document properties, making it even more versatile.
- Advanced features in GA: Leverage image quality scoring for better pre-processing, language hints for improved text detection, and rotation correction for enhanced accuracy.
On top of this, Enterprise Document OCR v2.0 now offers premium OCR add ons which users can enable based on their desired processing or quality requirements. These include:
- Selection mark detection: Detects and extracts selection marks, like checkboxes, directly from the OCR processor.
- MathOCR: Identify and extract formulas from documents in latex output.
- Font style detection: Extracting the computer font style and background color at a token level to allow users to understand the word context programmatically.
The versatility of Enterprise Document OCR v2.0 provides a strong foundation for LLM-driven applications, ensuring rich, secure, and highly accurate text and layout extraction and in LLM-powered applications, high-quality OCR is paramount. Ryan Walker, Chief Technology Officer at Casetext, attests to the importance of OCR quality:
“As a creator of legal AI solutions—most recently our AI legal assistant, CoCounsel—we build products that must correctly process large, complex collections of legal documents. These might be thousands of pages long, contain images, or be poorly scanned. Missing even a single word can make the difference between winning or losing a case. Google’s OCR accurately extracts text from files far better than every other system we’ve evaluated. Incorporating this technology into our products lets us deliver the highest-quality answers for the lawyers who rely on us, which in turn means they’re able to deliver the best possible service and results for their clients.”
Explore the potential of Enterprise Document OCR v2.0 to streamline your document understanding workflows.
Getting started
We’re very excited about what the future holds for Document AI as a platform for businesses to simplify document automation. Learn more about all these exciting developments in our session at Next’23 or try out one of our offerings today.