Screen Shot 2022 09 21 at 1.43.46 PM
A massive amount of business documents are processed daily across industries. Many of these documents are paper-based, scanned into your system as images, or in an unstructured format like PDF. Each company may apply unique rules associated with its business background while processing these documents. How to extract information accurately and process them flexibly is a challenge many companies face.
Amazon Intelligent Document Processing (IDP) allows you to take advantage of industry-leading machine learning (ML) technology without previous ML experience. This post introduces a solution included in the Amazon IDP workshop showcasing how to process documents to serve flexible business rules using Amazon AI services. You can use the following step-by-step Jupyter notebook to complete the lab.
Amazon Textract helps you easily extract text from various documents, and Amazon Augmented AI (Amazon A2I) allows you to implement a human review of ML predictions. The default Amazon A2I template allows you to build a human review pipeline based on rules, such as when the extraction confidence score is lower than a pre-defined threshold or required keys are missing. But in a production environment, you need the document processing pipeline to support flexible business rules, such as validating the string format, verifying the data type and range, and validating fields across documents. This post shows how you can use Amazon Textract and Amazon A2I to customize a generic document processing pipeline supporting flexible business rules.
For our sample solution, we use the Tax Form 990, a US IRS (Internal Revenue Service) form that provides the public with financial information about a non-profit organization. For this example, we only cover the extraction logic for some of the fields on the first page of the form. You can find more sample documents on the IRS website.
The following diagram illustrates the IDP pipeline that supports customized business rules with human review.
The architecture is composed of three logical stages:
You can define a generic business rule in the following JSON format. In the sample code, we define three rules:
The following code shows our business rules in JSON format:
You can expand the solution by adding more business rules following the same structure.
In the sample solution, we call the Amazon Textract analyze_document API query feature to extract fields by asking specific questions. You don’t need to know the structure of the data in the document (table, form, implied field, nested data) or worry about variations across document versions and formats. Queries use a combination of visual, spatial, and language cues to extract the information you seek with high accuracy.
To extract value for the DLN field, you can send a request with questions in natural languages, such as “What is the DLN?” Amazon Textract returns the text, confidence, and other metadata if it finds corresponding information on the image or document. The following is an example of an Amazon Textract query request:
The sample solution constructs the data in a structured format to serve the generic business rule evaluation. To keep extracted values, you can define a data model for each document page. The following image shows how the text on page 1 maps to the JSON fields.
Each field represents a document’s text, check box, or table/form cell on the page. The JSON object looks like the following code:
You can find the detailed JSON structure definition in the GitHub repo.
The sample solution comes with a Condition class—a generic rules engine that takes the extracted data (as defined in the data model) and the rules (as defined in the customized business rules). It returns two lists with failed and satisfied conditions. We can use the result to decide if we should send the document to Amazon A2I for human review.
The Condition class source code is in the sample GitHub repo. It supports basic validation logic, such as validating a string’s length, value range, and confidence score threshold. You can modify the code to support more condition types and complex validation logic.
Amazon A2I allows you to customize the reviewer’s web UI by defining a worker task template. The template is a static webpage in HTML and JavaScript. You can pass data to the customized reviewer page using the Liquid syntax.
In the sample solution, the custom Amazon A2I UI template displays the page on the left and the failure conditions on the right. Reviewers can use it to correct the extraction value and add their comments.
The following screenshot shows our customized Amazon A2I UI. It shows the original image document on the left and the following failed conditions on the right:
The reviewers can manually verify these results and add comments in the CHANGE REASON text boxes.
For more information about integrating Amazon A2I into any custom ML workflow, refer to over 60 pre-built worker templates on the GitHub repo and Use Amazon Augmented AI with Custom Task Types.
After the reviewer using the Amazon A2I customized UI verifies the result and chooses Submit, Amazon A2I stores a JSON file in the S3 bucket folder. The JSON file includes the following information on the root level:
The following is a sample JSON generated by Amazon A2I:
You can implement extract, transform, and load (ETL) logic to parse information from the Amazon A2I output JSON and store it in a file or database. The sample solution comes with a CSV file with processed data. You can use it to build a BI dashboard by following the instructions in the next section.
The sample solution includes a reporting stage with a visualization dashboard served by Amazon QuickSight. The BI dashboard shows key metrics such as the number of documents processed automatically or manually, the most popular fields that required human review, and other insights. This dashboard can help you get an oversight of the document processing pipeline and analyze the common reasons causing human review. You can optimize the workflow by further reducing human input.
The sample dashboard includes basic metrics. You can expand the solution using Amazon QuickSight to show more insights into the data.
To expand the solution to support more document pages with corresponding business rules, you need to make the following changes:
The custom Amazon A2I UI in the solution is generic, which doesn’t require a change to support new business rules.
Intelligent document processing is in high demand, and companies need a customized pipeline to support their unique business logic. Amazon A2I also offers a built-in template integrated with Amazon Textract to implement your human review use cases. It also allows you to customize the reviewer page to serve flexible requirements.
This post guided you through a reference solution using Amazon Textract and Amazon A2I to build an IDP pipeline that supports flexible business rules. You can try it out using the Jupyter notebook in the GitHub IDP workshop repo.
Sonali Sahu is leading Intelligent Document Processing AI/ML Solutions Architect team at Amazon Web Services. She is a passionate technophile and enjoys working with customers to solve complex problems using innovation. Her core area of focus are Artificial Intelligence & Machine Learning for Intelligent Document Processing.
Speech foundation models, such as HuBERT and its variants, are pre-trained on large amounts of…
This post was co-written with Vishal Singh, Data Engineering Leader at Data & Analytics team…
At Definity, a leading Canadian P&C insurer with a history spanning over 150 years, we…
Don't expect to hear a lot about better framerates and raytracing at the Nvidia GTC…
The team working at the Social Security Administration appears to be among the largest DOGE…
Many companies invest heavily in hiring talent to create the high-performance library code that underpins…