As one of the largest home loan servicers in the country, Mr. Cooper has been helping people with homeownership since 1994. Believing that the process could be streamlined, we saw an opportunity to revolutionize the way people bought their dream homes, starting with transforming paper-based processes. And we believed that digitizing and automating as much of that journey as possible was the way forward.
Traditionally, mortgage lenders require borrowers to submit various documents such as payslips and W2 wage and tax statements for loan applications. Since each document requires manual classification and verification, the process created significant delays to the borrower’s home-buying journey.
Adding to the complexity is the fact that there are more than 3,000 counties across the 50 U.S states. Each county has its own set of fees to record a deed when someone purchases a home. These fees often change and are difficult to determine. In some counties, you have to call the county office and discuss the fee amount over the phone. Every mortgage company must disclose fees including the county recording fee, in every closing statement for every mortgage.
To improve the customer experience and efficiency, we needed to streamline the mortgage process, from pre-approval to closing and post-closing to servicing. Our solution is to digitize and optimize business processes (such as classify documents, extract data, and predict fees) using machine learning (ML) during the entire lifecycle of the mortgage process.
Improving process efficiency using mortgage ML document classification and extraction
When we started this project in 2018, there wasn’t an off-the-shelf solution that could meet our functional and technical needs. We decided to build Pyro, our document management solution on Google Cloud. It is based on products like BigQuery and Vertex AI, which enabled us to quickly scale resources in the cloud to meet changes in demand.
Fast forward to today, Document AI is a product suite that provides simple and cost effective solutions to help the document lifecycle. These include pre-trained processors to classify and extract data from business documents.
Using Cloud AutoML on Vertex AI, we could quickly build and deploy models with minimal effort at a low cost. With Cloud AutoML, even business analysts with no ML programming background can train models and create endpoints with high confidence scores and accuracy.
Our ML model processes more than 2,200 pages per minute and classifies documents into predetermined categories with more than 90% accuracy, so customer service agents have accurate and real-time information when they speak to customers. The goal is to digitize as much of the mortgage process as possible so our customer service agents can focus on the customer, not paperwork. Our agents provide a human touch with sympathy and empathy to help customers overcome challenges in the mortgage process.
Within a year of launch, Pyro processed more than 932 million pages of mortgage documents, including a backlog of documents that would have taken 4.5 years to process manually.
Engaging with Google Cloud early in our product lifecycle journey helped us build our ML team in a meticulously planned manner. They provided the resources we lacked in-house. That allowed us to take our time to hire the right ML talent, rather than adding people too fast. We now have people on our team with different skill sets, ranging from subject matter experts who understand mortgage workflows to data engineers who build data pipelines by bringing data from multiple sources.
Since the launch of Pyro, we have built a library of more than 300 mortgage-specific machine learning models on Google Cloud.
Moving forward into county fee recording estimation process
We wanted to broaden our horizons outside the documents world and build use cases and solutions that benefit a larger audience. We worked closely with our business team to identify challenges that we can solve with AI. One such area identified was county fee recording (CFR) estimation during the payoff quote process.
After the payoff funds are received, the Lien Release is sent for recording, which incurs a fee determined at a county level. The estimation process for this recording fee is difficult as the fee varies by county and depends on various factors. Everything from loan level county rules and property information to lien release page length, borrower information (including number of borrowers), and more. CFR may also change over time. Since there’s no standard formula to calculate the recording fee, customers are sometimes undercharged or overcharged. Any errors in CFR calculation adds to the cost of Mr. Cooper servicing the loan because the mortgage lender has to absorb the difference.
Every day, our loan servicing system generates a list of loans along with the loan information, property information, and county information. Typically, we need to calculate recording fees for thousands of loans on weekdays to millions of loans on weekends. In the past, the business team calculated the fee manually using spreadsheet-based tools. Our solution was to create an ML pipeline using regression models. It reads the loan, property, county, and customer information, while predicting the CFR. It then feeds it back to our loan servicing system for faster and more accurate estimates. Here’s how it works:
The loan servicing system generates a list of loan estimate requests that are automatically sent to Cloud Storage via our Secure File Transfer Protocol (SFTP) server.
Cloud Function is triggered, invoking a Vertex AI inference pipeline, which pre-processes the input information, runs predictions against our recording fee ML regression model, post-processes the predicted results into CSV files, and stores them on Cloud Storage.
Using historical data to fuel the future
Our initial training data for CFR is close to five million records, representing a subset of our historical data. To ensure that information remains up to date, we refresh our ML model by creating a training pipeline on Vertex AI to capture any changes in property information and county fees. We then expose the retrained model through the inference pipeline. The model currently runs at 96% accuracy.
Another key challenge we had to solve was, “how do we know how many pages before we generate the document?” The solution is a regression model that looks at the past five years of historical data to identify patterns. For example, if there are two borrowers with this particular property information in a specific county, the recording fee is estimated to be between $50 and $65.
Vertex AI brings MLOps capabilities, so we don’t need to build ML pipelines from scratch. This led to us taking less than 45 days to build this ML model. Since we went live in December 2021, we’ve achieved around 66% improvement in $ savings annually compared to our previous years when the process was run manually.
We’re now looking to extend the value of AI into our call center to further enhance the customer experience. By looking at call transcripts, we want to identify call cohorts based on the primary reason for their calls, and how we can resolve their queries much faster.
Our journey from manual, paper processes to the digital world has been transformative. Google Cloud is our partner of choice for creating change and bringing value to our customers, and we look forward to how we can further innovate together.