RKT LegacyFlow
This post was written with Dian Xu and Joel Hawkins of Rocket Companies.
Rocket Companies is a Detroit-based FinTech company with a mission to “Help Everyone Home”. With the current housing shortage and affordability concerns, Rocket simplifies the homeownership process through an intuitive and AI-driven experience. This comprehensive framework streamlines every step of the homeownership journey, empowering consumers to search, purchase, and manage home financing effortlessly. Rocket integrates home search, financing, and servicing in a single environment, providing a seamless and efficient experience.
The Rocket brand is a synonym for offering simple, fast, and trustworthy digital solutions for complex transactions. Rocket is dedicated to helping clients realize their dream of homeownership and financial freedom. Since its inception, Rocket has grown from a single mortgage lender to an network of businesses that creates new opportunities for its clients.
Rocket takes a complicated process and uses technology to make it simpler. Applying for a mortgage can be complex and time-consuming. That’s why we use advanced technology and data analytics to streamline every step of the homeownership experience, from application to closing. By analyzing a wide range of data points, we’re able to quickly and accurately assess the risk associated with a loan, enabling us to make more informed lending decisions and get our clients the financing they need.
Our goal at Rocket is to provide a personalized experience for both our current and prospective clients. Rocket’s diverse product offerings can be customized to meet specific client needs, while our team of skilled bankers must match with the best client opportunities that align with their skills and knowledge. Maintaining strong relationships with our large, loyal client base and hedge positions to cover financial obligations is key to our success. With the volume of business we do, even small improvements can have a significant impact.
In this post, we share how we modernized Rocket’s data science solution on AWS to increase the speed to delivery from eight weeks to under one hour, improve operational stability and support by reducing incident tickets by over 99% in 18 months, power 10 million automated data science and AI decisions made daily, and provide a seamless data science development experience.
Rocket’s previous data science solution was built around Apache Spark and combined the use of a legacy version of the Hadoop environment and vendor-provided Data Science Experience development tools. The Hadoop environment was hosted on Amazon Elastic Compute Cloud (Amazon EC2) servers, managed in-house by Rocket’s technology team, while the data science experience infrastructure was hosted on premises. Communication between the two systems was established through Kerberized Apache Livy (HTTPS) connections over AWS PrivateLink.
Data exploration and model development were conducted using well-known machine learning (ML) tools such as Jupyter or Apache Zeppelin notebooks. Apache Hive was used to provide a tabular interface to data stored in HDFS, and to integrate with Apache Spark SQL. Apache HBase was employed to offer real-time key-based access to data. Model training and scoring was performed either from Jupyter notebooks or through jobs scheduled by Apache’s Oozie orchestration tool, which was part of the Hadoop implementation.
Despite the benefits of this architecture, Rocket faced challenges that limited its effectiveness:
Rocket’s legacy data science architecture is shown in the following diagram.
The diagram depicts the flow; the key components are detailed below:
At Rocket, we believe in the power of continuous improvement and constantly seek out new opportunities. One such opportunity is using data science solutions, but to do so, we must have a strong and flexible data science environment.
To address the legacy data science environment challenges, Rocket decided to migrate its ML workloads to the Amazon SageMaker AI suite. This would allow us to deliver more personalized experiences and understand our customers better. To promote the success of this migration, we collaborated with the AWS team to create automated and intelligent digital experiences that demonstrated Rocket’s understanding of its clients and kept them connected.
We implemented an AWS multi-account strategy, standing up Amazon SageMaker Studio in a build account using a network-isolated Amazon VPC. This allows us to separate development and production environments, while also improving our security stance.
We moved our new work to SageMaker Studio and our legacy Hadoop workloads to Amazon EMR, connecting to the old Hadoop cluster using Livy and SageMaker notebooks to ease the transition. This gives us access to a wider range of tools and technologies, enabling us to choose the most appropriate ones for each problem we’re trying to solve.
In addition, we moved our data from HDFS to Amazon Simple Storage Service (Amazon S3), and now use Amazon Athena and AWS Lake Formation to provide proper access controls to production data. This makes it easier to access and analyze the data, and to integrate it with other systems. The team also provides secure interactive integration through Amazon Elastic Kubernetes Service (Amazon EKS), further improving the company’s security stance.
SageMaker AI has been instrumental in empowering our data science community with the flexibility to choose the most appropriate tools and technologies for each problem, resulting in faster development cycles and higher model accuracy. With SageMaker Studio, our data scientists can seamlessly develop, train, and deploy models without the need for additional infrastructure management.
As a result of this modernization effort, SageMaker AI enabled Rocket to scale our data science solution across Rocket Companies and integrate using a hub-and-spoke model. The ability of SageMaker AI to automatically provision and manage instances has allowed us to focus on our data science work rather than infrastructure management, increasing the number of models in production by five times and data scientists’ productivity by 80%.
Our data scientists are empowered to use the most appropriate technology for the problem at hand, and our security stance has improved. Rocket can now compartmentalize data and compute, as well as compartmentalize development and production. Additionally, we are able to provide model tracking and lineage using Amazon SageMaker Experiments and artifacts discoverable using the SageMaker model registry and Amazon SageMaker Feature Store. All the data science work has now been migrated onto SageMaker, and all the old Hadoop work has been migrated to Amazon EMR.
Overall, SageMaker AI has played a critical role in enabling Rocket’s modernization journey by building a more scalable and flexible ML framework, reducing operational burden, improving model accuracy, and accelerating deployment times.
The successful modernization allowed Rocket to overcome our previous limitations and better support our data science efforts. We were able to improve our security stance, make work more traceable and discoverable, and give our data scientists the flexibility to choose the most appropriate tools and technologies for each problem. This has helped us better serve our customers and drive business growth.
Rocket’s new data science solution architecture on AWS is shown in the following diagram.
The solution consists of the following components:
We’ve evolved a long way in modernizing our infrastructure and workloads. We started our journey supporting six business channels and 26 models in production, with dozens in development. Deployment times stretched for months and required a team of three system engineers and four ML engineers to keep everything running smoothly. Despite the support of our internal DevOps team, our issue backlog with the vendor was an unenviable 200+.
Today, we are supporting nine organizations and over 20 business channels, with a whopping 210+ models in production and many more in development. Our average deployment time has gone from months to just weeks—sometimes even down to mere days! With just one part-time ML engineer for support, our average issue backlog with the vendor is practically non-existent. We now support over 120 data scientists, ML engineers, and analytical roles. Our framework mix has expanded to include 50% SparkML models and a diverse range of other ML frameworks, such as PyTorch and scikit-learn. These advancements have given our data science community the power and flexibility to tackle even more complex and challenging projects with ease.
The following table compares some of our metrics before and after migration.
. | Before Migration | After Migration |
---|---|---|
Speed to Delivery | New data ingestion project took 4–8 weeks | Data-driven ingestion takes under one hour |
Operation Stability and Supportability | Over a hundred incidents and tickets in 18 months | Fewer incidents: one per 18 months |
Data Science | Data scientists spent 80% of their time waiting on their jobs to run | Seamless data science development experience |
Scalability | Unable to scale | Powers 10 million automated data science and AI decisions made daily |
Throughout the journey of modernizing our data science solution, we’ve learned valuable lessons that we believe could be of great help to other organizations who are planning to undertake similar endeavors.
First, we’ve come to realize that managed services can be a game changer in optimizing your data science operations.
The isolation of development into its own account while providing read-only access to production data is a highly effective way of enabling data scientists to experiment and iterate on their models without putting your production environment at risk. This is something that we’ve achieved through the combination of SageMaker AI and Lake Formation.
Another lesson we learned is the importance of training and onboarding for teams. This is particularly true for teams that are moving to a new environment like SageMaker AI. It’s crucial to understand the best practices of utilizing the resources and features of SageMaker AI, and to have a solid understanding of how to move from notebooks to jobs.
Lastly, we found that although Amazon EMR still requires some tuning and optimization, the administrative burden is much lighter compared to hosting directly on Amazon EC2. This makes Amazon EMR a more scalable and cost-effective solution for organizations who need to manage large data processing workloads.
This post provided overview of the successful partnership between AWS and Rocket Companies. Through this collaboration, Rocket Companies was able to migrate many ML workloads and implement a scalable ML framework. Ongoing with AWS, Rocket Companies remains committed to innovation and staying at the forefront of customer satisfaction.
Don’t let legacy systems hold back your organization’s potential. Discover how AWS can assist you in modernizing your data science solution and achieving remarkable results, similar to those achieved by Rocket Companies.
Be sure to check out the previous articles in this series: •
TL;DR We compared Grok 3 and o3-mini’s results on this topic. They both passed. Since…
Generative AI diffusion models such as Stable Diffusion and Flux produce stunning visuals, empowering creators…
After a public callout, the developers of Hades took to social media to clarify that…
Experts say the conflicts posed by Tom Krause’s dual roles are unprecedented in the modern…
Groundbreaking study shows machine learning can decode emotions in seven ungulate species. A game-changer for…