Innovating in patent search: How IPRally leverages AI with Google Kubernetes Engine and Ray

Patent-search platform provider IPRally is growing quickly, servicing global enterprises, IP law firms, and multiple national patent and trademark offices. As the company grows, so do its technology needs. It continues to train its models for greater accuracy, adding 200,000 searchable records for customer access weekly, and mapping new patents.

With millions of patent documents published annually – and the technical complexity of those documents increasing — it can take even the most seasoned patent professional several hours of research to resolve a case with traditional patent search tools. In 2018, Finnish firm IPRally set out to tackle this problem with a graph-based approach. 

“Search engines for patents were mostly complicated boolean ones, where you needed to spend hours building a complicated query,” says Juho Kallio, CTO and co-founder of the 50-person firm. “I wanted to build something important and challenging.”

Using machine learning (ML) and natural language processing (NLP), the company has transformed the text from over 120 million global patent documents into document-level knowledge graphs embedded into a searchable vector space. Now, patent researchers can receive relevant results in seconds with AI-selected highlights of key information and explainable results.

To meet those needs, IPRally built a customized ML platform using Google Kubernetes Engine (GKE) and Ray, an open-source ML framework, balancing efficiency, performance and streamlining machine learning operations (MLOps). The company uses open-source KubeRay to deploy and manage Ray on GKE, which enables them to leverage cost-efficient NVIDIA GPU Spot instances for exploratory ML research and development. It also uses Google Cloud data building blocks, including Cloud Storage and Compute Engine persistent disks. Next on the horizon is expanding to big data solutions with Ray Data and BigQuery.  

“Ray on GKE has the ability to support us in the future with any scale and any kind of distributed complex deep learning,” says Kallio.

A custom ML platform built for performance and efficiency

The IPRally engineering team’s primary focus is on R&D and how it can continue to improve its Graph AI to make technical knowledge more accessible. With just two DevOps engineers and one MLOps engineer, IPRally was able to build its own customized ML platform with GKE and Ray as key components. 

A big proponent of open source, IPRally transitioned everything to Kubernetes when their compute needs grew. However, they didn’t want to have to manage Kubernetes themselves. That led them to GKE, with its scalability, flexibility, open ecosystem, and its support for a diverse set of accelerators. All told, this provides IPRally the right balance of performance and cost, as well as easy management of compute resources and the ability to efficiently scale down capacity when they don’t need it. 

“GKE provides the scalability and performance we need for these complex training and serving needs and we get the right granularity of control over data and compute,” says Kallio. 

One particular GKE capability that Kallio highlights is container image streaming, which has significantly accelerated their start-up time.

“We have seen that container image streaming in GKE has a significant impact on expediting our application startup time. Image streaming helps us accelerate our start-up time for a training job after submission by 20%,” he shares. “And, when we are able to reuse an existing pod, we can start up in a few seconds instead of minutes.”

The next layer is Ray, which the company uses to scale the distributed, parallelized Python and Clojure applications it uses for machine learning. To more easily manage Ray, IPRally uses KubeRay, a specialized tool that simplifies Ray cluster management on Kubernetes. IPRally uses Ray for the most advanced tasks like massive preprocessing of data and exploratory deep learning in R&D. 

“Interoperability between Ray and GKE autoscaling is smooth and robust. We can combine computational resources without any constraints,” says Kallio. 

The heaviest ML loads are mainly deployed on G2 VMs featuring eight NVIDIA L4 GPUs featuring up to eight NVIDIA L4 Tensor Core GPUs, which

deliver cutting-edge performance-per-dollar for AI inference workloads. And by

leveraging them within GKE, IPRally facilitates the creation of nodes on-demand, scales GPU resources as needed, thus optimizing its operational costs. There is a single Terraform-provisioned Kubernetes cluster in each of the regions that IPRally searches for the inexpensive spot instances. GKE and Ray then step in for compute orchestration and automated scaling.

To further ease MLOps, IPRally built its own thin orchestration layer, IPRay, atop KubeRay and Ray. This layer provides a command line tool for data scientists to easily provision a templated Ray cluster that scales efficiently up and down and that can run jobs in Ray without needing to know Terraform. This self-service layer reduces friction and allows both engineers and data scientists to focus on their higher-value work.

Technology paves the way for strong growth

Through this selection of Google Cloud and open-source frameworks, IPRally has shown that a startup can build an enterprise-grade ML platform without spending millions of dollars. Focusing on providing a powerful MLOps and automation foundation from its earliest days has paid dividends in efficiency and the team’s ability to focus on R&D.

“Crafting a flexible ML infrastructure from the best parts has been more than worth it,” shares Jari Rosti, an ML engineer at IPRally. “Now, we’re seeing the benefits of that investment multiply as we adapt the infrastructure to the constantly evolving ideas of modern ML. That’s something other young companies can achieve as well by leveraging Google Cloud and Ray.”

Further, the company has been saving 70% of ML R&D costs by using Spot instances. These affordable instances offer the same quality VMs as on-demand instances but are subject to interruption. But because IPRally’s R&D workloads are fault-tolerant, they are a good fit for Spot instances.

IPRally closed a €10m A round investment last year, and it’s forging on with ingesting and processing IP documentation from around the globe, with a focus on improving its graph neural network models and building the best AI platform for patent searching. With 3.4 million patents filed in 2022, the third consecutive year of growth, data will keep flowing and IPRally can continue helping intellectual property professionals find every relevant bit of information.

“With Ray on GKE, we’ve built an ML foundation that is a testament to how powerful Google Cloud is with AI,” says Kallio. “And now, we’re prepared to explore far more advanced deep learning and to keep growing.”