ML 18930 ap classifier
Healthcare discovery on ecommerce domains presents unique challenges that traditional product search wasn’t designed to handle. Unlike searching for books or electronics, healthcare queries involve complex relationships between symptoms, conditions, treatments, and services, requiring sophisticated understanding of medical terminology and customer intent.
This challenge became particularly relevant for Amazon as we expanded beyond traditional ecommerce into comprehensive healthcare services. Amazon now offers direct access to prescription medications through Amazon Pharmacy, primary care through One Medical, and specialized care partnerships through Health Benefits Connector. These healthcare offerings represent a significant departure from traditional Amazon.com products, presenting both exciting opportunities and unique technical challenges.
In this post, we show you how Amazon Health Services (AHS) solved discoverability challenges on Amazon.com search using AWS services such as Amazon SageMaker, Amazon Bedrock, and Amazon EMR. By combining machine learning (ML), natural language processing, and vector search capabilities, we improved our ability to connect customers with relevant healthcare offerings. This solution is now used daily for health-related search queries, helping customers find everything from prescription medications to primary care services.
At AHS, we’re on a mission to transform how people access healthcare. We strive to make healthcare more straightforward for customers to find, choose, afford, and engage with the services, products, and professionals they need to get and stay healthy.
Integrating healthcare services into the ecommerce business of Amazon presented two unique opportunities to enhance search for customers on healthcare journeys: understanding health search intent in queries and matching up customer query intent with the most relevant healthcare products and services.
The challenge in understanding health search intent lies in the relationships between symptoms (such as back pain or sore throat), conditions (such as a herniated disc or the common cold), treatments (such as physical therapy or medication), and the healthcare services Amazon offers. This requires sophisticated query understanding capabilities that can parse medical terminology and map it to common search terminology that a layperson outside of the medical field might use to search.
AHS offerings also present unique challenges for search matching. For example, a customer searching for “back pain treatment” might be looking for a variety of solutions, from over-the-counter pain relievers like Tylenol or prescription medications such as cyclobenzaprine (a muscle relaxant), to scheduling a doctor’s appointment or accessing virtual physical therapy. Existing search algorithms optimized for physical products might not match these service-based health offerings, potentially missing relevant results such as One Medical’s primary care services or Hinge Health’s virtual physical therapy program that helps reduce joint and muscle pain through personalized exercises and 1-on-1 support from dedicated therapists. This unique nature of healthcare offerings called for developing specialized approaches to connect customers with relevant services.
To address these challenges, we developed a comprehensive solution that combines ML for query understanding, vector search for product matching, and large language models (LLMs) for relevance optimization. The solution consists of three main components:
The solution is built entirely on AWS services, with Amazon SageMaker powering our ML models, Amazon Bedrock providing LLM capabilities, and Amazon EMR and Amazon Athena handling our data processing needs.
Now let’s examine the technical implementation details of our architecture, exploring how each component was engineered to address the unique challenges of healthcare search on Amazon.com.
We approached the customer search journey by recognizing its two distinct ends of the spectrum. On one end are what we call “spearfishing queries” or lower funnel searches, where customers have a clear product search intent with specific knowledge about attributes. For Amazon Health Services, these typically include searches for specific prescription medications with precise dosages and form factors, such as “atorvastatin 40 mg” or “lisinopril 20 mg.”
On the other end are broad, upper funnel queries where customers seek inspiration, information, or recommendations with general product search intent that might encompass multiple product types. Examples include searches like “back pain relief,” “acne,” or “high blood pressure.” Building upon Amazon search capabilities, we developed additional query understanding models to serve the full spectrum of healthcare searches.
For identifying spearfishing search intent, we analyzed anonymized customer search engagement data for Amazon products and trained a classification model to understand which search keywords exclusively lead to engagement with Amazon Pharmacy Amazon Standard Identification Numbers (ASINs). This process used PySpark on Amazon EMR and Athena to collect and process Amazon search data at scale. The following diagram shows this architecture.
For identifying broad health search intent, we trained a named entity recognition (NER) model to annotate search keywords at a medical terminology level. To build this capability, we used a corpus of health ontology data sources to identify concepts such as health conditions, diseases, treatments, injuries, and medications. For health concepts where we did not have enough alternate terms in our knowledge base, we used LLMs to expand our knowledge base. For example, alternate terms for the condition “acid reflux” might be “heart burn”, “GERD”, “indigestion”, etc. We gated this NER model behind health-relevant product types predicted by Amazon search query-to-product-type models. The following diagram shows the training process for the NER model.
The following image is an example of a query identification task in practice. In the example on the left, the pharmacy classifier predicts that “atorvastatin 40 mg” is a query with intent for a prescription drug and triggers a custom search experience geared towards AHS products. In the example on the right, we detect the broad “high blood pressure” symptom but don’t know the customer’s intention. So, we trigger an experience that gives them multiple options to make the search more specific.
For those interested in implementing similar medical entity recognition capabilities, Amazon Comprehend Medical offers powerful tools for detecting medical entities in text spans.
With our ability to identify health-related searches in place, we needed to build comprehensive knowledge bases for our healthcare products and services. We started with our existing offerings and collected all available product knowledge information that best described each product or service.
To enhance this foundation, we used a large language model (LLM) with a fine-tuned prompt and few-shot examples to layer in additional relevant health conditions, symptoms, and treatment-related keywords for each product or service. We did this using the Amazon Bedrock batch inference capability. This approach meant that we significantly expanded our product knowledge with medically relevant information.
The entire knowledge base was then converted into embeddings using Facebook AI Similarity Search (FAISS), and we created an index file to enable efficient similarity searches. We maintained careful mappings from each embedding back to the original knowledge base items, making sure we could perform accurate reverse lookups when needed.
This process used several AWS services, including Amazon Simple Storage Service (Amazon S3) for storage of the knowledge base and the embeddings files. Note that Amazon OpenSearch Service is also a viable option for vector database capabilities. Large-scale knowledge base embedding jobs were executed with scheduled SageMaker Notebook Jobs. Through the combination of these technologies, we built a robust foundation of healthcare product knowledge that could be efficiently searched and matched to customer queries.
The following diagram illustrates how we built the product knowledge base using Amazon catalog data, and then used that to prepare a FAISS index file.
A core component of our solution was implementing the Retrieval Augmented Generation (RAG) design pattern. The first step in this pattern was to identify a set of known keywords and Amazon products, establishing the initial ground truth for our solution.
With our product knowledge base built from Amazon catalog metadata and ASIN attributes, we were ready to support new queries from customers. When a customer search query arrived, we converted it to an embedding and used it as a search key for matching against our index. This similarity search used FAISS with matching criteria based on the threshold against the similarity score.
To verify the quality of these query-product pairs identified for health search keywords, we needed to maintain the relevance of each pair. To achieve this, we implemented a two-pronged approach to relevance labeling. We used an established scheme to tag each offering as exact, substitute, complement, or irrelevant to the keyword. Referred to as the exact, substitute, complement, irrelevant (ESCI) framework established through academic research. For more information, refer to the ESCI challenge and esci-data GitHub repository.
First, we worked with a human labeling team to establish ground truth on a substantial sample size, creating a reliable benchmark for our system’s performance using this scheme. The labeling team was given guidance based on the ESCI framework and tailored towards AHS products and services.
Second, we implemented LLM-based labeling using Amazon Bedrock and batch jobs. After matches were found in the previous step, we retrieved the top products and used them as prompt context for our generative model. We included few-shot examples of ESCI guidance as part of the prompt. This way, we conducted large-scale inference across the top health searches, connecting them to the most relevant offerings using similarity search. We performed this at scale for the query-product pairs identified as relevant to AHS and stored the outputs in Amazon S3.
The following diagram shows our query retrieval, re-ranking and ESCI labeling pipeline.
Using a mix of high-confidence human and LLM-based labels, we established a true ground truth. Through this process, we successfully identified relevant product offerings for customers using only semantic data from aggregated search keywords and product metadata.
We’re on a mission to make it more straightforward for people to find, choose, afford, and engage with the services, products, and professionals they need to get and stay healthy. Today, customers searching for health solutions on Amazon—whether for acute conditions like acne, strep throat, and fever or chronic conditions such as arthritis, high blood pressure, and diabetes—will begin to see medically vetted and relevant offerings alongside other relevant products and services available on Amazon.com.
Customers can now quickly find and choose to meet with doctors, get their prescription medications, and access other healthcare services through a familiar experience. By extending the powerful ecommerce search capabilities of Amazon to address healthcare-specific opportunities, we’ve created additional discovery pathways for relevant health services.
We’ve used semantic understanding of health queries and comprehensive product knowledge to create connections that help customers find the right healthcare solutions at the right time.
Here is a little more information about three healthcare services you can use directly through Amazon:
As we reflect on our journey to enhance healthcare discovery on Amazon, several key insights stand out that might be valuable for others working on similar challenges:
By combining these approaches, we’ve created a more intuitive and effective way for customers to discover healthcare offerings on Amazon.
If you’re looking to implement a similar solution for healthcare or search, consider the following:
In this post, we demonstrated how Amazon Health Services used AWS ML and generative AI services to solve the unique challenges of healthcare discovery on Amazon.com, illustrating how you can build sophisticated domain-specific search experiences using Amazon SageMaker, Amazon Bedrock, and Amazon EMR. We showed how to create a query understanding pipeline to identify health-related searches, build comprehensive product knowledge bases enhanced with LLM capabilities, and implement semantic matching using vector search and the ESCI relevance framework to connect customers with relevant healthcare offerings.
This scalable, AWS based approach demonstrates how ML and generative AI can transform specialized search experiences, advancing our mission to make healthcare more straightforward for customers to find, choose, afford, and engage with. We encourage you to explore how these AWS services can address similar challenges in your own healthcare or specialized search applications. For more information about implementing healthcare solutions on AWS, visit the AWS for Healthcare & Life Sciences page.
Installed VibeVoice using the wrapper this dude created. https://www.reddit.com/r/comfyui/comments/1n20407/wip2_comfyui_wrapper_for_microsofts_new_vibevoice/ Workflow is the multi-voice example one…
Data merging is the process of combining data from different sources into a unified dataset.
When working with machine learning on structured data, two algorithms often rise to the top…
This post is co-written with Julieta Rappan, Macarena Blasi, and María Candela Blanco from the…
OpenAI's new speech model, gpt-realtime, hopes that its more naturalistic voices would make enterprises use…
We explored our latest investigations into how tech is shaping education today.