Categories: FAANG

Scientific Research with Palantir

Scientific Research with Palantir: A Review of Health Research in 2022

Editor’s note: At Palantir, we support our partners in health and life sciences by providing them with the technology they need to achieve their most critical missions. In recent years, our partners across hospital systems, life science companies, and Federal agencies have expanded their use of Palantir software to the impactful domain of scientific research.

In this blog post, Dr. William Kassler, Chief Medical Officer, Palantir Federal highlights how our partners across healthcare have used our software to advance scientific discovery through academic research and peer-reviewed publications.

2022 was a year of great progress in the domain of software-powered scientific research, in large part due to the infrastructure investments made to tackle the COVID-19 pandemic. In 2022 alone, researchers used Foundry to advance a comprehensive understanding of long and pediatric COVID, to deliver accelerated insights about new and effective approaches for treating cancer, and to develop novel models for equitable distribution of therapeutics to the U.S. population.

To date, thousands of researchers — across hundreds of studies — have used Palantir Foundry to make scientific discoveries that advance our collective knowledge of medicine, health, and life sciences. By pushing science forward, our partners are uncovering new insights that will better inform doctors, clinicians, and policymakers, and subsequently improve health outcomes for individuals, families, and communities alike.

Below, we unpack how advanced software can help researchers meet their data needs, as well as curate a selection of articles that employed Foundry and were published in 2022 by our partners across the health space. These publications are just one of many ways in which our partners have made meaningful progress towards improving real-world health outcomes using novel software solutions, and we are proud to showcase them here.

Data Needs and Software Solutions for Health Research

Whether running clinical trials, harnessing and integrating real-world data to create data enclaves, or just managing the operational logistics of research, scientists rely on software to make their day-to-day digital tasks easier and more secure.

Specifically, there are at least four distinct capabilities that can make a software platform effective in accelerating research:

  • Seamless data integration. Seamlessly integrating and harmonizing large and diverse datasets, including varied types of clinical data from EHRs or clinical trials, genomics, radiology or pathology images, and other commonly used research data formats.
  • Data accessibility. Ensuring that vast amounts of data are both accessible and organized. While researchers can traditionally spend a significant amount of time cleaning and managing their data, advanced software can limit the time spent on this mundane task and allow researchers to focus on what they do best: research.
  • Secure collaboration. Empowering secure collaboration both within and across remote and in-person teams. Such team-oriented solutions can promote large-scale collaboration across previously siloed research organizations, while still governing and protecting sensitive data through robust permission and access controls.
  • Transparent data usage. At this scale of collaboration, the ability to allow for transparent documentation of data usage and manipulation for the purposes of knowledge management and research reusability and replicability.

For the last few years, partners like the National Institutes of Health (NIH) and the Centers for Disease Control and Prevention (CDC) have been using Palantir Foundry as a research tool to advance the scientific evidence base for medicine and public health because it meets (and exceeds) the above needs to help researchers generate insights securely and at speed. In addition to the capabilities that allow researchers to use their data more effectively, Palantir software also ensures that organizations have full control over — and insight into — their data. Palantir is neither a “data broker” nor “data aggregator” — our software enables organizations to safely and securely govern their own data, and in every instance, our customers remain the sole decision-maker over how and where their data is used.

2022: Research Insights Across Public Health and Beyond

Scientific research takes time and care, and since the onset of the COVID-19 pandemic, our partners have worked with Palantir’s software to lay the groundwork for an important body of health research published in 2022. For example, in April 2020, we relied on our experience helping accelerate research at NIH over the years to help the National Center for Advancing Translational Sciences (NCATS) rapidly stand up the National COVID Cohort Collaborative (N3C), where Palantir Foundry serves as the backbone for the N3C Data Enclave.

Today, we are proud of the fact that the N3C Enclave offers researchers secure access to data from 17.6M individuals across Electronic Health Record (EHR) data from 77 medical centers — a capability which has to date enabled thousands of researchers to engage in over 400 active research projects, resulting in over 75 publications. Many other studies were published based on research that used our platform, including by authors at the CDC, National Cancer Institute (NCI), Veterans Affairs (VA) and the Administration for Strategic Preparedness and Response (ASPR) on health-related topics.

Below, we highlight a sample of some of the most impactful peer-reviewed articles published in 2022 that used Palantir Foundry to help drive their research.

Health & Life Sciences

Identifying who has long COVID in the USA: a machine learning approach using N3C data

  • Publication: The Lancet Digital Health
  • Authors: Pfaff, E.R., *Girvin, A.T., Bennett, T.D., Bhatia, A., Brooks, I.M., Deer, R.R., Dekermanjian, J.P., Jolley, S.E., Kahn, M.G., Kostka, K., McMurry, J.A., Moffitt, R., Walden, A., Chute, C.G., Haendel, M., The N3C Consortium
  • Summary: Long COVID is a complex, poorly defined, and heterogeneous condition, which is frequently undiagnosed. This groundbreaking paper describes a machine learning model built in the N3C Data Enclave, which is designed to identify patients likely to have Long COVID. At a time when thousands are suffering from long COVID but very little is known about the condition, the approach discussed in this paper provides the possibility for identifying and better supporting long COVID patients, as well as accelerating recruitment for clinical trials.

Harmonizing units and values of quantitative data elements in a very large nationally pooled electronic health record (EHR) dataset

  • Publication: Journal of the American Medical Informatics Association
  • Authors: *Bradwell, K.R., Wooldridge, J.T., Amor, B., Bennett, T.D., Anand, A., Bremer, C., Yoo, J.Y., Qian, Z., Johnson, S.G., Pfaff, E.R., *Girvin, A.T., *Manna, A., *Niehaus, E.A., Hong, S.S., Zhang, X.T., Zhu, R.L., *Bissell, M., *Qureshi, N., Saltz, J., Haendel, M.A., Chute, G.C., Lehmann, H.P., Moffitt, R.A., the N3C Consortium
  • Summary: Using large-scale EHR data for research is inherently challenging. Just like the real life it reflects, EHR data is messy. It’s modeled differently system to system and there is no easy way to extract relevant, harmonized, information. In this publication, the authors describe their foundational approach to make EHR quantitative measurement data — that would have otherwise been challenging to use — more usable for research. This method originated from the N3C Data Enclave but can be applied far beyond.

Characteristics, Outcomes, and Severity Risk Factors Associated With SARS-CoV-2 Infection Among Children in the US National COVID Cohort Collaborative

  • Publication: JAMA Network Open
  • Authors: Martin, B., DeWitt, P.E., Russell, S., Anand, A., *Bradwell, K.R., Bremer, C., Gabriel, D., *Girvin, A.T., Hajagos, J.G., McMurry, J.A., Neumann, A.J., Pfaff, E.R., Walden, A., Wooldridge, J.T., Yoo, Y.J., Saltz, J., Gersing, K.R., Chute, C.G., Haendel, M.A., Moffitt, R., Bennett, T.D.
  • Summary: The early days of COVID-19 saw many children experience significant burden of illness despite early predictions they would experience mild cases. This important study identified the risk factors that predisposed children to be at risk of severe COVID-19, using data from the N3C Data Enclave.

Association of Radiation Therapy With Risk of Adverse Events in Patients Receiving Immunotherapy: A Pooled Analysis of Trials in the US Food and Drug Administration Database

  • Publication: JAMA Oncology
  • Authors: Anscher, M.S., Arora, S., Weinstock, C., Amatya, A., *Bandaru, P., Tang, C., *Girvin, A.T., Fiero, M.H., Tang, S., *Lubitz, R., Amiri-Kordestani, L., Theoret, M.R., Pazdur, R., Beaver, J.A.
  • Summary: These researchers brought together large-scale, patient-level data from across clinical trials in the FDA’s Palantir Foundry instance to examine whether cancer patients were at risk for serious adverse events if they received two common cancer therapies concurrently. This retrospective analysis suggests that receiving both radiation therapy and an immune inhibitor checkpoint drug within 90 days of one another was, in fact, safe.

Cancer genes disfavoring T cell immunity identified via integrated systems approach

  • Publication: Cell Reports
  • Authors: Kishton, R.J., Patel, S.J., Decker, A.E., Vodnala, S.K., Cam, M., Yamamoto, T.N., Patel, Y., Sukumar, M., Yu, Z., Ji, M., Henning, A.N., Gurusamy, D., Palmer, D.C., *Stefanescu, R., *Girvin, A.T., Lo, W., Pasetto, A., Malekzadeh, P., Deniger, D.C., Wood, K.C., Sanjana, N.E., Restifo, N.P.
  • Summary: In this publication, researchers worked to identify genes that reduce the efficacy of adoptive T cell therapies for cancer in order to pave the way for improved immunotherapy treatments. Using the NCI instance of Foundry, they conducted a bioinformatic analysis of public and private cancer genomic data.

Public Health

Dispensing of Oral Antiviral Drugs for Treatment of COVID-19 by Zip Code–Level Social Vulnerability — United States, December 23, 2021–May 21, 2022

  • Publication: Morbidity and Mortality Weekly Report
  • Authors: Gold, J.A.W., *Kelleher, J., *Magid, J., Jackson, B.R., Pennini, M.E., Kushner, D., Weston, E.J., Rasulnia, B., Kuwabara, S., Bennett, K., Mahon, B.E., Patel, A., Auerbach, J.
  • Summary: With the advent of oral antiviral drugs for COVID-19, Americans had a new line of defense against the disease. However, the availability of these drugs alone doesn’t equate to their widespread use. Relying on data in a CDC instance of Foundry, this publication examines the use of antivirals in the U.S. by ZIP code to identify a new equity-oriented strategy for antiviral access and use.

Improving efficiency of COVID-19 aggregate case and death surveillance data transmission for jurisdictions: current and future role of application programming interfaces (APIs)

  • Publication: Journal of the American Medical Informatics Association
  • Authors: Khan, D., Park, M., Lerma, S., Soroka, S., Gaughan, D., Bottichio, L., Bray, M., Fukushima, M., Bregman, B., Wiedeman, C., Duck, W., Dee, D., Gundlapalli, A., Suthar, A.
  • Summary: To gather daily COVID-19 data and understand the state of the pandemic, the CDC initially relied on a mix of case and death reporting and aggregate case and death surveillance data. Over time, the entities providing aggregate data to the CDC needed to update the data retroactively. The CDC initially relied on a manual process but identified APIs to be a much more efficient and automated way to receive timely data updates.

COVID-19 Vaccine Provider Access and Vaccination Coverage Among Children Aged 5–11 Years — United States, November 2021–January 2022

  • Publication: Morbidity and Mortality Weekly Report
  • Authors: Kim, C., Yee, R., Bhatkoti, R., Carranza, D., Henderson, D., Kuwabara, S., Trinidad, J.P., Radesky, S., Cohen, A., Vogt, T.M., Smith, Z., Duggar, C., Chatam-Stephens, K., Ottis, C., Rand, K., Lim, T., Jackson, A.F., Richardson, D., *Jaffe, A., *Lubitz, R., *Hayes, R., Zouela, A., Kotulich, D., Kelleher, P.N., Guo, A., Pillai, S.K., Patel, A.
  • Summary: Ensuring a safe return to school for children and their families in 2021 required a vaccination strategy that entailed both access and outreach, especially for those most vulnerable. This paper used data on vaccinations and COVID rates, alongside the Social Vulnerability Index (SVI) in an HHS instance of Foundry, to understand the success of the pediatric vaccination campaign and highlighted critical gaps in vaccinations.

What’s Next for Health Research at Palantir

Many use data and the Palantir platform to optimize operational decision-making. Yet, as shown above, a number of our health partners use their data in research to uncover new scientific insights that can be transformative for the health of individuals, communities, and populations at large. We’re proud of the potential impact of these insights and are excited to continue this important research work with our partners.

At its core, Foundry makes data that was previously inaccessible to researchers — whether due to scale, type, or location — accessible, unlocking the possibility of countless new scientific discoveries. In 2023, and beyond, we remain committed to enabling new discoveries for the benefit of individual, community, and public health.

To learn more about our continuing work across health and life sciences, visit https://www.palantir.com/offerings/federal-health/.

*Authors affiliated with Palantir Technologies.


Scientific Research with Palantir was originally published in Palantir Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

AI Generated Robotic Content

Recent Posts

Using Amazon Q Business with AWS HealthScribe to gain insights from patient consultations

With the advent of generative AI and machine learning, new opportunities for enhancement became available…

2 hours ago

How a 12-Ounce Layer of Foam Changed the NFL

Even the makers of the Guardian Cap admit it looks silly. But for a sport…

3 hours ago

Combining next-token prediction and video diffusion in computer vision and robotics

In the current AI zeitgeist, sequence models have skyrocketed in popularity for their ability to…

3 hours ago

What Is Perplexity AI? Understanding One Of Google’s Biggest Search Engine Competitors

What is Perplexity AI? Is it an over-hyped replacement for Google as a search engine,…

1 day ago

Scalable Private Search with Wally

This paper presents Wally, a private search system that supports efficient semantic and keyword search…

1 day ago

How DPG Media uses Amazon Bedrock and Amazon Transcribe to enhance video metadata with AI-powered pipelines

This post was co-written with Lucas Desard, Tom Lauwers, and Sam Landuydt from DPG Media.…

1 day ago