Health & Life Sciences Research with Palantir: 2023 in Review

Health & Life Sciences Research with Palantir

2023 in Review

Health Research + Technology: A Turning Point

Palantir Foundry has long been instrumental in accelerating the research findings of our health and life science partners, helping achieve unprecedented insights, streamline data access, improve data usability, and facilitate advanced visualization and analysis of data sources — all while protecting the privacy and security of the backing data.

In 2023, Foundry supported over 50 peer-reviewed publications in esteemed journals, covering a diverse number of subjects — from hospital operations, to oncological drugs, to learning modalities. The year prior, our software supported a record number of peer-reviewed publications, which we highlighted in a prior blog post.

Our partners’ foundational investments in technical infrastructure during the peak of the COVID-19 pandemic has made the impressive quantity of publications possible.

Public and commercial health care partners have proactively scaled their investments in data sharing and research software beyond COVID response to build a more comprehensive data foundation for biomedical research. For example, the N3C Enclave — which houses the data of 21.5M patients from across almost 100 institutions — is being used daily by thousands of researchers across agencies and organizations. Given the complexity of accessing, organizing, and harnessing ever-expanding biomedical data, the demand for similar research resources continues to rise.

In this blog post, we take a closer look at some noteworthy publications from 2023 and examine what lies ahead for software-backed research.

Emerging Technology and the Acceleration of Scientific Research

The impact of new technologies on the scientific enterprise is accelerating research-based outputs at a previously impossible scale. Emerging technologies and advanced software are helping create more precise, organized, and accessible data assets, which in turn are allowing researchers to tackle increasingly complex scientific challenges. In particular, as a modular, interoperable, and flexible platform, Foundry has been used to support a diverse range of scientific studies with unique research functions, including AI-assisted therapeutics identification, real-world evidence generation, and more.

In 2023, the industry has also seen an exponential growth in interest around using Artificial Intelligence (AI) — and in particular, generative AI and large language models (LLM) — in the health and life science domains. Alongside other core technical advancements (e.g., around data quality and usability), the potential for AI-enabled software to accelerate scientific research is more promising than ever. As a commercial leader in AI-enabled software, Palantir has been at the forefront of finding responsible, secure, and effective ways to apply AI-enabled capabilities to support our partners across industries in achieving their most important missions.

Over the past year, Palantir software helped drive key components of our partners’ research and we stand ready to continue working together with our partners in government, industry, and civil society to tackle the most pressing challenges in health and science ahead. In the next section, we provide concrete examples of how the power of software can help advance scientific research, highlighting some key biomedical publications powered by Foundry in 2023.

2023 Publications Powered by Palantir Foundry

In addition to a number of important cancer and COVID treatment studies, Palantir Foundry also enabled new findings in the broader field of research methodology. Below, we highlight a sample of some of the most impactful peer-reviewed articles published in 2023 that used Palantir Foundry to help drive their research.

Identifying new effective drug combinations for multiple myeloma

Drug combinations identified by high-throughput screening promote cell cycle transition and upregulate Smad pathways in myeloma

  • Publication: Cancer Letters
  • Authors: Peat, T.J., Gaikwad, S.M., Dubois, W., Gyabaah-Kessie, N., Zhang, S., Gorjifard, S., Phyo, Z., Andres, M., Hughitt, V.K., Simpson, R.M., Miller, M.A., Girvin, A.T., Taylor, A., Williams, D., D’Antonio, N., Zhang, Y., Rajagopalan, A., Flietner, E., Wilson, K., Zhang, X., Shinn, P., Klumpp-Thomas, C., McKnight, C., Itkin, Z., Chen, L., Kazandijian, D., Zhang, J., Michalowski, A.M., Simmons, J.K., Keats, J., Thomas, C.J., Mock, B.A.
  • Summary: Multiple myeloma (MM) is frequently resistant to drug treatment, requiring continued exploration to identify new, effective therapeutic combinations. In this study, researchers utilized high-throughput drug screening to identify over 1900 compounds with activity against at least 25 of the 47 MM cell lines tested. From these 1900 compounds, 3.61 million combinations were evaluated in silico, and pairs of compounds with highly correlated activity across the 47 cell lines and different mechanisms of action were selected for further analysis. Specifically, six (6) drug combinations were effective at 1) reducing over-expression of a key protein (MYC) that is often linked to the production of malignant cells and 2) increased expression of the p16 protein, which can help the body suppress tumor growth. Furthermore, three (3) identified drug combinations increased chances of survival and decreased the growth of cancer cells, in part by reducing activity of pathways involved in TGFβ/SMAD signaling, which regulate the cell life cycle. These preclinical findings identify potentially useful novel drug combinations for difficult to treat multiple myeloma.

New rank-based protein classification method to improve glioblastoma treatment

RadWise: A Rank-Based Hybrid Feature Weighting and Selection Method for Proteomic Categorization of Chemoirradiation in Patients with Glioblastoma

  • Publication: Cancers
  • Authors: Tasci, E., Jagasia, S., Zhuge, Y., Sproull, M., Cooley Zgela, T., Mackey, M., Camphausen, K., Krauze, A.V.
  • Summary: Glioblastomas, the most common type of cancerous brain tumors, vary greatly, limiting the ability to analyze the biological factors that drive whether glioblastomas will respond to treatment. However, data analysis of the proteome — the entire set of proteins that can be expressed by the tumor — can 1) offer non-invasive methods of classifying glioblastomas to help inform treatment and 2) identify protein biomarkers associated with interventions to evaluate response to therapy. In this study, researchers developed and tested a novel rank-based weighting method (“RadWise”) for protein features to help ML algorithms focus on the the most relevant factors that indicate post-therapy outcomes. RadWise offers a more effective pathway to identify the proteins and features that can be key targets for treatment of these aggressive, fatal tumors.

Identifying liver cancer subtypes likely to respond to immunotherapy

Tumor biology and immune infiltration define primary liver cancer subsets linked to overall survival after immunotherapy

  • Publication: Cell Reports Medicine
  • Authors: Budhu, A., Pehrsson, E.C., He, A., Goyal, L., Kelley, R.K., Dang, H., Xie, C., Monge, C., Tandon, M., Ma, L., Revsine, M., Kuhlman, L., Zhang, K., Baiev, I., Lamm, R., Patel, K., Kleiner, D.E., Hewitt, S.M., Tran, B., Shetty, J., Wu, X., Zhao, Y., Shen, T.W., Choudhari, S., Kriga, Y., Ylaya, K., Warner, A.C., Edmondson, E.F., Forgues, M., Greten, T.F., Wang, X.W.
  • Summary: Liver cancer is a rising cause of cancer deaths in the US. This study investigated variation in patient outcomes for a type of immunotherapy using immune checkpoint inhibitors. Researchers noted that certain molecular subtypes of cancer, defined by 1) the aggressiveness of cancer and 2) the microenvironment of the cancer cells, were linked to higher survival rates with immune checkpoint inhibitor therapy. Identifying these molecular subtypes can help doctors identify whether a patient’s unique cancer is likely to respond to this type of intervention, meaning they can apply more targeted use of immunotherapy and improve likelihood of success.

Applying algorithms to EHR data to infer pregnancy timing for more accurate maternal health research

Who is pregnant? defining real-world data-based pregnancy episodes in the National COVID Cohort Collaborative (N3C)

  • Publication: JAMIA, Women’s Health Special Edition
  • Authors: Jones, S., Bradwell, K.R.*, Chan, L.E., McMurry, J.A., Olson-Chen, C., Tarleton, J., Wilkins, K.J., Qin, Q., Faherty, E.G., Lau, Y.K., Xie, C., Kao, Y.H., Liebman, M.N., Ljazouli, S.*, Mariona, F., Challa, A., Li, L., Ratcliffe, S.J., Haendel, M.A., Patel, R.C., Hill, E.L.
  • Summary: There are indications that COVID-19 can cause pregnancy complications, and pregnant persons appear to be at higher risk for more severe COVID-19 infection. Analysis of health record (EHR) data can help provide more insight, but due to data inconsistencies, it is often difficult to ascertain 1) pregnancy start and end dates and 2) gestational age of the baby at birth. To help, researchers adapted an existing algorithm for determining gestational age and pregnancy length that relies on diagnostic codes and delivery dates. To increase the accuracy of this algorithm, the researchers layered on their own data-driven algorithms to precisely infer pregnancy start, pregnancy end, and landmark time frames throughout a pregnancy’s progression while also addressing EHR data inconsistency. This method can be reliably used to make the foundational inference of pregnancy timing and can be applied to future pregnancy and maternity research on topics such as adverse pregnancy outcomes and maternal mortality.

A novel method for resolving EHR data quality issues for clinical encounters

Clinical encounter heterogeneity and methods for resolving in networked EHR data: a study from N3C and RECOVER programs

  • Publication: JAMIA
  • Authors: Leese, P., Anand, A., Girvin, A.*, Manna, A.*, Patel, S., Yoo, Y.J., Wong, R., Haendel, M., Chute, C.G., Bennett, T., Hajagos, J., Pfaff, E., Moffitt, R.
  • Summary: Clinical encounter data can be a rich resource for research, but it often varies greatly across providers, facilities, and institutions, making it difficult to uniformly analyze. This inconsistency is magnified when multisite electronic health record (EHR) data is networked together in a central database. In this study, researchers developed a novel, generalizable method for resolving clinical encounter data for analysis by combining related encounters into composite “macrovisits.” This methodology helps manipulate and resolve EHR encounter data issues in a generalizable, repeatable way, allowing researchers to more easily unlock the potential of this rich data for large-scale studies.

Improving transparency in phenotyping for Long COVID research and beyond

De-black-boxing health AI: demonstrating reproducible machine learning computable phenotypes using the N3C-RECOVER Long COVID model in the All of Us data repository

  • Publication: Journal of the American Medical Informatics Association
  • Authors: Pfaff, E.R., Girvin, A.T.*, Crosskey, M., Gangireddy, S., Master, H., Wei, W.Q., Kerchberger, V.E., Weiner, M., Harris, P.A., Basford, M., Lunt, C., Chute, C.G., Moffitt, R.A., Haendel, M.; N3C and RECOVER Consortia
  • Summary: Phenotyping, the process of evaluating and categorizing an organism’s characteristics, can help scientists better understand the differences between individuals and groups of individuals, and to identify specific traits that may be linked to certain diseases or conditions. Machine learning (ML) can help derive phenotypes from data, but these are challenging to share and reproduce due to their complexity. Researchers in this study devised and trained an ML-based phenotype to identify patients highly probable to have Long COVID, an increasingly urgent public health consideration, and showed applicability of this method for other environments. This is a success story of how transparent technology and collaboration can make phenotyping algorithms more accessible to a broad audience of researchers in informatics, reducing duplicated work and providing them with a tool to reach insights faster, including for other diseases.

Navigating challenges for multisite real world data (RWD) databases

Data quality considerations for evaluating COVID-19 treatments using real world data: learnings from the National COVID Cohort Collaborative (N3C)

  • Publication: BMC Medical Research Methodology
  • Authors: Sidky, H., Young, J.C., Girvin, A.T.*, Lee, E., Shao, Y.R., Hotaling, N., Michael, S., Wilkins, K.J., Setoguchi, S., Funk, M.J.; N3C Consortium
  • Summary: Working with large scale centralized EHR databases such as N3C for research requires specialized knowledge and careful evaluation of data quality and completeness. This study examines the process of assessing data quality in preparation for research, focusing on drug efficacy studies. Researchers identified several methods and best practices to better characterize important study elements including exposure to treatment, baseline health comorbidities, and key outcomes of interest. As large scale, centralized real world databases become more prevalent, this is a helpful step forward in helping researchers more effectively navigate their unique data challenges while unlocking crucial applications for drug development.

What’s Next for Health Research at Palantir

While 2023 saw important progress, the new year brings with it new possibilities, as well as an urgency to apply the latest technical advancements to the most important health issues facing individuals, communities, and the public at large. For example, in 2023, the U.S. Government reaffirmed its commitment to combating systemic diseases such as cancer, and even launched a new health agency, the Advanced Research Projects Agency for Health (ARPA-H).

Furthermore, in 2024, Palantir is proud to be an industry partner in the innovative National AI Research Resource (NAIRR) pilot program, created under the auspices of the National Science Foundation (NSF) and with funding from the NIH. As part of the NAIRR pilot — whose launch was directed by the Biden Administration’s Executive Order on Artificial Intelligence — Palantir will be working with its long-time partners at the National Institutes of Health (NIH) and N3C to support research in advancing safe, secure, and trustworthy AI, as well as the application of AI to challenges in healthcare.

In 2024, we’re excited to work with partners, new and old, on issues of critical importance, applying our learnings on data, tools, and research to help enable meaningful improvements in health outcomes for all.

To learn more about our continuing work across health and life sciences, visit https://www.palantir.com/offerings/federal-health/.

*Authors affiliated with Palantir Technologies


Health & Life Sciences Research with Palantir: 2023 in Review was originally published in Palantir Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.