Responsible AI at Google Research: Context in AI Research (CAIR)

Artificial intelligence (AI) and related machine learning (ML) technologies are increasingly influential in the world around us, making it imperative that we consider the potential impacts on society and individuals in all aspects of the technology that we create. To these ends, the Context in AI Research (CAIR) team develops novel AI methods in the context of the entire AI pipeline: from data to end-user feedback. The pipeline for building an AI system typically starts with data collection, followed by designing a model to run on that data, deployment of the model in the real world, and lastly, compiling and incorporation of human feedback. Originating in the health space, and now expanded to additional areas, the work of the CAIR team impacts every aspect of this pipeline. While specializing in model building, we have a particular focus on building systems with responsibility in mind, including fairness, robustness, transparency, and inclusion.


The CAIR team focuses on understanding the data on which ML systems are built. Improving the standards for the transparency of ML datasets is instrumental in our work. First, we employ documentation frameworks to elucidate dataset and model characteristics as guidance in the development of data and model documentation techniques — Datasheets for Datasets and Model Cards for Model Reporting.

For example, health datasets are highly sensitive and yet can have high impact. For this reason, we developed Healthsheets, a health-contextualized adaptation of a Datasheet. Our motivation for developing a health-specific sheet lies in the limitations of existing regulatory frameworks for AI and health. Recent research suggests that data privacy regulation and standards (e.g., HIPAA, GDPR, California Consumer Privacy Act) do not ensure ethical collection, documentation, and use of data. Healthsheets aim to fill this gap in ethical dataset analysis. The development of Healthsheets was done in collaboration with many stakeholders in relevant job roles, including clinical, legal and regulatory, bioethics, privacy, and product.

Further, we studied how Datasheets and Healthsheets could serve as diagnostic tools that surface the limitations and strengths of datasets. Our aim was to start a conversation in the community and tailor Healthsheets to dynamic healthcare scenarios over time.

To facilitate this effort, we joined the STANDING Together initiative, a consortium that aims to develop international, consensus-based standards for documentation of diversity and representation within health datasets and to provide guidance on how to mitigate risk of bias translating to harm and health inequalities. Being part of this international, interdisciplinary partnership that spans academic, clinical, regulatory, policy, industry, patient, and charitable organizations worldwide enables us to engage in the conversation about responsibility in AI for healthcare internationally. Over 250 stakeholders from across 32 countries have contributed to refining the standards.

Healthsheets and STANDING Together: towards health data documentation and standards.


When ML systems are deployed in the real world, they may fail to behave in expected ways, making poor predictions in new contexts. Such failures can occur for a myriad of reasons and can carry negative consequences, especially within the context of healthcare. Our work aims to identify situations where unexpected model behavior may be discovered, before it becomes a substantial problem, and to mitigate the unexpected and undesired consequences.

Much of the CAIR team’s modeling work focuses on identifying and mitigating when models are underspecified. We show that models that perform well on held-out data drawn from a training domain are not equally robust or fair under distribution shift because the models vary in the extent to which they rely on spurious correlations. This poses a risk to users and practitioners because it can be difficult to anticipate model instability using standard model evaluation practices. We have demonstrated that this concern arises in several domains, including computer vision, natural language processing, medical imaging, and prediction from electronic health records.

We have also shown how to use knowledge of causal mechanisms to diagnose and mitigate fairness and robustness issues in new contexts. Knowledge of causal structure allows practitioners to anticipate the generalizability of fairness properties under distribution shift in real-world medical settings. Further, investigating the capability for specific causal pathways, or “shortcuts”, to introduce bias in ML systems, we demonstrate how to identify cases where shortcut learning leads to predictions in ML systems that are unintentionally dependent on sensitive attributes (e.g., age, sex, race). We have shown how to use causal directed acyclic graphs to adapt ML systems to changing environments under complex forms of distribution shift. Our team is currently investigating how a causal interpretation of different forms of bias, including selection bias, label bias, and measurement error, motivates the design of techniques to mitigate bias during model development and evaluation.

Shortcut Learning: For some models, age may act as a shortcut in classification when using medical images.

The CAIR team focuses on developing methodology to build more inclusive models broadly. For example, we also have work on the design of participatory systems, which allows individuals to choose whether to disclose sensitive attributes, such as race, when an ML system makes predictions. We hope that our methodological research positively impacts the societal understanding of inclusivity in AI method development.


The CAIR team aims to build technology that improves the lives of all people through the use of mobile device technology. We aim to reduce suffering from health conditions, address systemic inequality, and enable transparent device-based data collection. As consumer technology, such as fitness trackers and mobile phones, become central in data collection for health, we explored the use of these technologies within the context of chronic disease, in particular, for multiple sclerosis (MS). We developed new data collection mechanisms and predictions that we hope will eventually revolutionize patient’s chronic disease management, clinical trials, medical reversals and drug development.

First, we extended the open-source FDA MyStudies platform, which is used to create clinical study apps, to make it easier for anyone to run their own studies and collect good quality data, in a trusted and safe way. Our improvements include zero-config setups, so that researchers can prototype their study in a day, cross-platform app generation through the use of Flutter and, most importantly, an emphasis on accessibility so that all patient’s voices are heard. We are excited to announce this work has now been open sourced as an extension to the original FDA-Mystudies platform. You can start setting up your own studies today!

To test this platform, we built a prototype app, which we call MS Signals, that uses surveys to interface with patients in a novel consumer setting. We collaborated with the National MS Society to recruit participants for a user experience study for the app, with the goal of reducing dropout rates and improving the platform further.

MS Signals app screenshots. Left: Study welcome screen. Right: Questionnaire.

Once data is collected, researchers could potentially use it to drive the frontier of ML research in MS. In a separate study, we established a research collaboration with the Duke Department of Neurology and demonstrated that ML models can accurately predict the incidence of high-severity symptoms within three months using continuously collected data from mobile apps. Results suggest that the trained models can be used by clinicians to evaluate the symptom trajectory of MS participants, which may inform decision making for administering interventions.

The CAIR team has been involved in the deployment of many other systems, for both internal and external use. For example, we have also partnered with Learning Ally to build a book recommendation system for children with learning disabilities, such as dyslexia. We hope that our work positively impacts future product development.

Human feedback

As ML models become ubiquitous throughout the developed world, it can be far too easy to leave voices in less developed countries behind. A priority of the CAIR team is to bridge this gap, develop deep relationships with communities, and work together to address ML-related concerns through community-driven approaches.

One of the ways we are doing this is through working with grassroots organizations for ML, such as Sisonkebiotik, an open and inclusive community of researchers, practitioners and enthusiasts at the intersection of ML and healthcare working together to build capacity and drive forward research initiatives in Africa. We worked in collaboration with the Sisonkebiotik community to detail limitations of historical top-down approaches for global health, and suggested complementary health-based methods, specifically those of grassroots participatory communities (GPCs). We jointly created a framework for ML and global health, laying out a practical roadmap towards setting up, growing and maintaining GPCs, based on common values across various GPCs such as Masakhane, Sisonkebiotik and Ro’ya.

We are engaging with open initiatives to better understand the role, perceptions and use cases of AI for health in non-western countries through human feedback, with an initial focus in Africa. Together with Ghana NLP, we have worked to detail the need to better understand algorithmic fairness and bias in health in non-western contexts. We recently launched a study to expand on this work using human feedback.

Biases along the ML pipeline and their associations with African-contextualized axes of disparities.

The CAIR team is committed to creating opportunities to hear more perspectives in AI development. We partnered with Sisonkebiotik to co-organize the Data Science for Health Workshop at Deep Learning Indaba 2023 in Ghana. Everyone’s voice is crucial to developing a better future using AI technology.


We would like to thank Negar Rostamzadeh, Stephen Pfohl, Subhrajit Roy, Diana Mincu, Chintan Ghate, Mercy Asiedu, Emily Salkey, Alexander D’Amour, Jessica Schrouff, Chirag Nagpal, Eltayeb Ahmed, Lev Proleev, Natalie Harris, Mohammad Havaei, Ben Hutchinson, Andrew Smart, Awa Dieng, Mahima Pushkarna, Sanmi Koyejo, Kerrie Kauer, Do Hee Park, Lee Hartsell, Jennifer Graves, Berk Ustun, Hailey Joren, Timnit Gebru and Margaret Mitchell for their contributions and influence, as well as our many friends and collaborators at Learning Ally, National MS Society, Duke University Hospital, STANDING Together, Sisonkebiotik, and Masakhane.