The Efficacy and Ethics of AI Must Move Beyond the Performative to the Operational

Editors Note: Written by Courtney Bowman, Palantir’s Global Director of Privacy and Civil Liberties Engineering, this blog highlights our belief in the need for an integrated and operationally-orientated approach to artificial intelligence, which acknowledges its limitations, while placing ethics and efficacy at the heart of its use. You can read more about our principled approach to AI ethics here.

AI Beyond the Hype

Software vendors pitching AI solutions that don’t work and AI ethicists focused on distant abstractions should be confronted with a basic question: Are they addressing consequential real-world issues, or are they amusing themselves with toy problems?

The future of effective and responsible technology — including the field of artificial intelligence— will not be settled in a laboratory or behind a desk. Yet, unfortunately, some of the most preeminent minds in AI have for years directed their attention to building glossy demos that will never reliably deploy. Others have focused on consumer internet-facing applications aimed at manipulating, exploiting, and monetizing the private information of citizens, all while denouncing the immorality of applying similar technical skills to projects that actually contribute to the safety and security of wider society.

From self-driving vehicles [1], to radiology [2], and predicting job success based on candidate video snippets [3], there is a growing disillusionment with AI snake oil [4], alongside an increasing need to discover the credible bedrock underneath the sands of an AI hype cycle [5]. Preserving the truly valuable contributions of AI/ML from regulatory backlash [6], a painful hangover, or even an “AI Winter” will require an integrated effort to address the biggest challenges in the field: ethics and efficacy. In practice, this means examining what works, discarding what doesn’t, and recentering our moral frameworks around the contexts and whole-of-domain challenges of operationalized AI and away from vain musings about paper clips and trollies.

We see both sides of this continued erosion of confidence as symptoms of a single confusion: the tendency to adopt lofty and often performative aspirations without working through the essential groundwork that should apply to any technology intended to be deployed in the real world.

Palantir’s approach to building technology, including operating systems that enable AI integration and AI-assisted workflows, is to start with the operational context — whether it’s a pandemic-constrained supply chain, an aircraft manufacturer’s assembly line, or a warfighter’s battlefield assessment — and build and adapt software solutions that contextualize challenges of the environment on their own terms and in full view of their complexities, not as a grafted abstraction. This approach drives the ways we think — not only about the efficacy of our technology, but also the ethics surrounding its use. Most critically, this approach points to an important practical convergence of AI ethics and efficacy, i.e. that the practices and approaches to building AI that works effectively in the real world also align closely with the real and applied demands of morally defensible technologies.

Why is there So Much Confusion to Begin With?

For starters, the ambiguity around what constitutes AI sets us on shaky ground. Broadly speaking, ‘artificial intelligence’ has been used to refer to a cluster of technologies ranging from statistical computing to expert systems, heuristics, and artificial neural networks. Much of what we were describing ten years ago as “big data” or “predictive analytics” has now been rebranded as AI. Automation and AI are also often used interchangeably, despite there being important categorial distinctions between the two fields (automation aims to mechanize human activities, sometimes using AI tools; AI aims to synthesize or mimic functions, problem-solving activities, and even decision-making traditionally associated with forms of human intelligence).

These definitional uncertainties at best speak to a set of family resemblances [7] across types of technologies that have been haphazardly grouped under the banner of “artificial intelligence.” This categorical confusion is further muddled by promoters of various professional stripes who seek to exploit a sense of general exuberance in order to promote their own direct interests, whether or not those interests align with real and defensible outcomes — that is, whether or not the technologies they are touting actually work.

As a fragmented collection of technologies, AI too often fails to live up to promotional hype from industry, academia, commentators, and policymakers, while spiraling ever further into a critical maelstrom characterized by an increasingly troubled landscape of concerns about algorithmic fairness, accountability, and transparency.

Has AI Ethics Lost its Way?

So much of what has been written, promulgated, and is expected in the space of AI ethics principles has unfortunately become a pro forma exercise — a ‘check-the-box’ task that is more rote compliance than actionable guidance for navigating the extremely complex ethical considerations faced by real operators and AI systems users. ‘Ethics as theory’ provides the tools for musing about issues that — while theoretically interesting — may be practically useless.

Indeed, the volume of AI ethics principles statements has become so massive as to have generated a veritable cottage industry of AI ethics meta-studies [8]. Commentators have pointed to the inadequacy of lofty ethics principles [9] alone to address the real ethical challenges of AI and even signaled a developing crisis of legitimacy [10] around their proliferation.

Perhaps the most damning indictment of the state of AI ethics principles can be seen in how generic varying frameworks have become. Guidance around ‘trustworthy AI’ produced by China’s Ministry of Industry and Information Technology is virtually indistinguishable from frameworks published by major consultancy firms and think-tanks [11]. Something has plainly gone amiss if a society adopting mass surveillance and AI-facilitated social credit scoring [12] as a tool for comprehensive social control can espouse virtually the same principled ethical posture as public and private sectors institutions of liberal democracies.

For these reasons, we have previously resisted calls to publish a Palantir statement on AI Ethics. Instead, we have tended to redirect such discussions towards what we view as critical framing questions, such as: What makes AI separate from other technologies such that it would require a distinct ethical treatment? Are there more fundamental concerns that ought to be addressed to situate AI as a practice or discipline, let alone as a domain of formal ethical treatment? What good is an articulation of abstract ethical principles that may offer no meaningful or direct translation into practice?

This is not to suggest that all approaches to AI ethics are without merit. On the contrary, we appreciate that AI ethics frameworks concerned with risks of algorithmic bias, accountability, and explainability often do genuinely aspire to address concerns that should not be minimized or ignored — indeed, they are important and have their place in responsible technology architectures.

We do, however, believe it’s important to call attention to how industry, academia, and policymakers may have succumbed to a kind of tunnel vision and the somewhat misguided view that the most critical concerns at the core of AI technology adoption and use reside in these narrower fixations, rather than in the full view of systems (that encompass algorithms as single component parts) embedded in real-world environments.

The approach to AI and technology ethics that we have adopted and publicly share, is instead built upon a more expansive recognition that our technology — the software platforms we build and deploy to our customers — does not exist in vacuum, but rather is inextricably tied to the context of its application, its operational uses, and the full data-operating environment that surrounds the much narrower AI components that many others appear fixated upon.

We assert an ethics of technology that applies to the full contexts of its use. These contexts each assert their own situated set of domain-specific demands, functional expectations, and ethical obligations. This framing compels us to put AI in its appropriate place: as a tool among other tools of varying sophistication and inexorably embedded in a world of tangible actions and consequences.

Is AI Delivering the Promised Goods?

When it comes to the question of AI efficacy, the instances of AI falling woefully short of its hyped vision are numerous and growing. In fact many of the oft-touted successes prove, on closer inspection, to be dramatically overstated [13] or just plain fabricated [14]. Take self-driving vehicles: outside of very limited and dramatically constrained operational design domains, they have simply failed to achieve Level 5 automation [15], despite nearly a decade of assurances that the mark was just around the corner [16]. Some companies like Argo AI have also succumbed to market pressure in the face of failed delivery and opted to shut down entirely [17].

AI built to monetize the consumer-facing internet has demonstrated how algorithmically-driven social media aimed at relentlessly driving engagement in the name of monetization has neglected the complex externalities of society, politics, and the world at large. The second- and third-order effects take the form of AI-driven content moderation intended to combat AI-enabled echo chamber effects and misinformation amplification, which have contributed to seemingly unprecedented levels of political instability, division, incivility, and distrust for institutions.

That’s not to suggest that there aren’t places of marked success, where AI delivers significant utility. One prominent example is Natural Language Processing (NLP) for translation uses (e.g., Google Translate, DeepL). Translation applications work quite well for a host of everyday uses. But they also start to bump against limitations that motivate supervision and close reading for high-quality translation involving, for example, the subtleties of poetry and literary prose, as well as more quotidian texts, such as social media posts implicating complex, ambiguous, and rapidly evolving modes of speech.

But here the examples of success almost demonstrate the rule: NLP translation works on the back of decades of attempts and largely as a consequence of the harnessing of a corpus of already translated texts on the web to train against. The transition from laboratory result to practical application was anything but an overnight flip of a switch.

We are yet in the early days of exuberance over large language model (LLM) tools like GPT-3, GLaM, Gopher, LLaMA, and Chinchilla that seem to present fascinating — or at least amusing — chatbot and text-generation capabilities. However, some concerns and criticisms have rapidly surfaced, suggesting that LLMs operate more as stochastic parrots of language [18] amounting to something closer to impressive “bullshit machines” [19] than a genuine intelligence that understands the world they appear to be musing about [20]. Machine-assisted projects that generate text based on a corpus of other texts may make for titillating, and even unsettling interactions [21], but in and of themselves they fundamentally lack anything approaching an understanding of the semantics — the meaning, the real-world significance — of the words they string together. This disconnect between syntax and semantics [22] becomes all the more fraught in applications where life or death decisions hinge on specific claims about what is true in the world.

Nevertheless, we appreciate that there are sensible pathways to legitimate and defensible applications of LLMs in a variety of settings. It should however serve as an important reminder that, like other classes of AI technologies, there are limits to their applicability and that those limits will be largely determined by the specific contexts and environments of intended operation.

What, then, is Palantir’s Approach to AI?

The core insight driving our perspective on AI is a recognition that the efficacy and ethical challenges of AI are fundamentally rooted in the socio-technical ecosystem and full operational contexts in which AI serves not as a comprehensive panacea, but rather as a tool to help facilitate and augment meaningful outcomes for people across the world. And the best way to address the ethics of AI is to reorient the discussion to that full systems context — what we have elsewhere referred to as “operational AI.”

If we start to take artificial intelligence more as tools for human use, we become better equipped to situate AI in appropriate framing contexts that recognize its critical features, constraints, liabilities, and dependencies:

Individual models are capable of enabling or supporting modest tasks — not magical outcomes.
AI capabilities are dependent on their supporting infrastructure.
AI is fragile and can often fail if not properly maintained.
AI’s resilience is subject to the conditions that surround it.
There are real harms that can be produced by AI’s failings, but those implications are not limited strictly to structural failings of the AI model alone; they can (and more often than not do) also occur as a result of the infrastructure surrounding it and the human decision-making that flows from it.

Internalizing this reframing of AI as embedded tools rather than as true intelligence helpfully sidesteps or dissolves several of the key conundrums that seem to produce so much of this dubious snake oil.

Palantir has over the years focused on effectively and responsibly operationalizing AI (for example, through model management capabilities for Palantir Foundry), while so much of the world has instead fixated on marketing visions of technology that rarely work as advertised and often fail with significant, and still-unfolding ramifications.

Our approach has always been to do the difficult and laborious work of grounding AI as a tool in its appropriate settings, to construct the full framework for methodological rigor that deals with both building and deploying it as part of an actual decision process, embedded in the full system and environment of an AI application.

What this means in practice is that our notion of operational AI encompasses several important features:

It treats AI models as embedded in the actual environment where they will be deployed in production, and not as laboratory projects. This means that model inputs, users, model output, and consequences are not viewed as abstractions, but rather as real-world interactions.
It acknowledges that an appeal to fairness metrics and data bias is both a limited and dependent piece of the model evaluation puzzle. Limited in the sense that “fairness” is a qualitative concept that can at best be reductively translated into narrow quantitative terms. Dependent in the sense that these concepts are not universal abstractions, but rather only make sense in the specific context of their use: fairness matters against a specific historical, cultural, institutional background; all data is biased to begin with, but the real question is which biases do we want or need to include and exclude.
It looks at the full lifecycle of data and model management and provides tools for continuous testing and evaluation. In product terms, this translates into:
tracking the full provenance and lineage of all data and model branchings;
constituting modeling efforts around a sensible ontology that translates raw data elements into context-specific concepts;
version controlling changes to data, models, parameters, conditions, etc.;
tracking how dynamic environmental factors modify usage and outcomes that can be used to ensure ongoing model performance and reliability;
performing continuous testing and evaluation, data quality, and integrity checks to bolster models against the inevitable impacts of entropy and brittleness; and
creating a persistent and reliable audit trail for all data processing steps for later analysis, troubleshooting, oversight, and accountability.
It treats model and systems maintenance as a critical and enduring condition for keeping the AI running, not building a model and assuming it will run unaided or somehow learn to keep itself running in good order as the world changes around it;
It views end-user interactions with model outputs as a central feature to the workings of the full AI system, not just as an afterthought. This translates into user-oriented interface considerations that help to better bring to life contextual considerations, model confidence measures, and other features that augment and support human decision-making on top of AI outputs.
It appreciates the need for human-oriented applications, not only because the human element provides critical contextual grounding, but also because it often serves as the moral fabric for ethical consequences of AI tool use.
It considers an honest accounting of the trade-offs, limits, and failings of the system as an essential deployment responsibility, not an afterthought.

Ultimately, it is this more grounded approach to AI that can lead to technology tools that are more likely to be reliable, durable, and effective. But it also happens that the principles and features of reliable and appropriately situated operational AI systems can align closely with the demands of ethical data science and engineering practices. When you lose sight of the solid foundation of technology applications and assume you’re dealing with magic, it’s hard to firmly ground the ethics of that technology’s use.

Impacting the World

The results yielded by our approach to AI are real, consequential, and impact our lives. They are not stuff of academic musings, but are products of years of toiling in the field with our customers, working to understand the complexities of their domains of application, grappling with the attendant legal, policy, and ethics questions surrounding their environments, and working to enable AI systems solutions that address those complexities on their own terms.

In forthcoming blog posts, we will present more details and case studies representing various domains of AI-enabled and -assisted applications of Palantir’s technologies. We believe, each of these examples will further illustrate how Palantir’s reframing of AI as systems embedded in operational, contextually-dependent environments has provided a pathway for moving beyond the performative to the operational aspects of technology that works for the world that needs it.

Author

Courtney Bowman is Global Director of Privacy and Civil Liberties Engineering at Palantir Technologies. His work addresses the confluence of issues at the intersection of policy, law, technology, ethics, and social norms. In working extensively with government and commercial partners, Bowman’s team focuses on enabling Palantir to build and deploy data integration, sharing, and analysis software that respects and reinforces privacy, security, and data protection principles and community expectations.

In his role, Bowman also works with the privacy advocacy world at large to ensure that concerns related to new and emerging technologies and data sources are addressed in the ongoing design and implementation of Palantir’s software platforms. Bowman is co-author of The Architecture of Privacy, which provides a multidisciplinary framework for designing and building privacy-protective information systems. Bowman is a frequent commentator on issues surrounding AI ethics, efficacy, and operationalization. Prior to Palantir, Bowman earned degrees in Physics and Philosophy at Stanford University and worked as a quantitative and economic analyst at Google.

…

[1] Dembski, W.A. “Automated Driving and Other Failures of AI.” Mind Matters, 20 April 2021, https://mindmatters.ai/2021/04/automated-driving-and-other-failures-of-ai/.
[2] Roberts, M., Driggs, D., Thorpe, M. et al. Common pitfalls and recommendations for using machine learning to detect and prognosticate for COVID-19 using chest radiographs and CT scans. Nat Mach Intell 3, 199–217 (2021), https://doi.org/10.1038/s42256-021-00307-0.
[3] Raghavan, M., Barocas, S., Kleinberg, J., & Levy, K. Mitigating Bias in Algorithmic Hiring: Evaluating Claims and Practices (June 21, 2019). ACM Conference on Fairness, Accountability, and Transparency (FAT*), 2020, https://ssrn.com/abstract=3408010 or http://dx.doi.org/10.2139/ssrn.3408010.
[4] https://www.cs.princeton.edu/~arvindn/talks/MIT-STS-AI-snakeoil.pdf
[5] https://www.gartner.com/en/information-technology/glossary/hype-cycle
[6] https://www.ftc.gov/business-guidance/blog/2023/02/keep-your-ai-claims-check?utm_source=govdelivery
[7] Wittgenstein, Ludwig (2001) [1953]. Philosophical Investigations. Blackwell Publishing. pp. § 65–71.
[8] Consider just a few examples of AI ethics meta-studies:
Jobin, A., Ienca, M. & Vayena, E. The global landscape of AI ethics guidelines. Nat Mach Intell 1, 389–399 (2019). https://doi.org/10.1038/s42256-019-0088-2; https://cyber.harvard.edu/story/2020-01/meta-analysis-shows-ai-ethics-principles-emphasize-human-rights; https://venturebeat.com/ai/ai-weekly-meta-analysis-shows-ai-ethics-principles-emphasize-human-rights/; Hickok, M. Lessons learned from AI ethics principles for future actions. AI Ethics 1, 41–47 (2021). https://doi.org/10.1007/s43681-020-00008-1.; Hagendorff, Thilo. (2022). A Virtue-Based Framework to Support Putting AI Ethics into Practice. Philosophy & Technology. 35. 10.1007/s13347–022–00553-z.; Lacroix, A. & Luccioni A. S. Metaethical Perspectives on ‘Benchmarking’ AI Ethics. https://arxiv.org/pdf/2204.05151.pdf.
[9] Mittelstadt, B. Principles alone cannot guarantee ethical AI. Nat Mach Intell 1, 501–507 (2019). https://doi.org/10.1038/s42256-019-0114-4.
[10] Latonero, M. “AI Principle Proliferation as a Crisis of Legitimacy.” Carr Center Discussion Paper Series, 2020–011, https://carrcenter.hks.harvard.edu/files/cchr/files/mark_latonero_ai_principles_6.pdf?m=1601910899.
[11] Sheehan, M. “Beijing’s Approach to Trustworthy AI Isn’t So Dissimilar from the World’s.” Marco Polo Org, 18 August 2021, https://macropolo.org/beijing-approach-trustworthy-ai/?rp=e.
[12] García, L. I. (2022). The Role of AI in a Security and Population Control System: Chinese Social Credit System. Handbook of Research on Artificial Intelligence in Government Practices and Processes. https://doi.org/10.4018/978-1-7998-9609-8.ch011.
[13] Smith, G. & Funk J. “AI has a long way to go before doctors can trust it with your life.” Quartz, 4 June 2021, https://qz.com/2016153/ai-promised-to-revolutionize-radiology-but-so-far-its-failing.
[14] Schellmann, H. & Wall, S. “We tested AI interview tools. Here’s what we found.” MIT Technology Review, 7 July 2021, https://www.technologyreview.com/2021/07/07/1027916/we-tested-ai-interview-tools/.
[15] ‘Level 5’ refers to a “full automation” in which the “System is fully responsible for driving tasks while occupants act only as passengers and do not need to be engaged. … When engaged, the system handles all driving tasks while you, now the passenger, are not needed to maneuver the vehicle. The system can operate the vehicle universally — under all conditions and on all roadways. A human driver is not needed to operate the vehicle.” See US Department of Transportation NHTSA resources for more detals: https://www.nhtsa.gov/technology-innovation/automated-vehicles-safety.
[16] Levin, T. “Elon Musk has promised self-driving Teslas for years. Experts say it’s not even close.” Business Insider, 26 February 2023, https://www.businessinsider.com/elon-musk-tesla-full-self-driving-promise-experts-2023-2.
[17] Korosec, K. “Ford, VW-backed Argo AI is shutting down.” Tech Crunch, 26 October 2022, https://techcrunch.com/2022/10/26/ford-vw-backed-argo-ai-is-shutting-down/.
[18] Weil, E. “You Are Not a Parrot And a chatbot is not a human. And a linguist named Emily M. Bender is very worried what will happen when we forget this.” New York Magazine, 27 February 2023, https://nymag.com/intelligencer/article/ai-artificial-intelligence-chatbots-emily-m-bender.html.
[19] McQuillan, D. “ChatGPT: The world’s largest bullshit machine.” Transforming Society, 10 February 2023, https://www.transformingsociety.co.uk/2023/02/10/chatgpt-the-worlds-largest-bullshit-machine/.
[20] Auslender, V. “Meaningless words: Dangerous conversations with ChatGPT.” CTech, 12 December 2022, https://www.calcalistech.com/ctechnews/article/vhhk7xrni.
[21] Roose, K. “A Conversation With Bing’s Chatbot Left Me Deeply Unsettled.” New York Times, 16 February 2023, https://www.nytimes.com/2023/02/16/technology/bing-chatbot-microsoft-chatgpt.html.
[22] Searle, J., 1980, ‘Minds, Brains and Programs’, Behavioral and Brain Sciences, 3: pp. 417–57.

The Efficacy and Ethics of AI Must Move Beyond the Performative to the Operational was originally published in Palantir Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.