12ARe5mfk0ia1ItwRGH52INwA
Large language models present both massive opportunities and significant complexities for the insurance industry. Insurers can use AI to increase operational efficiency, improve the accuracy of underwriting decisions, enhance customer experience, and more effectively coordinate with partners. Yet in a heavily-regulated industry like insurance, ensuring objectivity and the appropriate level of human oversight in policy underwriting is legally and ethically crucial, and governing AI at scale requires management and orchestration.
Insurers face increasing scrutiny about how they plan to deploy AI — scrutiny that will carry profound legal and financial implications. For example, the National Association of Insurance Commissioners Bulletin on Use of Artificial Intelligence Systems by Insurers states that insurers are expected to adopt practices, such as governance frameworks and risk management protocols, designed to ensure that the use of AI systems does not result in unfair practices. [1]
As of January 3, 2025, 21 jurisdictions have now formally adopted this bulletin. Regulatory and law enforcement agencies will carefully examine how insurers use AI to make decisions or support actions that impact consumers.
Through our extensive experience partnering with several major industry institutions grappling with these challenges, we have developed a set of best practices that we believe are crucial for enabling insurance institutions to deploy AI in production in accordance with, and anticipating, the most rigorous legal, regulatory, and ethical considerations.
In this blog post, which uses underwriting as an example core insurance business application, we provide an outline of these best practices, organized according to the following core themes:
1. Understandable — AI results must be interpretable and understandable.
2. Integrated — AI must be deeply integrated with existing business systems and processes.
3. Governed & Secure — AI must be securely governed and controlled at scale.
Insurers seeking to introduce AI must protect themselves by making their AI-based processes explainable, traceable, and auditable. And those that master these capabilities now will turn AI governance from a regulatory burden into a strategic weapon, seizing a decisive competitive edge.
AI within underwriting can enable underwriters to make better, faster and more consistent decisions. In addition to helping to reduce operational costs, it can help advance key underwriting KPIs, such as quote-to-bind ratios, gross/net premium, renewal retention, and ultimately, impact the loss ratio. It can enable better risk selection and vetting through improved data extraction and research (e.g., human rights considerations), facilitate fluid communications with partners, increase speed to decision, and support underwriter due diligence. And it can enhance integration with the deal lifecycle, from initial quote to closing.
More specifically, insurers can deploy AI within underwriting across the following steps:
Data Extraction: LLMs can be used to rapidly extract relevant data from submissions + attachments by detecting relevant fields within unstructured data, PDFs, loss runs, broker emails, etc.
Research Due Diligence: LLMs can be used as a research assistant, pulling in relevant information for the underwriting decision from internal or external sources. This could be by mapping extracted data to internal historical data (e.g. this submission is linked to an existing policyholder), or by researching across any number of external sources (e.g. this person or company requesting insurance was recently involved in a particular news story).
Exception Alerting: LLMs can be used to generate alerts for underwriters by automatically detecting and flagging exceptions to business underwriting standards (either qualitative or quantitative). This can also be used to triage or prioritize the submissions which most adhere to desired type of risk the business wants to underwrite.
Communication: LLMs can be used facilitate faster communication by generating suggested responses to the broker/agent based on the prior combination of automated flags generated and Human-In-The-Loop interventions.
For insurance companies using AI on a large scale, it’s crucial to have a clear and detailed understanding of how their data is being used and processed. This means being able to trace the journey of data from its original sources through various AI systems and processes at the most granular level. To do so, insurers must build a full, branched data lineage showing how data has flowed from all source systems and inputs. Such a data tree empowers them to monitor how well each AI tool or process is performing, identify the impact of any changes made to AI systems, and provide complete transparency for audits and regulatory compliance. This capability is vital for several classes of data, including:
For any individual executable prompt, organizations should be able to identify:
This comprehensive monitoring becomes deeply integrated with evaluations, giving organizations a full sense of ownership over the agents running across their systems, providing insights into performance, and helping to identify where degradation may be occurring.
Furthermore, without the ability to conduct thorough analysis of data across versions, branches and history, organizations face a high risk of introducing biases into decision-making, which can include:
Transitioning to AI-driven solutions in insurance requires a structured approach to experimentation: one that involves rigorous planning and phasing. Any AI underwriting system will typically be highly complex, consisting of hundreds of different prompts and many agents. A structured approach to testing and experimenting with AI systems allows insurers to fine-tune their tools effectively, adapt to new business areas, and ensure all important factors are considered when making changes.
Broadly speaking, a platform for structured experimentation should consist of the following steps:
When integrating LLMs into critical business processes like underwriting, organizations must ensure accuracy, consistency, and auditability. Proper LLM orchestration can address these concerns, allowing organizations to assure reproducibility (e.g. get the same output for the same input from the LLM) in scaled statistical tests:
Insurers introducing LLMs must integrate them with existing business systems to ensure that they are auditable, manageable, and impactful at scale.
Underwriting typically crosses multiple technical systems, including email servers (where submissions typically land), data lakes, transactional policy systems, underwriting workbench systems, and open source and external APIs. To provide meaningful and accurate output, any AI platform must bi-directionally integrate with such systems in a flexible and configurable way.
On the input side, such integration requires an extensible data connection framework that establishes connections with all types of source systems — structured, unstructured, or semi-structured — and with all key data transfer approaches, such as batch, micro-batch, or streaming. For example, to perform an underwriting analysis, insurers may need to integrate (1) inbound emails from brokers, including all attachments; (2) historical policy and claims information; and (3) live external sources of information.
On the output side, any decisions, transactions or updates undertaken by the AI must write back to systems of record. For example, if the AI system (1) extracts information from submission attachments; (2) generates alerts against underwriting standards; and (3) assigns a prioritization to the new submission, to be operationally useful, it must instantaneously write back such information to multiple systems. Specifically, any changes made to data, property values, or links should be recorded when the LLM takes an Action so that it can be reflected in all user applications.
Organizations typically first implement AI underwriting systems within a specific sub-line of business as a Proof Of Concept, and then later establish a roadmap for scaling LLMs across their enterprise. As they do so, they must ensure that the orchestration layer itself permits scalability, in the following ways:
Although many believe that AI can or will soon be able to replace humans in operational workflows — and many AI products market themselves as such — we believe that human judgment remains critical. From both a quality and compliance perspective, insurers cannot solely rely on AI to decide on underwriting an insurance policy or whether to approve or reject a claim. Our work in the insurance industry leverages AI to augment, not replace, human analysis: to simplify, automate, and improve the quality of tasks such as ingesting, processing, and extracting multi-modal data, thereby empowering humans to make better decisions.
AI solutions that seek to remove the need for human reasoning present a variety of concerns for the insurance industry. LLMs can function as “black boxes,” making it difficult to audit and understand the reasoning behind their outputs. And attempting to replace humans with LLMs alone — especially in such a critical industry — would undermine consumer confidence.
Fundamentally, humans are essential for auditing and evaluating production performance. In production-level workflows, humans are involved in editing LLM extractions, allowing capture of real-world evaluation data that can be reintroduced into the experimentation process, and serves as an indicator for the performance of the LLM. This human intervention acts as a powerful proxy for assessing production performance, ensuring that the AI system continues to operate effectively and in alignment with compliance requirements.
Within underwriting specifically, there are three key reasons humans remain essential to the underwriting process:
Insurance IT requirements are often rigid and unaccommodating for fast-moving innovation. Less structured workflows, such as AI augmentation, must meet these IT demands. Therefore, while building and deploying AI use cases, insurance leaders should expect to follow similar Continuous Integration/Continuous Deployment (CI/CD) workflows, such as in a traditional software promotion path. In the context of introducing LLMs, following CI/CD requires branching support for the LLM-powered workflows. Just like code, LLM prompts should be promoted across environments, from feature branches to staging to production. Teams must test any iteration of a prompt to ensure there is no degradation in performance between branches, environments and/or versions. To reduce time spent on the technical development of CI/CD workflows, and invest more time in recognizing AI value proposition, insurers should choose a platform that supports native and granular CI/CD.
As part of release cycles, IT departments of traditional insurance companies will require the kinds of integration and functional testing common in code releases. They will need to understand the non-determinism of AI (i.e., the fact that an LLM can give different responses to the same single input) and account for this phenomenon in production testing strategies.
Several mechanisms can reduce the risk that a small prompt or logic change will have large, inadvertent consequences:
Insurers must account for several security concerns when implementing AI. Addressing each of the below concerns at a platform level is paramount to protecting an insurance company’s overarching security posture.
Gating sources of prompt injections:
Reducing the injection surface area:
Mitigating consequences of prompt injections:
Monitoring & alerting:
Strict access controls must be a first-class feature of any AI solution. LLMs should be harnessed using a strong, role-based access control system. The technical solution should mitigate against risks of data leakage through the following key features.
Just-in-time gates to LLM inputs:
Constraining and scrubbing LLM-generated outputs:
Preventing sensitive data from getting trained (or fine-tuned) into LLMs:
Any AI system is particularly vulnerable to XSS, CSRF, SSRF, privilege escalation and remote code execution that could occur if plugins or backend functions accepts un-scrutinized LLM output. Preventing these vulnerabilities requires protections against a variety of security breaches, including unauthenticated access; actions by users that do not correspond to their authorization role; unauthorized user session interception; unauthorized interception or insertion of data; injection, XSS (Cross-site scripting); and other security breaches. Recommended protection measures include (but are not limited to) the following:
AI can fundamentally transform the insurance industry. Insurers that rush to implement AI that is not auditable or secure will expose themselves to legal and commercial risk and offer little more than a chat bot. Those that adopt systems capable of implementing the governance, security, orchestration and reliability frameworks outlined in this post, however, will guard themselves from those risks while gaining a significant competitive advantage.
[1] -NAIC Model Bulletin: Use of Artificial Intelligence Systems by Insurers: https://content.naic.org/sites/default/files/inline-files/2023-12-4%20Model%20Bulletin_Adopted_0.pdf
[2] Thinking Outside the (Black) Box (Engineering Responsible AI , #2: https://blog.palantir.com/thinking-outside-the-black-box-24d0c87ec8a5
[3] Note: Model drift is still possible due to systems-level subtleties on the model vendor side, so this is helpful but insufficient on its own.
[4] Note: This is necessary but insufficient on its own. Even at temperature 0, and without model drift, commercial models can return results non-deterministically as a function of real-time inferencing minutiae, such as batching, distribution, and ensemble/cascade techniques (which are largely abstracted from the consumer).
[5] Reducing Hallucinations with the Ontology in Palantir AIP (Engineering Responsible AI , #1): https://blog.palantir.com/reducing-hallucinations-with-the-ontology-in-palantir-aip-288552477383
[6] From Prototype to Production (Engineering Responsible AI, #3): https://blog.palantir.com/from-prototype-to-production-engineering-responsible-ai-3-ea18818cd222
[7] Evaluating Generative AI (Engineering Responsible AI, #4): https://blog.palantir.com/evaluating-generative-ai-a-field-manual-0cdaf574a9e1
Requirements for AI in Production in Insurance Underwriting was originally published in Palantir Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.
The End of the AI Safety DebateFor years, a passionate contingent of researchers, ethicists, and…
A new wave of AI-powered browser-use agents is emerging, promising to transform how enterprises interact…
Employees throughout the federal government have until 11:59pm ET Monday to detail five things they…
Researchers are blurring the lines between robotics and materials, with a proof-of-concept material-like collective of…
Be sure to check out the previous articles in this series: •
TL;DR We compared Grok 3 and o3-mini’s results on this topic. They both passed. Since…