In high-stakes situations like health care—or weeknight “Jeopardy!”—it can be safer to say “I don’t know” than to answer incorrectly. Doctors, game show contestants, and standardized test-takers understand this, but most artificial intelligence applications still prefer to give a potentially wrong answer rather than admit uncertainty.
A team of AI researchers at Universitat Politècnica de València, in Spain, has found that as popular LLMs (Large Language Models) grow larger and more sophisticated, they become less likely to admit to a user that they do not know an answer.
This paper was accepted at the Workshop on Reliable and Responsible Foundation Models (RRFMs) Workshop at ICML 2025. Uncertainty quantification plays a pivotal role when bringing large language models (LLMs) to end-users. Its primary goal is that an LLM should indicate when it is unsure about an answer it gives.…
OpenAI researchers have introduced a novel method that acts as a "truth serum" for large language models (LLMs), compelling them to self-report their own misbehavior, hallucinations and policy violations. This technique, "confessions," addresses a growing concern in enterprise AI: Models can be dishonest, overstating their confidence or covering up the…