Categories: FAANG

Do LLMs Estimate Uncertainty Well in Instruction-Following?

Large language models (LLMs) could be valuable personal AI agents across various domains, provided they can precisely follow user instructions. However, recent studies have shown significant limitations in LLMs’ instruction-following capabilities, raising concerns about their reliability in high-stakes applications. Accurately estimating LLMs’ uncertainty in adhering to instructions is critical to mitigating deployment risks. We present, to our knowledge, the first systematic evaluation of uncertainty estimation abilities of LLMs in the context of instruction-following. Our study identifies…

MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs

We introduce MIA-Bench, a new benchmark designed to evaluate multimodal large language models (MLLMs) on their ability to strictly adhere to complex instructions. Our benchmark comprises a diverse set of 400 image-prompt pairs, each crafted to challenge the models' compliance with layered instructions in generating accurate responses that satisfy specific…

February 26, 2025

In "FAANG"

MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs

July 9, 2024

In "FAANG"

Self-reflective Uncertainties: Do LLMs Know Their Internal Answer Distribution?

This paper was accepted at the Workshop on Reliable and Responsible Foundation Models (RRFMs) Workshop at ICML 2025. Uncertainty quantification plays a pivotal role when bringing large language models (LLMs) to end-users. Its primary goal is that an LLM should indicate when it is unsure about an answer it gives.…

July 10, 2025

In "FAANG"

AI Generated Robotic Content