Categories: AI/ML News

A method to interpret AI might not be so interpretable after all

As autonomous systems and artificial intelligence become increasingly common in daily life, new methods are emerging to help humans check that these systems are behaving as expected. One method, called formal specifications, uses mathematical formulas that can be translated into natural-language expressions. Some researchers claim that this method can be used to spell out decisions an AI will make in a way that is interpretable to humans.

ExpertLens: Activation Steering Features Are Highly Interpretable

This paper was accepted at the Workshop on Unifying Representations in Neural Models (UniReps) at NeurIPS 2025. Activation steering methods in large language models (LLMs) have emerged as an effective way to perform targeted updates to enhance generated language without requiring large amounts of adaptation data. We ask whether the…

November 8, 2025

In "FAANG"

The Interpretable AI playbook: What Anthropic’s research means for your enterprise LLM strategy

June 18, 2025

In "AI/ML News"

Building explainability into the components of machine-learning models

Explanation methods that help users understand and trust machine-learning models often describe how much certain features used in the model contribute to its prediction. For example, if a model predicts a patient’s risk of developing cardiac disease, a physician might want to know how strongly the patient’s heart rate data…

August 31, 2022