Categories: FAANG

Language models can explain neurons in language models

We use GPT-4 to automatically write explanations for the behavior of neurons in large language models and to score those explanations. We release a dataset of these (imperfect) explanations and scores for every neuron in GPT-2.

ExpertLens: Activation Steering Features Are Highly Interpretable

This paper was accepted at the Workshop on Unifying Representations in Neural Models (UniReps) at NeurIPS 2025. Activation steering methods in large language models (LLMs) have emerged as an effective way to perform targeted updates to enhance generated language without requiring large amounts of adaptation data. We ask whether the…

November 8, 2025

In "FAANG"

How can we tell if AI is lying? New method tests whether AI explanations are truthful

Given the recent explosion of large language models (LLMs) that can make convincingly human-like statements, it makes sense that there's been a deepened focus on developing the models to be able to explain how they make decisions. But how can we be sure that what they're saying is the truth?

June 6, 2025

In "AI/ML News"

Vertex AI Example-based Explanations improve ML via explainability

September 1, 2022

In "FAANG"

AI Generated Robotic Content

Next Is AI electricity or the telephone? »

Previous « Reshaping IT automation with IBM Watson Code Assistant

Share

Published by

AI Generated Robotic Content

Tags: ai/mlfaang

3 years ago

Recent Posts

AI/ML Research

5 Architectural Patterns for Persistent Memory and State in AI Agents

Memory & State For AI Agents Building an AI agent can be tricky. Keeping it…

14 hours ago

AI/ML Research

Teaching LLMs to Update Beliefs for Efficient Long-Horizon Interaction

Overview of ABBEL compared to traditional recursive summarization. Beliefs replace the full interaction history as…

14 hours ago

FAANG

GH-ESD: Grounded Hypothesis-Driven Error Slice Discovery for Instance-Level Vision Tasks

Systematic failures of vision models on semantically coherent subsets, known as error slices, reveal limitations…

14 hours ago

FAANG

AI Sovereignty is Your Alpha: How to Avoid Transferring Your Alpha to a Hosted Model Provider

Use of third party AI model services poses significant risk to your alpha. Without sovereign…

14 hours ago

FAANG

Beyond RAG: Task-aware knowledge compression for enterprise AI on AWS

If you’re using Retrieval-Augmented Generation (RAG) for complex analytical tasks that span hundreds of documents,…

14 hours ago

AI/ML News

France Records Its First-Ever Pyrocumulonimbus Cloud Amid Record-Smashing Fires

Extreme fire conditions on the ground have created unprecedented conditions in the atmosphere.

15 hours ago

L