Categories: FAANG

ExpertLens: Activation Steering Features Are Highly Interpretable

This paper was accepted at the Workshop on Unifying Representations in Neural Models (UniReps) at NeurIPS 2025.
Activation steering methods in large language models (LLMs) have emerged as an effective way to perform targeted updates to enhance generated language without requiring large amounts of adaptation data. We ask whether the features discovered by activation steering methods are interpretable. We identify neurons responsible for specific concepts (e.g., “cat”) using the “finding experts” method from research on activation steering and show that the ExpertLens, i.e., inspection of these…
AI Generated Robotic Content

Recent Posts

This Is a Weapon of Choice (Wan2.2 Animate)

I used a workflow from here: https://github.com/IAMCCS/comfyui-iamccs-workflows/tree/main Specifically this one: https://github.com/IAMCCS/comfyui-iamccs-workflows/blob/main/C_IAMCCS_NATIVE_WANANIMATE_LONG_VIDEO_v.1.json submitted by /u/sutrik [link]…

16 hours ago

Expert-Level Feature Engineering: Advanced Techniques for High-Stakes Models

Building machine learning models in high-stakes contexts like finance, healthcare, and critical infrastructure often demands…

16 hours ago

Introducing agent-to-agent protocol support in Amazon Bedrock AgentCore Runtime

We recently announced the support for Agent-to-Agent (A2A) protocol on Amazon Bedrock AgentCore Runtime. With…

16 hours ago

BigQuery under the hood: How Google brought embeddings to analytics

Embeddings are a crucial component at the intersection of data and AI. As data structures,…

16 hours ago

Baidu just dropped an open-source multimodal AI that it claims beats GPT-5 and Gemini

Baidu Inc., China's largest search engine company, released a new artificial intelligence model on Monday…

17 hours ago

The Nike x Hyperice Hyperboot Is $200 Off

Nike’s high-end recovery sneakers are on sale—just in time for ski season.

17 hours ago