Categories: FAANG

ReLU Strikes Back: Exploiting Activation Sparsity in Large Language Models

Large Language Models (LLMs) with billions of parameters have drastically transformed AI applications. However, their demanding computation during inference has raised significant challenges for deployment on resource-constrained devices. Despite recent trends favoring alternative activation functions such as GELU or SiLU, known for increased computation, this study strongly advocates for reinstating ReLU activation in LLMs. We demonstrate that using the ReLU activation function has a negligible impact on convergence and performance while significantly reducing computation and weight transfer…

Duo-LLM: A Framework for Studying Adaptive Computation in Large Language Models

This paper was accepted at the Efficient Natural Language and Speech Processing (ENLSP) Workshop at NeurIPS 2024. Large Language Models (LLMs) typically generate outputs token by token using a fixed compute budget, leading to inefficient resource utilization. To address this shortcoming, recent advancements in mixture of expert (MoE) models, speculative…

November 19, 2024

In "FAANG"

Reverse engineering the NTK: towards first-principles architecture design

September 2, 2022

In "AI/ML Research"

ExpertLens: Activation Steering Features Are Highly Interpretable

This paper was accepted at the Workshop on Unifying Representations in Neural Models (UniReps) at NeurIPS 2025. Activation steering methods in large language models (LLMs) have emerged as an effective way to perform targeted updates to enhance generated language without requiring large amounts of adaptation data. We ask whether the…

November 8, 2025

In "FAANG"

AI Generated Robotic Content