Categories: FAANG

Cut Your Losses in Large-Vocabulary Language Models

As language models grow ever larger, so do their vocabularies. This has shifted the memory footprint of LLMs during training disproportionately to one single layer: the cross-entropy in the loss computation. Cross-entropy builds up a logit matrix with entries for each pair of input tokens and vocabulary items and, for small models, consumes an order of magnitude more memory than the rest of the LLM combined. We propose Cut Cross-Entropy (CCE), a method that computes the cross-entropy loss without materializing the logits for all tokens into global memory. Rather, CCE only computes the logit…
AI Generated Robotic Content

Recent Posts

3 Actionable AI Recommendations for Businesses in 2026

TL;DR In 2026, the businesses that win with AI will do three things differently: redesign…

11 hours ago

Revolutionizing Construction

How Cavanagh and Palantir Are Building Construction’s OS for the 21st CenturyEditor’s Note: This blog post…

1 day ago

Building a voice-driven AWS assistant with Amazon Nova Sonic

As cloud infrastructure becomes increasingly complex, the need for intuitive and efficient management interfaces has…

1 day ago

Cloud CISO Perspectives: Our 2026 Cybersecurity Forecast report

Welcome to the first Cloud CISO Perspectives for December 2025. Today, Francis deSouza, COO and…

1 day ago

As AI Grows More Complex, Model Builders Rely on NVIDIA

Unveiling what it describes as the most capable model series yet for professional knowledge work,…

1 day ago