Categories: FAANG

Cut Your Losses in Large-Vocabulary Language Models

As language models grow ever larger, so do their vocabularies. This has shifted the memory footprint of LLMs during training disproportionately to one single layer: the cross-entropy in the loss computation. Cross-entropy builds up a logit matrix with entries for each pair of input tokens and vocabulary items and, for small models, consumes an order of magnitude more memory than the rest of the LLM combined. We propose Cut Cross-Entropy (CCE), a method that computes the cross-entropy loss without materializing the logits for all tokens into global memory. Rather, CCE only computes the logit…
AI Generated Robotic Content

Recent Posts

Everyone Has Given Up on AI Safety, Now What?

The End of the AI Safety DebateFor years, a passionate contingent of researchers, ethicists, and…

9 hours ago

The rise of browser-use agents: Why Convergence’s Proxy is beating OpenAI’s Operator

A new wave of AI-powered browser-use agents is emerging, promising to transform how enterprises interact…

10 hours ago

Elon Musk Threatens FBI Agents and Air Traffic Controllers With Forced Resignation If They Don’t Respond to an Email

Employees throughout the federal government have until 11:59pm ET Monday to detail five things they…

10 hours ago

How to get a robot collective to act like a smart material

Researchers are blurring the lines between robotics and materials, with a proof-of-concept material-like collective of…

10 hours ago

Understanding RAG Part VI: Effective Retrieval Optimization

Be sure to check out the previous articles in this series: •

1 day ago

PR Agencies in the Age of AI

TL;DR We compared Grok 3 and o3-mini’s results on this topic. They both passed. Since…

1 day ago