Categories: FAANG

Cut Your Losses in Large-Vocabulary Language Models

As language models grow ever larger, so do their vocabularies. This has shifted the memory footprint of LLMs during training disproportionately to one single layer: the cross-entropy in the loss computation. Cross-entropy builds up a logit matrix with entries for each pair of input tokens and vocabulary items and, for small models, consumes an order of magnitude more memory than the rest of the LLM combined. We propose Cut Cross-Entropy (CCE), a method that computes the cross-entropy loss without materializing the logits for all tokens into global memory. Rather, CCE only computes the logit…
AI Generated Robotic Content

Recent Posts

New LTX is insane. Made a short horror in time for Halloween (flashing images warning)

I mainly used I2V. Used several models for the images. Some thoughts after working on…

17 hours ago

7 Machine Learning Projects to Land Your Dream Job in 2026

machine learning continues to evolve faster than most can keep up with.

17 hours ago

SEMORec: A Scalarized Efficient Multi-Objective Recommendation Framework

Recommendation systems in multi-stakeholder environments often require optimizing for multiple objectives simultaneously to meet supplier…

17 hours ago

Reduce CAPTCHAs for AI agents browsing the web with Web Bot Auth (Preview) in Amazon Bedrock AgentCore Browser

AI agents need to browse the web on your behalf. When your agent visits a…

17 hours ago

Why IT leaders should pay attention to Canva’s ‘imagination era’ strategy

The rise of AI marks a critical shift away from decades defined by information-chasing and…

18 hours ago

Giant Home Depot Skeletons Are on Crazy Sale Right Now (2025)

I covet big animatronic skeletons for no good reason. Finally, I can justify the impulse…

18 hours ago