Categories: FAANG

Cut Your Losses in Large-Vocabulary Language Models

As language models grow ever larger, so do their vocabularies. This has shifted the memory footprint of LLMs during training disproportionately to one single layer: the cross-entropy in the loss computation. Cross-entropy builds up a logit matrix with entries for each pair of input tokens and vocabulary items and, for small models, consumes an order of magnitude more memory than the rest of the LLM combined. We propose Cut Cross-Entropy (CCE), a method that computes the cross-entropy loss without materializing the logits for all tokens into global memory. Rather, CCE only computes the logit…
AI Generated Robotic Content

Recent Posts

RELEASE – The model you’ve all been waiting for – Smartphone Snapshot Photo Reality v13 – OMEGA

This is a LoRA for FLUX Klein Base 9b. **Link: https://civitai.red/models/2381927/flux2-klein-base-9b-smartphone-snapshot-photo-reality-style** All infos on how…

22 hours ago

Asus Zenbook A16 (2026) Review: Savor the Power, Ignore the Beige

This $2,000 Asus laptop delivers breathtaking performance thanks to Qualcomm's Snapdragon X2 Elite Extreme, but…

23 hours ago

The realism is getting out of hand

ComfyUI with ZIT submitted by /u/Ferwien [link] [comments]

2 days ago

Tovala Family Meals Review: Good Food, Lots of Salt

Tovala is a meal kit that comes with a smart oven, or a smart oven…

2 days ago

Open weight (and closed) Models with character sheet inputs

Now that we have some open weight models available to us that work with character…

3 days ago

Reinforced Agent: Inference-Time Feedback for Tool-Calling Agents

This paper was accepted at the Fifth Workshop on Natural Language Generation, Evaluation, and Metrics…

3 days ago