Categories: FAANG

Cut Your Losses in Large-Vocabulary Language Models

As language models grow ever larger, so do their vocabularies. This has shifted the memory footprint of LLMs during training disproportionately to one single layer: the cross-entropy in the loss computation. Cross-entropy builds up a logit matrix with entries for each pair of input tokens and vocabulary items and, for small models, consumes an order of magnitude more memory than the rest of the LLM combined. We propose Cut Cross-Entropy (CCE), a method that computes the cross-entropy loss without materializing the logits for all tokens into global memory. Rather, CCE only computes the logit…
AI Generated Robotic Content

Recent Posts

Wan 2.2 human image generation is very good. This open model has a great future.

submitted by /u/yomasexbomb [link] [comments]

20 hours ago

Your First Containerized Machine Learning Deployment with Docker and FastAPI

Deploying machine learning models can seem complex, but modern tools can streamline the process.

20 hours ago

Mistral-Small-3.2-24B-Instruct-2506 is now available on Amazon Bedrock Marketplace and Amazon SageMaker JumpStart

Today, we’re excited to announce that Mistral-Small-3.2-24B-Instruct-2506—a 24-billion-parameter large language model (LLM) from Mistral AI…

20 hours ago

AI vs. AI: Prophet Security raises $30M to replace human analysts with autonomous defenders

Prophet Security raises $30 million to launch a fully autonomous AI cybersecurity platform that investigates…

21 hours ago

To explore AI bias, researchers pose a question: How do you imagine a tree?

To confront bias, scientists say we must examine the ontological frameworks within large language models—and…

21 hours ago

Be honest: How realistic is my new vintage AI lora?

No workflow since it's only a WIP lora. submitted by /u/I_SHOOT_FRAMES [link] [comments]

2 days ago