Categories: FAANG

TiC-LM: A Web-Scale Benchmark for Time-Continual LLM Pretraining

This paper was accepted to the ACL 2025 main conference as an oral presentation.
This paper was accepted at the Scalable Continual Learning for Lifelong Foundation Models (SCLLFM) Workshop at NeurIPS 2024.
Large Language Models (LLMs) trained on historical web data inevitably become outdated. We investigate evaluation strategies and update methods for LLMs as new data becomes available. We introduce a web-scale dataset for time-continual pretraining of LLMs derived from 114 dumps of Common Crawl (CC) – orders of magnitude larger than previous continual language modeling benchmarks. We also…
AI Generated Robotic Content

Recent Posts

When you forget to include “Masterpiece” in your prompt.

submitted by /u/Riverlong [link] [comments]

3 hours ago

AI Agent Memory Explained in 3 Levels of Difficulty

A stateless AI agent has no memory of previous calls.

3 hours ago

Can Large Language Models Understand Context?

Understanding context is key to understanding human language, an ability which Large Language Models (LLMs)…

3 hours ago

From developer desks to the whole organization: Running Claude Cowork in Amazon Bedrock

Today, we’re excited to announce Claude Cowork in Amazon Bedrock. You can now run Cowork…

3 hours ago

From keynote to the terminal: Join our Next ‘26 developer livestreams

The main stage at Google Cloud Next is where the vision is set. This year,…

3 hours ago

Framework Has a Better, More Take-Apartable Laptop

The company announced its new Framework Laptop 13 Pro, along with updates to its 16-inch…

4 hours ago