Categories: FAANG

TiC-LM: A Web-Scale Benchmark for Time-Continual LLM Pretraining

This paper was accepted to the ACL 2025 main conference as an oral presentation.
This paper was accepted at the Scalable Continual Learning for Lifelong Foundation Models (SCLLFM) Workshop at NeurIPS 2024.
Large Language Models (LLMs) trained on historical web data inevitably become outdated. We investigate evaluation strategies and update methods for LLMs as new data becomes available. We introduce a web-scale dataset for time-continual pretraining of LLMs derived from 114 dumps of Common Crawl (CC) – orders of magnitude larger than previous continual language modeling benchmarks. We also…
AI Generated Robotic Content

Recent Posts

MiniMax-M2 is the new king of open source LLMs (especially for agentic tool calling)

Watch out, DeepSeek and Qwen! There's a new king of open source large language models…

51 mins ago

Elon Musk’s Grokipedia Pushes Far-Right Talking Points

The new AI-powered Wikipedia competitor falsely claims that pornography worsened the AIDS epidemic and that…

51 mins ago

Beyond electronics: Optical system performs feature extraction with unprecedented low latency

Many modern artificial intelligence (AI) applications, such as surgical robotics and real-time financial trading, depend…

51 mins ago

Chroma Radiance, Mid training but the most aesthetic model already imo

submitted by /u/Different_Fix_2217 [link] [comments]

24 hours ago

From human clicks to machine intent: Preparing the web for agentic AI

For three decades, the web has been designed with one audience in mind: People. Pages…

1 day ago

Best GoPro Camera (2025): Compact, Budget, Accessories

You’re an action hero, and you need a camera to match. We guide you through…

1 day ago