Categories: FAANG

TiC-LM: A Web-Scale Benchmark for Time-Continual LLM Pretraining

This paper was accepted to the ACL 2025 main conference as an oral presentation.
This paper was accepted at the Scalable Continual Learning for Lifelong Foundation Models (SCLLFM) Workshop at NeurIPS 2024.
Large Language Models (LLMs) trained on historical web data inevitably become outdated. We investigate evaluation strategies and update methods for LLMs as new data becomes available. We introduce a web-scale dataset for time-continual pretraining of LLMs derived from 114 dumps of Common Crawl (CC) – orders of magnitude larger than previous continual language modeling benchmarks. We also…

Projected Language Models: A Large Model Pre-Segmented Into Smaller Ones

This paper has been accepted at the Foundation Models in the Wild workshop at ICML 2024. Large language models are versatile tools but are not suitable for small inference budgets. Small models have more efficient inference but their lower capacity means that their performance can be good only if one…

July 18, 2024

In "FAANG"

Using LLMs for Late Multimodal Sensor Fusion for Activity Recognition

This paper was accepted at the Learning from Time Series for Health workshop at NeurIPS 2025. Sensor data streams provide valuable information around activities and context for downstream applications, though integrating complementary information can be challenging. We show that large language models (LLMs) can be used for late fusion for…

November 21, 2025

In "FAANG"

Language Models Improve When Pretraining Data Matches Target Tasks

Every data selection method inherently has a target. In practice, these targets often emerge implicitly through benchmark-driven iteration: researchers develop selection strategies, train models, measure benchmark performance, then refine accordingly. This raises a natural question: what happens when we make this optimization explicit? To explore this, we propose benchmark-targeted ranking…

July 19, 2025

In "FAANG"

AI Generated Robotic Content