Rethinking JEPA: Compute-Efficient Video SSL with Frozen Teachers

Video Joint Embedding Predictive Architectures (V-JEPA) learn generalizable off-the-shelf video representation by predicting masked regions in latent space with an exponential moving average (EMA)-updated teacher. While EMA prevents representation collapse, it complicates scalable model selection and couples teacher and student architectures. We revisit masked-latent prediction and show that a frozen teacher suffices. Concretely, we (i) …

vxceed cpg arch 1 1024x503 1

Vxceed builds the perfect sales pitch for sales teams at scale using Amazon Bedrock

This post was co-written with Cyril Ovely from Vxceed. Consumer packaged goods (CPG) companies face a critical challenge in emerging economies: how to effectively retain revenue and grow customer loyalty at scale. Although these companies invest 15–20% of their revenue in trade promotions and retailer loyalty programs, the uptake of these programs has historically remained …

2 build use integrate max 1000x1000 1

Want to get building production-ready AI agents? Here’s where startups should start.

Startups are using agentic AI to automate complex workflows, create novel user experiences, and solve business problems that were once considered technically impossible. Still, charting the optimal path forward — especially with the integration of AI agents — often presents significant technical complexity To help startups navigate this new landscape, we’re launching our Startup technical …

New memory framework builds AI agents that can handle the real world’s unpredictability

Researchers at the University of Illinois Urbana-Champaign and Google Cloud AI Research have developed a framework that enables large language model (LLM) agents to organize their experiences into a memory bank, helping them get better at complex tasks over time. The framework, called ReasoningBank, distills “generalizable reasoning strategies” from an agent’s successful and failed attempts …

Stable Diffusion Models are Secretly Good at Visual In-Context Learning

Large language models (LLM) in natural language processing (NLP) have demonstrated great potential for in-context learning (ICL) — the ability to leverage a few sets of example prompts to adapt to various tasks without having to explicitly update the model weights. ICL has recently been explored for computer vision tasks with promising early outcomes. These …