Categories: FAANG

MixAtlas: Uncertainty-aware Data Mixture Optimization for Multimodal LLM Midtraining

This paper was accepted at the Workshop on Navigating and Addressing Data Problems for Foundation Models (NADPFM) at ICLR 2026.
Principled domain reweighting can substantially improve sample efficiency and downstream generalization; however, data-mixture optimization for multimodal pretraining remains underexplored. Current multimodal training recipes tune mixtures from only a single perspective such as data format or task type. We introduce MixAtlas, a principled framework for compute-efficient multimodal mixture optimization via systematic domain decomposition and smaller proxy models…

CtrlSynth: Controllable Image-Text Synthesis for Data-Efficient Multimodal Learning

Pretraining robust vision or multimodal foundation models (e.g., CLIP) relies on large-scale datasets that may be noisy, potentially misaligned, and have long-tail distributions. Previous works have shown promising results in augmenting datasets by generating synthetic samples. However, they only support domain-specific ad hoc use cases (e.g., either image or text…

October 24, 2024

In "FAANG"

SO-Bench: A Structural Output Evaluation of Multimodal LLMs

Multimodal large language models (MLLMs) are increasingly deployed in real-world, agentic settings where outputs must not only be correct, but also conform to predefined data schemas. Despite recent progress in structured generation in textual domain, there is still no benchmark that systematically evaluates schema-grounded information extraction and reasoning over visual…

December 6, 2025

In "FAANG"

Promoting Cross-Modal Representations to Improve Multimodal Foundation Models for Physiological Signals

Many healthcare applications are inherently multimodal, involving several physiological signals. As sensors for these signals become more common, improving machine learning methods for multimodal healthcare data is crucial. Pretraining foundation models is a promising avenue for success. However, methods for developing foundation models in healthcare are still in early exploration…

October 29, 2024

In "FAANG"

AI Generated Robotic Content

Next Python Decorators for Production Machine Learning Engineering »

Previous « Cost-efficient custom text-to-SQL using Amazon Nova Micro and Amazon Bedrock on-demand inference

Published by

AI Generated Robotic Content

Tags: ai/mlfaang

1 month ago

Using depth maps and weight noising to get better character LoRAs

A few weeks ago I introduced a new method for training style LoRAs which has…

56 mins ago

AI/ML Research

The Statistics of Token Selection: Logits, Temperature, and Top-P Walkthrough

When large language models, or LLMs for short, produce outputs, several criteria are at stake,…

56 mins ago

FAANG

Process financial documents using Amazon Bedrock Data Automation

Financial institutions process thousands of documents daily, including tax forms, loan statements, and purchase orders.…

56 mins ago

FAANG

Introducing Google AI Threat Defense to help you outpace the adversary

aside_block <ListValue: [StructValue([('title', 'Summary of today’s news'), ('body', <wagtail.rich_text.RichText object at 0x7f00683723a0>), ('btn_text', ''), ('href',…

1 hour ago