Categories: FAANG

Robustness in Multimodal Learning under Train-Test Modality Mismatch

Multimodal learning is defined as learning over multiple heterogeneous input modalities such as video, audio, and text. In this work, we are concerned with understanding how models behave as the type of modalities differ between training and deployment, a situation that naturally arises in many applications of multimodal learning to hardware platforms. We present a multimodal robustness framework to provide a systematic analysis of common multimodal representation learning methods. Further, we identify robustness short-comings of these approaches and propose two intervention techniques leading…

Visatronic: A Multimodal Decoder-Only Model for Speech Synthesis

The rapid progress of foundation models and large language models (LLMs) has fueled significantly improvement in the capabilities of machine learning systems that benefit from mutlimodal input data. However, existing multimodal models are predominantly built on top of pre-trained LLMs, which can limit accurate modeling of temporal dependencies across other…

July 14, 2025

In "FAANG"

Scaling Laws for Native Multimodal Models

Building general-purpose models that can effectively perceive the world through multimodal signals has been a long-standing goal. Current approaches involve integrating separately pre-trained components, such as connecting vision encoders to LLMs and continuing multimodal training. While such approaches exhibit remarkable sample efficiency, it remains an open question whether such late-fusion…

April 17, 2025

In "FAANG"

SeMAnD: Self-Supervised Anomaly Detection in Multimodal Geospatial Datasets

*= Equal Contributors We propose a Self-supervised Anomaly Detection technique, called SeMAnD, to detect geometric anomalies in Multimodal geospatial datasets. Geospatial data comprises acquired and derived heterogeneous data modalities that we transform to semantically meaningful, image-like tensors to address the challenges of representation, alignment, and fusion of multimodal data. SeMAnD…

November 9, 2023

In "FAANG"

AI Generated Robotic Content

Next Generative AI for Insurance: How Insurers Are Using It »

Previous « Accelerating AI & Innovation: the future of banking depends on core modernization

Published by

AI Generated Robotic Content

Tags: ai/mlfaang

3 years ago

Brad Pitt casts Elliot for Achilles – an Ai acting performance experiment

I am putting most of my efforts to achieve more realistic Ai acting with natural…

1 hour ago

AI/ML News

New light-based switch could cut chip energy use and speed future AI photonics

Photonic devices are hardware systems that can process information using light instead of electricity. These…

2 hours ago

Image

Microsoft Lens First Tests: It’s Pretty Decent! – ComfyUI Native Support About to Be Merged

Model weights: https://huggingface.co/Comfy-Org/Lens PR: https://github.com/Comfy-Org/ComfyUI/pull/14077 You'll need to git the merge pull request if you're…

1 day ago