Categories: FAANG

MM-Ego: Towards Building Egocentric Multimodal LLMs

This research aims to comprehensively explore building a multimodal foundation model for egocentric video understanding. To achieve this goal, we work on three fronts. First, as there is a lack of QA data for egocentric video understanding, we automatically generate 7M high-quality QA samples for egocentric videos ranging from 30 seconds to one hour long in Ego4D based on human-annotated data. This is one of the largest egocentric QA datasets. Second, we contribute a challenging egocentric QA benchmark with 629 videos and 7,026 questions to evaluate the models’ ability in recognizing and…
AI Generated Robotic Content

Recent Posts

SamsungCam UltraReal – Qwen-Image LoRA

Hey everyone, Just dropped the first version of a LoRA I've been working on: SamsungCam…

6 hours ago

40 Best Early Amazon Prime Day Deals on WIRED-Tested Gear (2025)

Amazon Prime Day is back, starting on October 7, but we’ve already found good deals…

7 hours ago

These little robots literally walk on water

HydroSpread, a breakthrough fabrication method, lets scientists build ultrathin soft robots directly on water. These…

7 hours ago

VHS filters work great with AI footage (WAN 2.2 + NTSC-RS)

submitted by /u/mtrx3 [link] [comments]

1 day ago

Algorithm Showdown: Logistic Regression vs. Random Forest vs. XGBoost on Imbalanced Data

Imbalanced datasets are a common challenge in machine learning.

1 day ago

Unlock global AI inference scalability using new global cross-Region inference on Amazon Bedrock with Anthropic’s Claude Sonnet 4.5

Organizations are increasingly integrating generative AI capabilities into their applications to enhance customer experiences, streamline…

1 day ago