Categories: FAANG

Multimodal Large Language Models with Fusion Low Rank Adaptation for Device Directed Speech Detection

Although Large Language Models (LLMs) have shown promise for human-like conversations, they are primarily pre-trained on text data. Incorporating audio or video improves performance, but collecting large-scale multimodal data and pre-training multimodal LLMs is challenging. To this end, we propose a Fusion Low Rank Adaptation (FLoRA) technique that efficiently adapts a pre-trained unimodal LLM to consume new, previously unseen modalities via low rank adaptation. For device-directed speech detection, using FLoRA, the multimodal LLM achieves 22% relative reduction in equal error rate (EER) over…
AI Generated Robotic Content

Recent Posts

Open weight (and closed) Models with character sheet inputs

Now that we have some open weight models available to us that work with character…

1 hour ago

Reinforced Agent: Inference-Time Feedback for Tool-Calling Agents

This paper was accepted at the Fifth Workshop on Natural Language Generation, Evaluation, and Metrics…

1 hour ago

State of Routing in Model Serving

By Nipun Kumar, Rajat Shah, Peter ChngIntroductionThis is the first blog post in a multi-part series…

1 hour ago

AWS Transform now automates BI migration to Amazon Quick in days

Migrating to Amazon Quick doesn’t have to mean starting from scratch. Your dashboards encode hard-won…

1 hour ago

Waymo Is Trying to Crack Down on Solo Kids in Driverless Cars

As adult riders report new age-verification checks, the self-driving car company says it’s continuing to…

2 hours ago

A new type of optical chip cuts static power while enabling electrical reprogramming

As technology advances, and the demand for faster, higher-bandwidth, and more energy-efficient data processing continues…

2 hours ago