Categories: FAANG

Multimodal Large Language Models with Fusion Low Rank Adaptation for Device Directed Speech Detection

Although Large Language Models (LLMs) have shown promise for human-like conversations, they are primarily pre-trained on text data. Incorporating audio or video improves performance, but collecting large-scale multimodal data and pre-training multimodal LLMs is challenging. To this end, we propose a Fusion Low Rank Adaptation (FLoRA) technique that efficiently adapts a pre-trained unimodal LLM to consume new, previously unseen modalities via low rank adaptation. For device-directed speech detection, using FLoRA, the multimodal LLM achieves 22% relative reduction in equal error rate (EER) over…
AI Generated Robotic Content

Recent Posts

The realism is getting out of hand

ComfyUI with ZIT submitted by /u/Ferwien [link] [comments]

2 hours ago

Tovala Family Meals Review: Good Food, Lots of Salt

Tovala is a meal kit that comes with a smart oven, or a smart oven…

3 hours ago

Open weight (and closed) Models with character sheet inputs

Now that we have some open weight models available to us that work with character…

1 day ago

Reinforced Agent: Inference-Time Feedback for Tool-Calling Agents

This paper was accepted at the Fifth Workshop on Natural Language Generation, Evaluation, and Metrics…

1 day ago

State of Routing in Model Serving

By Nipun Kumar, Rajat Shah, Peter ChngIntroductionThis is the first blog post in a multi-part series…

1 day ago

AWS Transform now automates BI migration to Amazon Quick in days

Migrating to Amazon Quick doesn’t have to mean starting from scratch. Your dashboards encode hard-won…

1 day ago