Categories: FAANG

Multimodal Large Language Models with Fusion Low Rank Adaptation for Device Directed Speech Detection

Although Large Language Models (LLMs) have shown promise for human-like conversations, they are primarily pre-trained on text data. Incorporating audio or video improves performance, but collecting large-scale multimodal data and pre-training multimodal LLMs is challenging. To this end, we propose a Fusion Low Rank Adaptation (FLoRA) technique that efficiently adapts a pre-trained unimodal LLM to consume new, previously unseen modalities via low rank adaptation. For device-directed speech detection, using FLoRA, the multimodal LLM achieves 22% relative reduction in equal error rate (EER) over…
AI Generated Robotic Content

Recent Posts

Agentic Workflow vs. Autonomous Agent: What’s the Difference?

In this article, you will learn how to distinguish agentic workflows from autonomous agents by…

6 hours ago

Retrofit, don’t rebuild: Agentic overlays for transforming legacy enterprise services

The opinions expressed in this post are the authors’ views and not those of Cisco.…

6 hours ago

Anthropic Thinks Its Own Success Is Key to Making AI Safe

Anthropic's critics argue it's rapidly accumulating power. The company says that's what responsible AI development…

7 hours ago

Agentic AI bot helps scientists speak to robots, speeding up experiments

Researchers at the Department of Energy's Pacific Northwest National Laboratory use a slew of autonomous…

7 hours ago

Context Windows Are Not Memory: What AI Agent Developers Need to Understand

In this article, you will learn why a large context window is not the same…

1 day ago

Huntington Bank: Redacting sensitive data from 400M+ documents with AWS

When your document repository contains hundreds of millions of files accumulated over nearly a decade,…

1 day ago