Categories: FAANG

Multimodal Large Language Models with Fusion Low Rank Adaptation for Device Directed Speech Detection

Although Large Language Models (LLMs) have shown promise for human-like conversations, they are primarily pre-trained on text data. Incorporating audio or video improves performance, but collecting large-scale multimodal data and pre-training multimodal LLMs is challenging. To this end, we propose a Fusion Low Rank Adaptation (FLoRA) technique that efficiently adapts a pre-trained unimodal LLM to consume new, previously unseen modalities via low rank adaptation. For device-directed speech detection, using FLoRA, the multimodal LLM achieves 22% relative reduction in equal error rate (EER) over…
AI Generated Robotic Content

Recent Posts

HappyHorse 1.0, four shot anime sequence with character consistency across cuts

Multi shot consistency was the test I cared about. Same girl across four cuts in…

20 mins ago

Automate repetitive tasks with Amazon Quick Flows

Consider a typical Monday morning: you’re manually copying data from several different systems to create…

20 mins ago

Some Musk v. Altman Jurors Don’t Like Elon Musk

Musk’s lawsuit challenges OpenAI’s evolution under Sam Altman. But during jury selection, several potential jurors…

1 hour ago

Are you addicted to your AI chatbot? It might be by design

AI chatbots can grant almost any request—a celebrity in love with you, a research assistant,…

1 hour ago

GooglyEyes IC-LoRA for LTX2.3 released!

It's exactly as dumb and as it looks and sounds; slap googly eyes on anyone.…

1 day ago