Categories: FAANG

Multimodal Large Language Models with Fusion Low Rank Adaptation for Device Directed Speech Detection

Although Large Language Models (LLMs) have shown promise for human-like conversations, they are primarily pre-trained on text data. Incorporating audio or video improves performance, but collecting large-scale multimodal data and pre-training multimodal LLMs is challenging. To this end, we propose a Fusion Low Rank Adaptation (FLoRA) technique that efficiently adapts a pre-trained unimodal LLM to consume new, previously unseen modalities via low rank adaptation. For device-directed speech detection, using FLoRA, the multimodal LLM achieves 22% relative reduction in equal error rate (EER) over…
AI Generated Robotic Content

Recent Posts

New changes at CivitAI

submitted by /u/Enshitification [link] [comments]

6 hours ago

A Theoretical Framework for Acoustic Neighbor Embeddings

This paper provides a theoretical framework for interpreting acoustic neighbor embeddings, which are representations of…

6 hours ago

Understanding Amazon Bedrock model lifecycle

Amazon Bedrock regularly releases new foundation model (FM) versions with better capabilities, accuracy, and safety.…

6 hours ago

Guardrails at the gateway: Securing AI inference on GKE with Model Armor

Enterprises are rapidly moving AI workloads from experimentation to production on Google Kubernetes Engine (GKE),…

6 hours ago

OpenAI Backs Bill That Would Limit Liability for AI-Enabled Mass Deaths or Financial Disasters

The ChatGPT-maker testified in favor of an Illinois bill that would limit when AI labs…

7 hours ago