Categories: FAANG

Multimodal Large Language Models with Fusion Low Rank Adaptation for Device Directed Speech Detection

Although Large Language Models (LLMs) have shown promise for human-like conversations, they are primarily pre-trained on text data. Incorporating audio or video improves performance, but collecting large-scale multimodal data and pre-training multimodal LLMs is challenging. To this end, we propose a Fusion Low Rank Adaptation (FLoRA) technique that efficiently adapts a pre-trained unimodal LLM to consume new, previously unseen modalities via low rank adaptation. For device-directed speech detection, using FLoRA, the multimodal LLM achieves 22% relative reduction in equal error rate (EER) over…
AI Generated Robotic Content

Recent Posts

Lance by ByteDance: 3B Apache2 model for image and video understanding, generation, and editing

https://lance-project.github.io/ https://github.com/bytedance/Lance https://huggingface.co/bytedance-research/Lance submitted by /u/HatEducational9965 [link] [comments]

15 hours ago

Building Vector Similarity Search in PostgreSQL with pgvector

Search works well when users know exactly what they are looking for, but it breaks…

15 hours ago

Prompting Amazon Nova 2 for content moderation

If you moderate user-generated content at scale, you need a system that catches policy violations…

15 hours ago

These 11 Automatic Cat Feeders Were the Best We Tested in 2026

We tested some of the most popular automatic dry- and wet-food pet feeders to see…

16 hours ago

Audio cues can make AI feel more human, though some users may judge it as rude

Researchers at Carnegie Mellon University are investigating how humans respond to artificial intelligence agents that…

16 hours ago

NeuralCompanion

NeuralCompanion is an open-source, local-first AI companion project for people who like building, experimenting, and…

2 days ago