Categories: FAANG

Prompting Whisper for Improved Verbatim Transcription and End-to-end Miscue Detection

*Equal Contributors
Identifying mistakes (i.e., miscues) made while reading aloud is commonly approached post-hoc by comparing automatic speech recognition (ASR) transcriptions to the target reading text. However, post-hoc methods perform poorly when ASR inaccurately transcribes verbatim speech. To improve on current methods for reading error annotation, we propose a novel end-to-end architecture that incorporates the target reading text via prompting and is trained for both improved verbatim transcription and direct miscue detection. Our contributions include: first, demonstrating that…
AI Generated Robotic Content

Recent Posts

MiniMax-M2 is the new king of open source LLMs (especially for agentic tool calling)

Watch out, DeepSeek and Qwen! There's a new king of open source large language models…

46 mins ago

Elon Musk’s Grokipedia Pushes Far-Right Talking Points

The new AI-powered Wikipedia competitor falsely claims that pornography worsened the AIDS epidemic and that…

46 mins ago

Beyond electronics: Optical system performs feature extraction with unprecedented low latency

Many modern artificial intelligence (AI) applications, such as surgical robotics and real-time financial trading, depend…

46 mins ago

Chroma Radiance, Mid training but the most aesthetic model already imo

submitted by /u/Different_Fix_2217 [link] [comments]

24 hours ago

From human clicks to machine intent: Preparing the web for agentic AI

For three decades, the web has been designed with one audience in mind: People. Pages…

1 day ago

Best GoPro Camera (2025): Compact, Budget, Accessories

You’re an action hero, and you need a camera to match. We guide you through…

1 day ago