Categories: FAANG

Prompting Whisper for Improved Verbatim Transcription and End-to-end Miscue Detection

*Equal Contributors
Identifying mistakes (i.e., miscues) made while reading aloud is commonly approached post-hoc by comparing automatic speech recognition (ASR) transcriptions to the target reading text. However, post-hoc methods perform poorly when ASR inaccurately transcribes verbatim speech. To improve on current methods for reading error annotation, we propose a novel end-to-end architecture that incorporates the target reading text via prompting and is trained for both improved verbatim transcription and direct miscue detection. Our contributions include: first, demonstrating that…
AI Generated Robotic Content

Recent Posts

Teaching the model: Designing LLM feedback loops that get smarter over time

How to close the loop between user behavior and LLM performance, and why human-in-the-loop systems…

16 hours ago

I Tried the Best At-Home Pet DNA Test Kits on My Two Cats (2025)

I sent my cats' saliva to the lab to get health and genetic insights sent…

16 hours ago

Wan LoRa that creates hyper-realistic people just got an update

The Instagirl Wan LoRa was just updated to v2.3. It was retrained to be better…

2 days ago

Vibe Coding is Shoot-and-Forget Coding

TL;DR Vibe coding is great for quick hacks; lasting software still needs real engineers. Vibe…

2 days ago

Scaling On-Prem Security at Palantir

How Insight, Foundry & Apollo Keep Thousands of Servers in CheckIntroductionWhen it comes to Palantir’s on-premises…

2 days ago

Introducing Amazon Bedrock AgentCore Gateway: Transforming enterprise AI agent tool development

To fulfill their tasks, AI Agents need access to various capabilities including tools, data stores,…

2 days ago