Categories: FAANG

Prompting Whisper for Improved Verbatim Transcription and End-to-end Miscue Detection

*Equal Contributors
Identifying mistakes (i.e., miscues) made while reading aloud is commonly approached post-hoc by comparing automatic speech recognition (ASR) transcriptions to the target reading text. However, post-hoc methods perform poorly when ASR inaccurately transcribes verbatim speech. To improve on current methods for reading error annotation, we propose a novel end-to-end architecture that incorporates the target reading text via prompting and is trained for both improved verbatim transcription and direct miscue detection. Our contributions include: first, demonstrating that…
AI Generated Robotic Content

Recent Posts

Start Your Surround Sound Journey With $50 off This Klipsch Soundbar

This soundbar is just the beginning, with the option to add wireless bookshelf speakers or…

3 mins ago

Researchers pioneer next-generation AI semiconductors with ‘thermal constraining’ technique

A research team led by Professor Taesung Kim from the School of Mechanical Engineering at…

3 mins ago

3 Months later – Proof of concept for making comics with Krita AI and other AI tools

Some folks might remember this post I made a few short months ago where I…

23 hours ago

NASA Delays Launch of Artemis II Lunar Mission Once Again

A failure in the helium flow of the SLS rocket has prompted NASA to delay…

1 day ago

Jailbreaking the matrix: How researchers are bypassing AI guardrails to make them safer

A paper written by University of Florida Computer & Information Science & Engineering, or CISE,…

1 day ago

Turns out LTX-2 makes a very good video upscaler for WAN

I have had a lot of fun with LTX but for a lot of usecases…

2 days ago