Categories: FAANG

Prompting Whisper for Improved Verbatim Transcription and End-to-end Miscue Detection

*Equal Contributors
Identifying mistakes (i.e., miscues) made while reading aloud is commonly approached post-hoc by comparing automatic speech recognition (ASR) transcriptions to the target reading text. However, post-hoc methods perform poorly when ASR inaccurately transcribes verbatim speech. To improve on current methods for reading error annotation, we propose a novel end-to-end architecture that incorporates the target reading text via prompting and is trained for both improved verbatim transcription and direct miscue detection. Our contributions include: first, demonstrating that…
AI Generated Robotic Content

Recent Posts

Day off work, went to see what models are on civitai (tensor art is now defunct, no adult content at all allowed)

So any alternatives or is it VPN buying time? submitted by /u/mrgreaper [link] [comments]

18 hours ago

Image Augmentation Techniques to Boost Your CV Model Performance

In this article, you will learn: • the purpose and benefits of image augmentation techniques…

18 hours ago

10 Critical Mistakes that Silently Ruin Machine Learning Projects

Machine learning projects can be as exciting as they are challenging.

18 hours ago

Build an intelligent eDiscovery solution using Amazon Bedrock Agents

Legal teams spend bulk of their time manually reviewing documents during eDiscovery. This process involves…

18 hours ago

Your guide to taking an open model from discovery to a production-ready endpoint on Vertex AI

Developers building with gen AI are increasingly drawn to open models for their power and…

18 hours ago

Meta announces its Superintelligence Labs Chief Scientist: former OpenAI GPT-4 co-creator Shengjia Zhao

The move underscores Meta’s strategy of spending aggressively now to secure a dominant position in…

19 hours ago