Categories: FAANG

Prompting Whisper for Improved Verbatim Transcription and End-to-end Miscue Detection

*Equal Contributors
Identifying mistakes (i.e., miscues) made while reading aloud is commonly approached post-hoc by comparing automatic speech recognition (ASR) transcriptions to the target reading text. However, post-hoc methods perform poorly when ASR inaccurately transcribes verbatim speech. To improve on current methods for reading error annotation, we propose a novel end-to-end architecture that incorporates the target reading text via prompting and is trained for both improved verbatim transcription and direct miscue detection. Our contributions include: first, demonstrating that…
AI Generated Robotic Content

Recent Posts

10 Ways to Use Embeddings for Tabular ML Tasks

Embeddings — vector-based numerical representations of typically unstructured data like text — have been primarily…

1 min ago

Over-Searching in Search-Augmented Large Language Models

Search-augmented large language models (LLMs) excel at knowledge-intensive tasks by integrating external retrieval. However, they…

1 min ago

How Omada Health scaled patient care by fine-tuning Llama models on Amazon SageMaker AI

This post is co-written with Sunaina Kavi, AI/ML Product Manager at Omada Health. Omada Health,…

1 min ago

Anthropic launches Cowork, a Claude Desktop agent that works in your files — no coding required

Anthropic released Cowork on Monday, a new AI agent capability that extends the power of…

1 hour ago

New Proposed Legislation Would Let Self-Driving Cars Operate in New York State

New York governor Kathy Hochul says she will propose a new law allowing limited autonomous…

1 hour ago

From brain scans to alloys: Teaching AI to make sense of complex research data

Artificial intelligence (AI) is increasingly used to analyze medical images, materials data and scientific measurements,…

1 hour ago