Less Is More: A Unified Architecture for Device-Directed Speech Detection with Multiple Invocation Types
Suppressing unintended invocation of the device because of the speech that sounds like wake-word, or accidental button presses, is critical for a good user experience, and is referred to as False-Trigger-Mitigation (FTM). In case of multiple invocation options, the traditional approach to FTM is to use invocation-specific models, or a single model for all invocations. …