Efficient Multimodal Neural Networks for Trigger-less Voice Assistants
The adoption of multimodal interactions by Voice Assistants (VAs) is growing rapidly to enhance human-computer interactions. Smartwatches have now incorporated trigger-less methods of invoking VAs, such as Raise To Speak (RTS), where the user raises their watch and speaks to VAs without an explicit trigger. Current state-of-the-art RTS systems rely on heuristics and engineered Finite …
Read more “Efficient Multimodal Neural Networks for Trigger-less Voice Assistants”