Categories: FAANG

Integrating Categorical Features in End-To-End ASR

All-neural, end-to-end ASR systems gained rapid interest from the speech recognition community. Such systems convert speech input to text units using a single trainable neural network model. E2E models require large amounts of paired speech text data that is expensive to obtain. The amount of data available varies across different languages and dialects. It is critical to make use of all these data so that both low resource languages and high resource languages can be improved. When we want to deploy an ASR system for a new application domain, the amount of domain specific training data is…
AI Generated Robotic Content

Recent Posts

Just tried animating a Pokémon TCG card with AI – Wan 2.2 blew my mind

Hey folks, I’ve been playing around with animating Pokémon cards, just for fun. Honestly I…

23 hours ago

Busted by the em dash — AI’s favorite punctuation mark, and how it’s blowing your cover

AI is brilliant at polishing and rephrasing. But like a child with glitter glue, you…

24 hours ago

Scientists Have Identified the Origin of an Extraordinarily Powerful Outer Space Radio Wave

In March 2025 the Earth was hit by a fast radio burst as energetic as…

24 hours ago

Robots can now learn to use tools—just by watching us

Despite decades of progress, most robots are still programmed for specific, repetitive tasks. They struggle…

24 hours ago

Sharing that workflow [Remake Attempt]

I took a stab at recreating that person's work but including a workflow. Workflow download…

2 days ago

SlowFast-LLaVA-1.5: A Family of Token-Efficient Video Large Language Models for Long-Form Video Understanding

We introduce SlowFast-LLaVA-1.5 (abbreviated as SF-LLaVA-1.5), a family of video large language models (LLMs) offering…

2 days ago