Categories: FAANG

Affine-based Deformable Attention and Selective Fusion for Semi-dense Matching

This paper was accepted at the Image Matching: Local Features & Beyond workshop at CVPR 2024.
Identifying robust and accurate correspondences across images is a fundamental problem in computer vision that enables various downstream tasks. Recent semi-dense matching methods emphasize the effectiveness of fusing relevant cross-view information through Transformer. In this paper, we propose several improvements upon this paradigm. Firstly, we introduce affine-based local attention to model cross-view deformations. Secondly, we present selective fusion to merge local and global messages from…

Taming Text-to-Sounding Video Generation via Advanced Modality Condition and Interaction

This study focuses on Text-to-Sounding-Video (T2SV) generation, which aims to generate a video with synchronized audio from text, with both modalities aligned to the text conditions. Despite progress in joint audio-video training, two critical challenges remain: (1) text conditioning is a bottleneck—shared captions (TV=TA) trigger modal interference, while a gap…

July 8, 2026

In "FAANG"

Improvements to Embedding-Matching Acoustic-to-Word ASR Using Multiple-Hypothesis Pronunciation-Based Embeddings

In embedding-matching acoustic-to-word (A2W) ASR, every word in the vocabulary is represented by a fixed-dimension embedding vector that can be added or removed independently of the rest of the system. The approach is potentially an elegant solution for the dynamic out-of-vocabulary (OOV) words problem, where speaker- and context-dependent named entities…

March 7, 2023

In "FAANG"

SPIN: An Empirical Evaluation on Sharing Parameters of Isotropic Networks

Recent isotropic networks, such as ConvMixer and vision transformers, have found significant success across visual recognition tasks, matching or outperforming non-isotropic convolutional neural networks (CNNs). Isotropic architectures are particularly well-suited to cross-layer weight sharing, an effective neural network compression technique. In this paper, we perform an empirical evaluation on methods…

October 18, 2022

In "FAANG"

AI Generated Robotic Content