Affine-based Deformable Attention and Selective Fusion for Semi-dense Matching
This paper was accepted at the Image Matching: Local Features & Beyond workshop at CVPR 2024. Identifying robust and accurate correspondences across images is a fundamental problem in computer vision that enables various downstream tasks. Recent semi-dense matching methods emphasize the effectiveness of fusing relevant cross-view information through Transformer. In this paper, we propose several improvements upon this paradigm. Firstly, we introduce affine-based local attention to model cross-view deformations. Secondly, we present selective fusion to merge local and global messages from…
In embedding-matching acoustic-to-word (A2W) ASR, every word in the vocabulary is represented by a fixed-dimension embedding vector that can be added or removed independently of the rest of the system. The approach is potentially an elegant solution for the dynamic out-of-vocabulary (OOV) words problem, where speaker- and context-dependent named entities…
Recent isotropic networks, such as ConvMixer and vision transformers, have found significant success across visual recognition tasks, matching or outperforming non-isotropic convolutional neural networks (CNNs). Isotropic architectures are particularly well-suited to cross-layer weight sharing, an effective neural network compression technique. In this paper, we perform an empirical evaluation on methods…
Protein folding models have achieved groundbreaking results since the introduction of AlphaFold2, typically built via a combination of integrating domain-expertise into its architectural designs and training pipelines. Nonetheless, given the success of generative models across different but related problems, it is natural to question whether these architectural designs are a…