Swin Transformer supports 3-billion-parameter vision models that can train with higher-resolution images for greater task applicability
Swin Transformer, a Transformer-based general-purpose vision architecture, was further evolved to address challenges specific to large vision models. As a result, Swin Transformer is capable of training with images at higher resolutions, which allows for greater task applicability (left), and scaling models up to 3 billion parameters (right). Early last year, our research team from …