Efficient Video-Text Learning with Iterative Co-tokenization
Posted by AJ Piergiovanni and Anelia Angelova, Research Scientists, Google Research, Brain Team Video is an ubiquitous source of media content that touches on many aspects of people’s day-to-day lives. Increasingly, real-world video applications, such as video captioning, video content analysis, and video question-answering (VideoQA), rely on models that can connect video content with text …
Read more “Efficient Video-Text Learning with Iterative Co-tokenization”