Categories: FAANG

4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities

*Equal Contributors
Current multimodal and multitask foundation models like 4M or UnifiedIO show promising results, but in practice their out-of-the-box abilities to accept diverse inputs and perform diverse tasks are limited by the (usually rather small) number of modalities and tasks they are trained on. In this paper, we significantly expand upon the capabilities of 4M by training it on tens of highly diverse modalities and by performing co-training on large-scale multimodal datasets and text corpora. This includes training on several semantic and geometric modalities, feature maps from…
AI Generated Robotic Content

Recent Posts

Let’s Build a RAG-Powered Research Paper Assistant

In the era of generative AI, people have relied on LLM products such as ChatGPT…

18 hours ago

Supercharge your LLM performance with Amazon SageMaker Large Model Inference container v15

Today, we’re excited to announce the launch of Amazon SageMaker Large Model Inference (LMI) container…

18 hours ago

Google Cloud Database and LangChain integrations now support Go, Java, and JavaScript

Last year, Google Cloud and LangChain announced integrations that give generative AI developers access to…

18 hours ago

More accurate coding: Researchers adapt Sequential Monte Carlo for AI-generated code

Researchers from MIT, Yale, McGill University and others found that adapting the Sequential Monte Carlo…

19 hours ago

After Tesla’s Earnings Slide, Pressure’s On for Cybercab

The future of Elon Musk’s electric car company is murky. It may rest on Tesla’s…

19 hours ago

Robot see, robot do: System learns after watching how-to videos

Researchers have developed a new robotic framework powered by artificial intelligence -- called RHyME (Retrieval…

19 hours ago