Categories: FAANG

4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities

*Equal Contributors
Current multimodal and multitask foundation models like 4M or UnifiedIO show promising results, but in practice their out-of-the-box abilities to accept diverse inputs and perform diverse tasks are limited by the (usually rather small) number of modalities and tasks they are trained on. In this paper, we significantly expand upon the capabilities of 4M by training it on tens of highly diverse modalities and by performing co-training on large-scale multimodal datasets and text corpora. This includes training on several semantic and geometric modalities, feature maps from…
AI Generated Robotic Content

Recent Posts

Unintended consequences: U.S. election results herald reckless AI development

AI accelerationists have won as a consequence of the election, potentially sidelining those advocating for…

3 mins ago

L’Oreal Professionnel AirLight Pro Review: Faster, Lighter, and Repairable

L'Oréal's first professional hair dryer combines infrared light, wind, and heat to drastically reduce your…

3 mins ago

Can “Safe AI” Companies Survive in an Unrestrained AI Landscape?

TL;DR A conversation with 4o about the potential demise of companies like Anthropic. As artificial…

23 hours ago

Large language overkill: How SLMs can beat their bigger, resource-intensive cousins

Whether a company begins with a proof-of-concept or live deployment, they should start small, test…

1 day ago

14 Best Planners: Weekly and Daily Notebooks & Accessories (2024)

Digital tools are not always superior. Here are some WIRED-tested agendas and notebooks to keep…

1 day ago

5 Tools for Visualizing Machine Learning Models

Machine learning (ML) models are built upon data.

2 days ago