Categories: FAANG

RT-2: New model translates vision and language into action

Introducing Robotic Transformer 2 (RT-2), a novel vision-language-action (VLA) model that learns from both web and robotics data, and translates this knowledge into generalised instructions for robotic control, while retaining web-scale capabilities. This work builds upon Robotic Transformer 1 (RT-1), a model trained on multi-task demonstrations which can learn combinations of tasks and objects seen in the robotic data. RT-2 shows improved generalisation capabilities and semantic and visual understanding, beyond the robotic data it was exposed to. This includes interpreting new commands and responding to user commands by performing rudimentary reasoning, such as reasoning about object categories or high-level descriptions.
AI Generated Robotic Content

Recent Posts

The Essential Calvin & Hobbes – FLUX.2 Klein 9b Base -> 4x upscaler

submitted by /u/AreaFifty1 [link] [comments]

20 hours ago

Building a Context Pruning Pipeline for Long-Running Agents

Modern AI agents built on top of large language models (LLMs) are designed to run…

20 hours ago

Training Azerbaijani language models on Amazon SageMaker AI

This solution builds on open source tools including PyTorch, Hugging Face Transformers, and Liger Kernels.…

20 hours ago

AI in SRE: Where and how Google is deploying agentic AI to improve operations

Since its inception over 20 years ago, Google has used Site Reliability Engineering (SRE) to…

20 hours ago

The GOP’s Attacks on James Talarico Are Straight Out of the Incel Handbook

Claims about low testosterone and false accusations of veganism might play well to the online…

21 hours ago

Filtering out humanity: AI-assisted internet research favors cold logic over ethos and pathos

Is the internet losing its soul? A collaborative study by UC Riverside computer and social…

21 hours ago