Google DeepMind’s new RT-2 system enables robots to perform novel tasks

Abstract robot AI being tested
Andriy Onufriyenko/Getty Images

As artificial intelligence advances, we look to a future with more robots and automations than ever before. They already surround us — the robot vacuum that can expertly navigate your home, a robot pet companion to entertain your furry friends, and robot lawnmowers to take over weekend chores. We appear to be inching towards living out The Jetsons in real life. But as smart as they appear, these robots have their limitations.

Google DeepMind unveiled RT-2, the first vision-language-action (VLA) model for robot control, which effectively takes the robotics game several levels up. The system was trained on text data and images from the internet, much like the large language models behind AI chatbots like ChatGPT and Bing are trained. 

Also: How researchers broke ChatGPT and what it could mean for future AI development

Our robots at home can operate simple tasks they are programmed to perform. Vacuum the floors, for example, and if the left-side sensor detects a wall, try to go around it. But traditional robotic control systems aren’t programmed to handle new situations and unexpected changes — often, they can’t perform more than one task at a time. 

RT-2 is designed to adapt to new situations over time, learn from multiple data sources like the web and robotics data to understand both language and visual input, and perform tasks it has never encountered nor been trained to perform.

“A visual-language model (VLM) pre-trained on web-scale data is learning from RT-1 robotics data to become RT-2, a visual-language-action (VLA) model that can control a robot,” from Google DeepMind.

Google DeepMind

A traditional robot can be trained to pick up a ball and stumble when picking up a cube. RT-2’s flexible approach enables a robot to train on picking up a ball and can figure out how to adjust its extremities to pick up a cube or another toy it’s never seen before. 

Instead of the time-consuming, real-world training on billions of data points that traditional robots require, where they have to physically recognize an object and learn how to pick it up, RT-2 is trained on a large amount of data and can transfer that knowledge into action, performing tasks it’s never experienced before. 

Also: Can AI detectors save us from ChatGPT? I tried 5 online tools to find out

“RT-2’s ability to transfer information to actions shows promise for robots to more rapidly adapt to novel situations and environments,” said Vincent Vanhoucke, Google DeepMind’s head of robotics. “In testing RT-2 models in more than 6,000 robotic trials, the team found that RT-2 functioned as well as our previous model, RT-1, on tasks in its training data, or ‘seen’ tasks. And it almost doubled its performance on novel, unseen scenarios to 62% from RT-1’s 32%.”

Some of the examples of RT-2 at work were published by Google DeepMind.

Some of the examples of RT-2 at work that were published by Google DeepMind.

Google DeepMind/ZDNET

The DeepMind team adapted two existing models, Pathways Language and Image Model (PaLI-X) and Pathways Language Model Embodied (PaLM-E), to train RT-2. PaLI-X helps the model process visual data, trained on massive amounts of images and visual information with other corresponding descriptions and labels online. With PaLI-X, RT-2 can recognize different objects, understand its surrounding scenes for context, and relate visual data to semantic descriptions.

PaLM-E helps RT-2 interpret language, so it can easily understand instructions and relate them to what is around it and what it’s currently doing. 

Also: The best AI chatbots

As the DeepMind team adapted these two models to work as the backbone for RT-2, it created the new VLA model, enabling a robot to understand language and visual data and subsequently generate the appropriate actions it needs. 

RT-2 is not a robot in itself — it’s a model that can control robots more efficiently than ever before. An RT-2-enabled robot can perform tasks ranging in degrees of complexity using visual and language data, like organizing files alphabetically by reading the labels on the documents and sorting them, then putting them away in the correct places. 

It could also handle complex tasks. For instance, if you said, “I need to mail this package, but I’m out of stamps,” RT-2 could identify what needs to be done first, like finding a Post Office or merchant that sells stamps nearby, take the package, and handle the logistics from there. 

Also: What is Google Bard? Here’s everything you need to know

“Not only does RT-2 show how advances in AI are cascading rapidly into robotics, it shows enormous promise for more general-purpose robots,” Vanhoucke added. 

Let’s hope that ‘promise’ leans more towards living out The Jetsons’ plot than The Terminator’s. 

Artificial Intelligence

Note: This article have been indexed to our site. We do not claim legitimacy, ownership or copyright of any of the content above. To see the article at original source Click Here

Related Posts
Meizu is back in business with three senior smartphones thumbnail

Meizu is back in business with three senior smartphones

הספקתם להתגעגע? המותג שבעבר נתפס כאיום אפשרי על שיאומי חוזר אחרי הפוגה קצרה עם שלישיית דגמים שאפתנית תחת השם Meizu 18 חלפו כמה שנים מאז הזמנים בהם ניתן היה למצוא מכשירים חכמים מתוצרת Meizu בעשרות חנויות שונות כאן בישראל, במחירים שלעתים גרמו גם לסמסונג ול-Xiaomi להחוויר – אבל אנחנו ממשיכים לעקוב אחרי החידושים של החברה…
Read More
How will AI impact your industry? Pew Research has answers thumbnail

How will AI impact your industry? Pew Research has answers

Yuichiro Chino/Getty ImagesA big concern about AI is how it will impact the workforce, specifically its potential to replace humans. However, AI is likely to affect some industries more than others, depending on the nature of the role. The Pew Research Center analyzed federal data to learn which workers and industries are the most likely to
Read More
Tunic review — Clever like a fox thumbnail

Tunic review — Clever like a fox

GamesBeat Summit 2022 returns with its largest event for leaders in gaming on April 26-28th. Reserve your spot here! When I first booted up Tunic, it took me a few minutes to understand its intentions. When I took control of the player character, an adorable fox, I instinctively waited for the game to give me an…
Read More
This 165Hz Samsung gaming monitor is $100 off thumbnail

This 165Hz Samsung gaming monitor is $100 off

Image: Samsung There’s nothing worse than a monitor that stutters and lags during a high-stakes game. If you’re looking to upgrade your monitor, then you’re in luck, as we’ve got an awesome deal on tap for you today. Samsung’s currently selling the Samsung Odyssey G50A gaming monitor for $349.99, which is a savings of $100.
Read More
Index Of News
Total
0
Share