Machines Learn Better if We Teach Them the Basics

Imagine that your neighbor calls to ask a favor: Could you please feed their pet rabbit some carrot slices? Easy enough, you’d think. You can imagine their kitchen, even if you’ve never been there — carrots in a fridge, a drawer holding various knives. It’s abstract knowledge: You don’t know what your neighbor’s carrots and knives look like exactly, but you won’t take a spoon to a cucumber.

Artificial intelligence programs can’t compete. What seems to you like an easy task is a huge undertaking for current algorithms.

An AI-trained robot can find a specified knife and carrot hiding in a familiar kitchen, but in a different kitchen it will lack the abstract skills to succeed. “They don’t generalize to new environments,” said Victor Zhonga graduate student in computer science at the University of Washington. The machine fails because there’s simply too much to learn, and too vast a space to explore.

The problem is that these robots — and AI agents in general — don’t have a foundation of concepts to build on. They don’t know what a knife or a carrot really is, much less how to open a drawer, choose one and cut slices. This limitation is due in part to the fact that many advanced AI systems get trained with a method called reinforcement learning that’s essentially self-education through trial and error. AI agents trained with reinforcement learning can execute the job they were trained to do very well, in the environment they were trained to do it in. But change the job or the environment, and these systems will often fail.

To get around this limitation, computer scientists have begun to teach machines important concepts before setting them loose. It’s like reading a manual before using new software: You could try to explore without it, but you’ll learn far faster with it. “Humans learn through a combination of both doing and reading,” said Karthik Narasimhana computer scientist at Princeton University. “We want machines to do the same.”

New work from Zhong and others shows that priming a learning model in this way can supercharge learning in simulated environments, both online and in the real world with robots. And it doesn’t just make algorithms learn faster — it guides them toward skills they’d otherwise never learn. Researchers want these agents to become generalists, capable of learning anything from chess to shopping to cleaning. And as demonstrations become more practical, scientists think this approach might even change how humans can interact with robots.

“It’s been a pretty big breakthrough,” said Brian Ichter, a research scientist in robotics at Google. “It’s pretty unimaginable how far it’s come in a year and a half.”

Sparse Rewards

At first glance, machine learning has already been remarkably successful. Most models typically use reinforcement learningwhere algorithms learn by getting rewards. They begin totally ignorant, but trial and error eventually becomes trial and triumph. Reinforcement learning agents can easily master simple games.

Consider the video game Snake, where players control a snake that grows longer as it eats digital apples. You want your snake to eat the most apples, stay within the boundaries and avoid running into its increasingly bulky body. Such clear right and wrong outcomes give a well-rewarded machine agent positive feedback, so enough attempts can take it from “noob” to High Score.

But suppose the rules change. Perhaps the same agent must play on a larger grid and in three dimensions. While a human player could adapt quickly, the machine can’t, because of two critical weaknesses. First, the larger space means it takes longer for the snake to stumble upon apples, and learning slows exponentially when rewards become sparse. Second, the new dimension provides a totally new experience, and reinforcement learning struggles to generalize to new challenges.

Zhong says we don’t need to accept these obstacles. “Why is it that when we want to play chess” — another game that reinforcement learning has mastered — “we train a reinforcement learning agent from scratch?” Such approaches are inefficient. The agent wanders around aimlessly until it stumbles upon a good situation, such as a checkmate, and Zhong says it requires careful human design to get the agent to know what it means for a situation to be good. “Why do we have to do this when we already have so many books on how to play chess?”

Partly, it’s because machines have struggled to understand human language and decipher images in the first place. For a robot to complete vision-based tasks like finding and slicing carrots, for example, it must know what a carrot is — the image of a thing must be “grounded” in a more fundamental understanding of what that thing is. Until recently, there was no good way of doing that, but a boom in the speed and scale of language and image processing has made the new successes possible.

New natural language processing models allow machines to essentially learn the meaning behind words and sentences — to ground them in things in the world — rather than just store a simple (and limited) meaning like a digital dictionary.

Computer vision has seen a similar digital explosion. Around 2009, ImageNet debuted as a database of annotated images for computer vision research. Today it hosts over 14 million images of objects and places. And programs like OpenAI’s FROM·E generate new images upon command that look human-made, despite having no exact comparison to draw from.

It shows how machines only now have access to enough online data to really learn about the world, according to Anima Anandkumara computer scientist at the California Institute of Technology and Nvidia. And it’s a sign that they can learn from concepts as we do and use them for generation. “We are in such a great moment now,” she said. “Because once we can get generation, there is so much more we can do.”

Gaming the System

Researchers like Zhong decided machines didn’t have to embark on their explorations wholly uninformed anymore. Armed with sophisticated language models, the researchers could add a pre-training step where a program learned from online information before its trials and errors.

To test the idea, he and his colleagues compared the pre-training to traditional reinforcement learning in five different game-like settings where machine agents interpreted language commands to solve problems. Each simulated environment challenged the machine agent uniquely. One asked the agent to manipulate items in a 3D kitchen; another required reading text to learn a precise sequence of actions to fight monsters. But the most complicated setting was a real game, the 35-year-old NetHack, where the goal is to navigate a sophisticated dungeon to retrieve an amulet.

For the simple settings, automated pre-training meant simply grounding the important concepts: This is a carrot, that is a monster. For NetHack, the agent trained by watching humans play, using playthroughs uploaded to the internet by human players. These playthroughs didn’t even have to be that good — the agent only needed to build intuition for how humans behave. The agent wasn’t meant to become an expert, just a regular player. It would build intuition by watching — what would a human do in a given scenario? The agent would decide what moves were successful, formulating its own carrot and stick.

“Through pre-training, we form good priors for how to associate language descriptions with things that are happening in the world,” Zhong said. The agent would play better from the start and learn more quickly during subsequent reinforcement learning.

As a result, the pre-trained agent did outperform the traditionally trained one. “We get gains across the board in all five of these environments,” Zhong said. Simpler settings only showed a slight edge, but in NetHack’s complicated dungeons, the agent learned many times faster and reached a skill level that the classic approach couldn’t. “You might be getting a 10x performance because if you don’t do this, then you just don’t learn a good policy,” he said.

“These generalist agents are a big leap from what standard reinforcement learning does,” Anandkumar said.

Her team also pre-trains agents to get them to learn more quickly, achieving significant progress on the world’s bestselling video game, Minecraft. It’s known as a “sandbox” game, meaning it gives players a virtually infinite space in which to interact and create new worlds. It’s futile to program a reward function for thousands of tasks individually, so instead the team’s model (“Mine Dojo”) built its understanding of the game by watching captioned playthrough videos. No need to codify good behavior.

“We are getting automated reward functions,” Anandkumar said. “This is the first benchmark with thousands of tasks and the ability to do reinforcement learning with open-ended tasks specified through text prompts.”

Beyond Games

Games were a great way to show that pre-training models could work, but they’re still simplified worlds. Training robots to handle the real world, where the possibilities are practically endless, is much harder. “We asked the question: Is there something in between?” Narasimhan said. So he decided to do some online shopping.

His team created Online store. “It’s basically like a shopping butler,” Narasimhan said. Users can say something like “Give me a Nike shoe that’s white and under $100, and I want the reviews to state that they’re very comfortable for toddlers,” and the program finds and buys the shoe.

As with Zhong’s and Anandkumar’s games, WebShop developed an intuition by training with images and text, this time from Amazon pages. “Over time, it learns to understand the language and map it to actions it has to take on the website.”

At first glance, a shopping butler may not seem that futuristic. But while a cutting-edge chatbot can link you to a desired sneaker, interactions like placing the order require a wholly different skill set. And even though your bedside Alexa or Google Home speakers can place orders, they rely on proprietary software that carries out preordained tasks. WebShop navigates the web the way people do: by reading, typing and clicking.

“It’s a step closer toward general intelligence,” Narasimhan said.

Note: This article have been indexed to our site. We do not claim legitimacy, ownership or copyright of any of the content above. To see the article at original source Click Here

Related Posts
無人機MQ-4C、IFC-4量産機が米海軍へ初納入 EP-3E後継 thumbnail

無人機MQ-4C、IFC-4量産機が米海軍へ初納入 EP-3E後継

 ノースロップ・グラマンは、無人多用途機MQ-4C「トライトン」の最新マルチ情報収集仕様の量産機を米海軍に初納入した。「IFC-4 (Integrated Functional Capability Four:統合機能能力第4仕様)」と呼ばれるもので、高高度滞空型の洋上情報収集・警戒監視・偵察(ISR)プラットフォームとなる。 ノースロップ・グラマンMQ-4C(同社提供)  機体は現地時間2月1日にメリーランド州パタクセントリバー海軍航空基地に引き渡された。MQ-4Cは米空軍の無人偵察機RQ-4「グローバルホーク」の米海軍向けで、洋上ISRパトロール、信号情報、捜索・救助、通信中継など幅広い任務に対応する見込み。また、哨戒機P-8と連携して任務に就くことを想定している。  MQ-4Cは360度全周用AESA洋上レーダー、フル・モーションの光学・赤外線ビデオ・ストリーミング、高高度飛行能力、高い滞空性能、フル・スペクトラムの信号情報、艦艇・航空機・情報機関の地上局へ多種多様なデータを送ることができる複数の経路などを備える。  機内の搭載量を増やすために機体を強化し、ひょうやバードストライク、突風の荷重を防ぐ翼、除氷、雷保護システムも搭載し、厳しい海洋気象環境下でも降下や上昇ができる。  IFC-4仕様のMQ-4Cは、電子戦偵察機EP-3E「アリエス」が行っている情報収集任務を引き継ぐ見通し。 関連リンクNorthrop Grumman ・空自向けグローバルホークが初飛行 三沢配備の無人偵察機(21年4月20日) ・哨戒機P-8のいま 各国で導入、米海軍は102機に(20年7月28日)
Read More
What is an XLSX File and How to Open it? thumbnail

What is an XLSX File and How to Open it?

Genel olarak Microsoft Office 2007 ve önceki sürüm paketlerinde bulunan Excel programında hazırladığınız bir döküman .xlsx uzantılı bir XLSX dosyası olarak kaydedilir. Yeni versiyonlarda bu durum değişse bile eğer bilgisayarınızda yüklü bir Excel programı yoksa bu dosyayı açamazsınız. Gelin XLSX dosyası nedir, nasıl açılır tüm detaylarıyla görelim. Microsoft tarafından geliştirilen Office yazılım paketi, dünya çapında…
Read More
Delta-omicron hybrid variant identified for the first time thumbnail

Delta-omicron hybrid variant identified for the first time

Home News An artist's rendering of the new hybrid variant. (Image credit: Shutterstock) Scientists have confirmed the existence of a new COVID-19 variant that combines mutations from both omicron and delta variants for the first time, and there are reported cases in both Europe and the U.S.The new hybrid variant, unofficially dubbed "deltacron", was confirmed…
Read More
A 40 GB mobile plan at €9.99/month for life on the Bouygues Telecom network, the unbeatable offer of the new year thumbnail

A 40 GB mobile plan at €9.99/month for life on the Bouygues Telecom network, the unbeatable offer of the new year

Jusqu'au 04 janvier 2022, NRJ Mobile propose une promo forfait mobile très intéressante pour les nouveaux clients : 40 Go de data pour 9,99 €/mois et pas seulement la première année. L'offre est sans engagement et permettra de faire des économies sur vos factures de téléphone à vie. Si vous souhaitez profiter d'une importante enveloppe data…
Read More
Einstein Proven Right Yet Again: Theory of General Relativity Passes a Range of Precise Tests thumbnail

Einstein Proven Right Yet Again: Theory of General Relativity Passes a Range of Precise Tests

Researchers have conducted a 16-year long experiment to challenge Einstein’s theory of general relativity. The international team looked to the stars — a pair of extreme stars called pulsars to be precise – through seven radio telescopes across the globe. Credit: Max Planck Institute for Radio Astronomy The theory of general relativity passes a range…
Read More
Index Of News
Total
0
Share