Now versatile and flexible, robots must adapt to increasingly complex working conditions. Machine learning and simulation can respond to certain industry challenges, explains Stéphane Doncieux, professor of computer science at Sorbonne University and deputy director of the Institute of Intelligent Systems and Robotics (Isir) in this technical brief.
Robots are more and more versatile and could perform many tasks, including in our immediate environment in contact with users or simple passers-by. This is possible in particular thanks to new adaptive and less dangerous actuators, but this theoretical capacity remains difficult to transform into concrete applications. One of the main obstacles is linked to the variability of the tasks and environments with which such robots would be confronted. The more important it is, the more difficult the task of the robot designers, because they must anticipate the situations that may arise. Recent learning methods can be of great help to them by automating, at least partially, the creation of appropriate behaviors.
Dealing with the unexpected is one of the key issues of the robotics. It is approached in different ways, for example by designing a robust and redundant mechanical system, or by installing a controller capable of following a setpoint while rejecting certain external disturbances (such as the skidding of a land robot or the gust of wind that would suffer a drone). In these different approaches, the desired behavior of the robot is known.
In an uncontrolled environment, a robot can be confronted with various situations which are sometimes difficult to anticipate for its designers and in which the desired behavior is not always known. Thus, robot vacuums which have been on the market for more than twenty years continue to find themselves sometimes in situations that they cannot handle, despite the mechanical improvements made over the years. In this case, the human user must intervene. The problem is that such a robot will continue to hang in the same situation indefinitely, forcing the user to adapt. Recent advances in machine learning can make robot behavior more adaptive while making it easier for robot designers.
1 . The limits of supervised learning
Machine learning is a very vast field which covers many methods. It consists in automatically modifying the behavior of a program on the basis of external information and of the past behavior of this program, in other words of the way in which it reacted. There are several types of learning. The most widely used today is supervised learning (fig. 1). In this paradigm, the program is told how it should behave by presenting it with numerous examples of input data and associated outputs.
In the case of image recognition, for example, many images are supplied to the system by first indicating to it the object which is represented. Labels – like “cat”, “boat”, “mountain”… – are manually associated with the images. This phase gradually modifies the program so that it finds the associated label (s). This is what deep learning does with artificial neural networks. These networks are themselves programs, endowed with a large number of small elementary computing units connected to each other.
Fig. 1. A human provides the program with a set of pre-labeled data. The program trains itself to recognize information (images, sounds, etc.) and must generate the correct label. If it is false, i.e. different from the database supplied by the supervisor, the program corrects itself automatically until it gives the correct answer.
Supervised learning therefore makes modifications to the parameters of this neural network by taking into account its errors to that he does not reproduce them again in the future. The whole point of this method is that if the network is trained with enough data, it will also give a correct answer for similar inputs. It is this method that is used by Gafa and that explains why data is so important to these companies.
This type of learning does not apply well to robotics because a robot is a closed system: its perceptions determine its actions, which in turn will determine its future perceptions. The slightest deviation in behavior will therefore be amplified and risk placing the robot in a context that is too different from what is present in its base of examples. This method would involve creating a colossal database of examples, which is generally out of reach, especially since it is necessary to indicate precisely what the robot should do in each case.
2. Lighten the data
Another method makes it possible to overcome this constraint linked to data: learning by reinforcement. It consists in looking for the behavior allowing to maximize a reward. Behavior is determined by a function called a policy, which associates with a given state the action to be taken to maximize the reward over a certain time horizon. In this context, the state of the system is the information necessary for it to be able to decide.
Reinforcement learning therefore aims to build this policy for a given system. To do this, he explores the different possibilities and uses this experience to discover, reproduce and improve reward-maximizing behaviors. It is a form of trial-and-error learning: we try to apply a certain action from a given state, we observe the result obtained and we infer whether it was a good idea to do so in this context. This need to explore differentiates these learning methods from planning methods which involve knowing the consequences of an action without needing to test it. It is therefore at the heart of reinforcement learning.
There are several families of reinforcement learning algorithms (fig. 2), which differ in the way they are managed. politics and how to use the experience gained. We can first of all seek to associate a value with a state or with a state-action pair. This value is representative of the reward that can be expected by going through it. A policy can then be easily built by choosing, for a given state, the action which will lead to the greatest value. Assigning the right value to a state or to a state-action pair therefore amounts to estimating its contribution to the reward that can be expected. It is a credit assignment problem. Let us take an example where a robot must navigate in a maze to find the exit (position in which it will obtain the reward), then the previous positions can be assigned a value a little lower, those before a value even a little lower … If the value has been propagated among all the possibilities, when the robot arrives at an intersection, it knows which direction to adopt: the one associated with the most important value because it is this which will lead the most directly to the exit.
Fig. 2. Reinforcement learning involves testing a large number of possibilities and evaluating them against a reward to be obtained. This is all the greater as the action approaches the set objective. It is the most suitable apprenticeship for robotics.
A second family of algorithms does not seek to learn value functions and transition models determining in which state an action will lead. These approaches consist in representing the policy as a parameterized function of the state. The action is then calculated directly from the current state of the robot. The choice of the function used to represent the policy is very important since it will define what the robot will be able to do. Different functions are used depending on the application and can be
Note: This article have been indexed to our site. We do not claim ownership or copyright of any of the content above. To see the article at original source Click Here