Robotics and Intelligence Systems >> Behavior-Based Robotics Tutorial >> Robot Learning

Behavior-Based Robotics

Robot Learning

Robots that can learn and adapt are excellent evidence for machine intelligence. Robots can be taught concepts, motor skills, how to use their own sensors, how to gather information, navigate, express emotion and even how to teach themselves. Learning includes a rich variety of techniques that use previous experience to enable more effective performance. Although this definition excludes robots that acquire information without using it, we are still left with a rather nebulous term. In fact, when it comes to designing adaptive behavior, the hardest problem is often the choice of learning techniques. Each task requires a different form of learning. Some industrial robots must perform the same action, the same way, every day. While such a robot may benefit from an ability to learn this action as efficiently as possible, it may never need to learn a new motion. Other robots, such as a mobile office assistant, may need to learn new information and skills every day. For each task, a designer must decide precisely what should be learned, when learning should occur, the computational means to implement learning, and how much a priori knowledge should be supplied.

What Should be Learned?

Historically, robots for industrial purposes involved little or no learning. Recently, a growing interest in unstructured environments has encouraged learning intensive design methodologies. The emerging class of robots must be able to interact responsively with people and other robots providing assistance and service that will increasingly affect everyday life. Although united in its need for learning, this new class of robots is anything but homogeneous. Robots may learn by adjusting parameters, building environmental models such as maps, exploiting patterns, evolving rule sets, generating entire behaviors, devising new strategies, predicting environmental changes, recognizing the strategies of opponents or exchanging knowledge with other robots. Researchers have produced robots that can even adapt their own physical structure. At the Laboratoire de Robotique de Paris, Chocron and Bidaud have produced a virtual robot that can reconfigure its own componentry for changing conditions. For instance, when the robot finds itself on treacherous, craggy surfaces it may evolve a six-legged structure, whereas on flat ground it may simply use three legs and extend wheels. (Chocron and Bidaud 1999)

To say that a robot can learn actually communicates very little about the robot or control strategy. On one extreme, robots may merely fine-tune already hard-coded behavior. On the other, there are robots that write their own programs from scratch using a randomly generated pool of binary numbers. Still others may exchange programs or portions of programs with other robots to produce coevolutionary learning. Although there are countless variations, the most interesting learning occurs when the robot can devise its own approach from the bottom up. As Lisa Meeden points out, truly adaptive robots should “determine the relevance of sensory information to tasks, synthesize this sensory information and compose behaviors without ever being programmed for such interactions.” (Meeden 1996) Some systems are even able to learn the optimal learning strategy for a given task.

When Should Learning Occur?

After deciding the extent of learning necessary, another significant question is whether learning should occur online while accomplishing a task in the real world or offline in a simulated environment. For some tasks, such as collecting rock samples from a distant planet, a robot may have plenty of time to learn new strategies for traversing the planet’s surface. The ability to adapt online may be crucial for helping the robot deal with unforeseen situations. For time-critical tasks such as military engagements, however, a robot may not have much time to learn new strategies. Such a robot may need to go through a preliminary developmental phase where important maneuvers are learned once and for all. Many researchers use a hybrid approach. The designer begins the learning process offline, where early development can be constrained safely and efficiently. Once the solution space of possible controllers has been reduced, the designer can then transfer the evolved code to a physical robot in which the strategy can be optimized. At the Navy Center for Applied Research in Artificial Intelligence, Shultz and Grefenstette have worked toward ‘continuous learning,’ where the robot is equipped with an onboard simulator which it can use to test out new strategies online. Schultz and Grefenstette report that this capability can be very helpful for allowing robots to adapt to hardware failure and new domains. (Schultz 1994)

How is Learning Implemented?

Now that we have considered learning at a high level, it remains to discuss the computational means. Increasingly, researchers have adopted hybrid learning strategies which attempt to blend learning methods once viewed as distinct. Despite the fact that their boundaries can (and should) be blurred in places, we will adopt the following four classifications:

Artificial Neural Networks
A supervised, learning-with-a-trainer approach where knowledge is learned by adjusting weights between nodes of a neural network.
Reinforcement Learning
An unsupervised, learning-with-a-critic approach where mappings from percepts to actions are learned inductively through trial and error.
Evolutionary Learning
An unsupervised, learning-with-a-critic approach where controllers are derived deductively by alterations to an initial population of program code.
Learning by Imitation
A design methodology which uses a biologically inspired developmental paradigm to enable learning by emulation.

Artificial Neural Networks

Artificial neural networks (ANNs) are algorithms based very loosely on the neural phenomenon of spreading activation. Guided by reinforcement given during training episodes, ANNs can encode knowledge and skill implicitly as associative connections between nodes. Stimulation introduced to the input nodes of a neural network travels between layers of the network to produce some output on the other end. This output is evaluated by a trainer who applies reinforcement to alter the weights of synaptic connections and thereby change the way the network will respond. In this way, ANNs allow human knowledge and guidance to orchestrate the learning process. Such techniques, where high-level reinforcement is applied by a knowledgeable teacher, are often referred to as robot shaping. Usually robot shaping involves symbolic interaction between a human and robot and may even involve a high-level tasking language. Such an interface allows humans to supply high-level assistance and allows the robot to accomplish the low-level learning necessary to achieve the goal. For example, Sharkey, Heemkerk, and Neary (1996) used a supervised learning strategy to teach goal-finding and obstacle avoidance to a Nomad 200 mobile robot. Initial functionality was provided by pre-wired behaviors. While the robot began navigating the room, a human trainer intervened with a joystick whenever a danger was imminent. During the process, an ANN controller learned to generalize obstacle avoidance and goal-seeking behavior based on the stimulation produced by the joystick control. Sharkey, et al, believe their approach provides safe, reliable training much like that given by adult animals to their young.

Unlike many other learning methods, neural networks simulate the distributed, parallel nature of computation in the brain. Like the brain, ANNs allow skills and knowledge to be conditioned as implicit generalizations of repeated experience. Originally, our understanding of behavioral conditioning comes from Pavlov’s 1929 experiments where dogs who were given meat when a bell was rung learned to salivate even when bell ringing was unaccompanied by food. While ANNs can effectively model such classical stimulus-response conditioning, the analogy between biological learning and ANNs falls apart for several reasons. The first is that ANNs are only rudimentary models of biological brains. We would need a neural network of neural networks before any artificial model could emulate the recursive power of the human brain. Unlike an ANN where the structure of the network is usually fixed, the brain’s highly integrated, well-ordered structure emerges through competition between separately evolving collectives of neurons. Neuroscientists believe that recursive mappings between these clusters provides the brain with optimal complexity — a perfect balance between neural segregation and coherence.

Skeptics argue that ANNs’ lack of such an architecture prohibits meta-level learning — the ability to not only generalize, but also extend acquired knowledge beyond the frontiers of experience. These critics maintain that while ANNs are a step in the right direction (away from symbolic learning), they simply encode repeated stimulation and are only as good as the training set they are given. Although ANNs have failed to model cognitive capacities of the human cortex, they offer a truly unique and effective way to encode motor skills and low-level behavior. It may be that, like the cerebellum and other, older structures of the brain, ANNs can provide a foundation on which higher level learning can be built. In any case, they have provided powerful insight into understanding both machine and biological learning. Whatever else we may say, we must acknowledge that they have been instrumental in moving us from a focus on linear, symbolic learning toward parallel, distributed computation.

Reinforcement Learning

Robots can also learn using an automated critic that guides development by reinforcing or ‘punishing’ actions (or intended actions). Here we use the term ‘reinforcement learning’ to refer specifically to an unsupervised, learning-by-critic approach where each state of the environment bestows a scalar value used to reward or punish the robot’s progress. The boon of unsupervised learning is not only increased autonomy, but also the fact that it does not require either the robot or a teacher to have explicit, a priori knowledge of how the task should be accomplished. The goal of reinforcement learning is simply to devise some mapping from perceptual states to actions that will maximize the total reward. The difficulty is that although reinforcement learning does not require the optimal outcome to already be known, it does require a critic that can effectively mete out reward. Instead of directly coding some mapping from perceptions to actions, the designer must program the critic, deciding carefully how it will assign reward. As Leslie Kaelbling points out, reinforcement learning does not remove the burden of programming from the designer, but simply changes its nature. (Kaelbling 1993) The task of programming an effective critic is far from trivial, especially since learning systems must often traverse many low-reward states in order to arrive at a truly optimal solution. A fundamental challenge for RL is how to create a robot that can guide its own learning process, choosing actions not only to receive immediate reward, but also to explore toward globally optimal solutions.

Although reinforcement learning is an effective way to learn behavior in a simulator, the need for environmental conditions to be accounted for within the reward function makes it easier to use RL in simulation than in the real world. Nonetheless, RL is used frequently in structured, real-world environments, especially to refine motor skills such as the ability to turn a given number of degrees. While high-fidelity action is trivial within a simulated environment, the inaccuracy of real motors and sensors begs for the adaptive capacity enabled by RL. Some researchers have used RL for more complex real-world learning tasks including walking gaits for a six-legged robot (Maes and Brooks 1990), pushing behavior (Mahadevan and Connell 1991), and shooting a ball into a goal (Asada et al. 1995).

Evolutionary Learning

Evolutionary computing is a term used to indicate a group of unsupervised learning methods where behavior is evolved deductively from a randomly generated or seeded population of candidate controllers. Evolutionary methods draw their inspiration from natural processes of biological evolution. A form of reproduction called crossover is used which involves selecting fit individuals and combining them to produce successive generations of controllers. Most evolutionary strategies use syntactic or semantic constraints to limit the way individuals can be combined. To insure the probability that each generation is more fit than the previous, a critic is used that evaluates each individual in a population and assigns it a measure of overall fitness. As with reinforcement learning, one of the fundamental problems is the need to avoid premature convergence to local maxima. To prompt more thorough exploration of the solution space, mutation is used to alter controllers in a purely random fashion. Although counter-intuitive, this desultory feature harnesses the creative potential of chaos — an integral component in the biological recipe for life. Unfortunately, it is already computationally expensive to evaluate entire populations and mutation only makes matters worse. Every designer using evolutionary approaches must face this tension between the need for efficiency and optimality. In search of real-time control, most designers have had to settle for a “good enough” approach.

Evolutionary computing includes a variety of computational systems including genetic algorithms (GAs), evolutionary strategies (ESs), learning classifier systems (CSs), and genetic programming (GP). The oldest form of evolutionary computing is the genetic algorithm, conceived in the 1960s by John Holland. Low-level strings of 1s and 0s provide a gene pool of chromosomes. These strings are evaluated by a fitness function, which selects strings to be split and recombined. Other strings experience bit mutations where a single 1 or 0 is changed.

Learning classifier systems use GAs to evolve rule sets that dictate behavior. A single rule may be of the form:

IF situation=danger AND mission = critical THEN perform high-risk maneuver.”

However, to be used by a classifier system, the rule must be encoded into a format which the GA can operate on. Consequently, the rule might look more like the following:

IF 1111 AND 0011 THEN 0000 1010” or simply 1111001100001010

Beginning with a population of rules (usually generated at random), learning classifier systems relate recurring input patterns to environmental conditions and eventually induce an appropriate mapping of perceptions to actions. Whereas GAs and LCSs use bits as genes and bit strings as chromosomes, Genetic Programming (GP) functions at a higher level, using linguistic elements or even entire programs as the basic atoms in the evolutionary process. As one might expect, different evolutionary learning techniques are appropriate for different needs. So far, evolutionary methods have been successfully used to learn behavior necessary for tasks such as navigation, maze-learning, pursuit-evasion routines, box-pushing, herding, reaching and grasping. Increasingly, researchers have begun to blend evolutionary approaches with other learning techniques so that, for instance, a genetic algorithm might generate starting values for neural networks that control a grab reflex. At still another level, genetic programming might produce test scenarios to train the robot arm.

Learning by Imitation

Imitative Learning is more an approach than a specific computational means. Theoretically it might involve any of the means described above. Here it is characterized separately because it represents a unique methodology based on the notion that robots can be shown rather than told how to behave. Within this vein, MIT has pioneered a new class of robots that can learn by emulating behavior they observe. Eventually, a human should be able to demonstrate actions and supply auditory and visual cues to help the robot correctly perceive the instructions. A useful application is robots that can take instructions from a military commander. The robot would be able to respond to natural gestures and verbal instructions to provide services that cannot be otherwise be rendered without endangering human life. These services might involve delivering correspondences or scouting dangerous areas. Unfortunately, even a seemingly simple task such as filling a gas tank is far from trivial for a robot trying to learn through emulation. A robot equipped with precision actuators allowing it to reach, grip and manipulate objects is still faced with the need to decompose the task into: removing the cap, inserting the nozzle, squeezing the handle, waiting for the tank to fill, removing the nozzle, etc. Only after it has identified these subtasks can it map them to low-level behaviors it already knows such as reaching and grasping.

For imitative learning techniques to succeed, the robot must have some way of knowing which aspects of the environment it should attend to and precisely which actions it should try to reproduce. For instance, the robot should not imitate a cough or an itch when being shown how to turn a crank. Researchers have found that social cues, such as nodding or eye contact which we often take for granted, are crucial in correctly guiding imitative learning. To enable accurate perception of such cues, researchers have produced robots with complex arrays of wide-angle and zoom cameras. Attempts to model human sight have gone so far as to create robot eyes that exhibit the vestibulo-occular reflex. This crucial, yet unconscious reflex is what allows the human eye to remain focused on a target even when the head is in motion. Although these advances are promising, it will be some time before a robot can be reliably tasked through natural human interaction.

How Should Domain Knowledge be Incorporated?

According to Arkin, the need to introduce domain knowledge without over-specializing is one of the key issues for robot learning. (Arkin 1998) The right amount of “nature” (hard-coded knowledge) and “nurture” (learned knowledge) depends greatly on the breadth of different tasks required of the robot. For instance, if a robot is designed to vacuum the same floor plan every day, it make sense to supply the robot with a map. On the other hand, if a robot needs to search a variety of burning buildings in order to extinguish fires, it may need to build a map on the fly. In general, the more knowledge is hard-coded by the designer, the more constrained a robot will be to a particular set of tasks. Researchers have met with the difficult paradox, however, that attempts to have robots learn tabula rasa have succeeded only on relatively simple tasks. As is usually the case in robotics, the designer must seek an experimentally driven compromise informed by the demands of the task.

Some critics believe that the significance of robot learning is sullied when a human begins the robot with a priori knowledge and behavior. Many believe this reduces the autonomy of the robot and contaminates the control architecture with the taint of human influence. In response, many roboticists point to the fact that biological systems do not learn tabula rasa, but rather begin with a great number of instincts and genetically encoded behaviors. While there is heated debate as to how much of our intelligence derives from nature and how much nurture, there is no doubt that human physical and mental development derives from a program encoded in our DNA.

Other critics have argued that robot learning techniques have failed to capture the most important feature of the human brain — its capacity to not only learn, but be aware of and able to direct its own learning. While it is easy to argue that all robot learning methods simply provide indirect ways of telling the robot what to do, it is important to remember that good robot learning strategies require interaction with a real and chaotic universe. Two mobile robots begun with exactly the same online learning strategy will not learn the same behavior when set loose in the world. Much the same occurs during the lives of two genetically identical twins. No form of intelligence truly begins tabula rasa; rather, intelligence unfolds from a program which includes within itself the ability to adapt through interaction with a more or less chaotic world.

« Prev   Next »

Page Contact Information:

Department of energy

DOE Office of Nuclear Energy
DOE-Idaho Office