Magazine article AI Magazine

A Review of Reinforcement Learning

Magazine article AI Magazine

A Review of Reinforcement Learning

Article excerpt

The reinforcement learning problem is the challenge of AI in a microcosm; how can we build an agent that can plan, learn, perceive, and act in a complex world? There's a great new book on the market that lays out the conceptual and algorithmic foundations of this exciting area. Reinforcement learning pioneers Rich Sutton and Andy Barto have published Reinforcement Learning: An Introduction, providing a highly accessible starting point for interested students, researchers, and practitioners.

In the reinforcement learning framework, an agent acts in an environment whose state it can sense and occasionally receives some penalty or reward based on its state and action. Its learning task is to find a policy for action selection that maximizes its reward over the long haul; this task requires not only choosing actions that are associated with high reward in the current state but thinking ahead by choosing actions that will lead the agents to more lucrative parts of the state space. Although there are many ways to attack this problem, the paradigm described in the book is to construct a value function that evaluates the "goodness" of different situations. In particular, the value of a state is the long-term reward that can be attained if actions are chosen optimally. Recent research has produced a flurry of algorithms for learning value functions, theoretical insights into their power and limitations, and a series of fielded applications. The authors have done a wonderful job of boiling down disparate and complex reinforcement learning algorithms to a set of fundamental components, then showing how these components work together. The differences between dynamic programming, Monte Carlo methods, and temporal difference learning are teased apart, then tied back together in a unified way. Innovations such as backup diagrams, which decorate the book cover, help convey the power and excitement behind reinforcement learning methods to both novices and veterans like us.

The book consists of three parts, one dedicated to the problem description and two others to a range of reinforcement learning algorithms, their analysis, and related research issues.

We enthusiastically applaud the authors' decision to articulate the problem addressed in the book before talking in length about its various solutions. After all, a thorough discussion of the problem is necessary for veterans to understand the aims and scope of reinforcement learning research let alone novices in the field. At 85 pages in length, however, one might wonder what it is about the reinforcement learning problem that its description deserves (or requires?) twice as many pages as the typical journal paper. Is the reinforcement learning problem so complicated that it takes that long to describe and discuss it?

In truth, Part 1 does much more than just pose the problem. Chapter 1 contains a highly informal introduction to the broad problem domain: learning to select actions while interacting with an environment to achieve long-term goals. The example of Tic Tac Toe makes concepts such as reward, value functions, and the exploration-exploitation dilemma feel natural-all concepts that find a more mathematical treatment later in the book. The first chapter also provides an invaluable description of the history of reinforcement learning, placing recent research efforts in context. This history is an early example of a series of detailed literature reviews, found at the end of each chapter, which could alone justify the expense of purchasing the book.

Next, the book dives into a highly restricted instance of the reinforcement learning problem: the k-arm bandit problem. This well-researched problem lacks state transitions-there is only a single state-but it otherwise possesses the typical characteristics that set reinforcement learning apart from, say, supervised learning. The placement of the problem is well chosen because it illustrates with rigor the key concepts of the algorithms yet to come: the idea of interaction with an environment, reward, value functions, and the exploration-exploitation dilemma. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed


An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.