Reinforcement Learning: Connections, Surprises, Challenges

By Barto, Andrew G. | AI Magazine, Spring 2019 | Go to article overview

Reinforcement Learning: Connections, Surprises, Challenges


Barto, Andrew G., AI Magazine


The idea of implementing reinforcement learning (RL) in a computer was one of the earliest ideas about the possibility of AI. In a 1948 report, Alan Turing described a design for a pleasure-pain system:

When a configuration is reached for which the action is undetermined, a random choice for the missing data is made and the appropriate entry is made in the description, tentatively, and is applied. When a pain stimulus occurs all tentative entries are cancelled, and when a pleasure stimulus occurs they are all made permanent. (Turing [1948] 2004, 425)

Turing did little to develop this idea, and it was not until the year of his death, 1954, that Wesley Clark and Belmont Farley simulated RL in a neural net on a digital computer (Farley and Clark 1954). In the same year, Marvin Minsky described an analog RL neural net in his Princeton PhD dissertation (Minsky 1954). There were earlier ingenious RL devices, though electromechanical rather than computer implementations, including Claude Shannon's maze-run- ning mouse, Theseus, that used a kind of RL to find its way through a maze (Shannon 1951). In the 70 years since Turing's report, mathematical formulations of RL have appeared in fields such as psychology, economics, and control engineering. RL algorithms known as learning automata date back to the early 1960s and the work of the Russian mathematician and physicist M. L. Tsetlin (published posthumously in Tsetlin [1973]; surveyed by Narendra and Thathachar [1989]).

Despite featuring prominently in Minsky's famous "Steps" paper (Minsky 1961), and despite the extensive mathematical study of algorithms like learning automata, RL largely remained on the margin of AI until relatively recently. Today we see RL playing essential roles in some of the most impressive applications of machine learning (ML), including DeepMind's Go-playing programs (Silver, Huang, et al. 2016; Silver, Schrittwieser, et al. 2017).

Instead of recounting this history, my aim here is to focus on observations from my personal experience with RL over the most recent 40 years of this history. Though personal, these observations will be of general interest, I believe, because they are instructive about RL's place in AI and its promise for future developments. RL has continued to fascinate me for this long - even in the face of skeptics and naysayers - for two major reasons. First, the study of RL has exposed deep connections between largely separate disciplines, ranging from computer science and engineering to psychology and neuroscience. More than any one discovery, or collection of discoveries, this rich fabric of interconnected facts and ideas has improved our understanding of both the humanmade and the natural worlds. The second reason I have stuck with RL for so long is that studying it has surprised me in several interesting and instructive ways. I have had to revise preconceptions in some instances; in others, unexpected new insights - at least new to me - emerged from the results of computational explorations. Here, I attempt to convey a sense of the richness of this fabric by describing the most striking connections and surprises. Finally, I discuss some of the challenges that need to be faced in the future.

First, a bit of background. In the late 1970s I had the opportunity to work as a postdoc on a project aimed at assessing the scientific merit of a hypothesis proposed by physiologist A. Harry Klopf, a senior scientist with the Avionics Directorate of the Air Force Office of Scientific Research. Klopf hypothesized that neurons, the major components of our brains, are individually hedonists that work to maximize a local analog of pleasure while minimizing a local analog of pain (Klopf 1972, 1982). Under the direction of Michael Arbib, William Kilmer, and Nico Spinelli, professors at the University of Massachusetts, Amherst, and founders of the Cybernetics Center for Systems Neuroscience, a farsighted center focusing on the intersection of neuroscience and AI, and later joined by graduate student Richard Sutton, we explored the early history of learning in AI, including connections to theories of animal learning from psychology and theories about the neural machinery underlying learning. …

The rest of this article is only available to active members of Questia

Already a member? Log in now.

Notes for this article

Add a new note
If you are trying to select text to create highlights or citations, remember that you must now click or tap on the first word, and then click or tap on the last word.
One moment ...
Default project is now your active project.
Project items
Notes
Cite this article

Cited article

Style
Citations are available only to our active members.
Buy instant access to cite pages or passages in MLA 8, MLA 7, APA and Chicago citation styles.

(Einhorn, 1992, p. 25)

(Einhorn 25)

(Einhorn 25)

1. Lois J. Einhorn, Abraham Lincoln, the Orator: Penetrating the Lincoln Legend (Westport, CT: Greenwood Press, 1992), 25, http://www.questia.com/read/27419298.

Note: primary sources have slightly different requirements for citation. Please see these guidelines for more information.

Cited article

Reinforcement Learning: Connections, Surprises, Challenges
Settings

Settings

Typeface
Text size Smaller Larger Reset View mode
Search within

Search within this article

Look up

Look up a word

  • Dictionary
  • Thesaurus
Please submit a word or phrase above.
Print this page

Print this page

Why can't I print more than one page at a time?

Help
Full screen
Items saved from this article
  • Highlights & Notes
  • Citations
Some of your highlights are legacy items.

Highlights saved before July 30, 2012 will not be displayed on their respective source pages.

You can easily re-create the highlights by opening the book page or article, selecting the text, and clicking “Highlight.”

matching results for page

    Questia reader help

    How to highlight and cite specific passages

    1. Click or tap the first word you want to select.
    2. Click or tap the last word you want to select, and you’ll see everything in between get selected.
    3. You’ll then get a menu of options like creating a highlight or a citation from that passage of text.

    OK, got it!

    Cited passage

    Style
    Citations are available only to our active members.
    Buy instant access to cite pages or passages in MLA 8, MLA 7, APA and Chicago citation styles.

    "Portraying himself as an honest, ordinary person helped Lincoln identify with his audiences." (Einhorn, 1992, p. 25).

    "Portraying himself as an honest, ordinary person helped Lincoln identify with his audiences." (Einhorn 25)

    "Portraying himself as an honest, ordinary person helped Lincoln identify with his audiences." (Einhorn 25)

    "Portraying himself as an honest, ordinary person helped Lincoln identify with his audiences."1

    1. Lois J. Einhorn, Abraham Lincoln, the Orator: Penetrating the Lincoln Legend (Westport, CT: Greenwood Press, 1992), 25, http://www.questia.com/read/27419298.

    Cited passage

    Thanks for trying Questia!

    Please continue trying out our research tools, but please note, full functionality is available only to our active members.

    Your work will be lost once you leave this Web page.

    Buy instant access to save your work.

    Already a member? Log in now.

    Search by... Author
    Show... All Results Primary Sources Peer-reviewed

    Oops!

    An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.