Machine-Learning Research: Four Current Directions

Article excerpt

Machine-learning research has been making great progress in many directions. This article summarizes four of these directions and discusses some current open problems. The four directions are (1) the improvement of classification accuracy by learning ensembles of classifiers, (2) methods for scaling up supervised learning algorithms, (3) reinforcement learning, and (4) the learning of complex stochastic models.

The last five years have seen an explosion in machine-learning research. This explosion has many causes: First, separate research communities in symbolic machine learning, computational learning theory, neural networks, statistics, and pattern recognition have discovered one another and begun to work together. Second, machine-learning techniques are being applied to new kinds of problem, including knowledge discovery in databases, language processing, robot control, and combinatorial optimization, as well as to more traditional problems such as speech recognition, face recognition, handwriting recognition, medical data analysis, and game playing.

In this article, I selected four topics within machine learning where there has been a lot of recent activity. The purpose of the article is to describe the results in these areas to a broader Al audience and to sketch some of the open research problems. The topic areas are (1) ensembles of classifiers, (2) methods for scaling up supervised learning algorithms, (3) reinforcement learning, and (4) the learning of complex stochastic models.

The reader should be cautioned that this article is not a comprehensive review of each of these topics. Rather, my goal is to provide a representative sample of the research in each of these four areas. In each of the areas, there are many other papers that describe relevant work. I apologize to those authors whose work I was unable to include in the article.

Ensembles of Classifiers

The first topic concerns methods for improving accuracy in supervised learning. I begin by introducing some notation. In supervised learning, a learning program is given training examples of the form (x^sub l^, y^sub 1^), ..., (x^sub m^, y^sub m^) for some unknown function y = fx). The x=values are typically vectors of the form whose components are discrete or real valued, such as height, weight, color, and age. These are also called the features of ^sub i^j. I use the notation x^sub ij^, to refer to the jth feature of x^sub i^. In some situations, I drop the i subscript when it is implied by the context.

The y values are typically drawn from a discrete set of classes 1, ..., K in the case of classification or from the real line in the case of regression. In this article, I focus primarily on classification. The training examples might be corrupted by some random noise.

Given a set S of training examples, a learning algorithm outputs a classifier. The classifier is a hypothesis about the true function f. Given new x values, it predicts the corresponding y values. I denote classifiers by h^sub 1^, ..., h^sub L^.

An ensemble of classifiers is a set of classifiers whose individual decisions are combined in some way (typically by weighted or unweighted voting) to classify new examples. One of the most active areas of research in supervised learning has been the study of methods for constructing good ensembles of classifiers. The main discovery is that ensembles are often much more accurate than the individual classifiers that make them up.

An ensemble can be more accurate than its component classifiers only if the individual classifiers disagree with one another (Hansen and Salamon 1990). To see why, imagine that we have an ensemble of three classifiers: h^sub 1^, h^sub 2^, h^sub 3^, and consider a new case x. If the three classifiers are identical, then when h^sub 1^(x) is wrong, h^sub 2^(x) and h^sub 3^(x) are also wrong. However, if the errors made by the classifiers are uncorrelated, then when h^sub 1^(x) is wrong, h^sub 2^(x) and h^sub 3^(x) might be correct, so that a majority vote correctly classifies x. …