In 1998, the international planning community was invited to take part in the first planning competition, hosted by the Artificial Intelligence Planning Systems Conference, to provide a new impetus for empirical evaluation and direct comparison of automatic domain-independent planning systems. This article describes the systems that competed in the event, examines the results, and considers some of the implications for the future of the field.
The International Artificial Intelligence Planning Systems Conference (AIPS-98), held at Carnegie Mellon University in Pittsburgh in June 1998, played host to the first world planning competition. Competitors were invited to come and compete on a collection of domains and associated problems, sight unseen, using whatever planning technology they wanted. Tracks were offered for STRIPS, ADL, and HTN planning, but in the event, only STRIPS and nnl. were entered. Indeed, ADI. saw only two competitors: IPP (Koehler et al. 1997) and sGP (Anderson and Weld 1998). IPP gave a convincingly superior performance over SGP, the only Lisp-based planner in the competition, in a single round playoff. The STRIPS track originally attracted some 9 or 10 declarations of intent to take part. This article gives an account of the competition as seen by the competitors who actually arrived in Pittsburgh and took part in the two tracks.
Competitions as a way to evaluate and promote progress in various fields have precedents, such as the Message Understanding Competition (MUC) series, sponsored by the Defense Advanced Research Projects Agency (DARPA); the Text Retrieval Competition (TREC) series, sponsored by National Institute of Standards and Technology and DARPA; and the Turing Competition. These competitions have stimulated work, but they also represent a serious investment in effort for the competition organizers and competitors. It is a formidable task to create a collection of tasks that is realistically within the reach of the existing technology and that represents an adequate challenge and points the way for the field to develop. Administrative problems represent a huge overhead to this task: A common language must be developed that allows problems to be specified and results evaluated, scoring mechanisms must be determined, and the environment must be selected and competitors forewarned. Tribute should be paid to Drew McDermott for the role he played in almost single-handedly executing all these tasks, with support from the competition committee.
For the competitors, the competition represents a challenge to the robustness of their software, and demands work in meeting the specifications for both input and output formats, while they continue to develop and enhance the basic functions of their systems. The development of PDDL (McDermott and AlPS 1998) as the common language for the competition problem specifications was an important step in the progress of the competition. A challenge to the competitors was to adapt to the many minor changes in this language as it steadily stabilized. Another important problem was anticipating the demands of the competition domains. All the planners that eventually competed are domain-independent planners that require little or no manual guidance in selecting run-time behavior; however, the performance of all the planners can be affected dramatically by the design of the encoding of the domain and problem specifications. The criteria by which success was to be judged were also volatile: Trade-offs between the planning time and the optimality of the plan produced were a controversial balancing act. Optimality was measured purely by the number of steps in the plan for the STRIPS track; so, planners producing optimal parallel plans might well find themselves significantly outperformed by planners concentrating on sequential plan optimality. Furthermore, the selection of domains to be used in the competition was hard. All the competitors had some collection of favored domains that showcased the characteristics of their planners. …