Measuring Impacts of Condition Variables in Rule-Based Models of Space-Time Choice Behavior: Method and Empirical Illustration
Arentze, Theo, Timmermans, Harry, Geographical Analysis
The use of rule-based systems for modeling space-time choice has gained increasing research interests over the last years. The potential advantage of the rule-based approach is that it can handle interactions between a large set of predictors. Decision tree induction methods are available and have been explored for deriving rules from data. However, the complexity of the structures that are generated by such knowledge discovery methods hampers an interpretation of the rule-set in behavioral terms with as a consequence that the models typically remain a black box. To solve this problem, this paper develops a method for measuring the size and direction of the impact of condition variables on the choice variable as predicted by the model. The paper illustrates the method based on location and transport-mode choice models that are part of Albatross model--an activity-based model of space-time choice.
The recent planning literature has witnessed the exploration of decision tree techniques for analyzing space-time behavior. Decision tree induction methods, which have been developed in statistics and artificial intelligence, provide a technique to derive decision trees from observations on choice. The application of the technique for developing predictive models of space-time choice has been explored in several recent studies (Thill and Wheeler 1999; Gahegan 2000a; Arentze et al. 2000; Wets et al. 2000; Yamamoto, Kitamura, and Fujii 2002; Moons et al. 2002). In particular, we have used the technique to develop the activity-based model Albatross (Arentze and Timmermans 2000, 2002). Several empirical studies showed that the technique for such purposes is promising in terms of internal validity, performance relative to conventional utility-based models and spatial transferability (Arentze et al. 2001). Gahegan (2000b) stresses the potential of inductive learning as a method, besides scientific visualization, to address data with high attribute dimensionality.
The strength of the decision tree technique for modeling space-time behavior is its ability to represent the full complexity of interactions between condition variables. Unlike algebraic models (e.g., Bowman and Ben-Akiva, 1999), the specification of the model does typically not rely on the commonly used linear-additive utility functions, possibly with some selected interaction effects. At the same time, however, this complexity hinders an interpretation in terms of the impacts of condition variables on the choice variable. Whereas it is relatively straightforward to derive elasticities in discrete choice models, the non-algebraic nature of decision-tree based models prevents one from such easy interpretation.
Measuring the importance of condition variables has received some attention in Statistics and Artificial Intelligence mainly for two applications. First, it has been considered for identifying the most significant features among a large set of potentially relevant explanatory variables in a pre-modeling stage (e.g., Kira and Rendall 1992; Kononenko 1994). Feature selection methods that have been proposed take into account possible correlations between condition variables and try to measure the independent contribution of condition variables to observed differences in choice for example by using information theoretical concepts (e.g., Piramuthu 1996). In a second application, the objective is to rank conditional variables according to the contribution that they make in classifying or predicting the target variable in a decision tree model (e.g., Steinberg and Colla 1997). Thus, in both applications attention is focused on the potential or apparent significance of variables for prediction or classification. We emphasize that methods that have emerged from this area are not transferable to the present problem. Elasticity refers to a different dimension, namely the extent to which predictions are sensitive to variation over a hypothetical range of the condition variable of interest. …