Case-Based Analysis: The Key Is the Data: Analog, or Case-Based, Analysis Is One of the More Powerful Trading Techniques. However, It Also Is One of the More Difficult to Systematize. Here, We Expand on Our Previous Discussion of Case-Based Reasoning and Cover the Steps to Computerizing This Strategy

Article excerpt

In "Making the case for the trade" (October 2003), we discussed the basic ideas of case-based reasoning. We covered the notion of using analogs to make trading decisions and moved on into how case-based reasoning can be used to implement analog trading patterns. The concept of case-based reasoning for trading application is simple: Look at the current record and find similar records in the past; then, observe what happened during some period after these past records and use the observations to forecast what will happen in the future.

These forecasts can form the base of a trading system. Even though the core idea is simple, there are many issues to consider. For example, performing the distance measure for each pattern vs. the test of the database will make the speed of computing unacceptable for a commercial application with a large database. Case-based applications employ various indexing and filtering methods to speed this process.

One method that was briefly discussed in the last article is called simply "4.5," a machine learning methodology that develops a decision tree. The leaves of this tree can index the supporting cases and be used to retrieve similar cases to which a distance calculation can be applied. Another issue to address is weighting the fields used in calculating the distance calculation to minimize entropy for the predicted outcome.

Most case-based applications use some variation of calculating the Euclidean distance between patterns. This calculation is the easy part. The more difficult part of case-based applications is extracting features that can be used to describe a given case in a useful way. An example would be a case-based application that when given a song, finds similar sounding songs.

The research for the song identifier was completed at the University of California at Berkeley. This application used simple nearest neighbor matching, but the approach was novel in how features were extracted from each song that was compared. The study used musical structures such as frequency, tempo and amplitude taken from sampling during the song. These elements were used to create 1,248 features. These features were compared in the database to find similar songs. Analyzing market data is a similar problem requiring preprocessing and data sampling.

THE PREPROCESSING PROBLEM

Data preprocessing is a concept familiar to those who use neural networks. In developing neural networks, the attempt is to develop a process that is predictive of our desired output. In case-based reasoning, we want to develop preprocessing that is descriptive of a given window of data. Here, we'll assume that we are preprocessing for a data window of a given size. From this beginning, we can test for patterns of differing lengths based on changing the weighting when doing the distance matching.

When developing preprocessing strategies, we need to determine the types of relationships that are important to uncover in our data. For example, if we are looking at intermediate patterns where we are simply looking at the general shape of the chart formations, we can develop the preprocessing based on the closing price. However, many patterns that we might try to uncover require the interaction between the open, high, low and close over multiple bars of data. For this reason, we need to be able to maintain the relationships that allow us to analyze chart features such as gaps, key reversal days, inside bar days, outside bar days, etc.

We need to normalize these relationships. By normalizing, a distance measure of "0" is given if the exact pattern occurs in 1996 when a market is trading at a 1000 or in 2003 when the market is trading at 1500. We also need to add a predictive set of fields to each record that can be used for prediction once we isolate similar cases.

In preprocessing, we first need to develop a method for the representation of a single day. We define each Lay of data by using its relative relationship to itself, yesterday and the day before. …