Statistical Analysis for Traders

Article excerpt

Statistics 101 was more than just an annoying stepping stone to a college degree. It gave you some tools that can be used in market analysis. This first of two parts will discuss the time-proven method of linear regression analysis and how it can provide solid, objective information on where future prices may go.

Fundamental analysis often creates some unique problems for traders. First, fundamental relationships are hard to quantify. second, not all data are created equal. Third, it's easy to make mistakes that render your fundamental conclusions invalid.

The solution to all of these problems, for those of us without advanced studies in non-linear signal processing and the like, is to keep things simple. One of the quickest ways to get into trouble making forecasts is to employ tools you don't understand. Thankfully, there are simple, straightforward analysis methods that can be used on widely available fundamental data that you can use to forecast commodity prices.

Regression analysis is one such tool. Regression analysis objectively estimates past fundamental relationships to determine a standard relationship that can be used going forward. This method has been used for decades by researchers in all fields, including market analysis. It is nothing new, but it's often ignored because it lacks the flare or promises of flashier techniques.

The most important aspect of regression analysis is not the application of the method itself, but on the setup. "A model relationship" (right) covers some of the math behind regression analysis, but it's more important to understand the theory, assumptions and proper variable selection than the intricate algebraic steps of determining individual variable significance.


A product's price is a function of supply and demand. Supply and demand, in turn, are functions of determinants such as production methods, weather, disposable income, tastes, etc.

This relationship may be written mathematically as:

Regression analysis assigns specific numeric weights to the supply and demand determinants we plug into our actual equation. These specific determinants are called independent variables. Here's an example:

By inserting the supply and demand values for point t and solving the equation, we get an estimate for price. In this equation price is the dependent variable.

We use regression analysis to find the bl and b2 weights and the constant figure. Using a computer - most spreadsheet software includes the tools you need to do this - we analyze past values for the dependent variables and independent variables to find these weights. These weights are also called the regression coefficients.

Unfortunately, we do not have fundamental reports that perfectly quantify supply and demand determinants. Also, many of the determinants can't be quantified, such as poor worker morale affecting production or consumer tastes driving demand. Fortunately, these factors typically pale in significance compared to determinants such as carryover stocks or yield estimates, which are figures that are reliably estimated.

However, because we can't model every determinant of supply and demand, our model will include error. That is, each prediction by our regression equation will vary from the actual values by some amount: the "e" in the regression equation above. This can't be helped, but it can be minimized. The minimization process is explained in "A model relationship."

But while we accept this error, we still need to keep it in check. For regression analysis to be valid, we must maintain a few assumptions regarding the error in our model and the variables we use to build that model.


The assumptions of regression analysis must met for our data if we can trust our regression equation. Our model's predictions will be worthless unless the data have certain important characteristics that are required for the math behind the regression model to work. …