Picking Our Pieces

Article excerpt

Now we get the to heart of the matter - selecting the independent variables and choosing the actual models we'll test for statistical significance.

Now that we've covered the fundamental concepts and assumptions of regression analysis, we'll get to the point: model building. We've already decided on a dependent variable - the average price of November soybeans from mid-August through expiration. Now we must choose our independent variables - the supply and demand determinants that affect the average price of November soybeans.

First, one quick note about our dependent variable: The prices used in the scatterplots here and throughout the rest of this series will be inflation-adjusted. We want the monetary levels associated with certain past fundamental conditions to compare to the conditions and monetary values of today. Not adjusting for inflation would imply the same conditions that caused $8 soybeans in 1973 dollars, for example, would cause $8 soybeans in 1997 dollars. We used the average producer price index (PPI) from August through November for each contract year to deflate that contract's average fall price.

To determine the independent variable sets we'll test, first we'll list several supply and demand determinants. Then we'll choose five subsets based on the standard regression analysis assumptions and requirements covered last month.

For simplicity's sake, we'll also only use data from government agencies. Most, if not all, of the data sets will be available free over the Internet or for low-cost mail delivery from the agency itself.

Variable selection There are many potential measures of the price of soybeans that aren't mentioned here, but these should serve our purpose well - to demonstrate standard regression analysis by building a simple, statistically significant model.

For supply, we want to measure the amount of soybeans available to the market during the period that will affect the average fall price of the November contract.

Soybean yield - reported as the number of bushels harvested per acre - is one determinant of supply. Yield figures not only reflect the amount of beans planted but the amount the U.S. Department of Agriculture (USDA) expects farmers to harvest. Soybean acreage, another determinant, is the number of acres the USDA expects U.S. soybean farmers to harvest soybeans from. Production, a third supply determinant, is straightforward enough. In terms of the previous two variables, it's yield times acreage. This is the total number of bushels of soybeans the USDA expects farmers to harvest.

Demand is more difficult to quantify than supply because there isn't a single reported figure that does so well in capturing demand. Although sometimes mistaken for such, consumption, often reported along with supply figures in various reports, is not a demand measure. Demand and supply converge to determine price, which partly establishes consumption. Consumption, then, is a function of the price determinants demand and supply, not the other way around.

Where consumption, or usage, figures are useful is when we combine them with other soybean supply statistics. One popular calculation is the ratio of ending stocks to total usage - the stocks/usage ratio. This is a good measure of tightness for a particular crop. Using the inverse, a usage/ending stocks ratio, we avoid some linearity problems and get a better fit.

Other demand determinants include those mentioned in last month's article, such as the size of the market, substitute product production, complement product production and consumer tastes.

Two of the largest consumers of soybean meal are poultry and hog feeders. Protein supplements are important ingredients in the diets of these livestock. We could consider the inflation-adjusted prices of hogs and poultry just before the period of our dependent variable; if livestock feeders are getting more for their livestock, they would seek to produce more and, thus, demand more soybean meal for feed. …