Software Failure: Counting Up the Risks

Article excerpt

When Boeing's new 777 airliner first takes to the skies in a few years, computers will control such crucial functions as setting flaps and adjusting engine speed. Electrical circuits will relay a pilot's actions to these computers, where complicated programs will interpret the signals and send out the instructions necessary for carrying out the appropriate maneuvers. Pilots will no longer fly the aircraft via direct electrical and mechanical controls, except when using an emergency backup system.

Because of the disastrous consequences of even a single fault, the software for such a computer system must be extremely reliable. A new analysis, however, demonstrates that testing complex software to estimate the probability of failure cannot establish that a given computer program actually meets such high levels of reliability.

The analysis also affirms that using multiple programs, which independently arrive at an answer to a given problem. doesn't necessarily guarantee sufficiently high reliability.

"This leaves us in a terrible bind," say Ricky W. Butler and George B. Finelli of the NASA Langley Research Center in Hampton, Va., the computer scientists who performed the analysis. "We want to use digital processors in life-critical applications, but we have no feasible way of establishing that they meet their ultra-reliability requirements."

In a paper presented last week in New Orleans at the Association for Computing Machinery's conference on software for critical systems, they argue: "Without a major change in the design and verification methods used for life-critical systems, major disasters are almost certain to occur with increasing frequency."

Many military aircraft and the European-built A320 airliner already use computer-controlled "fly-by-wire" systems. Computers also play important roles in medical technology, transportation systems, industrial plants, nuclear power stations and telephone networks - realms in which a software failure can cause tragedy (SN: 2/16/91, p.104).

"I think this is ... an important paper," says David L. Parnas, a computer scientist at McMaster University in Hamilton, Ontario. "Its very convincing and provides a lot of insight."

The traditional method of determining the reliability of a light bulb or a piece of electronic equipment involves observing the frequency of failures among a sample of test specimens operated under realistic conditions for a predetermined period of time. Using these data, engineers can estimate failure probabilities of not only individual components but also entire systems.

Unlike hardware, however, software doesn't wear out or break. "Software errors are the product of improper human reasoning," Butler says. …