Magazine article Computers in Libraries

Failure Is Always an Option

Magazine article Computers in Libraries

Failure Is Always an Option

Article excerpt

Things are failing all around us. A few years ago, several large companies disappeared almost overnight in a series of scandals. This year, the mortgage industry imploded. More recently, stalwart financial and insurance corporations have been dropping like flies. When I wrote this article in mid-September, The Wall Street Journal's homepage (www.wsj.com) actually had the phrase "Don't Panic" as a prominent headline. Around the same time, a major U.S. city (Houston) had been without power for days following a devastating hurricane (Ike).

Failures happen on a smaller scale too. Recently, the all-caps word "FAIL" became a mocking web catch phrase to describe poorly conceived projects, server breakdowns, or other calamities. If you haven't seen this, do a search for "epic fail," though I'll warn you, some of what you find might be ugly. The people running the popular micro-blogging site Twitter struggle so much to keep up with traffic that the oddly reassuring image of a whale they post when the site goes down came to be known and loved as the "Fail Whale."

And in early September, colleagues from a former job of mine shut down one of their libraries. They closed a physical location, and two longtime staffers lost their positions. Restaurants and small businesses come and go all the time, websites go up and down all the time, but a library closing hits close to home. What we do or don't do when it all comes to an end says a lot about us.

Facing Failure

A friend reminded me recently about how so many things go right for most of us most days, but we still obsess over the little things that go wrong. The subway delay makes you late for your meeting, the boss doesn't approve of your work, or something went wrong with your clothes--anything can set us off on a day when everything else goes right. The power works, your family's as healthy as the day before, you have plenty of food to eat, and there are interesting things to do for fun and for profit. For this friend, his outlook changed one day when he chose to focus more on the successes--which isn't to say that he doesn't get upset when things go wrong, but rather that when he does get upset, he remembers what went right to help him get through it.

That's one way to face failure, and it probably can work a lot of the time for the little failures we face every day. Because I spend most of my time writing software--which is to say "debugging broken software I wrote poorly the first time and fixed poorly the second and third times"--I run into system crashes, freezes, errors, and panics all day long. A lot of the time, it's my own fault--but not always.

Hardware failures are normal, so much so that there's a statistical specification on a lot of the computing hardware products we buy called "mean time between failures" or MTBF. MTBF suggests how long we should expect most copies of a product (a hard drive, a screen, a music player) to last if we had many thousands of them. An MTBF of 5 years on a hard drive product says that "if you had a thousand of these, most of them should work for about 5 years." The problem with MTBF, though, is that most of us don't buy hard drives by the thousand. We usually buy one at a time, and any one drive can fail any day. That's the beauty of statistics--an early hardware product failure can be fully within the "expected performance range," but it can still really foul up your day.

Two coping skills get me past these kinds of problems. One is just a variation on my friend's approach--when things go right, this kind of work can be a joy. When one or two things in a software system fail, you can often step through the system piece by piece to eliminate possible culprits by enumerating everything that is working properly. This works as a way to diagnose what failed, and it works because it can be reassuring. The other skill is to recognize that "these things happen" and to keep some emotional detachment as you analyze what went wrong to see if the failure matches any patterns of failure you've seen in the past. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed

Oops!

An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.