Magazine article Information Today

Securing Your Digital Future

Magazine article Information Today

Securing Your Digital Future

Article excerpt

In the mid-1970s, three new file formats hit the information indus- try: Dialog, VRS, and ITSC: Orbit. Eager to hop on board with the latest technological trends, many organizations began using these standards to encode information for future use. But while users en- joyed this current information with all the benefits of a modern file for- mat, backfiles were often left un- converted in their original form. Over time, the technology used to view those files gradually disap- peared, rendering them inaccessible, according to Marjorie Hlava, president of Access Innovations, Inc. and chairwoman of the NFAIS standards committee.

"In terms of the digital backfiles, the big ones are good to the mid-'70s at the moment, and then it's like this big black void," she says. "Some files have gone back further, and some are beginning to go back further, but anything prior to the '70s, you're better off looking at print."

More information was lost as computer programs were updated and improved, leaving users with no way to open files stored in certain proprietary formats. According to Bill Trippe, lead analyst for XML technologies and content strategies at Gilbane Group, Inc., proprietary formats remain problematic today, though the market's consolidation around Microsoft has reduced the total number of proprietary processes. Still, he says these formats should be avoided because it is unlikely that any individual or company will have the exact same software and operating system required to access them in 10 to 15 years.

But data stored on physical media such as CDs, DVDs, and magnetic tape is degrading over time, and some formats, such as floppy disks, are virtually inaccessible by traditional means, according to Hlava. And today, as formats improve, more data is being lost.

Follow the Format

Companies are now beginning to use the latest industry standard format to encode their data. Hlava says most companies are currently in the process of switching to Unicode - which lets users read 8-, 16-, and 32-bit data - as opposed to ASCII, which supports only 8 -bit data. She says some institutions that haven't made the upgrade or haven't started using Unicode fully have found themselves at a disadvantage.

"The Library of Congress decided to take a 8-bit standard for Unicode, which means that the new 16-bit data - and I'd say 16-bit, at this point, is more common - they can't read," she says. "And so they may have to revisit that standard fairly soon."

Many libraries are making the transition from MARC records to XML, a combination of SGML and HTML that addresses both formats' faults, says Hlava. XML can be written in ASCII and Unicode.

"SGML was incredibly versatile, which made it incredibly complicated, and everybody implemented it differently," she says. "And then HTML came along in about '94 as a way to code up things and make it display on the web, and that was awesome and simple, but it was really format only. It didn't have anything to do with context, whereas SGML had everything to do with context. And so XML came out as kind of a hybrid so that you could put in both the format tags and the context tags."

Trippe says XML has not peaked yet because many companies are still switching over to the format. So it should be around for "quite some time," making it a good format for library data.

"It seems to be the preferred format for all of the information providers and everything like that, and it's pretty powerful because you can drill down to different things, so you can get to any level you want to," he says.

For archiving purposes, a nonproprietary standard such as TEI (Text Encoding Initiative) or EAD (Encoded Archival Description) should be used, says Hlava. These formats work across various systems as a flat ASCII or rich text format file would, but TEI and EAD files retain formatting and enable metadata to be added, she says.

"A flat file has no tagging," she says. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed


An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.