Progress Report: The British Library and Microsoft Digitization Partnership

By Ashling, Jim | Information Today, November 2007 | Go to article overview

Progress Report: The British Library and Microsoft Digitization Partnership


Ashling, Jim, Information Today


Microsoft made it clear that it wasn't going to let Google tackle mass book digitization exclusively when it announced a partnership with The British Library (BL) in November 2005.

The BL/Microsoft project is designed to digitize 25 million pages of 100,000 out-of-copyright titles from the BL collection related to 19th-century literature. Access will be provided via Microsoft's Live Search Books site (http://books .live.com) and the BL's Web site (www .bl.uk). Live Search Books now includes many partners: The University of California Libraries, Cornell University Library, the University of Toronto Library, The New York Public Library, and the American Museum of Veterinary Medicine have all joined, as well as more than 50 publishers.

[ILLUSTRATION OMITTED]

Wider Access to Lesser-Known Authors

Kristian Jensen, head of British Early Printed Collections, reviewed the selection process. Unlike previous BL digitization projects where material had been selected on an item-by-item basis, the sheer size of this project made such selectivity impossible. Instead, the focus is on English-language material, collected by the BL during the 19th century. Jensen compared the process to mass microfilming. "Nonselectivity widens access," he said.

Being less selective creates certain advantages, however. First, it lessens the domination of the well-known author, or the high profile enjoyed by the "already famous." The works of virtually unknown writers will be brought to the attention of scholars as easily as material by Charles Dickens. The collection is being processed by the same classifications used at the time of original acquisition. So unusual classes today, such as "19th-century female poets," become accessible as research areas. The benefits of looking at the literature as it would have been available at the time will be welcomed by educationalists, and delicate literary items will also benefit from not being overhandled in the future.

Another benefit of the selection process is that entire shelf runs can be taken for scanning at one time. After trying a couple of pilot runs to assess quality standards, Microsoft and the BL chose CCS (Content Conversion Specialists) as the scanning contractors. Richard Helle, CCS managing director, provided a tour of the digitization studio at the BL's press event in September.

The target is to scan 50,000 pages per day with a 2-year timetable for completion. However, none of the valuable material is subjected to any risk with such fast output. Helle emphasized that these "treasures" were being scanned nondestructively, and all staff involved had received careful training. Book movement pilots were run in advance to determine the amount of staff time required during the full process, from selection and retrieval to delivery, scanning, and reshelving.

Scanning and OCR Conversion

Limits have been established for maximum and minimum book size in terms of what can be scanned now, which prevents digitization of about 20 percent of the relevant collection. Everything is tracked with bar codes, and the condition of each book is checked to ensure that it can stand up to the scanning process. Four Kirtas Technologies BookScan machines are now being used.

These provide semiautomated scanning with an operator in place to ensure that all pages are turned accurately, to preview the quality of the images, and to adjust color settings that can vary with temperature. A separate scanner is used to handle books with fold-out pages.

Scanning produces high-resolution images (300 dpi) that are then transferred to a suite of 12 computers for OCR (optical character recognition) conversion. The scanners, which run 24/7, are specially tuned to deal with the spelling variations and old-fashioned typefaces used in the 1800s. The process creates multiple versions including PDFs and OCR text for display in the online services, as well as an open XML file for long-term storage and potential conversion to any new formats that may become future standards. …

The rest of this article is only available to active members of Questia

Already a member? Log in now.

Notes for this article

Add a new note
If you are trying to select text to create highlights or citations, remember that you must now click or tap on the first word, and then click or tap on the last word.
One moment ...
Default project is now your active project.
Project items

Items saved from this article

This article has been saved
Highlights (0)
Some of your highlights are legacy items.

Highlights saved before July 30, 2012 will not be displayed on their respective source pages.

You can easily re-create the highlights by opening the book page or article, selecting the text, and clicking “Highlight.”

Citations (0)
Some of your citations are legacy items.

Any citation created before July 30, 2012 will labeled as a “Cited page.” New citations will be saved as cited passages, pages or articles.

We also added the ability to view new citations from your projects or the book or article where you created them.

Notes (0)
Bookmarks (0)

You have no saved items from this article

Project items include:
  • Saved book/article
  • Highlights
  • Quotes/citations
  • Notes
  • Bookmarks
Notes
Cite this article

Cited article

Style
Citations are available only to our active members.
Buy instant access to cite pages or passages in MLA, APA and Chicago citation styles.

(Einhorn, 1992, p. 25)

(Einhorn 25)

1. Lois J. Einhorn, Abraham Lincoln, the Orator: Penetrating the Lincoln Legend (Westport, CT: Greenwood Press, 1992), 25, http://www.questia.com/read/27419298.

Cited article

Progress Report: The British Library and Microsoft Digitization Partnership
Settings

Settings

Typeface
Text size Smaller Larger Reset View mode
Search within

Search within this article

Look up

Look up a word

  • Dictionary
  • Thesaurus
Please submit a word or phrase above.
Print this page

Print this page

Why can't I print more than one page at a time?

Help
Full screen

matching results for page

    Questia reader help

    How to highlight and cite specific passages

    1. Click or tap the first word you want to select.
    2. Click or tap the last word you want to select, and you’ll see everything in between get selected.
    3. You’ll then get a menu of options like creating a highlight or a citation from that passage of text.

    OK, got it!

    Cited passage

    Style
    Citations are available only to our active members.
    Buy instant access to cite pages or passages in MLA, APA and Chicago citation styles.

    "Portraying himself as an honest, ordinary person helped Lincoln identify with his audiences." (Einhorn, 1992, p. 25).

    "Portraying himself as an honest, ordinary person helped Lincoln identify with his audiences." (Einhorn 25)

    "Portraying himself as an honest, ordinary person helped Lincoln identify with his audiences."1

    1. Lois J. Einhorn, Abraham Lincoln, the Orator: Penetrating the Lincoln Legend (Westport, CT: Greenwood Press, 1992), 25, http://www.questia.com/read/27419298.

    Cited passage

    Thanks for trying Questia!

    Please continue trying out our research tools, but please note, full functionality is available only to our active members.

    Your work will be lost once you leave this Web page.

    Buy instant access to save your work.

    Already a member? Log in now.

    Oops!

    An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.