Downloading Data from the Web

By Curle, David | Online, July/August 1997 | Go to article overview
Save to active project

Downloading Data from the Web

Curle, David, Online

Just when you thought you had mastered the art of downloading information from online databases and using a word processor to dress it up into a nicely-formatted, tidy report, along comes the World Wide Web and maKes a mess or everything again.

Regardless of its great value as a source for data of all kinds, there are a number of differences that make downloading and editing output from the Web more difficult than traditional online databases. The main differences are discussed shortly. A step-by-step review of several methods for downloading Web content will follow.

But first, a reminder: Data on the Web may be "free" in the sense that it costs nothing to access and use on most Web sites, but you are not entirely free to reproduce and re-use that data any way you please. Any such re-use must be consistent with applicable copyright law. (See articles by Stephanie C. Ardito and Robert Weiner in recent issues of ONLINE [1,2].)


What are the problems that the Web presents that traditional online databases do not-or that we have learned to master?

Organizationally Inconsistent Collections of Documents An online database consists of a number of records, each of which has a certain consistent structure. Even across various database producers and hosts, there are standard document types: bibliographic records, abstracts, full-text articles, company directory records, etc. That consistency makes downloading and manipulating search results predictable and relatively simple.

By contrast, on the Web you find a wild variety of document types and organizational models. A Web site can be a hierarchically organized group of documents linked from a main menu, or it can be a loosely connected collection without any discernible organizational principles. It can include its own search engine that will return documents from a large collection. A single page of HTML (HyperText Markup Language) can consist of a few lines of text or several megabytes of information. It might link to other documents located within the same Web site or on another Web site hosted on the other side of the world. No two Web sites are alike, which makes extracting information from them something of an adventure.

Multiplicity of Document Structures

The typical online database contains documents with strictly defined data elements such as a title, bibliographic elements, indexing terms, abstract, and text. Text is normally displayed flush left in a single, 80-character wide stream of text. A downloaded file of such documents provides predictable raw material for further processing, editing, and printing.

Each separate document in a Web site, on the other hand, consists of a specific file in HTML format. The HTML standard allows the author of each Web page to decide the size of each document, the graphical layout of the document, and organizational elements such as columns, tables, and headings. In short, if everyone is a publisher on the Web, so too is everyone an editor, layout coordinator, and art director-for better or for worse.

Navigational Differences

In a traditional online search session, the user constructs a search that results in retrieval of one or more records from a finite database. Depending on the host system, the searcher usually has a command available that will display all of those records in a single stream of text data that can be captured and sent to a single capture file.

On the Web, however, you need to "visit" each document separately before downloading them to disk one at a time; you can't simply enter a command to download an entire Web site or all of the documents that your Web search engine has retrieved. "Searching the Web" is really a misnomer; "Searching and Browsing the Web" is a more accurate description of the process. A large percentage of the data you encounter in a Web "search" might be irrelevant.

The rest of this article is only available to active members of Questia

Sign up now for a free, 1-day trial and receive full access to:

  • Questia's entire collection
  • Automatic bibliography creation
  • More helpful research tools like notes, citations, and highlights
  • Ad-free environment

Already a member? Log in now.

Notes for this article

Add a new note
If you are trying to select text to create highlights or citations, remember that you must now click or tap on the first word, and then click or tap on the last word.
One moment ...
Project items

Items saved from this article

This article has been saved
Highlights (0)
Some of your highlights are legacy items.

Highlights saved before July 30, 2012 will not be displayed on their respective source pages.

You can easily re-create the highlights by opening the book page or article, selecting the text, and clicking “Highlight.”

Citations (0)
Some of your citations are legacy items.

Any citation created before July 30, 2012 will labeled as a “Cited page.” New citations will be saved as cited passages, pages or articles.

We also added the ability to view new citations from your projects or the book or article where you created them.

Notes (0)
Bookmarks (0)

You have no saved items from this article

Project items include:
  • Saved book/article
  • Highlights
  • Quotes/citations
  • Notes
  • Bookmarks
Cite this article

Cited article

Citations are available only to our active members.
Sign up now to cite pages or passages in MLA, APA and Chicago citation styles.

(Einhorn, 1992, p. 25)

(Einhorn 25)


1. Lois J. Einhorn, Abraham Lincoln, the Orator: Penetrating the Lincoln Legend (Westport, CT: Greenwood Press, 1992), 25,

Cited article

Downloading Data from the Web


Text size Smaller Larger
Search within

Search within this article

Look up

Look up a word

  • Dictionary
  • Thesaurus
Please submit a word or phrase above.
Print this page

Print this page

Why can't I print more than one page at a time?

While we understand printed pages are helpful to our users, this limitation is necessary to help protect our publishers' copyrighted material and prevent its unlawful distribution. We are sorry for any inconvenience.
Full screen

matching results for page

Cited passage

Citations are available only to our active members.
Sign up now to cite pages or passages in MLA, APA and Chicago citation styles.

"Portraying himself as an honest, ordinary person helped Lincoln identify with his audiences." (Einhorn, 1992, p. 25).

"Portraying himself as an honest, ordinary person helped Lincoln identify with his audiences." (Einhorn 25)

"Portraying himself as an honest, ordinary person helped Lincoln identify with his audiences."1

1. Lois J. Einhorn, Abraham Lincoln, the Orator: Penetrating the Lincoln Legend (Westport, CT: Greenwood Press, 1992), 25,

Cited passage

Welcome to the new Questia Reader

The Questia Reader has been updated to provide you with an even better online reading experience.  It is now 100% Responsive, which means you can read our books and articles on any sized device you wish.  All of your favorite tools like notes, highlights, and citations are still here, but the way you select text has been updated to be easier to use, especially on touchscreen devices.  Here's how:

1. Click or tap the first word you want to select.
2. Click or tap the last word you want to select.

OK, got it!

Thanks for trying Questia!

Please continue trying out our research tools, but please note, full functionality is available only to our active members.

Your work will be lost once you leave this Web page.

For full access in an ad-free environment, sign up now for a FREE, 1-day trial.

Already a member? Log in now.