A New Era of Search Engines: Not Just Web Pages Anymore

By Hock, Ran | Online, September-October 2002 | Go to article overview

A New Era of Search Engines: Not Just Web Pages Anymore


Hock, Ran, Online


Web search engines are entering a new era. With traditional abstracting and indexing services, a rigorous search requires expertise in editorial policies: What kinds of documents, time frames, degree of selectivity, and exceptions should be expected? Is it journal articles, technical reports, or newspapers? Does it go back to 1987 or 1995? Does it have cover-to-cover indexing, full text, and/or abstracts? Are letters to the editor included or not?

Searchers tend not to apply the same rigor to Web search engines, knowing that searches retrieve only Web sites and Web pages. That is changing. Web search engines no long search just plain old ordinary Web sites. Information professionals should apply the same evaluation techniques to a Web search as they have to traditional online databases. You need to think in terms of what kinds of "documents" you should expect to get. Although Web pages still predominate, consider the varying types of content now searchable on the Web.

The serious searcher should indulge in some traditional thinking about what is retrievable and expand the list of categories included as retrievable "document types." In addition to Web pages, searches can be done for news articles, PDF (and other) file types, images, audio files, video files, and some other odds and ends. With a heightened awareness along these lines, you will have a better idea of what you can get through a particular tool (engine), which engine(s) to use for a particular question, and overall, a much better idea of what is possible.

To understand content coverage, keep two things in mind:

(1) In some cases various kinds of documents will be retrieved automatically in a regular search using the search engine's main query box(es). In other cases, retrieving the various documents will require that you specify a separate database.

(2) Not only is the variety of content important, but also the searchability of that content. Can you specify that you just want, for example, PDF files, and can you specify particular characteristics of these various document types? With images, for example, can you specify file format, colorations, and file sizes?

PLAIN OLD WEB PAGES

Think of the "Web pages" category of retrievable documents as those documents written in HTML, a distinct document type from the searchability perspective. When you enter a term in the main search box of a search engine, you will retrieve pages that have your search term as text somewhere on the page--or did, that is, when the engine crawled the page.

Rather obviously, major Web search engines, which I define as AllTheWeb, AltaVista, Google, HotBot, Lycos, Teoma, and WiseNut, primarily search and retrieve Web pages. (The engines listed are a bit arbitrary but reflect a combination of size and popularity within the professional searcher community.) The next obvious question is "how many" Web pages? Roughly, the numbers are as follows, in size order:

Google     2.1 billion
AllTheWeb  2.1 billion
WiseNut    1.6 billion
AltaVista   .9 billion
Lycos       .6 billion
HotBot      .5 billion
Teoma       .15 billion

On the "very good news" front, it is a relief to now be able to use billions rather than millions as the easiest measure with which to work when talking about Web page content. For a more precise analysis of size, be sure to take a look at Greg Notess' evaluation of search engine database sizes [searchengineshowdown.com]. Greg does an excellent job of analyzing size, taking into account factors such as duplication and dead links.

TIME FRAMES

Information professionals look at both ends of a time frame when evaluating sources: how far back in time a tool goes and how current the content is. For Web pages, the historical part is easy to answer. Web pages in the search engines can go back as far as Web pages go back, generally to somewhere in the early to mid-1990s. …

The rest of this article is only available to active members of Questia

Sign up now for a free, 1-day trial and receive full access to:

  • Questia's entire collection
  • Automatic bibliography creation
  • More helpful research tools like notes, citations, and highlights
  • Ad-free environment

Already a member? Log in now.

Notes for this article

Add a new note
If you are trying to select text to create highlights or citations, remember that you must now click or tap on the first word, and then click or tap on the last word.
One moment ...
Default project is now your active project.
Project items

Items saved from this article

This article has been saved
Highlights (0)
Some of your highlights are legacy items.

Highlights saved before July 30, 2012 will not be displayed on their respective source pages.

You can easily re-create the highlights by opening the book page or article, selecting the text, and clicking “Highlight.”

Citations (0)
Some of your citations are legacy items.

Any citation created before July 30, 2012 will labeled as a “Cited page.” New citations will be saved as cited passages, pages or articles.

We also added the ability to view new citations from your projects or the book or article where you created them.

Notes (0)
Bookmarks (0)

You have no saved items from this article

Project items include:
  • Saved book/article
  • Highlights
  • Quotes/citations
  • Notes
  • Bookmarks
Notes
Cite this article

Cited article

Style
Citations are available only to our active members.
Sign up now to cite pages or passages in MLA, APA and Chicago citation styles.

(Einhorn, 1992, p. 25)

(Einhorn 25)

1

1. Lois J. Einhorn, Abraham Lincoln, the Orator: Penetrating the Lincoln Legend (Westport, CT: Greenwood Press, 1992), 25, http://www.questia.com/read/27419298.

Cited article

A New Era of Search Engines: Not Just Web Pages Anymore
Settings

Settings

Typeface
Text size Smaller Larger Reset View mode
Search within

Search within this article

Look up

Look up a word

  • Dictionary
  • Thesaurus
Please submit a word or phrase above.
Print this page

Print this page

Why can't I print more than one page at a time?

Full screen

matching results for page

Cited passage

Style
Citations are available only to our active members.
Sign up now to cite pages or passages in MLA, APA and Chicago citation styles.

"Portraying himself as an honest, ordinary person helped Lincoln identify with his audiences." (Einhorn, 1992, p. 25).

"Portraying himself as an honest, ordinary person helped Lincoln identify with his audiences." (Einhorn 25)

"Portraying himself as an honest, ordinary person helped Lincoln identify with his audiences."1

1. Lois J. Einhorn, Abraham Lincoln, the Orator: Penetrating the Lincoln Legend (Westport, CT: Greenwood Press, 1992), 25, http://www.questia.com/read/27419298.

Cited passage

Welcome to the new Questia Reader

The Questia Reader has been updated to provide you with an even better online reading experience.  It is now 100% Responsive, which means you can read our books and articles on any sized device you wish.  All of your favorite tools like notes, highlights, and citations are still here, but the way you select text has been updated to be easier to use, especially on touchscreen devices.  Here's how:

1. Click or tap the first word you want to select.
2. Click or tap the last word you want to select.

OK, got it!

Thanks for trying Questia!

Please continue trying out our research tools, but please note, full functionality is available only to our active members.

Your work will be lost once you leave this Web page.

For full access in an ad-free environment, sign up now for a FREE, 1-day trial.

Already a member? Log in now.