Indexing and Abstracting on the World Wide Web: An Examination of Six Web Databases
Nicholson, Scott, Information Technology and Libraries
Web databases, commonly known as search engines or Web directories, are currently the most useful way to search the Internet. In this article, the author draws from library literature to develop a series of questions that can be used to analyze these Web searching tools. Six popular Web databases are analyzed using this method. Using this analysis, the author creates three categories for Web databases and explores the most appropriate searches to perform with each. The work concludes with a proposal for the ideal Web database.
The Internet provides a link to many valuable information sources with no centralized database for organization and searching. Many individual Web databases and their attached search engines accessible through the World Wide Web compete to provide subject and keyword access to information available through the Internet. These databases are created by both humans and automated computer programs called "spiders" or "robots." As there is no standard (such as an AACR2R variant) for description of Web pages, each engine provides access in a unique way to a different database. This article will examine the methods used to collect information about the information resources, the indexing used, and the abstracting done as of February 25, 1997, in these six Web databases:
Alta Vista: http://www.altavista. digital. com
Open Text: http://index.opentext. net
Magellan: http://www.mckinley. com
To evaluate these databases from the viewpoint of an indexer/abstracter, three aspects will be examined: collection methods, indexing, and abstracting. The following questions, selected from Auster (1986), Conhaim, (1996), Courtois, Baer, and Stark (1995), Katz (1992), Lancaster (1991), Venditto (1996), and Winship (1995), will be examined for each database:
* How are sites selected (human/ automation)?
* What selection criteria are used? What types of Internet resources are analyzed?
* What is the scope of searching the Internet for sites?
* How long does it take a site to be included?
* How often are the entries updated?
* How large is the database and how fast is it growing?
* Which parts of the site are indexed? Are these parts appropriate surrogates for the work?
* Is a controlled vocabulary used? Is it available to end-users?
* How is the keyword indexing accomplished?
* How can users search the indexed terms?
* What is included in a displayed citation?
* Can the user discern where the citation came from?
* How valuable is the displayed citation in assisting a user to predict usefulness?
* Are there descriptions, abstracts, or reviews presented for the site? How are they created?
* For what type of searching is this database suited?
* For what type of searcher is the search engine created?
* How could the database/search engine be improved?
* How can an author assist the database service in accurate indexing and abstracting?
Lycos, the "Catalog of the Internet," is one of the oldest search engines on the Web. It was started at the Center for Machine Translation at Carnegie Mellon University in 1994 (Mauldin and Leavitt, 1994). Lycos is one of the most popular Web databases and was the first search engine available from the Netscape Net Search button (Notess, 1995). It currently shares that honor with Web Crawler, Excite, Yahoo, and Infoseek. Besides the Web database, Lycos provides access to a subject directory, the top 5 percent of the Web, and information on cities, stocks, individuals, and companies.
Upon request by a user, Lycos sends out a spider that navigates the site, recording information in the database. …