Google Scholar

Article excerpt

Google Scholar. Google Inc., 1600 Amphitheatre Parkway, Mountain View, CA 94043; 650.253.0000; fax, 650.253.0001; http://scholar.google .com; free Website.

Nothing quite prepared the library world for the introduction of Google Scholar in November 2004. In mere weeks, Google's astonishing brand recognition and promotional machine propelled Google Scholar into the public's consciousness. Librarians-particularly medical and science librarians-have been talking and writing about it ever since. Who would have thought that a research database could create such a buzz?

What exactly is Google Scholar? The parent company has been typically coy with explanatory information on the product since its launch. Even now, much remains unknown about its source content, indexing, or relevance algorithms.

Google Scholar is a subset of the larger Google search index, consisting of full-text journal articles, technical reports, preprints, theses, books, and other documents, including selected Web pages that are deemed to be "scholarly." Although Google Scholar covers a great range of topical areas, it appears to be strongest in the sciences, particularly medicine, and secondarily in the social sciences. The company claims to have full-text content from all major publishers except Elsevier and the American Chemical Society, as well as hosting services such as Highwire and Ingenta.

Much of Google Scholar's index derives from a crawl of full-text journal content provided by both commercial and open source publishers. Specialized bibliographic databases like OCLC's Open WorldCat and the National Library of Medicine's PubMed are also crawled. Since 2003, Google has entered into numerous individual agreements with publishers to index full-text content not otherwise accessible via the open Web. Although Google does not divulge the number or names of publishers that have entered into crawling or indexing agreements with the company, it is easy to see why publishers would be eager to boost their content's visibility through a powerhouse like Google.

Like the larger Google search engine index, Google Scholar is fast and easy to search. It retrieves document or page matches based on the keywords searched and then organizes the results using a closely guarded relevance algorithm. Because so much of the content of Google Scholar's index comes from licensed commercial journal content, most users will discover that clicking on a link in Google Scholar's search results may reveal only an abstract-not full text-accompanied by a pay-per-view option. Institutions can configure Open-URL link resolvers, such as SFX, to authenticate users to provide access to full-text content that is available through institutional subscriptions.

The inadequacies of Google Scholar have already been well documented in reviews [1,2]. These reviews focused on three major weaknesses of the tool: lack of sufficient advanced search features, lack of transparency of the database content, and uneven coverage of the database. Henderson's review of Google Scholar demonstrated its significant limitations for clinician use [3]. Tests conducted by Jacso showed that Google Scholar typically crawled only a subset of the full available content of individual journals or databases [4]. In February 2005, Vine discovered that Google Scholar was almost a full year behind indexing PubMed records and concluded that "no serious researcher interested in current medical information or practice excellence should rely on Google Scholar for up to date information" [5].

With a simple, basic search interface and only minimal advanced search features, Google Scholar lacks almost every important feature of MEDLINE. It does not map to Medical Subject Headings (MeSH); does not permit nested Boolean searching; lacks essential features like explosions, subheadings, or publication-type limits; and offers searchers no ability to benefit from the extraordinary indexing that the National Library of Medicine provides. …