Information science is an interdisciplinary field that encompasses the study of the production, organization, storage, retrieval, dissemination and use of information. Research may focus on the information user, the systems that provide access to information, or the interface between the two. Over the past fifty years a number of sub-fields have emerged within information science. Two primary areas of study within the discipline are information retrieval (IR) and informetrics. Each specialty has developed from different traditions, but have common areas of interest. In this paper, the author provides a nontechnical overview of information retrieval and informetrics for the non-specialist, with a focus on the applications of the intersection of these two areas for IR system design and evaluation.
What is information retrieval?
Information retrieval is a selective process by which desired information is extracted from a store of information called a database (Meadow, 1992). Traditionally, IR systems have been used to locate text-based information, either the full-text of documents or document surrogates that summarize the contents of documents located outside of the database (e.g. bibliographic records). In recent years, information retrieval has broadened to include multimedia formats such as images, sound and video. IR system usage has also broadened during this time. Previously, information professionals were the primary users of IR systems, searching systems available through vendors such as DIALOG and EBSCO Information Services. The wider availability of online public access catalogues in libraries, CD-ROM database systems, and, most recently, web search engines, has made IR systems much more accessible to end users.
The process of interactive information retrieval involves a dialogue between the searcher and the IR system. The searcher initially submits a query to the IR system. Queries consist of one or more search terms and operators that define the parameters for records to be retrieved. The query terms are compared to an index of terms within the database using the operations (e.g. and, or, not) specified in the query. A list of records matching the query criteria is presented to the searcher for perusal. Based on the searcher's inspection of the records retrieved, the query may be reformulated. The process is then repeated.
On the surface, IR systems may resemble commonly used database management systems (DBMS). Although it is possible to develop an IR system using certain DBMS software, physical and philosophical differences distinguish these two types of systems. For example, the concept of relevance is central to information retrieval but does not play a role in DBMS interactions. Due to the ambiguities of language, not all items retrieved may be relevant to the searcher's information needs, despite having matched the query parameters. This is the challenge of IR: ensuring the timely retrieval of relevant items while not retrieving those items that are non-relevant to the searcher's information need.
Numerous conceptual models have been developed for IR systems. Many of today's IR systems incorporate a Boolean approach where retrieval is based on an exact or partial match to a query. Many bibliographic database systems accessible within libraries or through database vendors such as DIALOG use this method. Also popular are probabilistic systems that take into account likelihood of relevance based on frequency of occurrence of search terms within documents, allowing retrieved items to be presented in rank order based on calculated relevance. Most World Wide Web search engines and other full-text IR systems rely on this approach. Still, other systems rely on a vector space model, where potential relevance is determined by proximity of documents to queries, represented as vectors in a multi-dimensional space (Salton & McGill, 1983).
Information retrieval remains a key research area within information science. …