Academic journal article Journal of Digital Information Management

Optimization of Boolean Queries in Information Retrieval Systems Using Genetic Algorithms-Genetic Programming and Fuzzy Logic

Academic journal article Journal of Digital Information Management

Optimization of Boolean Queries in Information Retrieval Systems Using Genetic Algorithms-Genetic Programming and Fuzzy Logic

Article excerpt

ABSTRACT: This paper proposes to use two information retrieval system models (Boolean information retrieval model and extended Boolean (fuzzy) information retrieval model). These models differ by using Boolean queries or fuzzy weighted queries. It also proposes a way for optimizing user query for the two models by using genetic programming and fuzzy logic. And proposes to use more number of Boolean operators (AND, OR, XOR, OF, and NOT) instead of the standard Boolean operators (AND, OR, and NOT), and use weights for Boolean operators and for terms in fuzzy models.

Categories and Subject Descriptors

H.3.3 [Information Search and Retrieval]; J.3 [Life and Medical Sciences]: Biology and genetics

General Terms

Boolean operators, Fuzzy model, Information retrieval

Keywords: Boolean Query, Information Retrieval, Genetic Algorithms, Genetic Programming, Fuzzy Logic, Term Weights, and Boolean Operator Weights.

1. Introduction

One of the most pressing issues with today's explosive growth of the Internet is so-called resource discovery problem [3]. That is how to find information interest among the vast and growing amount of information available. One of the most important uses of the public network is to find suitable information for such user query request. In this paper, we discuss the use of weights for both Boolean operators and the use of more number of Boolean operators to optimize the user query. We work on two IR models (Boolean and fuzzy) to optimize the user query using one of the evolutionary algorithms--genetic programming and fuzzy logic.

Genetic algorithm was implemented in both models, but fuzzy logic was used only in fuzzy or extended Boolean model. For the both models harmonic mean measure was used to measure the IR performance. Harmonic mean was used to combine precision and recall measures both at once to improve the IR performance.

2. Motivation

Because of the widespread use of web search techniques, particularly in academics, search processes need to be understood. Many users of web are not well trained in Boolean algebra; "the problem of learning the correct interpretations of Boolean operators and their rules of precedence" [1]. Hence, the motivation of this current work is to produce two IRs models, which enable to optimize the user query. The optimized query will retrieve the most relevant documents with less number of non-relevant documents to his/her search query. The deployment of Boolean operators (AND, OR, XOR, OF, and NOT) using harmonic mean measure improves the performance of IR.

3. Related work

The body of literature in information retrieval is filled with many papers. Generic algorithms offer more promise than rest. Masaharu et al. [17], propose to use an IR interface that employed a few number of query terms and concept categories with Boolean expressions; they use only the words that exist in the original query for reformulating the Boolean query; and their work is confined to two Boolean operators only. Cordon et al. [18], represent the query in a parse tree with maximum of 20 nodes; where they used only "AND, OR, and NOT" Boolean operators, and moreover the testing is limited to a small set of 400 documents only. The study of Kraft et al. [16] has addressed the genetic programming where they optimize user search queries and investigate whether precision or recall is more efficient objective function and presented experiments over non-fuzzy collection of documents. Cordon et al. [19] propose the use of multi objective evolutionary algorithms (EA), and they offer comparison of several EA oriented approaches for optimization of persistent search queries. Among the aforesaid studies the works of Kraft and Cordon offer more promises.

4. Information Retrieval

IR is the process of extracting useful information from databases of text documents (collection of document) via word or term searches and other techniques. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed

Oops!

An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.