Academic journal article Journal of Digital Information Management

Fusion User Browsing Behavior and User Interests Autonomous Control of Personalized Search Results Sort

Academic journal article Journal of Digital Information Management

Fusion User Browsing Behavior and User Interests Autonomous Control of Personalized Search Results Sort

Article excerpt

I. Introduction

For the Internet information retrieval, the retrieval results with personalized information based on the user's interest is one of the main approach to improve the recall and precision ratios for information retrieval.

Personalized retrieval is a technology, which collect information of user's interest and find the results according with user's requirements and put them in the searching page in priority. This method can improve the searching efficiency and accuracy, but have threats to leak user's secrets inevitably. For different users, how to satisfy different request and reflect individual interest accurately, and provide suitable results, is one of key problems at present.

Based on the simplified ODP hierarchical structure and searching record, a model which can express and classify the use's interest and control the opening degree autonomously has been established. This paper focuses on the issues above, discusses the building of the user interest model, the evaluation of search result pages, and the sort of search results.

2. User interest model based on basic structure of ODP

2.1 Establishing model of user's interest

One of the key problems of personalized retrieval is how to describe the actual interest of a user. Referring to the simplified ODP tree structure, according to various users of different interest branches and their weights, it constructs the organization of catalogues levels based on user personalized requirements, and adding the percentage of user's interest at the tree node, we construct the model of user's interest [1-4].

The user interest model as a tree structure, notes for TP. Each node of the tree structure model is defined as:

node = {Keyword, Item, Weight, Value, Children}

Introduce the following notations:

Keyword is the keyword of the node.

Item is the describing word set of the node,

Item = {[k.sub.1], [k.sub.2], ... [k.sub.n]}.

Weight is the weight of each describing word in the describing word set,

Weight ={[KW.sub.1], [KW.sub.2], ... [KW.sub.n]},

[KW.sub.i] says the weight of the number i of node. Item.

Value is the key word score for the node.

Children is the information of the subordinate child node.

The root node is a special node,

node-Top = {null, null, null, Value, Children}.

For any node n except of the root node in the tree structure model, the value for each node Item is the subset of its arent node Item value, and the parent node Item will generally contain the unique describing words in its child nodes Item excluding the union of its child nodes. Each element weight in the parent node Item is the sum of the same element weights in its child nodes. The Value value of each node is the sum of Value values of its child nodes. i.e.,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

2.2 The calculation of user interest degree

The initial expression of the user's interest is empty. When one searches the web content, the key terms and corresponding ODP classification catalogues are designated artificially. In the personalized searching process, a result searched will be produced after users search a keyword. The model of user's interest will be updated by recording pages opened by users and extracting characteristic.

Every node in the model is regard as the node which one user interests it. Item is the description of the node and Value is the percentage of the node. Whether the page matches with the user's interest and the matching level depend on whether the page matches with the description of the node in the model and the importance for the user [3,4].

Using TF * IDF [5] method and ICTCLAS [6], a Chinese lexical analyzer, developed by the institute of calculation Chinese academy, the characteristic vector of the text are composed of the characteristic word and its percentage extracted in the text page. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed

Oops!

An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.