Academic journal article Journal of Digital Information Management

A Recommendation Model Based on Site Semantics and Usage Mining

Academic journal article Journal of Digital Information Management

A Recommendation Model Based on Site Semantics and Usage Mining

Article excerpt

1. Introduction

Millions of people access the web daily for various reasons: find information, perform financial transactions, communicate with others, etc. Due to the explosive growth of the online data and the diversity of goals that may be pursued over the web, it is not surprising that web traffic has gained a high monetary value over the last years. To tap into this accelerating market, web site operators strive to improve the usability and user retention of their sites, by customizing the latter to the needs of their users. Web site customization is the process of modifying the information or services provided by a web site so as to meet the user needs.

Adjusting the content or structure of web data to specific needs has been an active field of research for several years. Some operators attempt to improve their sites based on the analysis of the web usage data. Most of these efforts [5], [26], [27] focus on extracting useful patterns and rules, using data mining techniques, in order to understand the users' navigational behavior so that decisions concerning site restructuring may then be made by humans. However, usage-based site customization can be problematic either when there is not enough data in order to extract patterns related to certain categories, or when the site contents change and new pages are added that are not yet included in the web log [20]. To overcome such difficulties, researchers have proposed the exploitation of information about the content [11], [21] and/or the structure [9] of web sites. In particular, they propose to combine site usage and content knowledge in order to dynamically modify the web sites. Mining web logs to discover knowledge about the user interests has also been addressed in the context of recommendation engines [12] [29].

The commonality in most of the existing site customization approaches is that they attempt to model the user interest as a set of topics weighted by their degree of preference. Although this method is successful for building general user profiles, nevertheless it is inadequate for deciphering the specific user goals in their site visits. To make the distinction between user interests and interaction goals clear, consider the following situation. User A, an engineer, is interested in programming and visits a number of sites to find information about her subject of interest. In each of her site visits though, the user intents to obtain different types of information such as download sample source code, read a document for an in-depth background in programming languages, etc. Evidently, if we were to provide that user with customized site views we would not only need to recommend her pages about programming, but also suggest pages that contain the exact type of information about programming that satisfies the user's goal.

In this paper, we extend previous works on site customization and we introduce a novel recommendation model that combines in new ways the sites' usage patterns and semantics so as to derive knowledge about both the users' site interests and interaction goals. Our model explores a built-in subject hierarchy for the semantic annotation of the sites' content as well as for the identification of the user interests in their site navigations. Moreover, it relies on the association between the sites' usage and structural data in order to detect the user goals that correspond to particular interests and builds recommendations that aim at providing users with customized site views. The contribution of our work lies in the following.

* We introduce a novel approach for the automatic identification of the user interests and goals in their site visits as these are exemplified in the user's navigational patterns. For computing the user interests in site visits our approach relies on a subject hierarchy and employs a number of heuristics for estimating both short-term and long-term interests. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed

Oops!

An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.