Discovering Interesting Association Rules in the Web Log Usage Data

By Dimitrijevic, Maja; Bosnjak, Zita | Interdisciplinary Journal of Information, Knowledge and Management, Annual 2010 | Go to article overview

Discovering Interesting Association Rules in the Web Log Usage Data


Dimitrijevic, Maja, Bosnjak, Zita, Interdisciplinary Journal of Information, Knowledge and Management


Introduction

Due to the immense volume of Internet usage and web browsing in recent years, log files generated by web servers contain enormous amounts of web usage data that is potentially valuable for understanding the behaviour of website visitors. This knowledge can be applied in various ways, such as enhancing the effectiveness of websites through user personalization or developing directed web marketing campaigns (Anand, Mulvenna & Chavielier, 2004; Cooley, Mobasher, & Srivastava, 1997).

Data mining methods, which, by definition, are suitable for automatic extraction of potentially interesting information from very large databases, are used to extract knowledge from the web usage log files. One of the popular data mining methods that has been used for this purpose is association rule finding (Kosala & Blockeel, 2000).

Originally, association rule mining algorithms were applied for the analysis of transactional databases (Agrawal, Imielinski, & Swami, 1993).

An association rule is defined as follows:

Let I = i{[i.sub.l]..., [i.sub.n]} be a set of items, and T = {[t.sub.l]...,[t.sub.m]} a set of transactions, where each transaction [t.sub.i] consists of a subset of items in I. An association rule is then an implication of the form:

X [right arrow] Y, X [member of] I, Y [member of] I, X [intersection] Y = [empty set]

An item set X has support s in T if s% of the transactions in T contains X.

An item set X is frequent if its support is higher than the user specified minimum support.

The rule X [right arrow] Y holds in T with confidence c if c%o of transactions in T that contain Xalso contain Y.

The problem of mining association rules is to generate all association rules that consist of frequent item sets and the confidence greater than the user-specified minimum confidence.

While association rule finding algorithms are complete in that they find all rules that satisfy defined constraints, they often result in a large set of rules that is difficult to exploit and find those rules that are truly interesting to the user. Various methods have been proposed to help deal with this issue.

For example, a query language called "Mine Rule", originally developed for querying inductive databases, can be applied to mining the set of generated association rules (Meo, Luca Lanzi, Matera, Careggi, & Esposito 2004). Furthermore, various methods have been proposed to prune the set of generated rules and discard irrelevant rules (Jaroszewicz & Simovici, 2002; Liu, Hsu, & Ma, 1999). Another area of research focuses on finding various association rule 'interestingness measures', which help find the rules that give maximally useful information to the user in the set of generated association rules (Tan, Kumar, & Srivastava, 2004). Some of the proposed association rule interestingness measures are all-confidence (Omiecinski, 2003), collective strength (Aggarwal & Yu, 1998), conviction and lift (Brin, Motwani, Ullman, & Tsur, 1997).

When applying association rule mining to web usage data, a web resource of a particular website is usually considered an item, while a website visitor session is considered a transaction of items. Here, a website visitor session is a set of web resources that a visitor requested during one event of browsing the website (Anand et al., 2004).

Although various interestingness measures and rule pruning methods have been applied to association rule mining of web usage data, extracting useful information from the set of generated association rules remains a difficult task (Geng & Hamilton, 2006; Huang, 2007).

Web usage data is specific and differs from the market basket data in the sense that it contains a large number of tightly correlated items (web resources or web pages) due to the link structure of a website. Web pages that are tightly linked together often occur in the same transaction, which is why the generated set of association rules contains a high number of so-called "hard" association rules that have very high confidence, but are not truly interesting to the user. …

The rest of this article is only available to active members of Questia

Already a member? Log in now.

Notes for this article

Add a new note
If you are trying to select text to create highlights or citations, remember that you must now click or tap on the first word, and then click or tap on the last word.
One moment ...
Default project is now your active project.
Project items

Items saved from this article

This article has been saved
Highlights (0)
Some of your highlights are legacy items.

Highlights saved before July 30, 2012 will not be displayed on their respective source pages.

You can easily re-create the highlights by opening the book page or article, selecting the text, and clicking “Highlight.”

Citations (0)
Some of your citations are legacy items.

Any citation created before July 30, 2012 will labeled as a “Cited page.” New citations will be saved as cited passages, pages or articles.

We also added the ability to view new citations from your projects or the book or article where you created them.

Notes (0)
Bookmarks (0)

You have no saved items from this article

Project items include:
  • Saved book/article
  • Highlights
  • Quotes/citations
  • Notes
  • Bookmarks
Notes
Cite this article

Cited article

Style
Citations are available only to our active members.
Buy instant access to cite pages or passages in MLA, APA and Chicago citation styles.

(Einhorn, 1992, p. 25)

(Einhorn 25)

1. Lois J. Einhorn, Abraham Lincoln, the Orator: Penetrating the Lincoln Legend (Westport, CT: Greenwood Press, 1992), 25, http://www.questia.com/read/27419298.

Cited article

Discovering Interesting Association Rules in the Web Log Usage Data
Settings

Settings

Typeface
Text size Smaller Larger Reset View mode
Search within

Search within this article

Look up

Look up a word

  • Dictionary
  • Thesaurus
Please submit a word or phrase above.
Print this page

Print this page

Why can't I print more than one page at a time?

Help
Full screen

matching results for page

    Questia reader help

    How to highlight and cite specific passages

    1. Click or tap the first word you want to select.
    2. Click or tap the last word you want to select, and you’ll see everything in between get selected.
    3. You’ll then get a menu of options like creating a highlight or a citation from that passage of text.

    OK, got it!

    Cited passage

    Style
    Citations are available only to our active members.
    Buy instant access to cite pages or passages in MLA, APA and Chicago citation styles.

    "Portraying himself as an honest, ordinary person helped Lincoln identify with his audiences." (Einhorn, 1992, p. 25).

    "Portraying himself as an honest, ordinary person helped Lincoln identify with his audiences." (Einhorn 25)

    "Portraying himself as an honest, ordinary person helped Lincoln identify with his audiences."1

    1. Lois J. Einhorn, Abraham Lincoln, the Orator: Penetrating the Lincoln Legend (Westport, CT: Greenwood Press, 1992), 25, http://www.questia.com/read/27419298.

    Cited passage

    Thanks for trying Questia!

    Please continue trying out our research tools, but please note, full functionality is available only to our active members.

    Your work will be lost once you leave this Web page.

    Buy instant access to save your work.

    Already a member? Log in now.

    Oops!

    An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.