Naive Bayes Classification of Public Health Data with Greedy Feature Selection

By Hickey, Stephanie J. | Communications of the IIMA, April 2013 | Go to article overview

Naive Bayes Classification of Public Health Data with Greedy Feature Selection


Hickey, Stephanie J., Communications of the IIMA


INTRODUCTION

Public health issues feature prominently in popular awareness, political debate, and in data mining literature. Data mining, the work of discovering patterns in data, has the potential to influence public health in a myriad of ways, from personalized, genetic medicine to studies of environmental health and epidemiology, and many applications in between. Classification of new data based on patterns previously observed holds promise for applying specific advances to public health more generally. Classification algorithms that take advantage of Bayes' Theorem and prevalence statistics, dubbed naive Bayes classifiers, aim to accomplish this with readily available data.

For this study, we applied a naive Bayes classifier to a robust public health dataset, with greedy feature selection, with the objective of efficiently identifying that the n attributes which best predict a selected target attribute, without searching the input space exhaustively. For example, is length of hospital stay impacted by insurance type, by region, by type of hospital, or by something else? Do diagnoses and procedures drive outcomes (discharge status) or does something else?

This study may contribute toward applying data mining approaches to public health data, specifically, to predicting attributes that represent a measure of treatment outcome or a proxy for cost, for patients receiving health care services in U.S. hospitals, based on readily accessible patient data.

PUBLIC HEALTH CARE IN THE U.S.

The U.S. health care system has had no shortage of attention recently. According to the World Health Organization, health care spending amounted to $7,146 per capita and 15.2% of the gross domestic product in 2008, the highest of any nation. In its World Health Report 2000, its most recent survey of population health and health systems financing, however, the U.S. ranked 38th. As recently as 2010, 49.9 million residents had neither public nor private insurance to help allay the cost of health care (1). The debate surrounding the Patient Protection and Affordable Care Act and the Health Care and Education Reconciliation Act of 2010, designed to extend insurance options to more residents and curtail further increases in health-care spending, was a major issue in the 2012 elections. Yet, despite the attention, apparent tradeoffs between the costs of health care, both to individuals and institutions, the quality of care received by most patients, and the efficiency of the system as a whole persist.

The recent explosion in data available for analysis is as evident in health care as anywhere else. Private and public insurers, health care providers, particularly hospitals, physician groups and laboratories, and government agencies are able to generate far more digital information than ever before. This data presents an opportunity; clues to the varied challenges faced by the health care system may lie in this data. The insights gained from effectively mining public health data have implications for several types of stakeholders in the current health care system: planning implications for hospital administrators, treatment protocol implications for physician groups, public health implications for legislators, government agencies, and think tanks.

LITERATURE REVIEW

Not surprisingly, a great deal of data mining analysis is being done in the public health domain, particularly predictive data mining in clinical medicine (Bellazzi & Zupan, 2008), and the potential influence of such work is broad and compelling (Kulikowski, 2002). Further, data mining in the public health domain presents unique challenges (Cios & Moore, 2002): heterogeneity of medical data, ethical, legal, and social constraints on use of that data, statistical approaches that address heterogeneity and these constraints, and the special status of medicine as a revered and scrutinized field responsible for life-and-death decisions that may affect all of us. …

The rest of this article is only available to active members of Questia

Already a member? Log in now.

Notes for this article

Add a new note
If you are trying to select text to create highlights or citations, remember that you must now click or tap on the first word, and then click or tap on the last word.
One moment ...
Default project is now your active project.
Project items

Items saved from this article

This article has been saved
Highlights (0)
Some of your highlights are legacy items.

Highlights saved before July 30, 2012 will not be displayed on their respective source pages.

You can easily re-create the highlights by opening the book page or article, selecting the text, and clicking “Highlight.”

Citations (0)
Some of your citations are legacy items.

Any citation created before July 30, 2012 will labeled as a “Cited page.” New citations will be saved as cited passages, pages or articles.

We also added the ability to view new citations from your projects or the book or article where you created them.

Notes (0)
Bookmarks (0)

You have no saved items from this article

Project items include:
  • Saved book/article
  • Highlights
  • Quotes/citations
  • Notes
  • Bookmarks
Notes
Cite this article

Cited article

Style
Citations are available only to our active members.
Buy instant access to cite pages or passages in MLA, APA and Chicago citation styles.

(Einhorn, 1992, p. 25)

(Einhorn 25)

1. Lois J. Einhorn, Abraham Lincoln, the Orator: Penetrating the Lincoln Legend (Westport, CT: Greenwood Press, 1992), 25, http://www.questia.com/read/27419298.

Cited article

Naive Bayes Classification of Public Health Data with Greedy Feature Selection
Settings

Settings

Typeface
Text size Smaller Larger Reset View mode
Search within

Search within this article

Look up

Look up a word

  • Dictionary
  • Thesaurus
Please submit a word or phrase above.
Print this page

Print this page

Why can't I print more than one page at a time?

Help
Full screen

matching results for page

    Questia reader help

    How to highlight and cite specific passages

    1. Click or tap the first word you want to select.
    2. Click or tap the last word you want to select, and you’ll see everything in between get selected.
    3. You’ll then get a menu of options like creating a highlight or a citation from that passage of text.

    OK, got it!

    Cited passage

    Style
    Citations are available only to our active members.
    Buy instant access to cite pages or passages in MLA, APA and Chicago citation styles.

    "Portraying himself as an honest, ordinary person helped Lincoln identify with his audiences." (Einhorn, 1992, p. 25).

    "Portraying himself as an honest, ordinary person helped Lincoln identify with his audiences." (Einhorn 25)

    "Portraying himself as an honest, ordinary person helped Lincoln identify with his audiences."1

    1. Lois J. Einhorn, Abraham Lincoln, the Orator: Penetrating the Lincoln Legend (Westport, CT: Greenwood Press, 1992), 25, http://www.questia.com/read/27419298.

    Cited passage

    Thanks for trying Questia!

    Please continue trying out our research tools, but please note, full functionality is available only to our active members.

    Your work will be lost once you leave this Web page.

    Buy instant access to save your work.

    Already a member? Log in now.

    Oops!

    An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.