Using the Google Search Engine to Detect Word-for-Word Plagiarism in Master's Theses: A Preliminary Study

Article excerpt

The effectiveness and efficiency of the Google Search engine for detecting potential occurrences of word-for-word plagiarism in master's theses was investigated. 210 electronic master's theses from a sample of 260 completed in 2003 were examined. Undocumented phrases from each thesis were searched against the World Wide Web using the Google search engine. Exact phrases from each thesis were searched for 10 minutes. Matches--or potential occurrences of plagiarism were found in 27.14% of the theses searched. Matches were found on or before the first numbered page in 16 of the 57 theses containing suspect passages. The average time for finding a match was 3.8 minutes. The results show that the Google search engine can be used to effectively and efficiently detect potential occurrences of plagiarism in some master's theses. The method described in the study could be used by theses advisors and other faculty as an alternative to anti-plagiarism software packages. Further investigation is needed to determine whether Google's effectiveness is consistent across varied academic disciplines. Comparative studies of Google and anti-plagiarism software and services are needed as well.

**********

The purpose of this research was to explore Google's potential for detecting occurrences of word-for-word (1) plagiarism in master's theses. The authors sought answers to these questions:

1. Is Google an effective tool for detecting plagiarism in master's theses?

2. Is Google an efficient tool for detecting plagiarism in master's theses?

The first question relates to the nature of graduate research and the types of resources on the World Wide Web. Graduate level research in most academic disciplines requires extensive use of professional journals and monographs. Some of these materials are not available in electronic formats; those that are distributed electronically are often subscription-based and not freely available on the World Wide Web. Hence, it was unknown whether Google searches would retrieve sources of plagiarized material in Master's theses (this research was conducted prior to the release of Google Scholar in November 2004). The second question stems from the authors' interest in determining whether Google might provide a relatively fast 10 minutes or less--mechanism for thesis advisors interested in checking suspect passages of a thesis draft.

The process of using search engines and periodical databases to detect plagiarism in student papers has been described by others (e.g. Ryan, 2000; Lathrop & Foss, 2000; Marshall, 1998). However, most published material on this topic is anecdotal and focuses on plagiarism in high school and undergraduate student papers. A literature search produced no studies on the effectiveness of Google or other search engines for detecting plagiarism in master's theses.

Some universities are investing in anti-plagiarism software and services such as Turn-It-In to combat academic dishonesty. Plagiarism detection services typically require student papers to be submitted to professors in electronic format. Professors then submit the papers to the software company which runs the paper against its own database of online resources. The professor then receives reports from the company detailing which papers appear to contain plagiarism. While plagiarism detection software and services offer many benefits, they are not free. Moreover, some institutions are reluctant to use plagiarism detection software and services due to concerns about students' intellectual property and privacy rights--particularly since some companies add the content of submitted papers to their database. This practice raises concerns, even though companies such as Turn-It-In pledge to protect the content of submitted papers and do not make it available to customers (http://www.turnitin.com/static/legal/legal_document.html).

The consequences of plagiarism for students and institutions, the increased availability of graduate theses, and the need for alternatives to commercial plagiarism detection software prompted this investigation. …