Exploring State-of-the-Art Software for Forensic Authorship Identification

Article excerpt

ABSTRACT

Back in the 1990s Malcolm Coulthard announced the beginnings of an emerging discipline, forensic linguistics, resulting from the interface of language, crime and the law. Today the courts are more than ever calling on language experts to help in certain types of cases, such as authorship identification, plagiarism, legal interpreting and translation, statement analysis, and voice identification. The application of new technologies to the analysis of questioned texts has greatly facilitated the work of the language scientist as expert witness in the legal setting, and contributed to the successful analysis and interpretation of style providing statistical and measurable data. This article aims at presenting linguists and researchers in forensic linguistics with an exploration of the strengths, limitations and challenges of state-of-the-art software for forensic authorship identification.

KEYWORDS: Forensic linguistics, language, crime, law, software for forensic authorship identification.

I. THE AIM OF THIS DISCUSSION

Over the last decade it has become evident that linguists can be of service to the law, and courts, especially in Common Law countries, are calling on language experts more and more to help in certain types of cases, such as authorship identification, voice identification, plagiarism, legal interpreting and translation, statement analysis, etc.

Undoubtedly, the application of new technologies to the analysis of questioned texts has significantly facilitated the work of the language scientist as expert witness in the legal setting, by enhancing the scientific reliability of descriptive linguistic analysis with measurable data, and reducing the time-consuming task involved in the observation, description, analysis, and counting of the data.

The aim of this discussion is to present language experts and researchers with an exploration of the strengths, limitations, and challenges of state-of-the-art software for forensic authorship identification. For the purpose of analysis, this article will be divided into two main parts:

Part one will be devoted to forensic linguistics as an up-and-coming discipline within the field of applied linguistics. Our discussion in this first part will provide the reader with essential background information to understand forensic language researchers' recent, healthy interest in new techniques and methods that may help the language expert explain linguistic findings in statistical terms, and be consistent with the current scientific reliability standard that is demanded for linguistic evidence by the judiciary, especially in Common Law countries.

Part one will be further divided into three sections. Firstly, we will offer the reader a brief overview of authorship identification and the birth of forensic linguistics. Secondly, we will look at stylistic analysis as an approach to forensic authorship identification. And lastly, we will consider the problems faced by the language scientist as expert witness in the legal setting, after the Federal Rules of Evidence in the USA providing the new standard for admitting expert scientific testimony in a federal trial came into force (Daubert v. Merrell Dow Pharmaceuticals 92-102, 509 U.S., 579, 1993). A consideration of a major challenge to forensic linguistics, as seen in latest developments of the discipline, will bring part one to an end.

Part two will concentrate on new advances in software for quantitative data analysis used in forensic authorship identification by examining a selected sample of state-of-the-art tools.

Finally, the concluding remarks section will bring together the most relevant conclusions as to the role played by software for quantitative analysis in forensic authorship identification, and suggestions for further development will be given as to the main challenges in this field.

PART ONE

II. FORENSIC AUTHORSHIP IDENTIFICATION AND THE BIRTH OF FORENSIC LINGUISTICS

The emergence of forensic linguistics as a discipline is closely related to two prominent cases of disputed authorship in police statements in the UK. …