Magazine article European English Messenger

Quality Measurements of Error Annotation-Ensuring Validity through Reliability

Magazine article European English Messenger

Quality Measurements of Error Annotation-Ensuring Validity through Reliability

Article excerpt

Major obstacles to achieving high levels of reliability (and by extension validity) of error annotation of learner corpora range from defining errors in general, the lack of an error taxonomy sufficiently applicable in corpus annotation, insufficiencies of any set linguistic norm as background for tagging, to the lack of well-defined measurements of quality of annotation. The paper first looks at the theoretical issues behind the definition of an error. It expands the discussion by focusing on a more practically applicable account of errors aimed at error annotation. It goes on to offer a more robust error taxonomy which could help address issues of interpretability inherent in linguistic categorization and could ensure more consistency. In the end, the paper suggests an alternative definition of an error applicable for corpus annotation, based on inter-annotator agreement and aimed at being the primary indicator of validity.

1. Introduction and background

Error annotation of learner corpora (1) is problematic when it comes to defining errors in practical terms, in terms of the error classification chosen as the background for a tagset, in terms of the annotators' training to use a set linguistic norm, and in terms of assessment of the quality (or 'correctness') of annotation. Assessing the quality of annotation is essential for gauging the validity of the linguistic information we wish to extract from the corpus. Since validity cannot be assessed directly, due to the lack of 'ground truth' in human (linguistic) judgment, the only thing we can assess is the reliability of annotation as indicating validity of the tags assigned (Plaban, Pabitra and Anupam 2000). The reliability is mirrored in consistency of annotation (Brants 2000), done both by each individual annotator (intra-annotator agreement) and by more annotators when compared to each other (inter-annotator agreement). High levels of consistency signify high quality of data processing. It is hence important to ensure high levels of consistency of tagging, which would lead to high reliability of annotation and hence signify valid information in the corpus.

Starting with the issue of accounting for errors, there is a general consensus that errors constitute failures in language competence (Corder 1971; Lennon 1991; Lengo 1995). From the point of view of a practical application in any form of error analysis, as in corpus annotation, such a take on errors is not really useful. Firstly, following such a theoretical premise in practice would lead us to a conclusion (reached already by other authors, such as for example James (1998:79)) that errors cannot be produced by 'native' speakers. Only mistakes could be possible. (2) Secondly, language competence is in broad terms accessible only indirectly through language performance and needs to be observed accordingly. In terms of error annotation (and applied linguistics in general), it is clear we need a more practice-oriented definition of an error. In other words, we need something more tangible that annotators can actually hold on to while tagging.

A common approach to solving this problem is to define errors using a set linguistic norm as a background. The opaque (and often unjustly negatively connoted) term 'error' is then (much more accurately) referred to as a 'non-norm adequate form'. In practice, the linguistic norm can be tied to a standard variety of a language, its grammatical description, dictionaries, and in case of learner corpus annotation, to the training of annotators. This is a more applicable take on errors--there is a common linguistic background set for all annotators. It can clearly indicate to annotators what the acceptable, norm-adequate performance is, at least up to a certain hierarchical level of language description. It makes their determining what is not adequate simpler and ideally produces more agreement between their categorizations.

2. Error annotation in practice

The next stumbling block on the road towards higher consistency of learner corpus annotation is the choice of a taxonomy of non-norm adequate features to be used as a tagset. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed


An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.