An Outline of the Survey's ICE Parsing Scheme
Another form of annotation applied to ICE-GB (the British component of the ICE corpus) was parsing, in which our aim was to analyse each of the utterances in the corpus according to their form, or category, and their function, and according to the relationships between their component parts. ICE-GB was initially parsed using the TOSCA automatic parser, which provided one or more analyses to be checked and selected manually. Most utterances were parsed in this way, but for those that could not be parsed by the TOSCA parser, a new parser was created at the Survey of English Usage (cf. Chapter 11). The Survey parser offers one analysis for each utterance, either a complete parse or a partial parse. ICETREE, a manual tree editor, was compiled at the Survey to cater for the analyses produced by the Survey parser. It is used to check and correct parses and to complete partial parses.
Manual tree editing has given us the opportunity to introduce new parsing terms that enable us to complete the analysis of problematic constructions, or of constructions hitherto not catered for. The new scheme also applies to the Survey parser. The Survey parsing scheme is based on the TOSCA system, but differs from it in many respects. What follows is a general overview of the Survey parsing scheme, which includes many of the new terms introduced into the ICE parsing hierarchy.
Each word in the corpus is given a word-class tag to show what category of word it belongs to. Each word also performs a function. Groups of words are categorized, and they also perform functions. Every group of words, from the largest (a whole sentence, say) to the smallest (an individual word) performs a function and is described by a category. The process of parsing is the gradual narrowing of word groupings, and the identification of the function each grouping performs and its category. The result of such analysis is displayed as a labelled tree, as in Fig. 10.1.