The Survey Parser: Design and Development
ALEX CHENGYU FANG
Automatic parsing aims at the decomposition of a sentence into its syntactic constituent structures, so that the relations between words and groups of words are clarified. It is a process deemed essential as a help in understanding the sentential meaning. It is also the first step towards a reversed process whereby a natural language sentence can be automatically constructed according to the specification of abstract semantic meanings. The best example to demonstrate the application of parsing is multi-lingual machine translation. A sentence in Language A is first of all parsed to help to arrive at its semantics, which are then formalized and finally represented as a corresponding sentence in Language B. Success in parsing will represent a major breakthrough in natural language processing. Nearly all the major universities around the world host research teams working on different approaches to parsing. Britain alone boasts research teams in this area at Cambridge, Edinburgh, Leeds, Nottingham, Sheffield, Sussex, and York.
Despite efforts in the past 50 years or so, however, 'the state of the art in parsing general English by computer is but primitive' ( Blacket al., 1993: 2). In 1990-2, three experiments were carried out on eleven rule-based parsers, which subsequently produced a success rate of only 33 per cent on naturally occurring sentences (cf. Blacket al., 1993). The increasingly popular stochastic approach ( Fujisaki, 1984; Garside and Leech, 1985; Atwell, 1988; Briscoe and Carroll, 1991; Fujisakiet al., 1991; Magerman, 1994), despite its advantage over the rule-based approach, suffers from incorrect analyses, especially in the attachment of constituent structures (cf. Briscoe and Carroll, 1991). SPATTER, a probabilistic parser, achieved a 78 per cent crossing-brackets score, 1 and yet only about 35 per cent of the parses exactly matched the human annotations for those sentences ( Magerman, 1994: v). Some systems try to remedy these parsing problems through man-machine interactions, but this usually proves too costly. The TOSCA Parser developed at the University of Nijmegen, Holland, for instance, requires considerable manual pre-editing of the input text in order to reduce parsing times and ambiguities.