6 Principles in post-editing

Many of the errors in automatic tagging were found in areas where a grammarian has difficulties in drawing a borderline, e.g. between participles and adjectives for -ed forms; between nouns, adjectives, and participles for -ing forms; between conjunction and preposition for as; conjunction and adverb for so; etc. The main difficulty in post-editing was to achieve a reasonable degree of consistency in such cases. Some general principles in post-editing have been:

  1. to keep a tag assigned by the automatic tagging programs unless there are good reasons against it (the 'follow-the-tagger principle');
  2. in cases where a change is necessary, to use classification criteria which can be applied as simply and consistently as possible (the 'consistency principle');
  3. in cases of doubt, to give each word its most 'normal' tag, e.g. NNS rather than NN for means (the 'normalcy principle').

While an attempt has been made to find a classification which is linguistically justifiable, this has not always been possible. For one thing, this would have meant tackling grammatical problems which are still awaiting a solution. A particular problem has been that we have chosen to draw a borderline and assign a single tag for each occurrence of a word, though we know that gradience and fuzzy borderlines are characteristic of language (cf Johansson 1985). In the sections below we shall draw attention to some problematic areas.8