Re: Corpora: Question about a Brown Corpus tag

From: E S Atwell (eric@comp.leeds.ac.uk)
Date: Thu Aug 17 2000 - 14:31:40 MET DST

  • Next message: Arne Fitschen: "Corpora: Corpus Lexicography: institutions, projects, tools"

    David,
    I'd like to help BUT this is my last day before leaving for conference and
    vacation, so dont have time to investigate in detail, but...

    I was involved in LOB Corpus tagging projectin 1981-3, we started from
    Brown corpus, which had been originally tagged using TAGGIT program and
    then manually proofread and corrected. I don't think we had access to a
    proofreader's guide defining the Brown tagset, we just had the "corpus
    eviudence" of which tags actually appeared with which words. Tagging of
    WH-tags was not as clear-cut as eg sing v plural nouns, and we decided to
    change some boundaries/definitions in the new LOB tagset. Some tag
    definitions in Brown were clearly decided by what TAGGIT found computable;
    I *guess* linguistic inconsistencies in tagging some words may be down to
    drawing boundaries on grounds of computational tractability rather than
    purely linguistic reasons (or, to be more fair, when two or more
    conflicting linguistic criteria were available (eg form v function),
    computational tractability was a deciding factor)

    We have tried taking some other text samples (teenager conversations, BBC
    radio broadcasts, software manuals), re-tagging these with Brown tagset
    (and several othger tagsets as well), and getting these proofread by
    experts in the original tagset. See
    http://www.scs.leeds.ac.uk/amalgam/amalgam/corpus/tagged_prf.html
    for links to each of the samples tagged in 8 different tagsets.
    A description of the Brown tagset, in terms of which tags actually appear
    with which words, is given in
    http://www.scs.leeds.ac.uk/amalgam/tagsets/brown.html

    I note that the Brown corpus training set included:

    WPS WH-pronoun, nominative
    that who whoever whosoever what whatsoever

    WDT WH-determiner
    which what whatever whichever whichever-the-hell
     
    furthermore, "which" does not appear with any other tag than WDT,
    "who" appears with WPS, WPO
    "that" appears with WPS, CS, DT

    It appears that the designers of the Brown tagset decided not to try to
    distinguish between determiner and pronoun functions of "which", I guess
    because the type of English constraint grammar rules used in TAGGIT would
    not have been able to correctly disambiguate between these 2 tags in
    sufficient cases.

    So, if you want to be consistent with Brown, you simply tag ALL cases of
    "which" as WDT, even when introducing relative clauses.

    Eric Atwell.

    PS if you want to compare your Brown-tagged corpus with another, feel free
    to re-use our multi-tagged corpus!

    -- 
    Eric Atwell, Distributed Multimedia Systems MSc Tutor & SOCRATES Tutor
    School of Computing, University of Leeds, LEEDS LS2 9JT
    TEL: (44)113-2335430  FAX: (44)113-2335468
    WWW: http://www.comp.leeds.ac.uk/eric  EMAIL: eric@comp.leeds.ac.uk
    

    On Wed, 16 Aug 2000, David Campbell wrote:

    > This is a pretty specific question about POS tagging in the Brown Corpus: > In the sentences: > > Which/WDT child broke the glass? > I do not know which/WDT way to go. > > 'which' is acting as a determiner and takes the 'Wh' determiner tag WDT. > Simmilarly in the sentences: > > That/DT child broke the glass. > I want to go that/DT way. > > the word 'that' is acting as a determiner and tagged DT. However, both > 'which' and 'that' along with 'who' commonly introduce relative clauses > (other words, or no word at all, can do this too, but this occurs less > frequently) such as in the sentences. > > The child who/WPS broke the glass is in the the corner. > The map that/WPS has the red cover will help. > The book which/WDT is on the table is mine. > > Here's my problem. 'Who' and 'That' are tagged by Brown as 'Wh' pronouns > (WPS) when introducing relative clauses, but 'which' retains it's > determiner tag WDT. I am at a loss as to why. I've looked at > documentation for the tag sets but found nothing to explain this. The > original Penn Treebank had the 'Wh' determiner tagged WDT for all > instances of 'which' and 'what' as well as instances of 'that' such as > above. But this was changed and now 'that' is tagged as a determiner in > one sense and a pronoun in another. > > Can anyone offer a reasonable explanation for this? I'm currently tagging > my own corpus and would like to compare it to some text which has > previously been marked up with the Brown set. Therefore, I'd like my > tagging to be consistant with what's been done previously. But this case > really bugs me and I was hoping someone might have some insight on why > things are tagged the way they are here. > > Thanks > David A. Campbell > > To make a prairie > > To make a prairie it takes a clover > and one bee,-- > One clover, and a bee, > And revery. > The revery alone will do > If bees are few. > EMILY DICKINSON > > >



    This archive was generated by hypermail 2b29 : Thu Aug 17 2000 - 14:34:33 MET DST