RE: Corpora: Question about a Brown Corpus tag

From: Mark Lewellen (lewellen@erols.com)
Date: Fri Sep 15 2000 - 18:21:17 MET DST

  • Next message: Yvonne Cederholm: "Corpora: Chair in NLP at Göteborg University"

    In response to:
    > An alternative to underspecification of POS information is to develop a
    > POS tagger that records multiple POS in ambiguous contexts (ideally with
    > probabilities attached to each POS choice)....
    > Could anyone point out projects that have developed such POS taggers, or
    > submit opinions as to their viability?

    Miles Osborne wrote:

    > Check out:
    >
    > http://www.cs.brown.edu/people/ec/papers/tagforpar.ps
    >
    > from the abstract:
    > >
    > We consider what tagging models are most appropriate as front ends for
    > probabilistic context-free-grammar parsers. In particular we ask if using
    > a tagger that returns more than one tag, a ``multiple tagger,'' improves
    > parsing performance. Our conclusion is somewhat surprising: single tag
    > Markov-model taggers are quite adequate for the task. First of all,
    > parsing accuracy, as measured by the correct assignment of parts of speech
    > to words, does not increase significantly when parsers select the tags
    > themselves. In addition, the work required to parse a sentence goes up
    > with increasing tag ambiguity, though not as much as one might expect.
    > Thus, for the moment, single taggers are the best taggers.
    > >

    I downloaded this article, which argues that a parser should _not_ make use
    of
    probabilities from a tagger that returns multiple tags with their
    probabilities.
    This is counter-intuitive to me; however, here is a summary of the argument:
    (apologies for generalizing symbols to forms suitable for e-mail)

    1) We want to maximize: p( parse_tree | word_string ).
    2) For a context-free grammar, 1) is equivalent to maximizing the product of
    the
         probabilities of the rules used in the parse (i.e., max product
    p(rules) ).
    3) Since we are maximizing p( parse_tree | word_string ), the rules have
    words as
         their terminal symbols, so some of the rules are 'lexical rules'.
    4) The probability of a lexical rule p( tag->word ) is p( word | tag ).
    5) The 'multiple' tagger results in p( tag | word ). This is not the
    information
        p( word | tag ) that we require. Using p( tag | word ) here is
    analagous to
        the problem of using p( tag | word ) instead of p( word | tag ) in some
    early
        HMM taggers.

    While I fully understand the logic of this argument, it however is desirable
    to
    exploit the information that a 'multiple' tagger provides. Perhaps Baye's
    rule
    could be applied, so that we could use ( p( word) x p( tag | word ) ) / p(
    tag )
    instead of p( word | tag ).

    Are there any agreements/disagreements with the above argument, or any other
    comments on the application of 'multiple' PoS taggers as front ends to
    parsers?
    Thanks-

    Mark Lewellen



    This archive was generated by hypermail 2b29 : Fri Sep 15 2000 - 18:19:33 MET DST