Re: Corpora: Question about a Brown Corpus tag

From: Tylman Ule (ule@sfs.nphil.uni-tuebingen.de)
Date: Thu Sep 14 2000 - 16:22:06 MET DST

  • Next message: E S Atwell: "RE: Corpora: Question about a Brown Corpus tag"

    Mark Lewellen wrote:
    >
    > An alternative to underspecification of POS information is to develop a
    > POS tagger that records multiple POS in ambiguous contexts (ideally with
    > probabilities attached to each POS choice). An advantage to this approach
    > is that POS-ambiguity information is not 'hard-coded' in advance by the
    > tag set, but is rather determined by sentence context, and may be extended
    > to other ambiguities (such as N vs. V).
    >
    > Could anyone point out projects that have developed such POS taggers, or
    > submit opinions as to their viability? One difficulty I notice is that a
    > typical tagger using an HMM with the Viterbi algorithm determines a most
    > likely _sequence_ , which would make it difficult to establish proabilities
    > of multiple POS tags for a given word.

    You may either use a tagger also specifying alternative tags deemed to
    be less probable, or combine a number of taggers to come to a similar
    solution via a voting schema, or, of course, do both.

    For information regarding system combination via voting, please see e.g.

    @InProceedings{,
      author = {Hans van Halteren and Jakub Zavrel and Walter
                      Daelemans},
      title = {Improving Data Driven Wordclass Tagging by System
                      Combination},
      year = 1998,
      booktitle = "Proceedings of COLING-ACL '98, August",
      address = "Montreal, Canada",
      publisher = ACL,
      url = {ftp://ilk.kub.nl/pub/papers/coling98.ps.gz},
    }

    I know of at least one tagger providing alternative tags on a given
    search beam, namely Thorsten Brant's tnt tagger
    (http://www.coli.uni-sb.de/~thorsten/tnt).

    And as for the third solution, I am currently investigating that
    approach, and results so far look quite promising.

    Best,
    Tylman

    > Mark Lewellen
    >
    > > > on 17 Aug 2000 Eric S Atwell wrote:
    > > >
    > > > > Some tag definitions in Brown were clearly
    > > > > decided by what TAGGIT found computable;
    > > > > I *guess* linguistic inconsistencies in tagging
    > > > > some words may be down to drawing boundaries on
    > > > > grounds of computational tractability rather than
    > > > > purely linguistic reasons
    > > >
    > > > on 17 Aug 2000 Andrew Harley wrote:
    > > >
    > > > > This explains how so many taggers can claim 95% or higher
    > > success rates!
    > > >
    > > > > I also know taggers that tagged IN as "preposition
    > > > > or conjunction" on the same grounds.
    > > > ------------------------
    > >
    > > This is a reasonable decision, because you cannot resolve this ambiguity
    > > on the grounds of the immediate context (which most taggers use). It is,
    > > thus, better to keep the POS-information underspecified and resolve the
    > > ambiguity, when you are doing the parse. Otherwise, your parser has to
    > > work with unreliable information.
    > >
    > > > So what could be the linguistic reasons that Eric was mentioning? For me
    > > > (with a rather limited linguistic background) the "traditional" criteria
    > > > for POS determination look quite arbitrary or let's say heuristic.
    > > >
    > > > I cannot, for instance, see any advantage of separating "until" in:
    > > > * until tomorrow (preposition)
    > > > * until the morning comes (subordinating conjunction)
    > >
    > > I agree that you can (or even should) also leave this underspecified
    > > until you do a full parse. However, at some point you have to make a
    > > decision, because you have to annotate clauses and you have to annotate
    > > prepositional phrases. Now, the 'until' (when it is a connector) gives
    > > you a good cue where the clause starts.
    > >
    > > > while not separating "and" in:
    > > > * you and me (coordinating conjunction)
    > > > * I go and see (coordinating conjunction)
    > >
    > > As 'and' coordinates constituents of the same kind, you can analyse
    > > sentences like:
    > >
    > > 'I came and see.' as: [CL [NP [N I]] [VP [V came] [CO and] [V see]]
    > > (my ad-hoc annotation ;-))
    > >
    > > The use of 'and' does not affect the 'global' structure of the clause.
    > > However, this is clearly different for 'until' as it introduces a
    > > prepositional phrase in the one case and a clause in the other.
    > >

    --
    Tylman Ule,  Tel. 07071/29-78490, Fax 07071/551335
    	Seminar für Sprachwissenschaft, Universität Tübingen
            Kleine Wilhelmstraße 113, 72074 Tübingen
    



    This archive was generated by hypermail 2b29 : Thu Sep 14 2000 - 16:19:44 MET DST