Re: Corpora: Collaborative effort

From: COMP staff (csrluk@comp.polyu.edu.hk)
Date: Tue Jun 13 2000 - 04:47:51 MET DST

  • Next message: Geoffrey Williams: "Re: Corpora: Collaborative effort"

    > From krovetz@research.nj.nec.com Tue Jun 13 10:03:46 2000
    > Date: Mon, 12 Jun 2000 21:59:37 -0400
    > From: Bob Krovetz <krovetz@research.nj.nec.com>
    > To: corpora@hd.uib.no
    > Subject: Re: Corpora: Collaborative effort
    >
    > Robert Luk wrote:
    >
    > >Consider that one has 6 sense tags and the other also has 6 sense tags for the same
    > >word in a sentence, assuming that they use the same set of sense tags
    > >(although not likely). The likelihood that the two tagging
    > >algorithms agreed by chance (independently) is 6 x 1/6 x 1/6. So, the
    > >above seems to be true if there are 2 sense tags for the word:
    > >
    > > 2 x 1/2 x 1/2.
    > >
    > >Is this correct?
    >
    > In the case of Semcor and DSO, the sense inventory was the same (WordNet).
    > The rate of agreement I mentioned was the agreement we would get by
    > tagging all instances with the most frequent sense for the word in the corpus.

    I was referring to "by chance". I just wondered how did you arrive at 0.56 for agreement by chance
    between tags assigned by 2 different tagging algorithms?

    > I don't see why you say it is not likely that they will use the same set of
    > sense tags.

    They are the same then.

    > How can we make meaningful comparisions between word-sense
    > tagging systems without using the same word sense inventory? That was
    > the purpose of the SENSEVAL competition.

    I agreed if the sense tags have completely different meaning. However, the
    differences in meaning between tags may be in shades of meaning rather than the crisp
    decision that they are or not same. We can still "compare"
    them if we think of senses with gradation - by comparing the contexts of the word usage or
    make assignment of one tag to the other by human. It is like merging 2 dictionaries.

    There may be some algorithms that work "WITH" a particular tag set and these algorithms
    may not work well in another tag set. Consider a system that uses a set of handcrafted
    rules for tag assignment or for the improvement of tag assignment. If these rules use the
    tag information to decide on tagging of other words, then we cannot abstract them out automatically
    to work with another set.

    Best,

    Robert Luk



    This archive was generated by hypermail 2b29 : Tue Jun 13 2000 - 04:46:50 MET DST