Re: Corpora: Collaborative effort

From: Wible (dwible@mail.tku.edu.tw)
Date: Tue Jun 13 2000 - 11:08:21 MET DST

  • Next message: James Cussens: "Corpora: job suitable for statistical computational linguist"

    ----- Original Message -----
    From: Robert Luk (COMP staff) <csrluk@comp.polyu.edu.hk>
    To: <krovetz@research.nj.nec.com>
    Cc: <corpora@hd.uib.no>
    Sent: Tuesday, June 13, 2000 9:43 AM
    Subject: Re: Corpora: Collaborative effort

    First, I thought this project was described not as a sense tagging project
    but something like the reverse. You give me a sense and I (one of the
    collaborators) offer some sentences that illustrate that sense. It sounds
    like concerns expressed below about agreement rates among taggers are
    relevant to the latter but I don't quite see their relevance to the
    collaborative work suggested for this project. Maybe I'm missing something.

    Even so, I have a thought about inter-tagger agreement when it comes to
    sense tagging. I'm new to semantic tagging so please forgive me if my
    thoughts are either old news or misguided.

    Let's say there are 19 senses for the verb 'run'. It seems to me misleading
    to calculate inter-tagger agreement by assuming that any instance where two
    taggers each select a different sense from these 19 constitutes an absence
    of agreement between these two taggers. These 19 or whatever senses are
    certainly not discrete, autonomous senses as independent from one another as
    pearls on a string. For example, it may be that I can't even detect a
    difference between, say, sense 7 and sense 8 or feel that the distinction is
    a matter of splitting hairs (this is how I feel about certain sense
    distinctions in WorNet for example). In cases like these, the fact that
    human tagger A chooses sense 7 and tagger B chooses sense 8 for a particular
    token of 'run' is a very unimportant case of inter-tagger disagreement
    compared to a case where A opts for sense 7 and B for sense 13, where 13 is
    a clearly different from 7.

    I don't know what sort of factors were taken into consideration in
    calculating agreement rates in the cases mentioned below where figures
    approach chance. Certainly if agreement calculations ignore such semantic
    clusterings among senses, we have to wonder about the value of such figures.
    I realize that taggers also may not agree on the clusterings themselves, but
    that is a different issue. To the extent that we reject the pearls on a
    string view of senses, to that extent we must admit that rater agreement
    calculations which are based on the pearls view fall short of what they are
    intended to measure. Does anyone know if there has been any empirical
    research done to uncover such clusterings of senses by asking raters to
    judge similarity of senses?

    Best,

    David Wible

    >
    > > Jeremy Clear wrote:
    > >
    > > >... That's the crucial thing -- you spend no significant
    > > >time agonizing over the task; you just quickly pick some concordance
    > > >lines and send them in. Sure, not everyone will agree 100% that the
    > > >lines you've picked exactly match the sense I posted (first because
    > > >the sense I posted was just an arbitrary definition taken from one
    > > >dictionary which is clearly inadequate to define and delimit precisely
    > > >a semantic range; and second, because no-one is going to validate or
    > >
    > > Philip Resnik wrote:
    > >
    > > >I agree -- especially since tolerance of noise is necessary even when
    > > >working with purportedly "quality controlled" data. And one can
    > > >always post-process to clean things up if quality becomes an issue
    > >
    > > Krovetz
    > >
    > > I don't mean to put a damper on this idea, but we should expect that
    > > the agreement rate will be far from 100%. Also, the tolerance of noise
    > > will depend on the amount of noise. I did a comparison between the
    > > tagging of the Brown files in Semcor and the tagging done by DSO.
    > > I found that the agreement rate was 56%. This is exactly the rate of
    > > agreement we would find by chance. So the amount of post-processing
    > > could be quite a bit of work!
    >
    > Consider that one has 6 sense tags and the other also has 6 sense tags for
    the same
    > word in a sentence, assuming that they use the same set of sense tags
    > (although not likely). The likelihood that the two tagging
    > algorithms agreed by chance (independently) is 6 x 1/6 x 1/6. So, the
    > above seems to be true if there are 2 sense tags for the word:
    >
    > 2 x 1/2 x 1/2.
    >
    > Is this correct?
    >
    > For information, we did some work in measuring the agreement of sense
    > tagging between HUMAN, which is about 80% for both recall and precision
    > (or 0.8 x 0.8 = 0.64 ~ 0.56). However, this is for Chinese over a small
    > sample.
    >
    > Best,
    >
    > Robert Luk
    >



    This archive was generated by hypermail 2b29 : Tue Jun 13 2000 - 11:07:26 MET DST