Re: Corpora: Collaborative effort

From: COMP staff (csrluk@comp.polyu.edu.hk)
Date: Tue Jun 13 2000 - 03:43:27 MET DST

Next message: Bob Krovetz: "Re: Corpora: Collaborative effort"

Previous message: Bob Krovetz: "Re: Corpora: Collaborative effort"
Maybe in reply to: Jem Clear: "Corpora: Collaborative effort"
Next in thread: Wible: "Re: Corpora: Collaborative effort"
Next in thread: Bob Krovetz: "Re: Corpora: Collaborative effort"
Reply: Wible: "Re: Corpora: Collaborative effort"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

> Jeremy Clear wrote:
>
> >... That's the crucial thing -- you spend no significant
> >time agonizing over the task; you just quickly pick some concordance
> >lines and send them in. Sure, not everyone will agree 100% that the
> >lines you've picked exactly match the sense I posted (first because
> >the sense I posted was just an arbitrary definition taken from one
> >dictionary which is clearly inadequate to define and delimit precisely
> >a semantic range; and second, because no-one is going to validate or
>
> Philip Resnik wrote:
>
> >I agree -- especially since tolerance of noise is necessary even when
> >working with purportedly "quality controlled" data. And one can
> >always post-process to clean things up if quality becomes an issue
>
> Krovetz
>
> I don't mean to put a damper on this idea, but we should expect that
> the agreement rate will be far from 100%. Also, the tolerance of noise
> will depend on the amount of noise. I did a comparison between the
> tagging of the Brown files in Semcor and the tagging done by DSO.
> I found that the agreement rate was 56%. This is exactly the rate of
> agreement we would find by chance. So the amount of post-processing
> could be quite a bit of work!

Consider that one has 6 sense tags and the other also has 6 sense tags for the same
word in a sentence, assuming that they use the same set of sense tags
(although not likely). The likelihood that the two tagging
algorithms agreed by chance (independently) is 6 x 1/6 x 1/6. So, the
above seems to be true if there are 2 sense tags for the word:

2 x 1/2 x 1/2.

Is this correct?

For information, we did some work in measuring the agreement of sense
tagging between HUMAN, which is about 80% for both recall and precision
(or 0.8 x 0.8 = 0.64 ~ 0.56). However, this is for Chinese over a small
sample.

Best,

Robert Luk

Next message: Bob Krovetz: "Re: Corpora: Collaborative effort"
Previous message: Bob Krovetz: "Re: Corpora: Collaborative effort"
Maybe in reply to: Jem Clear: "Corpora: Collaborative effort"
Next in thread: Wible: "Re: Corpora: Collaborative effort"
Next in thread: Bob Krovetz: "Re: Corpora: Collaborative effort"
Reply: Wible: "Re: Corpora: Collaborative effort"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Tue Jun 13 2000 - 03:42:30 MET DST