Re: Corpora: Collaborative effort

From: Patrick Ruch (ruch@dim.hcuge.ch)
Date: Tue Jun 13 2000 - 18:29:56 MET DST

  • Next message: T. Murphy: "Corpora: Children's or Graded Corpora"

    Hi,

    > I don't mean to put a damper on this idea, but we should expect that
    > the agreement rate will be far from 100%. Also, the tolerance of noise
    > will depend on the amount of noise. I did a comparison between the
    > tagging of the Brown files in Semcor and the tagging done by DSO.
    > I found that the agreement rate was 56%. This is exactly the rate of
    > agreement we would find by chance. So the amount of post-processing
    > could be quite a bit of work!

    I am involved in a project where we are tagging medical text using a subset
    of UMLS. Although words are usually less ambiguous in such "narrow"
    domain than in unrestricted texts, we are also facing some agreement
    problems. Could you ship me the some references about the topic ?

    Best,
    Patrick Ruch
    _________________________________________
    Patrick Ruch
    University Hospital of Geneva
    Medical Informatics Division
    CH-1211 Geneva 14
    tel.: (+41 22) 372 61 64
    fax: (+41 22) 372 48 55



    This archive was generated by hypermail 2b29 : Tue Jun 13 2000 - 09:29:52 MET DST