Corpora: Cross Document Coreference

From: Daniel Winchester (d.winchester@cs.bham.ac.uk)
Date: Wed May 17 2000 - 17:22:12 MET DST

  • Next message: Keith J. Miller: "RE: Corpora: French software for linguistic analysis"

    Dear All,

    I have recently undertaken a NLP PhD with the working title of
    'Cross-Document Coreference' in the computer science department of the
    University of Birmingham. To get to the point, I am using the term
    cross-document coreference to denote multiple, and often variant,
    references to the same entity from different texts. This usage follows
    from the handful of papers from the NLP community that outline systems
    designed to disambiguate such references (e.g.. the work of Breck
    Baldwin and Amit Bagga).

    Thus; in different documents, 'Clinton', 'William Clinton', 'William
    Jefferson Clinton' etc. ,when referring to the president, could all be
    said to 'corefer' but 'Bill Clinton', the new york policeman or
    'Clinton', the town in Arizona would not.

    I am aware that this 'coreference' is profoundly different from that
    found within documents, and that the terminology itself is
    problematic. Coreference within a discourse/text relies on
    relationships that are intended to allow the reader to resolve any
    ambiguity, this is obviously not the case for references in unrelated
    texts to the same entity. Nevertheless, for the time being I will use
    the term cross-document coreference.

    I am hoping for some help on the following:

    1. Are there any corpora available that are marked for cross-document
    coreference?

           I know that this is unlikely but anything where all references in
    the corpus to the same entity are related in some way would be very
    useful.

    2. Does anyone know if this sort of work is being done or has been done
    elsewhere under a different name or in a different discipline?

            It seems the sort of task that Information Retrieval (IR) would
    be interested in, but, to date, I have found no equivalent work.
            I'm basically after any suggestions that people might have for
    where this is already being looked at, for other news groups that I
    should post a query on, or for alternative disciplines and terminology
    that might be relevant.

    Hope that you will be able to help.

    Kind Regards

    Daniel Winchester

    Research Student
    Computer Science Dept
    University of Birmingham



    This archive was generated by hypermail 2b29 : Wed May 17 2000 - 17:21:02 MET DST