Re: Corpora: Multi-document summarisation data

From: Christopher Cieri (ccieri@ldc.upenn.edu)
Date: Fri May 26 2000 - 01:22:30 MET DST

  • Next message: Derek Walker: "Corpora: Job: Computational Linguist at ISSCO (Geneva)"

    Tassos,

    The corpora we have which are most like what you need are the Topic Detection
    and Tracking corpora. TDT-2 contains tens of thousands of stories from English
    (and Mandarin) broadcast news and radio. LDC annotators defined 100 news
    topics from stories selected at random from the corpus and then annotated each
    of the stories for relevance to each of the 100 topics. The corpora contain
    both the stories and the relevance table. For each topic, LDC also developed
    topic definitions like the one that follows. These are written primarily to
    guide the annotators so I can't say how useful they'll be for your work. For
    more information of the TDT-2 corpus, see:
    http://www.ldc.upenn.edu/Projects/TDT2/

        90. Unwed Fathers' Law
        Seminal Event:
        WHAT: CA adopts law legalizing the use of paternity forms for unwed
    fathers
        WHERE: USA
        WHEN: January 1997
        TOPIC EXPLICATION:
         In 1997, law was passed in California that allows licensed hospitals to
    provide a "declaration of paternity" form
        for the unwed parents of a newborn to sign. This document makes unwed
    fathers legally responsible for the child.
        Signing the declaration form is voluntary. The declaration entitles
    children to the same rights and privileges as
        children born to married parents, and makes unwed fathers easier to track
    down should they become "deadbeat dads"
        (decline responsibility and leave the child and mother without financial
    support) The document provides the child with
        legal access to parental medical records, and the noncustodial parents'
    medical benefits. Several states have adopted
        the use of this document. Stories discussing cases invoking the use of
    this law, the developing partnership with clinics,
        county welfare offices, local vital records offices and courts, and
    related stories discussing the effects this may have
        nationally are on topic. The implementation of this law by other states as
    well as California are on topic if they
        specifically mention the CA law or the "declaration of paternity" form.
        RELATED RULE OF INTERPRETATION # 9
        Related Article: CNN19980626.2130.0558

    Chris

    Tassos Tombros wrote:

    > Hello everybody.
    >
    > I am looking for a document collection for the purposes of multi-document
    > summarisation. What I am looking for is clusters of related documents and
    > a corresponding human-written summary for each of the clusters.
    >
    > Any help would be greatly appreciated.
    >
    > Thanks,
    >
    > Tassos
    >
    > --------------------------------------------------------------------------
    > Tassos Tombros, F082 Tel : +44 (0)141 330 4971
    > Department of Computing Science Fax : +44 (0)141 330 4913
    > University of Glasgow e-mail:tombrosa@dcs.gla.ac.uk
    > Glasgow G12 8RZ, UK http://www.dcs.gla.ac.uk/~tombrosa/

    --
    Christopher Cieri
    Executive Director, Linguistic Data Consortium
    3615 Market Street, Philadelphia, PA 19104-2608 USA
    phone: 215-573-5489, fax: 215-573-2175
    mailto:Christopher.Cieri@ldc.upenn.edu
    http://www.ldc.upenn.edu
    




    This archive was generated by hypermail 2b29 : Fri May 26 2000 - 01:13:29 MET DST