Linda,
LDC's Topic Detection and Tracking (TDT) corpora categorize tens of
thousands of newswire, radio and television stories according to the
news topics they discuss. The TDT-PILOT and TDT-2 corpora have already
been released. The catalog pages are, respectively:
http://www.ldc.upenn.edu/Catalog/LDC98T25.html
http://www.ldc.upenn.edu/Catalog/LDC99T37.html
The TDT-3 corpus will be released in 2000.
Note, however, that topic is defined more narrowly in TDT than in the
examples you gave. Rather than offer bandwidth consuming details here, I
give a simple example below and encourage interested readers to visit
the projects' WWW pages at:
http://www.ldc.upenn.edu/Projects/TDT
Example TDT topic
***************
83. World AIDS Conference
Seminal Event:
WHAT: 12th World AIDS Conference
WHERE: Geneva, Switzerland
WHEN: 28 June 1998
TOPIC EXPLICATION:
The 12th World AIDS Conference opened in Geneva, and was attended by
international speakers concerned with the
continuing spread of the AIDS epidemic. Stories on topic may cover
reports on panel discussions, preparations made for the
conference, concluding proposals, suggestions and possible actions
towards international legislation to address the continuing
spread of the virus. Reports that are solely on medical advancements
in the fight against aids that bear no linkage to the
conference are not on topic.
RELATED RULE OF INTERPRETATION # 11
Related Article: NYT19980628.0108 More examples: Yes , Brief .
Best wishes,
Chris
-- Christopher Cieri Executive Director, Linguistic Data Consortium 3615 Market Street, Philadelphia, PA 19104-2608 USA phone: 215-573-5489, fax: 215-573-2175 mailto:Christopher.Cieri@ldc.upenn.edu http://www.ldc.upenn.edu
This archive was generated by hypermail 2b29 : Tue May 02 2000 - 20:15:24 MET DST