Sam,
In case you have not already done this, you might have a look at LDC's
Catalog (http://www.ldc.upenn.edu/Catalog). We have 168 corpora
available at the moment and add about 20 per year. Most of our English
text corpora focus on news since news text is relatively easy to acquire
in large volume and covers a variety of topics. LDC also does data
collection and annotation for specific projects or sponsors provided
that we retain the right to share the data with our research
communities.
Best wishes,
Chris
-- Christopher Cieri Executive Director, Linguistic Data Consortium 3615 Market Street, Philadelphia, PA 19104-2608 USA phone: 215-573-5489, fax: 215-573-2175 mailto:Christopher.Cieri@ldc.upenn.edu http://www.ldc.upenn.eduSam Chiles wrote:
> Hello all I am new to the world of Corpora and have recently been > recruited to locate sources of Corpora for a new library in > development by Microsoft. They are currently licensing English > language text data covering any subject to use for linguistic > software, such as grammar checkers. Could anyone give me a few > pointers toward any type of corpora that could be available for use by > Microsoft? Thank youSam Sam Chiles > E-mail sam.chiles@virgin.net
This archive was generated by hypermail 2b29 : Thu Jul 27 2000 - 19:42:11 MET DST