RE: Corpora: Part of Speech Tagging<unknown-words>

Christopher Tribble (ctribble@sri.lanka.net)
Fri, 5 Nov 1999 14:25:45 +0530

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Catherine Pilière: "Corpora: CFP ESSLLI-2000 Student Session"
Previous message: Priscilla Rasmussen: "Corpora: ANLP-NAACL2000 Submission Notification Form Now Available"

re Vasuprada Kandrakota's request for sources of free corpus data

If you subscribe to the UK's Guardian International you can build a good
nespaper corpus for free by registering for the email edition. You'll be
sent the full text of the following sections each week:
international-news, us-news, uk-news, features, culture, and sport. With
around 50,000 words an issue you will soon accumulate a useful set of texts
(already blocked into quite useful thematic groups).

Bestest

Chris Tribble

--
		Dr Christopher Tribble
Sri Lanka	21 Wijerama Mawatha, Colombo 7
		TEL  +94 75 332 309
UK		122, Queen Alexandra Mansions, Judd Street
		London WC1 H 9DQ
		TEL +44 171 833 4271
UK Mailing	c/o FCO (Sri Lanka)
		The British Council, Sri Lanka
		King Charles Street, London SW1A 2AH
E-mail		ctribble@sri.lanka.net
Home Page	http://ourworld.compuserve.com/homepages/Christopher_Tribble

> -----Original Message-----
> From: owner-corpora@lists.uib.no [mailto:owner-corpora@lists.uib.no]On
> Behalf Of VASUPRADA KANDRAKONTA(98MCMT04)
> Sent: Friday, November 05, 1999 8:00 PM
> To: corpus list
> Subject: Corpora: Part of Speech Tagging<unknown-words>
>
>
> Hi everybody,
> I'm doing a project in POS tagging.For this I'm using the statistical
> methods. I've built a Hidden Markov Model using the SUSANNE corpus and am
> using the Viterbi Algorithm to find out the best tag sequence.But I have a
> problem of sparse data. Can anyone tell me what should be done with the
> unknown words<words not found in the corpus>. One method is to use the
> features like word endings and capital letter starting. But what about the
> state transition matrix.
> If anyone knows any literature on the net about this, please let me know.
>
> I'm in a plan to upgrade my system,using a corpus of larger size.The
> corpus I'm using right now is of size 1,30,000words. Can anyone tell me
> where I can get a downloadable corpus(free of cost).
>
> Thankyou,
> Vasuprada Kandrakota
> Dept. of Computer Science,
> University of Hyderabad,
> Hyderabad-INDIA 500 046
>
>
>
>

Next message: Catherine Pilière: "Corpora: CFP ESSLLI-2000 Student Session"
Previous message: Priscilla Rasmussen: "Corpora: ANLP-NAACL2000 Submission Notification Form Now Available"