Re: Corpora: web-search

From: Ken Litkowski (ken@clres.com)
Date: Fri May 05 2000 - 20:45:17 MET DST

Next message: Bill Mann: "Re: Corpora: Measuring Text Reuse"

Previous message: Ken Litkowski: "Re: Corpora: web-search"
In reply to: Lou Burnard: "Re: Corpora: web-search"
Next in thread: James L. Fidelholtz: "Re: Corpora: web-search"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

I would hope that this tool may be useful to lexicographers as you have
configured it. Might I suggest, in addition to the suggestions already
made, that an output option include a format like that used in Senseval,
since there are many in the computational linguistics community who have
used that format for word-sense disambiguation studies.

The format would be a line with an identifier and then up to three
sentences of the source text, with the last sentence containing the
bracketed target word. It wouldn't be crucial to be all-inclusive. Use
a simple sentence-splitter and see if you can generate a set of
sentences. If not, just discard the particular corpus instance. This
would provide great training data.

Ken

-- 
Ken Litkowski                     TEL.: 301-482-0237
CL Research                       EMAIL: ken@clres.com
9208 Gue Road
Damascus, MD 20872-1025 USA       Home Page: http://www.clres.com

Next message: Bill Mann: "Re: Corpora: Measuring Text Reuse"
Previous message: Ken Litkowski: "Re: Corpora: web-search"
In reply to: Lou Burnard: "Re: Corpora: web-search"
Next in thread: James L. Fidelholtz: "Re: Corpora: web-search"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Fri May 05 2000 - 21:53:09 MET DST