I would hope that this tool may be useful to lexicographers as you have
configured it. Might I suggest, in addition to the suggestions already
made, that an output option include a format like that used in Senseval,
since there are many in the computational linguistics community who have
used that format for word-sense disambiguation studies.
The format would be a line with an identifier and then up to three
sentences of the source text, with the last sentence containing the
bracketed target word. It wouldn't be crucial to be all-inclusive. Use
a simple sentence-splitter and see if you can generate a set of
sentences. If not, just discard the particular corpus instance. This
would provide great training data.
Ken
-- Ken Litkowski TEL.: 301-482-0237 CL Research EMAIL: ken@clres.com 9208 Gue Road Damascus, MD 20872-1025 USA Home Page: http://www.clres.com
This archive was generated by hypermail 2b29 : Fri May 05 2000 - 21:53:09 MET DST