G'Day,
Lou Burnard <lou.burnard@computing-services.oxford.ac.uk> writes:
> Can anyone point me to any annotated language corpora which are freely
> available under something like the GNU Public Licence? All the ones I
> have thought of so far seem to be available only under some kind of
> complicated licensing scheme which precludes (e.g) commercial
> exploitation, unrestricted copying, etc. And cost money.
OPUS <http://logos.uio.no/opus/> sounds ideal. It includes many
European (and even non-European) texts, is freely available (GPL or
similar licenses) and even POS tagged and marked up in XML.
>
> I'd like to have a corpus of a reasonable size (1 million+ words) in any
> European language (tho English or French are preferable) with some
> kind of word-level annotation, which I can hack about, use in teaching,
> and put on a freely-distributable CD, without worrying about copyright
> lawyers. There *must* be some somewhere!
It is already distributed on the Knorpora CD
<http://sslmit.unibo.it/%7ebaroni/welcome_to_knorpora.html>, a
modified version of the Knoppix 3.3 Live CD for students of
corpus-based computational linguistics.
> It doesn't even have to be in XML -- though it will be when I've
> finished with it.
-- Francis Bond <www.kecl.ntt.co.jp/icl/mtg/members/bond/> NTT Communication Science Laboratories | Machine Translation Research Group
This archive was generated by hypermail 2b29 : Mon Jan 24 2005 - 16:49:22 MET