Dear Linguists and Lawyers,
I have got the same "problem" with a large (tagged) monitor corpus of
texts from french written on-line forums :
- these messages are publically available in the sense that everybody
can read and reuse them
- each newsgroup server stores and uses its own copies of them
- search engines use and exploit cached copies of them
- ...
So,
- It is an illegal procedure to store these messages - in an anonymous
way - in a database ?
- It is an illegal procedure to exploit this corpus for research
purposes ? (i.e. to realise linguistic studies and to develop NLP
processing using corpus-based machine learning methods)
- It is an illegal procedure to illustrate scientific articles with
examples from this corpus ?
Do I need to ask permission for each author to store and use its
messages ? What if I mention the source and the author ? What about the
copyrights?
Moreover,
- What if I want to make my corpus publically available for researchers
?
- What if NLP processing developed from this corpus are to be integrated
in commercial products ?
Thank you in advances for your help...
References, pointers and suggestions are welcome, especially for the
legal aspects for France...
Nicolas Torzec
-- Nicolas Torzec PhD Student in NLP processing --delucca@nilc.icmc.usp.br wrote: > > Dear Linguists and Lawyers, > > I am troubled with Legal aspects of corpora compiling. I am in > doubt if is an illegal procedure storage webpages (or part of them) > in a database (see at http://www.dictionarium.com/project.htm), > not available to public, and display its contents as short collocations > less than 100 characters by time by search method. > > On the other hand, the Internet search engines uses cached (temporary ?) > copies of the sites and display a short of the web pages. > > My procedure is wrong? Which the Legal difference? I need ask permission > for each website to storage its pages? If I mention the source and the author > I will be protecting the copyrights? > > > I look forward to hearing from you. > > Yours Sincerely, > > J. L. De Lucca > > ------------------------------------------------- > This mail sent through IMP: http://horde.org/imp/
-- Nicolas TORZEC
ENSSAT / Université de Rennes 1 6, rue de Kerampont 22300 Lannion
Mel : nicolas.torzec@enssat.fr Tel : 02.96.46.27.30 Fax : 02.96.37.01.99 Web : http://www.enssat.fr --
This archive was generated by hypermail 2b29 : Tue Jun 17 2003 - 13:08:07 MET DST