Re: Corpora: Plagiarism detection

From: Anoop Sarkar (anoop@unagi.cis.upenn.edu)
Date: Mon May 08 2000 - 17:56:31 MET DST

Next message: Tom Vanallemeersch: "Re: Corpora: Plagiarism detection"

Previous message: mark butler: "Corpora: JOB: Postdoctoral fellow in text analysis or multi-agent systems"
In reply to: Paul Clough: "Corpora: Plagiarism detection"
Next in thread: Tom Vanallemeersch: "Re: Corpora: Plagiarism detection"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

> Does anyone know of any current plagiarism detection projects currently
> going on? I know of Malcolm Coulthard and Copycatch, but are there any other
> projects? Also, I would like to do some statistical work on plagiarised
> work, but does anyone know where I can find any data?

The following reference and also the references cited within might be helpful.

"Syntactic Clustering of the Web" by A. Z. Broder, S. C. Glassman, M. S.
Manasse, G. Zweig from Proc of WWW6, available at http://decweb.ethz.ch/WWW6/Te
chnical/Paper205/Paper205.html

They use document fingerprinting to cluster syntactically similar documents.
The same technique has been used to find documents on the web that are similar
by Nevin Heintze, see http://www.cs.cmu.edu/afs/cs/user/nch/www/koala/main.html

-Anoop

Next message: Tom Vanallemeersch: "Re: Corpora: Plagiarism detection"
Previous message: mark butler: "Corpora: JOB: Postdoctoral fellow in text analysis or multi-agent systems"
In reply to: Paul Clough: "Corpora: Plagiarism detection"
Next in thread: Tom Vanallemeersch: "Re: Corpora: Plagiarism detection"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Wed May 10 2000 - 11:45:36 MET DST