Re: Corpora: Plagiarism detection

From: Anoop Sarkar (anoop@unagi.cis.upenn.edu)
Date: Mon May 08 2000 - 17:56:31 MET DST

  • Next message: Tom Vanallemeersch: "Re: Corpora: Plagiarism detection"

    > Does anyone know of any current plagiarism detection projects currently
    > going on? I know of Malcolm Coulthard and Copycatch, but are there any other
    > projects? Also, I would like to do some statistical work on plagiarised
    > work, but does anyone know where I can find any data?

    The following reference and also the references cited within might be helpful.

    "Syntactic Clustering of the Web" by A. Z. Broder, S. C. Glassman, M. S.
    Manasse, G. Zweig from Proc of WWW6, available at http://decweb.ethz.ch/WWW6/Te
    chnical/Paper205/Paper205.html

    They use document fingerprinting to cluster syntactically similar documents.
    The same technique has been used to find documents on the web that are similar
    by Nevin Heintze, see http://www.cs.cmu.edu/afs/cs/user/nch/www/koala/main.html

    -Anoop



    This archive was generated by hypermail 2b29 : Wed May 10 2000 - 11:45:36 MET DST