Jean-Gabriel Ganascia - Academic Year 2011-2012

UPMCp.jpg

Plagiarism Detection

Bibliography

Here is a small bibliography on plagiarism detection. This bibliography can be used both for the IREC evaluation and for the second year master internship. For more details, contact Jean-Gabriel.Ganascia@lip6.fr.

Internship Subject

This Internship Subject can be chosen either by Erasmus Mundus DMKM master students or by IAD master students.

pastiche.jpg

Today, there exists many plagiarism detection software (e.g. Ephorus, Plagiarism Scanner, Safe Assign, Turnitin (The most famous one)), Urkund, noplagiat.com, Compilatio.net, Pomtotron, plagiarismdetect.com, Plagiarism Detector, Eve2, CopyCatch, Plagium, Plagiarism Checker, See Sources, Copyscape, Plagiserve, etc. There even exist an annual International Competition on plagiarism detection where those different softwares are compared and evaluated.
Most of those tools have been developed for ethical purpose, that is to detect textual plagiarisms seen as crimes because they are considered as an fraudulent appropriation of others' goods. Those tools are based on different techniques (e.g. fingerprinting, substring matching, bags of words analysis, stylometry etc.). However, the plagiarism detection can also be used for literary analysis, to detect reciprocal influence of writers. It can also be used to understand the notion of style and, more specifically, of style imitation, which is characteristic of writers' exercises. As an example, the notion of pastiche of which famous writers like Marcel Proust were trained, can be viewed as an extreme case of plagiarism.
We are working in collaboration with the Sorbonne to apply those techniques in order to detect the reused texts in the Balzac's masterpiece, "La comédie humaine". Some of these reused texts are written by Balzac himself, some others come from the contemporary literature and journals. The Balzac's texts and some of the contemporary textual writings will be digitalized by the Sorbonne team. It appears that the size of the corpus is important, but considerably lower than than the size of texts available on the web. The goal of this internship is to adapt the above mentioned plagiarism detection techniques to detect literary reused texts in classics, and more specifically, in Balzac "La comédie humaine".

The work of students for the internship will be divided in three steps:
1- Bibliographic Study: it is to discuss the relevancy of the different plagiarism detection techniques to the detection of textual reused in literary masterpiece
2- Design of a Specific Approach: the second step of this study will consist in designing and implementing a specific approach to the above mentioned problem. This approach will be based, in a first step, on substring detection. Then, based on a linguistic model of plagiarism, a second step will attempt to detect text reused by Balzac.

Remark: there might be a scholarship for at least one student on this topic

Practical Details

Plagiarism (last edited 2011-11-08 06:09:58 by GustaveGanascia)