Stylometry – Based on our previous blog post The Haunting Presence of Ghostwriting in Academia, we wanted to highlight existing technical methods helping to identify ghostwriting. Even though a comprehensively viable solution is still under construction, we can take a look at a couple of scientific approaches and software solutions.
As mentioned in the previous blog post, we at PlagScan despise ghostwriting in academia, which has high chances to create inattentive students, who earn a grade for money. If too many students rely on ghostwriting services, the quality of academic degrees would reduce significantly and values such as critical thinking, individual efforts and own performance would vanish.
A manual approach to detect ghostwriting is to look out for forensic linguistics and evaluate whether the author’s text suits his or her previous work. Depending on the circumstances, this can either be easy or very difficult. Experts can do amazing work there, currently being unmatched by software performance. In particular, this expertise is in demand at legal cases (e.g. see http://www.forensiclinguistics.net/consult.html).
The general theoretical approach is called stylometry, or authorship attribution. In a technical scenario a specific algorithm analyzes the writing style of a person and determines whether he or she actually wrote the text. Different factors in style play a role in determining the authorship. For instance, students in an academic high class standing probably express themselves in a more extensive language than a high school student. Beyond the education level, changes in style can point to different age, gender, locations, etc. There are now two ways this can help in identifying ghostwriting.
- You have multiple documents of one person and check their homogeneity. If they all carry the same style traces, it is very likely, that they are all from one person.
- You only have one document from a person, but you have a profile of that person (age, gender, locations, etc.). E.g. if a high schooler submits a paper that includes words or lengthy sentences of senior college student level, the algorithm would generate an alert, based on deviant stylometry from his profile.
Software like JGAAP (Java Graphical Authorship Attribution Program), AICBT or Signature, provide options to compare the stylometry of two texts, which can be helpful, but not yet enough in a more extensive investigation of authorship.
Stylometry analysis would be a groundbreaking add-on for any plagiarism detection software. However, implementing such a technology is difficult, and always requires very specific data input. For scenario two above, a gold standard for a specific profile could help to get some idea about the suitability of a style to a profile. In the optimal case of scenario one, however, the software would need to create a “style standard” for each individual person, which then generates an alert, as soon as a person’s text differs from his or her previous style.
Applying this hurdle to the example of a university: Collecting at least two work samples from each student would be a lot of effort, without a guarantee that students actually submitted their own work. An option would be to require a few in-class writing assignments. That has to be weaved into the curriculum or may require the sacrifice of a significant amount of class time.
Even though the implementation of a ghostwriting detection algorithm is very unlikely for the near future, we will keep observing current technologies and work on stylometry ourselves to upgrade PlagScan as soon as science took the necessary leap so possible solutions will become feasible.