The Socio-Linguistic Variant and What Author Metrics Doesn’t Want to Investigate
Real papers against real ghostwriters
PlagScan conducted an experiment with a student writing corpus collected from a group of students outside the US and UK all belonging to the same classroom. The selected papers were response papers to the same given prompt. In this set of documents, we then introduced a professionally ghostwritten document composed by ghostwriter R with the same prompt as the students. R was also given the same academic articles to read before writing the response paper and the same writing specifications as the other students. We then uploaded the student papers and the document written by R to PlagScan to analyze with Author Metrics.
Putting the theoretical discussion to the test
According to previous theoretical discussion and experimental results in stylometry, genre and topic interference can make a differentiation between authors harder. This is because the shared word and phrase usage that accompanies writing on the same topic, as well as the writing conventions for particular genres, make it difficult to identify a strong authorial signature to be used for author verification. We observed in our experiment that R was a high outlier for practically every measure that Author Metrics looks at. Given that we controlled for topic and genre, we note that what varied in this experiment is the education and socialization background of the professional ghostwriter. When R’s assignment was removed from the set of student documents as a control experiment and Author Metrics was run again, no student submissions flagged red.
The role of ‘Lexical Originality’
To illustrate the system we will look at one measure in Author Metrics. The Lexical Originality compares the number of unique words in a document to other documents which are uploaded to PlagScan and selected for comparison. We expect to see a group of students in the same classroom, given the same educational materials, and writing on the same topic to have low variance with regards to Lexical Originality. Without a ghostwritten document introduced to the set, this is indeed what we observed. With the ghostwritten document included in the experiment, we noticed that R used very different language to express their thoughts. Even though R was given the same resources, the diction is one give away that R is either quite prodigious or not an academic peer with the other students. Of course, this measure alone is not enough, so Author Metrics takes an ensembled look. In doing so, we noted that R’s writing quantitatively looks completely different from the other students, which raised a red flag in Author Metrics and called for further inspection.
Limitations to the Peer Group Similarity Hypothesis
We have performed similar experiments which yielded comparable results and this research at PlagScan supports the Peer Group Similarity Hypothesis. It states that the classroom is a speech and writing community where there exists normal variation among peers in the language behavior observed in their academic writing. We are, however, fully aware that further experiments which account for linguistically diverse education (multicultural classrooms) may refine or refute our hypothesis. For example, there may be cultural or socioeconomic differences that exist within the classroom which influence student performance across various metrics. Scores across linguistic metrics aside, these social differences in the classroom incite new pedagogical approaches which are responsive to recognizing that classrooms are increasingly less homogeneous.
What Author Metrics will not cover
Already, multicultural classrooms challenge pedagogies on how to effectively do anti-plagiarism education. These changes include requiring instructors to be conscious of the various backgrounds at play in the classroom and to incorporate them into the learning experience. No doubt, the multicultural perspective is the future of education and one that PlagScan supports as a safeguard of Academic Integrity and a thought leader in Educational Technology. However, research which moves away from ensuring integrity and improving the learning experience and towards identifying different groups of students in the classroom to reveal their personal characteristics may diverge into Author Profiling, which is something that PlagScan does not plan to do.
We would love to hear your opinion on the Peer Group Similarity Hypothesis! Either comment below or reach out to us through firstname.lastname@example.org
In case you’ve missed part I, find it here: