Updated Quotation Detection

Today we want to introduce an update to one of PlagScan’s key features:

The automated detection of quoted text passages in user documents.

It is important to examine quotations differently from the rest of the text. Quotations enable the writer to use other people’s words and ideas without plagiarizing. When writers quote passages correctly, they can draw conclusions and create something new, based on the existing work. With PlagScan, we want to support this important skill in students, scientists, and writers.

PlagScan uses color codes to let users know whether a match is a quotation or potential plagiarism. Our software highlights quotations in green and potential plagiarism in red color. Quotations that are not found in any sources are not highlighted. PlagScan also provides a link to the source of the passage, which helps the user to determine if the matched quotation should be considered as plagiarism or as a correct citation.

To develop an algorithm for automatic identification of quotes one needs to know how different languages indicate quotes. In German, a standard quotation is marked with an opening („ or ‚) and closing (“ or ‘) quotation mark. French text usually contains angular brackets used as quotation marks: («…») or (‹…›). Spanish writers put double angular brackets and inverted commas (“…”) around quotations. There are single (‘…’) and double (“…”) quotation marks in English documents.

Further variations of quotation marks create a challenging task to identify quotes automatically. Quotation marks such as (“…”) and (‘…’) look similar at the beginning and at the end of a quote and (‘…’) is also commonly used as apostrophe.

A language detection mechanism was recently deployed to recognize all sorts of quotation mark variations across different languages. It also identifies correctly the start and end positions of a quote according to the language. However, after monitoring the performance of the system, we noticed that many documents contain text passages in more than one language. Furthermore, many writers are not consistent in maintaining their citation style. We decided to improve our algorithm and develop language independent heuristics to detect quotations.

Our algorithm processes every symbol in the text that might be a quotation mark. We analyze punctuation signs and other characters that appear at the beginning or end of a section and might indicate a possible quotation mark. According to certain characteristics, the algorithm assigns a probability if a character could belong to a citation or not. It also determines whether the character could be a starting or a closing quotation mark. Next, the software searches the best matching pair of quotation marks and re-evaluates the probability. At the end, the results contain detected quotations the user can be sure of.

PlagScan is able to detect quotes within a text written in multiple languages, containing various quotation marks, or even erroneous or mistyped quotation characters. The software differentiates between a quotation mark and an apostrophe, and takes other punctuation signs into account to locate sentence segments. The algorithm works reliably and error-free. It will never miss any quotation characters and finds the best matching pair of quotation marks. For instance, if there is an opening quotation mark at the beginning of the text, but no closing one, the algorithm will not output the whole text as a quote. We have tested the algorithm in various languages for our users from all over the world to benefit from this feature.

You can test our new quotation detection system yourself – log into your PlagScan account and upload a document with citations!

Have fun using PlagScan and stay tuned for following updates.

Your PlagScan Developer Team

