Updated Quotation Detection


Today we want to introduce an update to one of PlagScan’s key features:

The automated detection of quoted text passages in user documents.

It is important to examine quotations differently from the rest of the text. Quotations enable the writer to use other people’s words and ideas without plagiarizing. When writers quote passages correctly, they can draw conclusions and create something new, based on the existing work. With PlagScan, we want to support this important skill in students, scientists, and writers.

PlagScan uses color codes to let users know whether a match is a quotation or potential plagiarism. Our software highlights quotations in green and potential plagiarism in red color. Quotations that are not found in any sources are not highlighted. PlagScan also provides a link to the source of the passage, which helps the user to determine if the matched quotation should be considered as plagiarism or as a correct citation.

To develop an algorithm for automatic identification of quotes one needs to know how different languages indicate quotes. In German, a standard quotation is marked with an opening („ or ‚) and closing (“ or ‘) quotation mark. French text usually contains angular brackets used as quotation marks: («…») or (‹…›). Spanish writers put double angular brackets and inverted commas (“…”) around quotations. There are single (‘…’) and double (“…”) quotation marks in English documents.

Further variations of quotation marks create a challenging task to identify quotes automatically. Quotation marks such as (“…”) and (‘…’) look similar at the beginning and at the end of a quote and (‘…’) is also commonly used as apostrophe.

A language detection mechanism was recently deployed to recognize all sorts of quotation mark variations across different languages. It also identifies correctly the start and end positions of a quote according to the language. However, after monitoring the performance of the system, we noticed that many documents contain text passages in more than one language. Furthermore, many writers are not consistent in maintaining their citation style. We decided to improve our algorithm and develop language independent heuristics to detect quotations.

Our algorithm processes every symbol in the text that might be a quotation mark. We analyze punctuation signs and other characters that appear at the beginning or end of a section and might indicate a possible quotation mark. According to certain characteristics, the algorithm assigns a probability if a character could belong to a citation or not. It also determines whether the character could be a starting or a closing quotation mark. Next, the software searches the best matching pair of quotation marks and re-evaluates the probability. At the end, the results contain detected quotations the user can be sure of.

PlagScan is able to detect quotes within a text written in multiple languages, containing various quotation marks, or even erroneous or mistyped quotation characters. The software differentiates between a quotation mark and an apostrophe, and takes other punctuation signs into account to locate sentence segments. The algorithm works reliably and error-free. It will never miss any quotation characters and finds the best matching pair of quotation marks. For instance, if there is an opening quotation mark at the beginning of the text, but no closing one, the algorithm will not output the whole text as a quote. We have tested the algorithm in various languages for our users from all over the world to benefit from this feature.

You can test our new quotation detection system yourself – log into your PlagScan account and upload a document with citations!

Have fun using PlagScan and stay tuned for following updates.

Your PlagScan Developer Team


Leave a comment

Your email address will not be published. Required fields are marked *

6 thoughts on “Updated Quotation Detection

    • Cornelia Jurka Post author

      Hi irfan,

      good question! I got a simple answer:

      Blue color shows where text possibly has been altered.

      Imagine someone taking a whole line of text from Wikipedia.
      The whole line is shown red then. If someone replaced single words (e.g. to hide plagiarism), they would show up in blue inbetween the red texts.

      Hope that makes it clearer. If you got further questions, don’t hesitate to contact us.

  • Mark

    How do Plagscan manages those quotations where the writer does not use quotation marks but instead put the keywords like according to or says?

    example:
    The end justifies the means according to Machiavelli.
    or
    Einstein says that Imagination is more important than knowledge.

    • Cornelia Jurka Post author

      Dear Mark,

      Thanks for asking. In your settings click on ‘Show advanced settings’. There you can exclude complete documents (e.g. with famous phrases by Einstein) from the plagiarism analysis with the Whitelist – URL + option. Read more about this under ‘Feature 4’ in this blogpost.
      Otherwise, after the analysis is complete, you can mark false positives as quotations, reducing the PlagLevel.

      Let me know if you have further questions!

    • Cornelia Jurka Post author

      Dear Priya,

      Red means PlagScan has found an exact match in another source.
      Blue indicates any rewritten text (for example, within a red mark).
      Green means the text has been quoted.

      In a report, this information is also briefly displayed in a legend. You will find them under the list of matches.

      Hope that helps!