ChatGPT, and AI Detection Tools: the Challenge of False Positives

February 5, 2024

Anna E. Bullock, Susan Stone and Kristina Supler

AI tools such as Chat GPT offer an enticing promise to ease many challenges of the writing process. Depending on which tools one uses, AI can be used both to create drafts and to automate and smooth the revision process. AI can be used to check spelling, as a thesaurus, and to summarize, clarify, and simplify complex concepts. However, many schools do not allow for AI to be used as a writing tool at all. Students presented with the new world of AI tools face the challenge of navigating the many potential benefits of AI, while complying with their institutions’ various AI policies.

Students are tasked with adhering to each of their professors’ unique syllabi, student codes of conduct, and academic integrity policies governing the use of AI tools in different contexts. Some policies outright ban all AI tools, while others place restrictions on the proper and improper contexts in which AI may be used. Plagiarism detectors, such as Turnitin, which have been around for over a decade, are increasingly equipped with so-called “AI detectors” as part of their screening capabilities. AI detectors have been increasingly added to the process of automated review for student assignments prior to grading. These “detectors” claim to analyze a piece of writing and spit out a percentage indicating how much of the piece is AI-generated versus human-generated text, but how accurate are they?

Turnitin, one of the more common tools used in academic contexts for plagiarism detection, has recently acknowledged the potential for false positives from its AI writing detection tool. This revelation has stirred concerns about the reliability of Turnitin’s and other AI detectors’ algorithms, prompting educational institutions to reconsider their reliance on this technology in subjecting students to academic integrity and plagiarism allegations. In addition to concerns about AI detection tools’ reliability, there has been recent research indicating that students for whom English is a second language are flagged for AI plagiarism at higher rates than native English speakers. These issues have informed higher education institutions’ AI policies and processes to address potential student conduct violations in connection with AI tools.

AI Detection Tools and “False Positives”

Turnitin has acknowledged the potential for false positives in connection with its AI writing detection tool, but has not divulged precisely how the tool operates. However, two key aspects of AI detection have been revealed by Anne Chechitelli, Turnitin’s Chief Product Officer. Chechitelli wrote that AI detection tools are more reliable when presented with longer written submissions. Turnitin recently increased its minimum word count for AI detection from 150 to 300 words. There are higher chances of obtaining a false positive when the amount of AI-generated language is indicated to comprise less than 20% of any given submission. If AI is used sparingly, contributing less than 20% of a document, there is a greater likelihood of Turnitin generating a false positive and implicating a student in a potential academic integrity investigation.

In response to the challenges imposed by false positives and potential bias in AI detection software, some educational institutions have chosen to disable Turnitin’s AI detection tool since the start of the 2023 academic year. For example, Vanderbilt University has decided to “disable Turnitin’s AI detection tool for the foreseeable future[,]” citing questions about how the tool works and the potential for bias against students for whom English is a second language. Issues of bias are particularly common within AI technologies, which rely on amassing a bank of human-generated information to work. Where the samples used to create an AI tool include mostly material from one group, in this case, native English speakers, non-native English writing may be flagged due to coincidental differences in sentence structure and word choice. The University of Pittsburgh has taken a similar stance. Other universities, including Michigan State University, Northwestern University, and the University of Texas, have also disabled Turnitin’s AI detection tool.

Institutions that Continue to Use AI Detection Are Encouraged to Rely on More than One Indication of AI Misuse

The Center for Teaching Excellence at the University of Kansas has provided guidance on the careful use of AI detectors in combination with an instructor’s knowledge of the student’s writing. AI detectors, according to the guidance, is a “tool [that] provides information, not an indictment” of students’ work. This policy acknowledges the common pitfalls of AI detection tools. The remaining guidance provides students and educators with a more human approach to AI detection, and encourages a process to work with students to determine whether a student has inappropriately used AI in submitted work:

Make comparisons. Does the flagged work have a difference in style, tone, spelling, flow, complexity, development of argument, and use of sources and citations than students’ previous work? We often detect potential plagiarism by making those sorts of comparisons. AI-created work raises suspicion for the same reason.
Try another tool. Submit the work to another AI detector and see whether you get similar results. That won’t provide absolute proof, especially if the detectors are trained on the same language model. It will provide additional information, though.
Talk with the student. Students don’t see the scores from the AI detection tool, so meet with them about the work you are questioning and show them the Turnitin data. Explain that the detector suggests use of AI software to create the written work and point out the flagged elements in the writing. Make sure the student understands why that is a problem. If the work is substantially different from the student’s previous work, point out the key differences.
Offer a second chance. The use of AI and AI detectors is so new that instructors should consider giving students a chance to redo the work. If you suspect the original was created with AI, you might offer the resubmission at no penalty or for a reduced grade. If it seems absolutely clear that the student did submit AI-generated text and did no original work, give the assignment a zero or a substantial reduction in grade.
If all else fails … If you are convinced a student has misused artificial intelligence and has refused to change their behavior, you can file an academic misconduct report. Remember, though, that the data you are basing this on has many flaws. You are far better to err on the side of caution than to devote lots of time and emotional energy on an academic misconduct claim that may not hold up.

The guidance acknowledges the potential that the academic misconduct claim may nonetheless “not hold up,” even where an instructor is convinced that a student has inappropriately relied upon AI.

Avoid Academic Integrity Investigations Using Insight into AI Detection Methods

It is difficult to find sources that describe how AI tools detect AI-generated content and distinguish it from human writing. Jack Caulfield of Scribbr has explained that most AI detection models assess perplexity, a measurement of the “unpredictability” of a sequence of language in text. Lower perplexity is considered evidence of AI generation because AI tends to make the most “obvious” or most common language choices as compared with human-produced writing. Burstiness, the variation in sentence structure and length, is another factor. A particularly low “burstiness” level indicates that a text is likely to be AI-generated. AI models tend to produce less varied sentence length and structure when compared with typical writing.

Both brevity and high-level summary included within a student’s submission may lead to improper flags by AI detection tools. Students should be careful to be specific and clear in their writing to help avoid any suspicion of improper AI use. Students should also be mindful of submission length, being careful to add details when summarizing larger works.

The AI Debate Continues

The disclosure of higher false positives AI writing detection tools has prompted institutions to reassess their reliance on this new technology. With concerns about the accuracy of AI detection tools, educators and institutions must strike a balance between plagiarism prevention and avoiding false accusations. As technology evolves, the debate around the role of AI in academia will continue to rage. KJK attorneys are here for students as they face academic integrity allegations, including allegations of AI misuse and plagiarism. KJK has successfully defended students who were suspected of inappropriate reliance on AI in connection with their work.

For more information, or to discuss further, please contact Student & Athlete Defense attorneys Susan Stone (SCS@kjk.com; 216.736.7220), Kristina Supler (KWS@kjk.com; 216.736.7217), or Anna Bullock (AEB@kjk.com; 216.736.7223).

ChatGPT, and AI Detection Tools: the Challenge of False Positives

AI Detection Tools and “False Positives”

Institutions that Continue to Use AI Detection Are Encouraged to Rely on More than One Indication of AI Misuse

Avoid Academic Integrity Investigations Using Insight into AI Detection Methods

The AI Debate Continues

CATEGORIES

Recent Posts