No evidence to support plagiarism allegations against computer science course
Several students on Fizz, an anonymous campus chat app, claimed that computer science professor Arman Cohan copied slides and homework for his course on Natural Language Processing from a similar course at Stanford. However, the News found no evidence to verify such a claim.
Zoe Berg, Senior Photographer
The News found no evidence to support a student-circulated claim that Professor Arman Cohan copied a Stanford University course’s curriculum for his course on Natural Language Processing.
A post on the anonymous campus chat app “Fizz” claimed that Cohan “copied the entire Stanford CS224N curriculum,” including lectures and homework, without attributing credit. The post received 1,600 upvotes as of Monday, and four follow-up posts from other users also received over 1,000 upvotes.
“No, I did not base my syllabus on Stanford’s cs224n or any other existing courses from other places,” Cohan told the News. “I’d like to emphasize that NLP is a pretty well-established course with several available textbooks, and any overlap in course content with another university stems from the fact that the subject matter we teach is well-established and has a common core of knowledge.”
The News reviewed lecture slides from Stanford’s CS224N course, called Natural Language Processing with Deep Learning, offered in winter 2023 and found no notable similarities between the Stanford materials and Cohan’s. All language and visual material included in Cohan’s slides appeared original.
None of the users who posted plagiarism allegations on Fizz responded to the News’ requests for comment in time for publication.
When approached by the News on Sunday, Professor Christopher Manning, who taught Stanford’s CS224N in Winter 2023, looked at Cohan’s slide material in comparison to the Stanford version.
“Nothing in the lectures particularly seems copied from cs224n,” he wrote. “They seem quite distinct.”
Only one homework assignment has been released so far for Cohan’s course. The assignment contains three parts, the first two of which are distinct from any material in the Stanford course. The third part, which asks students to reimplement a word2vec algorithm — a natural language processing technique published in 2013 that involves obtaining vector representations of words — is similar to the process required in Stanford’s second assignment.
But Manning said he is “not fussed” by any similarities in homework.
“There is some overlap in the assignment, which may be what they’re picking up on,” he wrote. “Part 3 has people implementing word2vec, which is what we have students do in assignment 2.” He explained, however, that both courses are asking students to reimplement the same word2vec algorithm from 2013, so neither was “original.”
Cohan said that the implementation of word2vec is a precursor to understanding neural networks, and is standard for instruction of the subject.
The three suggested readings for Cohan’s class are also listed in the Stanford syllabus as suggested reference text. One of them, “Speech and Language Processing,” is co-written by a Stanford professor. At least two of the three readings are also suggested in the syllabus for natural language processing courses offered at Princeton and Berkeley.
A separate post claimed that Yale’s website for the course was a copy of Stanford’s. Each website includes a homepage and sections for the course schedule and assignments. But both the Princeton and Berkeley course sites included a similar structure.
Another user claimed they had written to their dean about the matter and encouraged other users to do so. Cohan said that the News’s inquiry was the first he had heard of the allegations, and that he was “shocked and surprised” when he heard of the Fizz posts.
Yilun Zhao GRD ’29, a teaching assistant in Cohan’s course, said that he could not identify any significant content overlap between Arman’s course and Stanford’s version after two hours of review.
“While there are some shared topics, such as discussions on neural networks, I believe this overlap is to be expected in NLP courses, which often cover a core curriculum recognized across the field,” Zhao said.
Zhao noted that if anything, Cohan’s slides bear more resemblance to those of former Yale professor Dragomir Radev, who taught the course from 2017 to 2023. But, he said that this “seems reasonable,” as it is common for different faculty members to use similar course material when teaching the same courses in different terms.
Kejian Shi GRD ’24, another teaching assistant for the course, said that Cohan and his teaching assistants spent “days in preparing their first homework assignment,” and that the course was specifically designed for Yale students. He also noted that natural language processing, like Physics or Biology, has a common sequence of introductory topics that are likely to be taught in any course.
Cohan also highlighted some key differences between his course and Stanford’s CS224N’s. His course includes topics in the field such as the impact of pretraining data, details of evaluation, retrieval-based language models, self-alignment and interpretability, none of which are covered in Stanford’s version.
His course also focuses on prior methods of natural language processing, such as Naive Bayes classification methods, whereas the Stanford course only focuses on methods involving neural networks.
“Many courses in our curriculum are standard, introductory offerings, widely available throughout the university landscape,” Jeffrey Brock, the Dean of Yale’s School of Engineering and Applied Science wrote. “Professor Cohan is a recognized world expert in the area of Natural Language Processing and I am confident that he has chosen carefully the materials for this introductory course, which sits squarely within his field of research expertise.”
The Yale Department of Computer Science was founded in 1969.