Time to Evaluate

Are Course Evals Due for a Rethink?
img049
Photo by Elise Wilcox.

“Our current system is an embarrassment and must be improved.”

These were the words of Charles Bailyn, the then-chairman of Yale College’s Teaching and Learning Committee, as they were written in a November 2002 Yale Daily News column. The embarrassing “system” to which Bailyn referred was Yale College’s student course evaluation program, and the News column was his position paper in support of reform.

In 2002 and prior, on the last meeting of each class, a professor would distribute paper surveys — “Yale College Course Improvement Forms,” they were called — asking students for their opinions of the course. Invariably, though, the majority of the students’ responses would be hurried and haphazard — scribbles and snap judgments made by college kids eager to finish their semester. It was all so bad, Bailyn said, that an “otherwise routine” accreditation visit had ended in “pointed” criticism of the student course evaluation program.

In the face of the evaluations’ criticism and ineffectiveness, in the 1997–’98 school year Bailyn’s committee launched a search for alternatives. And by 2002, they had found a solution: an online course evaluation program that would ask students to review their class experiences before they could access their semester grade. The system would give students more time to reflect and assess. Response rate to the evaluations would increase, testing showed, as would the quality of the written comments. As an added bonus, with easier distribution, students would be able to review course evaluations from prior years before finalizing their schedules.

At a faculty meeting in November 2002, the online course evaluation plan passed with unanimous approval by a faculty optimistic and confident about its returns. Evaluation response rates reached 86 percent in the first semester, and Bailyn was back in the News, lauding its early success: “I’m very pleased that students took this as seriously as they did,” he said.

 

We’re now over 10 years removed from that faculty meeting, and though we are still using that same online course evaluation system, professors are no longer optimistic nor confident about the returns. Given that students are still required to face the forms before viewing their grades, the response rate remains as high as anyone could have promised — 95 percent, this past fall — but professors have found that students are actually writing less online than they did on the old forms.

“Frankly, I wish that students took a little more time with the written comments,” history professor John Gaddis said. “It seems like part of the focus of these online evaluations is numerical — having students rate the courses as ‘Excellent’ or ‘Very Good’ — but I don’t really have any use for those numbers. In the older system, if you wanted to make a comment, there was really no option but to write out your thoughts or suggestions; now, when you read the written comments, it just says, ‘Best course ever!’ or ‘Awesome!’”

Professor Holly Rushmeier, chair of the Department of Computer Science, goes so far as to say that, in her experience, “the quantitative elements actually distract from the comments.”

And just how brief are these student comments? Of nine students interviewed for this article, just one reported taking more than 10 minutes to complete the eight-question surveys. Five students reported taking as little time as two minutes or less on each.

A random sampling of 250 student evaluations suggests most Yalies share a similar approach. Eight of the nine interviewed students reported spending the most time on the question, “How would you summarize [course name] for a fellow student? Would you recommend [course name] to another student? Why or why not?” But of the sampling of 250 responses to that question, less than one-third contained more than 50 words, and just 19 responses contained more than 100 words.

It’s no surprise, then, that many professors no longer agree with Bailyn’s 2003 pleasure in “how seriously” the students took the change. As English professor Lawrence Manley said, “Students just don’t put enough time into the forms for them to be helpful.”

 

It may be that the course suppliers — the professors — aren’t finding the online evaluations informative, but, as it turns out, the consumers — Yale students — are. All nine students interviewed characterized the evaluations as at least somewhat helpful to them, and all reported consulting evaluations when making decisions during shopping period.

But does student satisfaction with the course evaluations mean that they’re working? In other words, if students are happily using the ratings and comments to craft ideal schedules — a feature that wasn’t even possible when paper evals were submitted to professors in manila envelopes — is the new online system doing its job, after all?

According to Gabriel Olszewski, the University registrar, to answer that question means to clarify the intended audience of the evaluations. The Registrar’s Office, Olszewski says, administers the system and distributes the evaluations, but doesn’t tell students who’s reading them.

“The students that I’ve talked to,” Olszewski said, “have their own interests at heart — to give information that would help other students make better decisions about courses. That might be a very different perspective on the evaluations than a faculty member has, or a very different perspective than a department has.” And maybe it is this difference, he suggests, that lies behind some of the problems with the current evaluation system.

Dean of Yale College Mary Miller, for one, acknowledges this idea, but holds strong that a primary service of course evaluations is still to teachers, and to the placement of teaching at the center of the undergraduate experience.

“The idea of the course evaluations as they were [10 years ago] was to have students give their feedback on what we could do to improve the courses — what the best parts were, what the worst parts were, other ideas,” Miller said. “It’s become less explicit with the online evaluations … [but] the deans and departments are all activists for the importance of our undergraduate teaching, and for everyone being a part of that.”

Bailyn expressed something similar in an interview with the News in 2006, stating that the most important role of evaluations was to give students a way to communicate directly with their instructors.

But has this basic goal of course evaluations — so obviously the purpose of the old paper forms — been forgotten by the students?

Professor Murray Biggs, who has been critical of the system since its introduction, thinks so: “The course evaluations are now really for the benefit — or not — of the students, and it’s the students who should comment on their value.”

Maybe the words of one such student, Tess McCann ’15, best capture the disconnect: “Does anyone else — other than the students — read the course evals? Seriously — I don’t even know.”

 

For teachers and departments, the course evaluation system is supposed to be used in service of improving students’ experiences in courses. One way to gauge the success of the system, then, is to look at whether or not courses based on students’ evaluations are actually improving from year to year.

Course evaluations ask students to report their overall assessment of a course on a five-point scale (where 1=poor, 2=below average, 3=good, 4=very good, and 5=excellent). With a large enough sample size, improvement of a course can be tracked by its change in average score over successive years. This is an unscientific process, but it can give a rough idea of whether or not a particular course is, in the students’ eyes, getting better each time it’s offered.

Evidence suggests that most courses are not. Of a random sampling of 30 courses, none had average student ratings improve by more than 3 percent between the 2009 and 2011 offerings of the course. This isn’t conclusive of anything, but it does suggest that student opinions of classes aren’t improving with repeated offerings of the course — at least not on a university-wide scale.

These 30 courses are just a sampling, but work done by brothers Harry Yu ’14 and Peter Xu ’14 can give us a better macro-picture. Yu and Xu, developers of a new statistics-based “blue-booking” site called YalePlus Bluebook, have analyzed the totality of the available evaluations. Their data shows that the average assessment rating of Yale courses has shown no trend over the past three years — increasing and decreasing only in negligible amounts. Not only are repeat courses not improving from year to year; the quality of the curriculum, as a whole, isn’t either.

One partial explanation would seem to come from a 2007 letter from Bailyn to the News. He argued that many introductory courses would “receive bad evaluations despite outstanding teaching performances.” But in fact, evaluations reveal that the same content can be met with wildly different reactions when taught by two different professors. If a teacher gets better, a course can, with the same content, get higher student marks. But, with students providing limited specific feedback and few suggestions on their evaluations, courses aren’t measurably improving.

 

Looking to get the specific feedback that the online system isn’t giving them, some professors are turning to alternative forms of course evaluation.

Physics professors Richard Casten and Sidney Cahn have developed one innovative approach. They don’t just rely on the scarce suggestions from the online evaluations: rather, they supplement them with conversational evaluations at the mid-term.

“On the forms, lots of students write too little,” Casten said. “More helpful for us is what we do about halfway through the semester, when we ask for two or three students from each section of the lab to meet with us after class and give us their impressions of the course — of what’s good, what’s bad, what can be improved.”

Casten says that he informs the students that their feedback should be candid, and that nothing they say will count against their grade in the course. In fact, he tells them, they should focus more on what’s gone wrong than what’s gone right. Students make “tons of suggestions,” he says, such as “more demos, shorter labs and a better idea of what might be put on the exam.” And after hearing their students’ suggestions, the first time they had one of these meetings, Casten and Cahn decided to implement many of them.

The result? The numerical student assessments for their physics labs have improved by almost 10 percent — more than double the improvement of any of the evaluations sampled in the random study.

Paul Wasserman ’14, too, has experience with classes that have had alternative forms of student course evaluation. Wasserman recalls feeling in one class that the professor “wasn’t doing enough to draw upon the readings.” So when, in the middle of the course, that professor asked the students what could be improved, Wasserman made a concrete, implementable suggestion. “For the rest of the semester, the class drew more on the readings: it was a change that really helped — made it feel like it was worthwhile to read the books.” It’s unclear to Wasserman, though, whether he would have made that suggestion on a course evaluation form: “I probably spend two minutes — maybe three — on one of those evaluations,” he said.

But are there also ways that the course evaluations could be improved even within the framework of our current online model?

One proposal would be the elimination of what Olszewski calls “grade-shielding,” the policy of hiding grades until students have either completed or specifically “declined to complete” the evaluations.

Grade-shielding has its disadvantages — namely that, as Manley notes, “most students are just eager to see their grades, and so don’t spend much time individualizing their responses or being specific.” But there are also ostensible benefits of grade-shielding: the policy, Olszewski says, was conceived of as a way to raise low response rates. And on that metric, it’s done its job: response rates have stayed consistently high.

Is that wholly a good thing, though? The concerns about response quality have been detailed above; they bear out the predictions of Biggs, whose 2002 letter to the News noted that, with fewer returns, “the students who respond are those with something to say and wanting to say it.”

And, in some cases, the quantity of responses can be counterproductive. “For the big lecture courses, there are so many responses that it’s hard to read them all,” Gaddis said.

Another idea might be to amend or add to the questions that the current forms are asking. At Stanford University, course evaluations ask students to rate their teachers from 1–5 by dozens of very specific criteria. Then, when teachers receive their evaluations, they can see just where they rank in categories such as “setting clear objectives for the course” and “explaining clearly how students would be evaluated.” After all the data is mined and forwarded to faculty, Stanford’s Center for Teaching and Learning releases a bulletin offering advice for teachers looking to implement teaching changes in response to their evaluations.

Stanford’s system doesn’t necessitate longer comments or more effort on the students’ end. Instead, it focuses on setting out smart, numerically scored metrics for evaluation more useful than the reductive “overall assessment of the course.” Yale’s questions, by comparison, seem vague and underdeveloped.

After 10 years, we can say that our current system isn’t “an embarrassment,” as Charles Bailyn called the old one: students are using the online course evaluations to their own benefit, aggregating reviews to help with course selection. But the written comments are still hurried, still haphazard — the snap judgments, once made by college kids eager to finish their semester, are now made by college kids eager to see their grades. If we want feedback that helps teachers as much as students, then Yale’s course evaluation system may need still to “be improved.”

Comments