TEACHING EVALUATIONS: BIASED BEYOND MEASURE

April 8, 2016
Philip B. Stark, Associate Dean for the Division of Mathematical and Physical Sciences, Discusses How Student Teaching Evaluations are Biased Against Female Instructors
BERKELEY, CA, April 8, 2016 – Student evaluations of teaching (SET) are widely used in academic personnel decisions as a measure of teaching effectiveness.  On April 11th at Berkeley, Philip B. Stark, will present  findings from joint work, with graduate student Kellie Ottoboni  and research fellow Anne Boring of OFCE-Sciences Po, that indicate student evaluations of teaching (SET) are biased against female instructors by an amount that is large and statistically significant.  Gender biases can be large enough to cause more-effective instructors to get lower SET than less-effective instructors.  Stark will argue that relying on SET in employment decisions can work against women’s career advancement in academia.
The study consisted of in-depth statistical analysis of two studies from U.S. and French universities and revealed that SET are more sensitive to students’ gender bias and grade expectations than they are to teaching effectiveness.  In the French data, male students tend to rate male instructors higher than they rate female instructors, and in the U.S. data, female students tend to rate (perceived) male instructors higher than they rate female instructors. 
The findings are based on two datasets, 23001 SET of 379 instructors by 4,423 students in six mandatory first-year courses in a five-year natural experiment at a French university, and 43 SET for four sections of an online course in a randomized, controlled, blind experiment at a U.S. university. 
In his talk, Stark will contend that the bias varies by discipline and by student gender, among other things (class size, level, format, physical characteristics of the classroom, and instructor ethnicity) and it is not possible to adjust for the bias, because it depends on so many factors.  
“SET appear to measure student satisfaction and grade expectations more than they measure instructor effectiveness. While student satisfaction may contribute to teaching effectiveness, it is not itself teaching effectiveness.  Students may be satisfied or dissatisfied with courses for reasons unrelated to learning outcomes and not in the instructor’s control (e.g., the instructor’s gender.)”
Stark notes, SET could be replaced with peer evaluation or a faculty review of materials that instructors use in their classes, such as syllabi and handouts. 
In their paper published in ScienceOpen in January 2016, authors Stark, Boring, and Ottoboni conclude that:
In the US, SET have two primary uses: instructional improvement and personnel 
decisions, including hiring, firing, and promoting instructors.  We recommend caution 
in the first use, and discontinuing the second use, given the strong student biases that 
influence SET. 
Overall, SET disadvantages female instructors.  There is no evidence that this is the 
exception rather than the rule.  Hence, the onus should be on universities that rely on 
SET for employment decisions to provide convincing affirmative evidence that such 
reliance does not have disparate impact on women, underrepresented minorities, 
or other protected groups.  Absent such specific evidence, SET should not be used for
personnel decisions. 
Carol Christ, Director at Center for Studies in Higher Education, will moderate the discussion. 
Sponsored by: The Center for Studies in Higher Education, UC Berkeley and the Social Science Matrix, UC Berkeley, on the 8th Floor Barrows Hall, Monday, April 11th from 4:00pm-5:30pm. Reception to follow.  
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++  
Philip B. Stark is the Associate Dean for the Division of Mathematical and Physical Sciences at UC Berkeley. Stark's research centers on inference (inverse) problems, especially confidence procedures tailored for specific goals. Applications include the Big Bang, causal inference, the U.S. census, climate modeling, earthquake prediction, election auditing, food web models, the geomagnetic field, geriatric hearing loss, information retrieval, Internet content filters, nonparametrics (confidence sets for function and probability density estimates with constraints), risk assessment, the seismic structure of the Sun and Earth, spectroscopy, spectrum estimation, and uncertainty quantification for computational models of complex systems. In 2015, he received the Leamer-Rosenthal Prize for Transparency in Social Science award.  Stark was former Department Chair of Statistics and Director of the Statistical Computing Facility at UC Berkeley.
Carol Christ, is the Director, Center for Studies in Higher Education, UC Berkeley; former President, Smith College; and Executive Vice Chancellor and Provost, UC Berkeley.
Center for Studies in Higher Education (CSHE) was established in 1956 and was the first research institute in the United States devoted to the study of systems, institutions, and processes of higher education.  The Center’s mission is to produce and support multi-disciplinary scholarly perspectives on strategic issues in higher education, to conduct relevant policy research, to promote the development of a community of scholars and policymakers engaged in policy-oriented discussion, and to serve the public as a resource on higher education.  http://cshe.berkeley.edu