From a sample of nearly 1500 reviewer comment sets from manuscripts published in “Ecology and Evolution” and “Behavioural Medicine,” we observed that approximately one in eight reviewer comment sets contained unprofessional comments. Previously, Silbiger and Stubler [3] observed that 58% of authors surveyed self-reported having received unprofessional comments in a review over their career. It appears as though a lack of professional comportment may have a large impact upon the experience of peer review. It is difficult to describe peer-review as collegial given the observed prevalence of unprofessional comments. It is also hard to imagine that such a high level of demeaning behaviour would be tolerated within a professional workplace context without corrective interaction.
The author of case study one received the most unprofessional comments, nearly double that of the next highest case study (Table 1). This elevated rate, beyond the overall average, is a product of four clusters of unprofessional comments associated with four manuscripts. Clusters of unprofessional comments highlight the role that editors could play to improve professional comportment. When unprofessional comments were observed, subsequent comment sets by that reviewer were qualitatively observed to often contain similar content. Removing these clusters of unprofessional comments would have substantially lowered the incidence of such comments in all case studies. Specifically, removing these four clusters would have reduced the unprofessional comments case study one received to the overall average (12%). Adopting policies at the level of the journal that enable editors to request reviewers to revise or remove unprofessional comments could help lower the incidence of such comments. We appreciate that some journals require editors to forward uncensored comments to authors. In such cases, comments from the editorial board indicating that particular comments do not represent the opinions of the editor or the editorial team would be a welcomed addition.
Only 2% of assessed comment sets included an accusation of questionable research practices. Care must be taken not to over-interpret this result, as only manuscripts that were eventually published were assessed. It is possible that such accusations have identified misconduct and resulted in justified rejection, and the manuscript never being published. Such papers would not have been assessed in our analysis. However, such accusations could carry far-reaching ramifications for the career of a researcher. In every instance in our dataset, the accusation of questionable research practices were a result of miscommunication or differences of opinion in research methodologies. While it is important that concerns about questionable research practices are communicated to editors, reviewers should proceed cautiously.
We employed five criteria to evaluate IIUCs in reviewer comment sets. Overall, two in five comment sets contained at least one IIUC. We observed that 19% of reviews were superficial, providing little useful guidance to the authors. These reviews failed to evaluate strengths and weaknesses, and/or provided no details regarding fatal flaws. Such reviews are unlikely to improve a manuscript, and the lack of detail makes it difficult to assess any of the reviewer’s claims. 22% of case study comment sets contained inaccurate statements about information clearly stated in the manuscript, such as admonishing an author for not including sample size when the sample size was clearly stated (proportion of inaccurate statements could not be rated for reviews published on Publons). Comments of this nature may imply that reviewers did not evaluate the manuscript in detail.
27% of comment sets included unsupported authoritarian arguments (not supporting claims with citations or sufficient detail to evaluate the claim), Common forms of arguments from authority were vague comments associated with experimental designs or statistical analyses. These comments often stated that the design or analyses were “wrong,” or “inappropriate” to answer experimental questions; without providing citations or explanation as to why the design/analysis was inappropriate. In the Ecology and Evolution reviewer comment sets, another common expression of this was to state that sampling units and/or data were not independent, without providing details as to why this was the case and the problematic nature of data dependence in the study evaluated. Such comments resulted in manuscripts being rejected for vague, and in some cases, arguably incorrect reasons. We suggest that reviewers explain their criticisms, ideally providing citations to support their position. If citations are not available to support their opinions, then sufficient detail should be provided for authors to evaluate the critique and prepare a reasoned response.
19% of comment sets stated that critical literature was missing but did not provide guidance on what that literature was. Such comments can be difficult to address by authors, and the lack of detail makes it difficult to assess the validity of reviewer concerns. Finally, 14% of comment sets included attacks upon common methods supported by a preponderance of evidence. This indicates that reviewers may often review outside of their areas of expertise, or do not evaluate provided references to familiarize themselves with methods. Comments identified as contradicting well supported methods did not include instances where reviewers asked for nuanced justification of methods (e.g. I am wondering why “x” was used instead of “y”). Instead, we counted instances where reviewers viewed common methods as strikes against the manuscript. For instance, in one Ecology and Evolution case, a reviewer strongly critiqued the use of Poisson regression to analyze over dispersed count data, an established method of analysis. This case highlights that not all reviewers will have the required expertise to evaluate all statistical analyses. More broadly, reviewers may not always be qualified to offer comments on all sections of a manuscript, a point that could be noted in reviewer comments.
Prevalence of unprofessional comments and IIUCs were observed to vary by subject area, within case studies, and between Plubons comments and case studies. Variation between case studies exemplifies that individual experiences with peer-review can vary greatly and compassion should be extended to those for whom this process is more negative. In almost all cases, incidence of low-quality reviews and abusive comments were higher in case studies than in Plubons comments. Differences between reviewer comments in Plubons and author case studies is unsurprising given that uploading reviews to Plubons is optional and likely prone to selection bias. Further, not all reviews are uploaded for a given manuscript. As such reviewer comments from Plubons are not diagnostic for a single manuscript; however, when assessed with case studies, they offer insight into the general nature of reviewer comments. While all evaluated manuscripts were eventually published, all reviews on Plubons were from the reviewer comment sets leading to publication in that journal. Case study comment sets, on the other hand, included reviews of rejected manuscripts that were eventually published elsewhere. For these reasons, differences between Plubons and case studies must be interpreted with caution. Finally, while differences were noted, caution is warranted when drawing contrast by subject area given that only two subject areas were evaluated.
Based upon our results, we suggest some solutions to improve the experience of peer review. First, reviewers should only comment on the technical merit of the submitted manuscript, never the author. We posit that it is never appropriate to comment on the gender, sex, age, or race of the author. A reviewer should also never assume that an author is, or is not, a native English speaker. Such comments can be offensive, and often incorrect. If editorial issues are identified, they can be pointed out without referring to personal characteristics of the author. Second, when issues are identified, reviewers must be specific when providing criticism, as well as provide references to support their points, and/or enough detail for authors to implement them. As scientists, it is not appropriate to make a claim without supporting it. We maintain that reviewers should be held to the same evidentiary standard as authors and must support their criticisms. Providing citations and/or detail regarding identified issues/missing literature enables editors and authors to assess the validity of the concern, prepare a measured response, or properly implement suggested changes. Third, reviewers should only review articles that they have the time and expertise to review thoroughly. When sections of a manuscript are outside the reviewer’s area of expertise, this should be identified. Our findings also underscore the importance of editors in mitigating unprofessional comments. When unprofessional comments were observed, subsequent comment sets by that reviewer often contained similar content. Editors must be vigilant and if allowed by their journal, screen such comments immediately. Finally, a variety of tools have been created to assess the quality of peer reviews, refer to Superchi, González [12] for a detailed review. Our coding structure offers one such method to evaluate reviewer behaviour.
Another potential option to improve peer review is a wholescale systemic change, with peer review adopting an alternative model. Several alternative peer-review models have been suggested, including the use of “as-is (paper is assessed on its initial merit with no suggested changes offered),” “double-blind (reviewer and author identity redacted),” and “total transparency (all reviewer comments and author responses made public)” models. Others have suggested the use of reviewer training [1, 2, 13]. Unfortunately, when alternative models have been assessed, they have not had measurable success in improving the peer-review process [14,15,16,17]. This is not surprising, and we argue that no model of peer review can succeed unless those within the system behave in a way that upholds the system’s integrity.
Beaumont [7] and Gerwing and Rash [8] contend that a peer review code of conduct is required to promote good reviewer behaviour while minimizing harmful behaviours. Gerwing and Rash [8] provide an example of what a peer review code of conduct could entail. Scientific codes of conduct already exist in some fields, such as for professional engineers or biologists. Unfortunately, such codes of conduct do not extend to peer review. While some journals offer guidelines around reviewer behaviour, this is far from the norm. Further, such guidelines lack the rigor of an accepted professional code of conduct [8]. Based on the findings of our investigation, we endorse the adoption of a peer review code of conduct. If an explicit code of conduct was available to guide reviewer behaviour, as well as to judge conduct against, editors would not be required to make judgement calls as often. If issues are detected with reviewer comments, Editors could request that reviewers provide feedback that conforms to the code of conduct. Therefore, assisting editors in what is admittedly a difficult job (to say nothing of finding reviewers in the first place). Finally, peer-reviewer training could be designed around such codes of conduct to provide a universal standard.