A round-up of the designs in PDN's library that reduce norm-breaking behavior

Platforms run on explicit and implicit rules and norms to guide users and minimize harms. Those rules can range from platform-wide bans on illegal content to niche subgroup rules that, for example, require all comments end in question marks (see: reddit/ama).
To ensure users follow norms, platforms and online groups use a variety of approaches to either reduce rule-breaking before it occurs or decrease its chance of recurring. Behind those approaches is the assumption that a subset of users are not intent on causing harm but break rules for other reasons - lack of awareness, getting caught up in the heat of the moment, etc. - and so can be steered toward healthier engagement.
The PDN library team has reviewed over a dozen studies that test the effectiveness of interventions aimed at reducing rule-breaking. We summarize those interventions and their evidence below. (More info on how our ratings work.) As new research becomes public, we'll do our best to update our assessments below.
Rating: Validated
What it is: This is a message that can be displayed at the top of a forum, or when welcoming new users, informing or reminding them of community rules, and outlining enforcement procedures. Reminders can also be targeted towards users after they have their content removed for breaking community rules.
Evidence it works: Two field studies (on Reddit and Nextdoor) that displayed norms on individual posts or as users joined groups, respectively, demonstrated that reminding users of norms led to proportionately fewer rule-breaking posts by users. A field experiment on Facebook likewise showed that reminding users of norms after their content was removed increased their adherence to rules.
Limitations: As a feature but also a bug of norm reminders, one study shows they have the effect of encouraging more posts from newcomers which, paradoxically, can produce an overall increase in non-rule-abiding posts.
Rating: Validated
What it is: When a comment violates a forum's rules, it is deleted either by human moderators or automatically. While removing rule-breaking content is primarily aimed at protecting other users (from e.g. hate-speech), secondary goals are to deter individuals from future rule-breaking and to reduce rule-breaking behavior among other users.
Evidence it works: Two quasi-experiments, conducted on Facebook and Reddit, demonstrated that users who were notified about the deletion of their rule-breaking comment were less likely to break a rule in subsequent weeks. The findings from one of these studies also suggest that publicly removing rule-breaking comments positively influences other users, decreasing the likelihood they will offend.
Limitations: We identified no substantial limitations. However, it's worth noting that in both studies, the rule-breaker received an explanation for the comment's removal and had the option to appeal the decision. This suggests that the notification process and the opportunity for appeal might be crucial factors in the intervention's effectiveness.
Rating: Convincing
What it is: This intervention provides users with real-time feedback when they are drafting a post that appears to violate a forum's rules, giving those users an explanation of the rule infraction and a chance to course correct.
Evidence it works: A field experiment on Reddit showed that users who were randomly assigned to be given "post guidance" not only produced fewer rule-breaking posts, they also produced more high quality posts (i.e. posts that received greater positive engagement).
Limitations: Similar to Reminder of Norms, while Post Guidance reduces the proportion of rule-breaking posts and even increases the total amount of quality posts, it does not reduce the workload on moderators who still need to manually remove the same number of rule-breaking posts. This is possibly because users acting in bad faith may be learning to evade automated moderation.
Rating: Convincing
What it is: Before a comment is posted, an automatic classifier assigns a "toxicity score" to their comment. If the comment receives a high score, users are prompted to revise it before posting. Note: Preliminary Flagging is similar to Post Guidance in that they both give users feedback before they post. We include Preliminary Flagging as a distinct intervention, however, because - as it has been tested so far - it focuses specifically on reducing toxicity.
Evidence that it works: An experiment on Twitter and a study on OpenWeb both found that users who received toxicity prompts posted fewer offensive comments. Researchers in both studies also observed secondary positive effects, including an increase in civil and thoughtful comments and a decrease in toxic replies.
Limitations: This intervention might have an unintended backfire effect on a small subset of users who intend to cause harm. In spite of the overall positive impact, for example, it was observed that a small percentage of users changed their posts to include even more harmful language after being warned about toxicity.
Rating: Likely
What it is: When a comment is removed, the user who broke the rule is provided with an explanation.
Evidence that it works: One observational study and one quasi-experiment have tested the effectiveness of removal explanations. The observational study suggests that users who receive any explanation - from a bot or human moderator - are less likely to produce another rule-breaking post, and that longer explanations are more effective. The quasi-experiment compared the combined intervention of removing a comment with removal explanations and suggested that this combined approach is effective in reducing future rule-breaking comments.
Limitations: Neither study used an experimental design to specifically detect the effects of including a removal explanation. Such a design would be needed to further solidify the evidence for removal explanations alone.
Rating: Tentative
What it is: This intervention involves a classifier that informs users whether a thread they are about to reply in is likely to become tense. It also provides them with real-time feedback on whether their proposed contribution would increase or decrease the tension in that thread. Note: This intervention shares the real-time element and toxicity focus of Preliminary Flagging, but gives feedback as users are crafting their posts, as opposed to after they have tried to submit a post.
Evidence that it works: An initial study tested this intervention as a browser extension for Reddit users. Researchers found that users spent more time crafting their comments in response to tense threads when using the intervention and were more likely to revise their comments when alerted it would increase tensions.
Limitations: The evidence for this intervention is based on a within-subject experiment with a small sample of self-selected participants, so there is currently no evidence to show if it would be effective for people who are not voluntarily engaging with it. To have greater confidence in its effectiveness it would be ideal to see a study with a between-subjects design and a larger sample size.
The Prosocial Design Network researches and promotes prosocial design: evidence-based design practices that bring out the best in human nature online. Learn more at prosocialdesign.org.
A donation for as little as $1 helps keep our research free to the public.