A mockup of a prompt telling a user that "Your Comment Is Likely To Be Hurtful To Others" with an option to either edit the post or post it anyway.

Preliminary Flagging Before Posting

Reduce online harassment

Our confidence rating

Convincing

Share This Intervention

What It Is

An AI-powered intervention that prompts users to revise their comment before posting if it receives a high toxicity score. The intervention is most often powered by Jigsaw's Perspective API which rates a comment's toxicity.

Civic Signal Being Amplified

Welcome
:
Ensure user safety

When To Use It

Interactive

What Is Its Intended Impact

This intervention reduces the number of toxic comments posted. It is particularly geared towards "road rage" style comments, in which an otherwise genuine user is immediately in the heat of the moment.

Evidence That It Works

Evidence That It Works

A private study by OpenWeb (2020) on their own platform, found that about half of the users either revise their comment or decide not to post it when prompted that their comment may be inflammatory. They also observed a 12.5% increase in civil and thoughtful comments being posted overall. The authors observed that the intervention led to healthier conversations and opportunities for well-intentioned users to participate, which also boosted loyalty and overall community health.

In another study, a randomized controlled experiment conducted on Twitter (Katsaros et al., 2021), the authors found that users who received the intervention posted 6% fewer offensive Tweets than non-prompted users in the control. The decrease in offensive content was not just due to the deletion and revision of prompted Tweets, but also to a decrease in recidivism and the number of offensive replies to the prompted Tweets. They concluded that interventions allowing users to reconsider their comments can be an effective mechanism for reducing offensive content online.

Why It Matters

Most of the edits in response to the prompt were done in good faith, suggesting that users are generally well intentioned and open to positive change, needing only to be made mindful of the potential harmfulness of their comment when they may lack better judgment in the spur of the moment.

Special Considerations

While most edits in response to the prompt were done in good faith, there can be backlash and attempts to circumvent the intervention. In one study (Katsaros et al., 2021), for example, in 3% of cases in which the intervention was used, users edited their posts to add even more slurs, attacks, or profanity compared to what they originally intended to post.

And, as any API that rates the toxicity of comments is human-written, it will naturally carry the perspectives and biases of its creators.

Examples

This intervention entry currently lacks photographic evidence (screencaps, &c.)

Citations

OpenWeb tests the impact of “nudges” in online discussions

Ido Goldberg, Guy Simon, Kusuma Thimmaiah
September 21, 2020

"Did You Suspect the Post Would be Removed?": Understanding User Reactions to Content Removals on Reddit

Shagun Jhaver, Darren Scott Appling, Eric Gilbert, Amy Bruckman
Proceedings of the ACM on Human-Computer Interaction
November 7, 2019
10.1145/3359294

Reconsidering Tweets: Intervening During Tweet Creation Decreases Offensive Content

Matthew Katsaros, Kathy Yang, Lauren Fratamico, Yale Law School, Twitter Inc.
International AAAI Conference on Web and Social Media (ICWSM 2022)
December 1, 2021
abs/2112.00773v1

Is this intervention missing something?

You can help us! Click here to contribute a citation, example, or correction.

Further reading

Back to Prosocial Design Library