A mockup of a prompt telling a user that "Your Comment Is Likely To Be Hurtful To Others" with an option to either edit the post or post it anyway.

Preliminary Flagging Before Posting

Reduces online harrassment

Our confidence rating

Convincing

How do our ratings work?

Share This Intervention

Back To Top

What It Is

This is an AI-powered intervention, most often powered by Jigsaw's Perspective API, which rates a comment's toxicity.

Typically any comment receiving a high toxicity score will prompt a message suggesting that they revise their post.

When To Use It

This can be used in conjunction with any comment section or platform that primarily relies on short-text posts.

It should appear to the user after they have pressed the 'Submit' button, and before the post goes live.

What Is Its Intended Impact

This intervention reduces the number of toxic comments posted. It is particularly geared towards "road rage" style comments, in which an otherwise genuine user is immediately in the heat of the moment.

How We Know It Works

How We Know It Might Work

A private study by OpenWeb (2020) on their own platform, which they sell as a service, found that about half of users either revise their comment or decide not to post it when prompted that their comment may be inflammatory.

"The overall positive outcome of this experiment reinforces our belief that quality spaces and civility cues drive a better, healthier conversation experience." write the study's authors. "A gentle nudge can steer the conversation in the right direction and provide an opportunity for users with good intentions to participate. The feature provides more transparency and education throughout the user engagement journey boosting loyalty and overall community health."

In a separate survey of 907 Reddit Users, "although roughly a fifth (18%) of the participants accepted that their post removal was appropriate... over a third (37%) of the participants did not understand why their post was removed, and further, 29% of the participants expressed some level of frustration about the removal." The study suggests that "users receiving explanations for removal are more likely to perceive the removal as fair and post again in the future."

And finally, in a randomized controlled experiment conducted on Twitter, researchers from Cornell and Yale evaluated "a new intervention that aims to encourage participants to reconsider their offensive content [with a prompt to] users, who are about to post harmful content, with an opportunity to pause and reconsider their Tweet."

Their research found that users prompted with this intervention "posted 6% fewer offensive Tweets than non-prompted users in our control. This decrease in the creation of offensive content can be attributed not just to the deletion and revision of prompted Tweets — we also observed a decrease in both the number of offensive Tweets that prompted users create in the future and the number of offensive replies to prompted Tweets." They concluded that interventions allowing users to reconsider their comments can be an effective mechanism for reducing offensive content online.

Why It Matters

The findings could help explain that, while a minority of the edits were either trying to trick the system or redirecting their angrily to the prompt itself; most, but not all, of the edits in response to this prompt are done in good faith.

When it comes to moderation technologies there is no one size fits all. We believe this data analysis has also helped us understand and detect online trolls faster and better. If a user is repeatedly ignoring nudges and trying to trick the system, it warrants stronger tools such as auto suspension.
Ido Goldberg, et al.

Special Considerations

While most edits in response to this prompt are done in good faith, there can be backlash and attempts to circumvent the intervention.

In one study, roughly 0.4% of cases in which this nudge was used, users edited their posts to add even more slurs, attacks, or profanity compared to what they originally intended to post.

And, as any API that rates the toxicity of comments is human-written, it will naturally carry the perspectives and biases of its creators.

Examples

This intervention entry currently lacks photographic evidence (screencaps, &c.)

Citations

OpenWeb tests the impact of “nudges” in online discussions

Ido Goldberg, Guy Simon, Kusuma Thimmaiah
September 21, 2020
DOI:
Proceedings of the ACM on Human-Computer Interaction

"Did You Suspect the Post Would be Removed?": Understanding User Reactions to Content Removals on Reddit

Shagun Jhaver, Darren Scott Appling, Eric Gilbert, Amy Bruckman
November 7, 2019
DOI:
10.1145/3359294
International AAAI Conference on Web and Social Media (ICWSM 2022)

Reconsidering Tweets: Intervening During Tweet Creation Decreases Offensive Content

Matthew Katsaros, Kathy Yang, Lauren Fratamico, Yale Law School, Twitter Inc.
December 1, 2021
DOI:
abs/2112.00773v1
Journal of Experimental Criminology

Social media governance: can social media companies motivate voluntary rule following behavior among their users?

Tom Tyler, Matt Katsaros, Tracey Meares & Sudhir Venkatesh
December 27, 2019
DOI:
10.1007/s11292-019-09392-z

Is this intervention missing something?

You can help us! Click here to contribute a citation, example, or correction.

Further reading

Back to Prosocial Design Library