An expanded modal box, which conveys the concept of listening to a recording of someone else's voice.

Give audio option for user input

Humanizes users in others' eyes



Empower Connection

Give audio option for user input

Share This Intervention
Back To Top


What It Is

A button that allows the users to respond to content with a recording of their own voice.

What Is Its Intended Impact

Reduces likelihood of showing or reporting hostility when people discuss a topic via voice memos.

When To Use It

If technical constraints allow, place this as an option anywhere that someone can otherwise respond to content with text (e.g., comment sections and replies to social media posts).

How We Know It Works

Why It Might Work

Four experiments conducted by Schroeder, et al. (2017) predicted that a person’s speech, beyond conveying a person’s thoughts, also "conveys their mental capacity, such that hearing a person explain his or her beliefs makes the person seem more mentally capable—and therefore seem to possess more uniquely human mental traits—than reading the same content” (ibid).

This study ran three experiments “involving polarizing attitudinal issues and political opinions” which found “this effect to emerge when people are perceived as relatively mindless, such as when they disagree with the evaluator’s own beliefs” (ibid).

A fourth experiment found that “paralinguistic cues in the voice” (e.g., tone, echoing) were an indicator for mental capacities.

Why It Matters

In an interview with Casey Newton of the Platformer, Kayvon Beykpour, Twitter’s head of product, had this to say about audio:

“Our mechanics incentivize very short-form, high-brevity conversation, which is amazing and powerful and has led to all the impact that Twitter has had in the world. But it’s a very specific type of discourse, right? It's very difficult to have long, deep, thoughtful conversations."Audio is interesting for us because the format lends itself to a different kind of behavior. When you can hear someone’s voice, you can empathize with them in a way that is just more difficult to do when a you’re in an asynchronous environment. … We think audio is powerful, because that empathy is is real and raw in a way that you can’t achieve over text in the same way.

According to Schroeder et al. (2017), modern technology allows for predominantly text-based interactions over vast distances, but that may not be optimal for cultivated mutual appreciation and understanding the minds of others. Voice, however, allows us to better build a theory of mind about others, the absence of which fuels dehumanization.

Special Considerations

The television spot opens with a white man making calls to a rental office. He gives a different name and uses a different voice for each call, reflecting various races and ethnicities. Each time, he is told that the apartment is not for rent. Yet, when he uses a Caucasian-sounding name and accent, he is assured that the apartment is—for some certain reason—available.

The ad, titled "Accents", appeared on television stations across the United States.  It also suggests a very real underlying problem with voice only, as it can, unfortunately, still prove to be a vector for discrimination.

Another drawback is that voice-only risks being inaccessible to the deaf and hard of hearing. However, given advancements in real-time closed captioning, this latter drawback is potentially solvable.

Modern technology is rapidly changing the media through which people interact, enabling interactions between people around the globe and across ideological divides who might otherwise never interact. These interactions, however, are increasingly taking place over text-based media that may not be optimally designed to achieve a user’s goals. Individuals should choose the context of their interactions wisely. If mutual appreciation and understanding of the mind of another person is the goal of social interaction, then it may be best for the person’s voice to be heard.

Schroeder et al.; 2017


This entry is currently being researched & evaluated!

You could help us improve this entry by contributing facts, media, screenshots, and other evidence of its existence in the wild.
Email us

Our Confidence Rating


A grade of Inference is for proposed interventions that lack research of their own, but that could work by way of analogous studies, expert opinions, or first principles.

While this is, technically, the lowest evidentiary grade that we can afford an intervention while still including it in the library, this is not meant to discourage. On the contrary! This grade is very much an invitation to explore and experiment with it further.

In The Wild

This intervention has precedence, and exists, or has at one time or another, existed in the wild.


Contextual Research

The Humanizing Voice: Speech Reveals, and Text Conceals, a More Thoughtful Mind in the Midst of Disagreement


Juliana Schroeder, Michael Kardas, Nicholas Epley

Date of Publication

Publication Status

Peer Reviewed

Study Design


Sample Size(s)





Haas School of Business, University of California, Berkeley; Booth School of Business, The University of Chicago

Journal Name

Association for Psychological Sciences

Entry Type

Research Article or Manuscript

Publication Statistics

Online Impact


APA Citation

Schroeder, J., Kardas, M., & Epley, N. (2017). The Humanizing Voice: Speech Reveals, and Text Conceals, a More Thoughtful Mind in the Midst of Disagreement. Psychological Science, 28(12), 1745–1762.

There's more to learn about this intervention. Want to help?

Do you think this intervention could have more benefits, unacknowledged drawbacks, or other inaccuracies that we've neglected to mention here?

We always welcome more evidence and rigorous research to back up, debunk, or augment what we know.

If you want to be a part of that effort, we'd love to have your help!

Email us

Further Reading

Back to Prosocial Design Library