Content Moderation is a Dead End.
Important work is done by moderation, but it cannot solve broader issues with technology. Subjective Measurement, Designing for Well-Being, & Algorithmic Value Alignment are more promising approaches.
Last week, I attended the Trust & Safety Research Conference with excitement, as one of the first times I could engage with the world as a representative of USC’s Neely Center and the Psychology of Technology Institute rather than as a Meta employee. One reason I left was because I wanted to be part of the wider conversation in the world about how to improve technology’s impact on society, which I thought was often operating from some incorrect assumptions. At most companies, there are people who work on teams with “trust”, “safety”, or “integrity” in the title, separate from the people who are primarily responsible for designing and building the company’s core products. Products are often built to support business objectives and then other teams write policies about the content and behaviors that should not be allowed. Content moderation procedures are then built to implement those policies. A lot of valuable and important work gets done, but if we are expecting this process to meaningfully change technology’s impact on society, it is a dead end.
DALL-E image of ““a drawing of a man with a computer driving a car toward a cliff”
Let me explain with an example. One of the many policy-based projects I worked on at Meta was Engagement Bait, which is defined as “a tactic that urges people to interact with Facebook posts through likes, shares, comments, and other actions in order to artificially boost engagement and get greater reach.” Accordingly, “Posts and Pages that use this tactic will be demoted.” To do this, “models are built off of certain guidelines” trained using “hundreds of thousands of posts” that “teams at Facebook have reviewed and categorized.” The examples provided are obvious (eg. a post saying “comment “Yes” if you love rock as much as I do”), but the problem is that there will always be far subtler ways to get people to engage with something artificially. As an example, psychology researchers have a long history of studying negativity bias, which has been shown to operate across a wide array of domains, and to lead to increased online engagement. So instead of explicitly asking for engagement, which clearly would violate policies, publishers learn to implicitly bait users into engaging with their content through tactics like fear and outrage. Politicians, many of whom have sophisticated software to tell them what does well, learn that “If there’s no blood, it is likely to only be seen by our social bubble” and can learn the contours of any content moderation policy to maximize distribution and escape sanctions, without being explicitly told.
The same issue recurs with regards to most (though not all) content policy issues, especially when they intersect with more subjective, psychological experiences. What makes something “hate speech” or “violence incitement” is not whether it conforms to a policy, but rather whether it generates hate or violence in people. Research by the Dangerous Speech project shows that this effect is as much about the speaker, audience, and historical context as it is about what specific words are used, echoing a long history of social psychological research on the power of situations to affect attitudes and behavior. It is often more effective to misinform people implicitly by taking a true event (e.g. an adverse reaction to a vaccine, a rare instance of voter fraud, or a crime committed by a minority group) and suggesting it is representative of a broader pattern, than it is to make a verifiably false claim. The relative ineffectiveness of a content moderation based approach is magnified by the need to create guidelines that can scale so that thousands of contract workers can reliably make the same judgments quickly. If situation and context matter, then how well could these systems possibly work? And if we then use this data to train machine learning models, should we be surprised when these systems learn simple heuristics that can lead to over-enforcement rather than being able to replicate complex judgment?
Not all judgments of content rely on context as much (e.g. child exploitation or drug sales), and there will always be valuable and important work to be done by policy and content moderation processes. However, for the more pervasive effects of technology that people worry about, making new policies and moderating content based on those policies will not work. At best, it will make a small dent for the most obvious examples with an acceptable cost of over-enforcement. At worst, it will lull society into thinking that problems are solved because we have defined a category of content and successfully driven its prevalence down.
It was heartening to realize that many of my colleagues at the Trust & Safety Research Conference agreed with me. In one panel on the Responsibility of Trust & Safety, panelists called content moderation “a band-aid on a deeper wound” and discussed how “there are cases [that are obvious], but there are so many more cases that are in that gray area”. One of the questions referenced the current system as “a whack-a-mole reduce harm approach”. The keynote at TrustCon, a conference which occurred just before this conference, argued for explicitly expanding the Trust and Safety paradigm toward health promotion. The keynote at this conference featured a graphic with the word moderation crossed out.
Yet, I still worry that we are not being definitive enough about the limits of content moderation and the need to pivot more intentionally toward alternative approaches, if we want to make more sustained progress. There remain many efforts to misapply policy and moderation frameworks to address problems (e.g. misinformation) that cannot reasonably be addressed with this paradigm. We need to explicitly seek alternatives.
What are promising alternative approaches?
Leaning into Subjective Measurement:
If we embrace the idea that there is no natural objective category of content that is misinformation, harassment, etc., then society can start to take ownership of measuring platform impact, rather than waiting for platforms to provide a hypothetical data set that would solve our understanding gap. The ground truth measurement needed for understanding harassment online is closer to the subjective experience of a sample of the population, than it is to the percentage of content that conforms to a policy definition. However, we could drive harassment to zero if we just prohibited people from interacting with each other altogether. Useful measurement also has to take into account related positive experiences, such as how often people feel supported by or learn from others. We also need to be able to understand how platforms relate to each other and to baselines (e.g. offline experiences) over time, so that we can better contextualize progress. Creating better systematic measures is a core opportunity that we are focused on delivering at the Psychology of Technology Institute and USC’s Neely Center, and we plan to announce our more specific plans shortly.
Designing for Well-Being
One of the best talks I saw at the Trust & Safety Research conference described a study published in the Journal of Online Trust & Safety, where they partnered with Nextdoor to figure out how they could encourage better conversations. They explicitly leveraged psychological research on group identity to hypothesize that moving people to a smaller group would encourage more civil conversations. In a 2nd intervention, they also leveraged research on prescriptive norms to generate positively framed norms that they felt would also improve civility. To test their hypotheses they did have to create definitions of what was a better or worse conversation, but rather than relying on any definition as an enforcement lever, they used three separate operationalizations (including the freely available Moral Foundations Dictionary), to understand whether their design change was positive or negative. Moving people to smaller group discussions did indeed increase civility (measured using human labels from a definition as well as text analysis) and reduced user reports, as a complementary behavioral signal. Prescriptive guidelines had mixed results for human labeled civility, but had positive results for civility as measured using text analysis and also led to reduced reports.
By using their definitions as a measure, rather than an enforcement lever, they put a lot less stress on those definitions as issues like over-enforcement and user experience recall are mitigated. Nobody’s speech was limited by the changes made and we could reasonably assume that the changes would positively affect related, but unmeasured definitions of civility as well. In our future work, we are hopeful that our network can facilitate more such collaborations between designers, technologists, and researchers, perhaps in collaboration with organizations such as Yale’s Justice Collaboratory (which led this work), New_ Public, or the Prosocial Design Network.
Algorithmic Value Alignment
As part of the Facebook Papers, one of the internal notes I wrote was leaked and so I can conveniently quote myself.
Rather than optimizing for engagement and then trying to remove bad experiences, we should optimize more precisely for good experiences.
This is not a problem that is limited to Facebook or even social media, as figuring out how to make automated systems responsive to our values, rather than blindly following a reasonable objective function, is a pervasive question, often referred to as value alignment. It underlies questions about whether optimizing for GDP and stock price is actually good for society and has become even more critical as AI-driven systems create myriad optimization opportunities that all have the potential to go wrong (e.g. balancing safety vs. efficiency of self-driving cars, cost vs. health benefit for managing health resources, worker productivity vs. well-being for the gig economy).
In the social media world, some progress is being made if you read behind the headlines. A more charitable reading of the evolution of Facebook’s Meaningful Social Interaction metric suggests that the company has meaningfully improved newsfeed’s objective function based on internal research, even before any outside pressure came to bear. Many platforms have been experimenting with mechanisms for users to let platforms know when a piece of content that they engaged with was actually not a good experience and the new “show more” and “show less” buttons that Facebook recently announced, are likely a promising tool for more accurately providing users content that aligns with their values, rather than merely what they engage with. Public tests and interventions have opened the door to a conversation about how we treat civic and political content, as areas that may require a different value-aligned objective function.
At USC’s Neely Center, we are currently collaborating across business schools in the Psychology of Technology Institute network to build a course that helps tomorrow’s leaders create more value aligned AI-driven systems. To date, much of this conversation has occurred within technical communities, and the conversation on value alignment needs to be broader. It is a problem that clearly is unsolved, but we never will get there unless we start to approach it from a cumulative science perspective, leveraging what we can learn from the above examples and teaching it to our next generation of leaders, rather than focusing on narratives of good vs. bad actors.
Dead ends can be valuable.
The work that has been and is being done on content moderation is immensely valuable. It has prevented and will continue to mitigate many harms. Yet, the most valuable part of that work may be in helping us wrestle with issues and realize when we need to change course. I did not have the same opinion about how to approach engagement bait when I started my time at Facebook and I think an increasing number of people are realizing the limits of content moderation and are starting to work on the next chapter of “trust and safety” work. We are excited to be part of that future. Please do get in touch with me if you’d like to be involved in any of the above workstreams and be part of this future with us.
Fantastic post, Ravi. Your quote "Rather than optimizing for engagement and then trying to remove bad experiences, we should optimize more precisely for good experiences" should be a banner hung in every web design office...
When you are ready to cut the in-group "sociology" jargon used in this piece by 50% AND provide very concrete examples, across platforms and across political affiliations, then you might be able to convince the public that your prescription is valid. If small groups alone made for more civic speech, there wouldn't be so many small Reddit subgroups that are toxic and even closed, or extremist Twitter groups. You need to delve into very concrete examples of speech. Otherwise, it just sounds like more tech platform manipulation.