How Healthy Are Social Media Platforms?
Social media communication has become more prevalent than ever before. People have turned to these platforms to advocate, share information, and connect at a time when in-person interactions are limited due to COVID-19 mitigation measures. With the shift to virtual communication, ensuring reliability and accuracy has become increasingly important.
Social media acts as a gatekeeper to information and points of view. People tend to curate their feeds in a way that aligns with their currently held beliefs. The positive feedback loop of surrounding oneself with people who have similar opinions reinforces those opinions, and can lead many users into echo chambers. Beliefs are amplified through repetition, and there is little to no exposure to other opinions.
We saw this over the past year, as sensitive information related to politics and public health was widely shared across social channels, and platforms like Facebook, Twitter, and TikTok, among others, faced scrutiny over it.
The Iffy Quotient
School of Information senior Hamza Baccouche joined us to discuss the health of social media platforms and the challenges of adequately moderating content.
Along with a team at the Center of Social Media Responsibility (CSMR), Baccouche has developed the Iffy Quotient: a measurement of how much “iffy” content has been amplified on Facebook and Twitter. The term “iffy” is an imprecise measure used to describe sites that frequently publish misinformation. NewsGuard and Media Bias/Fact Check are used to classify the sites, and these classifications are ongoing. This web-based dashboard has been running since 2016, allowing for comparisons to be made over time and between platforms. This project was prompted by the polarization following the 2016 presidential election, and its relevance has continued past the 2020 presidential election.
Baccouche and his team sought to take the analysis one step further by developing machine learning models that, once trained, are able to assess posts and categorize commentary as “civil” or “uncivil.” These determinations are based on factors such as hostility, personal attacks, tone, and topic. Over the last two years, they have developed machine learning models, trained them, and deployed them to assign scores to various social platforms.
The first step to solving a problem is admitting that there is a problem. The ability to expressly quantify how civil or fact-based social platforms are is a “huge step forward in being able to correct the trajectory.”
Understanding Machine Learning
CSMR used natural language processing to analyze text posts. The first step is training the model to perform the desired task. In this case, it was recognizing civility of a post. Hamza, as a data labeler, used political subreddits to train the model. Posts were manually labeled as civil or uncivil based on tone, or whether the person or the idea was being attacked. The goal was to train the machine to understand and identify language patterns, then define them in a way that was generalizable. A rulebook was defined as to what distinguishes healthy disagreement from incivility. Learning where to draw that line was one of their biggest challenges.
Ensuring Model Relevance
Training a machine is a long, iterative process. It can take months, even years. All the while, relevant issues are changing, language patterns are adapting, and new trends are emerging on social media. “How do you learn to identify conflict and misinformation and social media posts for something that your machine learning model has never seen?” Baccouche and other researchers at CSMR grapple with this question as they try to keep the machine learning models up to date. For example, we were not all aware of COVID-19 one year ago, let alone four years ago when this model was trained. It would be impossible for the model to recognize and flag COVID-19 misinformation without being trained on what information is factual and what is not. The team at CSMR aims to “deploy that to the public and to social media companies as soon as possible.” Yet there is no substitute for time. Pushing a model out faster could result in sacrificing accuracy. Given the danger of COVID-19 misinformation circulating online, it is crucial to maintain the integrity of these models.
Content Moderation
Is there truly any ideal way to moderate content on social media platforms? This is a complex problem that doesn’t seem to have a clear solution. The sheer volume of content on platforms like Facebook and Twitter make it impossible to manually moderate. It is much easier to flag questionable content from users who have a large following, but that means that users without significant followings aren’t being held to the same standards. This is where machine learning models come into play. Baccouche recommends a filtration system that automatically categorizes most of the content as within its guidelines or not, then pushes the remaining content to manual review.
However, it gets tricky when considering how to train the machine. raining the machine to recognize violence and pornography requires humans to feed it those types of images in the first place. Who would volunteer to do this kind of soul-crushing work? Do companies let some disturbing images slide because the machines weren’t able to catch them, or do they put people in these difficult positions of having to look at these images every day?
On the flip side, if a platform relies solely on manual review, it’s relying on its users to flag problematic content. And no matter how well intentioned, humans to make the determination of whether it is in the platform’s guidelines.
Interview conducted and post written by UM Social Intern Keara Kotten