Deep Learning Methods for Moderating Harmful Viral Content


Content Moderation Challenges in the Aftermath of Buffalo

The racially-motivated shooting in a Buffalo supermarket – live streamed by the perpetrator and shared across social media – is tragic on many levels.  Above all else, lives were lost and families are forever broken as a result of this horrific attack.  Making matters worse, copies of the violent recording are spreading on major social platforms, amplifying extremist messages and providing a blueprint for future attacks.

Unfortunately, this is not a new problem: extremist videos and other graphic content have been widely shared for shock value in the past, with little regard for the negative impacts. And bad actors are more sophisticated than ever, uploading altered or manipulated versions to thwart moderation systems.

As the world grapples with broader questions of racism and violence, we’ve been working with our partners behind the scenes to help control the spread of this and other harmful video content in their online communities.  This post covers the concerns these partners have raised with legacy moderation approaches, and how newer technology can be more effective in keeping communities safe. 

Conventional Moderation and Copy Detection Approaches

Historically, platforms relied on a combination of user reporting and human moderation to identify and react to harmful content. Once the flagged content reaches a human moderator, enforcement is usually quick and highly accurate. 

But this approach does not scale for platforms with millions (or billions) of users.  It can take hours to identify and act on an issue, especially in the aftermath of a major news event when post activity is highest.  And it isn’t always the case that users will catch bad content quickly: when the Christchurch massacre was live streamed in 2019, it was not reported until 12 minutes after the stream ended, allowing the full video to spread widely across the web.

More recently, platforms have found success using cryptographic hashes of the original video to automatically compare against newly posted videos.  These filters can quickly and proactively screen high volumes of content, but are generally limited to detecting copies of the same video. Hashing checks often miss content if there are changes to file formats, resolutions, and codecs. And even the most advanced “perceptual” hashing comparisons – which preprocess image data in order to consider more abstract features – can be defeated by adversarial augmentations.  

Deep Learning To Advance Video Moderation and Contain Viral Content

Deep learning models can close the moderation capability gap for platforms in multiple ways. 

First, visual classifier models can proactively monitor live or prerecorded video for indicators of violence.  These model predictions enable platforms to shut down or remove content in real-time, preventing the publishing and distribution of videos that break policies in the first place.  The visual classifiers can look for combinations of factors, such as someone holding a gun, bodily injury, blood, and other object or scene information to create automated and nuanced enforcement mechanisms. Specialized training techniques can also accurately teach visual classifiers to identify the difference ​​between real violence and photorealistic violence depicted in video games, so that something like a first-person shooter game walkthrough is not mistaken for an real violent event.

In addition to screening using visual classifiers, platforms can harness new types of similarity models to stop reposts of videos confirmed to be harmful, even if those videos are adversarially altered or manipulated. If modified versions somehow bypass visual classification filters, these models can catch these videos based on visual similarity to the original version.   

In these cases, self-supervised training techniques expose the models to a range of image augmentation and manipulation methods, enabling them to accurately assess human perceptual similarity between image-based content. These visual similarity models can detect duplicates and close copies of the original image or video, including more heavily modified versions that would otherwise go undetected by hashing comparisons.

Unlike visual classifiers, these models do not look for specific visual subject matter in their analysis.  Instead, they quantify visual similarity on a spectrum based on overlap between abstract structural features. This means there’s no need to produce training data to optimize the model for every possible scenario or type of harmful content; detecting copies and modified versions of known content simply requires that the model accurately assess whether images or video come from the same source.

How it works: Deep Learning Models in Automated Content Moderation Systems

Using predictions from these deep learning models as a real-time signal offers a powerful way to proactively screen video content at scale. These model results can inform automated enforcement decisions or triage potentially harmful videos for human review. 

Advanced visual classification models can accurately distinguish between real and photorealistic animated weapons. Here are results from video frames containing both animated and real guns. 

To flag real graphic violence, automated moderation logic could combine confidence scores in actively held weapons, blood, and/or corpse classes but exclude more benign images like these examples. 

As a second line of defense, platforms need to be able to detect reposts or modified versions of known harmful videos from spreading.  To do this, platforms use predictions from pre-trained visual similarity models in the same way they use hash comparisons today. With an original version stored as a reference, automated moderation systems can perform a frame-wise comparison with any newly posted videos, flagging or removing new content that scores above a certain similarity threshold.  

Visual similarity model results on a pair of example frames. The query image has been lightly modified with a black and white filter, horizontal flip, and overlay text. The model returns a similarity score of >0.95
Visual similarity model results on a second pair of example frames. This time query image is heavily augmented, with the original cropped, rotated and photoshopped onto a billboard in a city photo. The model returns a similarity score of >0.8

In these examples, visual similarity models accurately predict that frame(s) in the query video are derived from the original reference, even under heavy augmentation. By screening new uploads against video content known to be graphic, violent, or otherwise harmful, these moderation systems can replace incomplete tools like hashing and audio comparison to more comprehensively solve the harmful content detection problem.

Final Thoughts: How Hive Can Help

No amount of technology can undo the harm caused by violent extremism in Buffalo or elsewhere.  We can, however, use new technology to mitigate the immediate and future harms of allowing hate-based violence to be spread in our online communities. 

Hive is proud to support the world’s largest and most diverse platforms in fulfilling their obligation to keep online communities safe, vibrant, and hopeful. We will continue to contribute towards state-of-the-art moderation solutions, and can answer questions or offer guidance to Trust & Safety teams who share our mission at