BACK TO ALL BLOGS

CNBC

BACK TO ALL BLOGS

Wired

BACK TO ALL BLOGS

Hive Adds Hate Model to Fully-Automated Content Moderation Suite

Social media platforms increasingly play a pivotal role in both spreading and combating hate speech and discrimination today. Now integrated into Hive’s content moderation suite, Hive’s hate model enables more proactive and comprehensive visual and textual moderation of hate speech online.

Year over year, our content moderation suite has emerged as the preeminent AI-powered solution to both help platforms keep their environments protected from harmful content, and to dramatically reduce the exposure of human moderators to sensitive content. Hive’s content moderation models have consistently and significantly outperformed comparable models, and we are proud to currently work with more than 30 of the world’s largest and fastest-growing social networks and digital video platforms.

Today we are excited to officially integrate our hate model into our content moderation product suite, helping our current and future clients combat racism and hate speech online. We believe that blending our best-in-class models with the significant scale of our clients’ platforms can result in real step-change impact.

Detecting hate speech is a unique challenge that is dynamic and evolving rapidly. Context and subtle nuances vary widely across cultures, languages, and regions. Additionally, hate speech itself isn’t always explicit. Models must be able to recognize subtleties quickly and proactively. Hive is committed to taking on that challenge and, over the past months, we have partnered with several of our clients to ready our hate model for today’s launch.

How We Help

Hate speech can occur both visually and textually with a large percentage occurring in photos and videos. Powered by our distributed global workforce of more than 2 million registered contributors, Hive’s hate model is trained on more than 25 million human judgments and supports both visual classification models and text moderation models.

Our visual classification models classify entire images into different categories by assigning a confidence score for each class. These models can be multi-headed, where each group of mutually exclusive model classes belongs to a single model head. Within our hate model, some examples of heads include the Nazi and KKK symbols, and other terrorist or white supremacist propaganda. Results from our model are actioned according to platform rules. Many posts are automatically actioned as safe or restricted; others are routed for manual review of edge cases where a symbol may be present but not in a prohibited use. Our visual hate models will typically achieve >98% recall and a <0.1% false positive rate. View our full documentation here.

Our text content moderation model is a multi-head classifier that will now include hate speech. This model automatically detects “hateful language” – defined, with input from our clients, as any language, expression, writing, or speech that expresses / incites violence against, attacks, degrades, or insults a particular group or an individual in a particular group. These specific groups are based on protected attributes such as race, ethnicity, national origin, gender, sex, sexual orientation, disability, and religion. Hateful language includes but is not limited to hate speech, hateful ideology, racial / ethnic slurs, and racism. View our full documentation here.

We are also breaking ground on solving the particularly challenging problem of multimodal relationships between the visual and textual content, and expect to be adding multi-modal capabilities over the next weeks. Multimodal learning allows our models to understand the relationship between both text and visual content in the same setting. This type of learning is important to better understand the meaning of language and the context in which it is used. Accurate multimodal systems can avoid flagging cases where the visual content on its own may be considered hateful, but the presence of counterspeech text — where individuals speak out against the hateful content — negates the hateful signal in the visual content. Similarly, multimodal systems can help flag cases where the visual and textual content independently are not considered to be hateful, but in the context of one another are in fact hateful, such as hateful memes. Over time, we expect this capability to further reduce the need for human reviews of edge cases.

What’s Next?

Today’s release is a milestone we are proud of, but merely the first step in a multi-year commitment to helping platforms filter hate speech from their environments. We will continue to expand and enhance model classification with further input from additional moderation clients and industry groups.

BACK TO ALL BLOGS

Wired

BACK TO ALL BLOGS

How Hive is helping social platforms and BPOs manage emergent content moderation needs during the COVID-19 pandemic

Social platforms face significant PR and revenue risks during the coronavirus crisis, challenged to maintain safe environments in the face of constrained human content moderation and insufficient in-house AI; Hive is using AI and its distributed workforce of 2 million contributors to help

SAN FRANCISCO, CA (March 23, 2020) – The extraordinary measures taken worldwide to limit the spread of the coronavirus disease have disrupted the global economy, as businesses across industries scramble to adapt to a reality few were prepared for. In many cases, companies have stalled operations – with notable examples including airlines, movie theaters, theme parks, and restaurants among others.

The disruption facing consumer technology companies like Google, Facebook, Twitter, and others is different. Engagement on social media platforms is unaffected, if not boosted, by the outbreak. However, underneath user trends are significant public relations and revenue risks if content moderation cannot keep up with the volume of user-generated content uploads.

Hive, a San Francisco-based AI company, has emerged as a leader in helping platforms navigate the disruption through a combination of data labeling services at scale and production-ready automated content moderation models.

Hive operates the world’s largest distributed workforce of humans labeling data, now more than 2 million contributors from more than 100 countries, and has been able to step in to support emergent content moderation data labeling needs as contract workforces of business process outsourcers (BPOs) have been forced to go on hiatus given their inability to work from home. Further, Hive’s suite of automated content moderation models have consistently and significantly outperformed capable models from top public clouds, and are being used by more than 15 leading platforms to reduce the volume of content required for human review.

Context for the Disruption

It is no secret that major social platforms employ tens of thousands of human content moderators to police uploaded content. These massive investments are made to maintain a brand safe environment and protect billions of dollars of ad revenue from marketers who are fast to act when things go wrong.

Most of this moderation is done by contract workers, often secured through outsourced labor from firms like Cognizant and Accenture. Work from home mandates spurred by COVID-19 have disrupted this model, as most of the moderators are not allowed to work from home. Platforms have suggested that they will use automated tools to help fill the gap during the disruption, but they have also acknowledged that this is likely to reduce effectiveness and to result in slower response times than normal.

How Hive is Helping

Hive has emerged in a unique position to meet emergent needs from social media platforms.

As BPOs have been forced to stand down onsite content moderation services, significant demand for data labeling has arisen. Hive has been able to meet these needs on short notice, mobilizing the world’s largest distributed workforce of humans labeling data, now more than 2 million contributors sourced from more than 100 countries. Hive’s workforce is paid to complete data labeling tasks through a consensus-driven workflow that yields high quality ground truth data.

“As more people worldwide stay close to home during the crisis and face unemployment or furloughs, our global workforce has seen significant daily growth and unprecedented capacity,” says Kevin Guo, Co-Founder and CEO of Hive.

Among data labeling service providers, Hive brings differentiated expertise to content moderation use cases. To date, Hive’s workforce has labeled more than 80 million human annotations for “not safe for work” (NSFW) content and more than 40 million human annotations for violent content (e.g. guns, knives, blood). Those preexisting job designs and workforce familiarity has enabled negligible job setup for new clients signed already this week.

Platforms are also relying on Hive to reduce the volume of content required for human review through use of Hive’s automated content moderation product suite. Hive’s models – which span visual, audio, and text solutions – have consistently and significantly outperformed comparable models from top public clouds, and are currently helping to power content moderation solutions for more than fifteen of the top social platforms.

Guo adds, “We have ample capacity for labeling and model deployment and are prepared to support the industry in helping to keep digital environments safe for consumers and brands as we all navigate the disruption caused by COVID-19.”

For press inquiries, contact Kevin Guo, Co-Founder and CEO, at kevin.guo@thehive.ai.

BACK TO ALL BLOGS

Updated Best-in-Class Automated Content Moderation Model

Improved content moderation suite with additional subclasses; now performs better than human moderators

The gold standard for content moderation has always been human moderators. Facebook alone reportedly employs more than 15,000 human moderators. There are critical problems with this manual approach – namely cost, effectiveness, and scalability. Headlines over recent months and years are scattered with high-profile quality issues – and, increasingly, press has covered significant mental health issues affecting full-time content moderators (View article from The Verge).

Here at Hive, we believe AI can transform industries and business processes. Content moderation is a perfect example: there is an obligation on platforms to do this better, and we believe Hive’s role is to power the ecosystem in better addressing the challenge.

We are excited to announce the general release of our enhanced content moderation product suite, featuring significantly improved NSFW and violence detections. Our NSFW model now achieves 97% accuracy and our violence model achieves 95% accuracy, considerably better than typical outsourced moderators (~80%), and even better than an individual Hive annotator (~93%).

Deep learning models are only as good as the data they are trained on, and Hive operates the world’s largest distributed workforce of humans labeling data – now nearly 2 million contributors globally (our data labeling platform is described in further detail in an earlier article).

In our new release, we have more than tripled the training data, built off of a diverse set of user-generated content sourced from the largest content platforms in the world. Our NSFW model is now trained on more than 80 million human annotations and our violence model trained on more than 40 million human annotations.

Model Design

We were selective in our construction of the training dataset, and strategically added the most impactful training examples. For instance, we utilized active learning to select training images where the existing model results were the most uncertain. Deep learning models produce a confidence score on input images which ranges from 0 (very confident the image is not in the class) to 1.0 (very confident the image is in the class). By focusing our labeling efforts on those images in the middle range (0.4 – 0.6), we were able to improve model performance specifically on edge cases.

As part of this release, we also focused on lessening ambiguity in our ‘suggestive’ class in the NSFW model. We conducted a large manual inspection of images where either Hive annotators tended to disagree, or even more crucially, when our model results disagreed with consented Hive annotations. When examining images in certain ground truth sets, we noticed that up to 25% of disagreements between model prediction and human labels were due to erroneous labels, with the model prediction being accurate. Fixing these ground truth images was critical for improving model accuracy. For instance, in the NSFW model, we discovered that moderators disagreed on niche cases, such as which class leggings, contextually implied intercourse, or sheer clothing fell into. By carefully defining boundaries and relabeling data accordingly, we were able to teach the model the distinction in these classes, improving accuracy by as much as 20%.

Classified as clean:

Figure 1.1 - Updated examples of images classified as clean
Figure 1.1 – Updated examples of images classified as clean

Classified as suggestive:

Figure 1.2 - Updated examples of images classified as suggestive
Figure 1.2 – Updated examples of images classified as suggestive

For our violence model, we noticed from client feedback that the classes of knives and guns included instances of these weapons that wouldn’t be considered cause for alarm. For example, we would flag the presence of guns during video games and the presence of knives when cooking. It’s important to note that companies like Facebook have publicly stated the challenge of differentiating between animated and real guns (View article on TechCrunch). In this release, the model now distinguishes between culinary knives and violent knives, and animated guns and real guns, by the introduction of two brand new classes to provide real, actionable alerts on weapons.

Hive can now distinguish between animated guns and real guns:

Figure 2 – Examples of animated guns
Figure 2 – Examples of animated guns

The following knife picture is not considered violent anymore:

Figure 3 - Examples of culinary knives
Figure 3 – Examples of culinary knives

Model Performance

The improvement of our new models compared to our old models is significant.

Our NSFW model was the first and most mature model we built, but after increasing training annotations from 58M to 80M, the model still improved dramatically. At 95% recall, our new model’s error rate is 2%, while our old model’s error rate was 4.2% – a decrease of more than 50%.

Our new violence model was trained on over 40M annotations – a more than 100% increase over the previous training set size of 16M annotations. Performance also improved significantly across all classes. At 90% recall, our new model’s error rate decreased from 27% to 10% (a 63% decrease) for guns, 23% to 10% (a 57% decrease) for knives, and 34% to 20% (a 41% decrease) for blood.

Over the past year, we’ve conducted numerous head-to-head comparisons vs. other market solutions, using both our held-out test sets as well as evaluations using data from some of our largest clients. In all of these studies, Hive’s models came out well ahead of all the other models tested.

Figures 6 and 7 show data in a recent study conducted with one of our most prominent clients, Reddit. For this study, Hive processed 15,000 randomly selected images through our new model, as well as the top three public cloud players: Amazon Rekognition, Microsoft Azure, and Google Cloud’s Vision API.

At recall 90%, Hive precision is 99%; public clouds range between 68 and 78%. This implies that our relative error rate is between 22x and 32x lower!

The outperformance of our violence model is similarly significant.

For guns, at recall 90%, Hive precision is 90%; public clouds achieve about 8%. This implies that our relative error rate is about 9.2x lower!

For knives, at recall 90%, Hive precision is 89%; public clouds achieve about 13%. This implies that our relative error rate is about 7.9x lower!

For blood, at recall 90%, Hive precision is 80%; public clouds range between 4 and 8%. This implies that our relative error rate is between 4.8x and 4.6x lower!

Final Thoughts

This latest model release raises the bar on what is possible from automated content moderation solutions. Solutions like this will considerably reduce the costs of protecting digital environments and limit the need for harmful human moderation jobs across the world. Over the next few months, stay tuned for similar model releases in other relevant moderation classes such as drugs, hate speech and symbols, and propaganda.

For press or inquires, please contact Kevin Guo, Co-Founder and CEO (kevin.guo@thehive.ai)