BACK TO ALL BLOGS

Why We Worked with Parler to Implement Effective Content Moderation

Earlier today, The Washington Post published a feature detailing Hive’s work with social network Parler, and the role our content moderation solutions have played in protecting their community from harmful content and, as a result, earning their app reinstatement in Apple’s App Store.

We are proud of this very public endorsement on the quality of our content moderation solutions, but also know that with such a high-profile client use case there may be questions beyond what could be addressed in the article itself about why we decided to work with Parler and what role we play in their solution. For detailed answers to those questions, please see below.

Why did Hive decide to work with Parler?

We believe that every company should have access to best-in-class content moderation capabilities to create a safe environment for their users. While vendors earlier this year terminated their relationships with Parler after believing their services were enabling a toxic environment, we believe our work addresses the core challenge Parler faced and enables a safe community for Parler’s users to engage.

As outlined in our recent Series D funding announcement, our founders’ precursor to Hive was a consumer app business that itself confronted the challenge of moderating content at scale as the platform quickly grew. The lack of available enterprise-grade, pre-trained AI models to support this content moderation use case (and others) eventually inspired an ambitious repositioning of the company around building a portfolio of cloud-based enterprise AI solutions.

Our founders were not alone. Content moderation has since emerged as a key area of growth in Hive’s business, now powering automated content moderation solutions for more than 75 platforms globally, including prominent dating services, video chat applications, verification services, and more. A December 2020 WIRED article detailed the impact of our work with iconic random chat platform Chatroulette.

When Parler approached us for help in implementing a content moderation solution for their community, we did not take the decision lightly. However, after discussion, we aligned on having built this product to provide democratized access to best-in-class content moderation technology. From our founders’ personal experience, we know it is not feasible for most companies to build effective moderation solutions internally, and we therefore believe we have a responsibility to help any and all companies keep their communities safe from harmful content.

What is Hive’s role in content moderation relative to Parler (or Hive’s other moderation clients)?

Hive provides automated content moderation across video, image, text, and audio, spanning more than 40 classes (i.e., granular definitions of potentially harmful content classifications such as male nudity, gun in hand, or illegal injectables).

Our standard API provides a confidence score for every content submission against all our 40+ model classes. In the instance of Parler, model flagged instances of hate speech or incitement in text are additionally reviewed by members of Hive’s 2.5 million plus distributed workforce (additional details below).

Our clients map our responses to their individual content policies – both in terms of what categories they look to identify, how sensitive content is treated (i.e., blocked or filtered), and the tradeoff between recall (i.e., the percentage of total instances identified by our model) and precision (i.e., the corresponding percentage of identifications where our model is accurate). Hive partners with clients during onboarding as well as on an ongoing basis to provide guidance on setting class-specific thresholds based on client objectives and the desired tradeoffs between recall and precision.

It is the responsibility of companies like Apple to then determine whether the way our clients choose to implement our technology is sufficient to be distributed in their app stores, which in the case of Parler, Apple now has.

What percentage of content is moderated, and how fast?

100% of posts on Parler are processed through Hive’s models at the point of upload, with latency of automated responses in under 1 second.

Parler uses Hive’s visual moderation model to identify nudity, violence, and gore. Any harmful content identified is immediately placed behind a sensitive content filter at the point of upload (notifying users of sensitive content before they view).

Parler also uses Hive’s text moderation model to identify hate speech and incitement. Any potentially harmful content is routed for manual review. Posts deemed safe by Hive’s models are immediately posted to the site, whereas flagged posts are not displayed until model results are validated by a consensus of human workers. It typically takes 1-3 minutes for a flagged post to be validated. Posts containing incitement are blocked from appearing on the platform; posts containing hate speech are placed behind a sensitive content filter. Human review is completed using thousands of workers within Hive’s distributed workforce of more than 2.5 million registered contributors who have opted into and are specifically trained on and qualified to complete the Parler jobs.

In addition to the automated workflow, any user-reported content is automatically routed to Hive’s distributed workforce for additional review and Parler independently maintains a separate jury of internal moderators that handle appeals and other reviews.

This process is illustrated in the graphic below.

How effective is Hive’s moderation of content for Parler, and how does that compare to moderation solutions in place on other social networks?

We have run ongoing tests since launch to evaluate the effectiveness of our models specific to Parler’s content. While we believe that these benchmarks demonstrate best-in-class moderation, there will always be some level of false negatives. However, the models continue to learn from their mistakes, which will further improve the accuracy over time.

Within visual moderation, our tests suggest the incidence rate of adult nudity and sexual activity content not placed behind a sensitive content filter is less than 1 in 10,000 posts. In Facebook’s Q4 2020 Transparency Report (which, separately, we think is a great step forward for the industry and something all platforms should publish), it was reported that the prevalence of adult nudity and sexual activity content on Facebook was ~3 to 4 views per 10,000 views. These numbers can be seen as generally comparable with the assumption that views of posts with sensitive content roughly average the same as all other posts.

Within text moderation, our tests suggest the incidence rate of hate speech (defined as text hateful towards another person or group based on protected attributes, such as religion, nationality, race, sexual orientation, gender, etc.) not placed behind a sensitive content filter was roughly 2 of 10,000 posts. In Q4 2020, Facebook reported the prevalence of hate speech was 7 to 8 views per 10,000 views on their platform.

Our incidence rate of incitement (defined as text that incites or promotes acts of violence) not removed from the platform was roughly 1 in 10,000 posts. This category is not reported by Facebook for the purposes of benchmarking.

Does Hive’s solution prevent the spread of misinformation?

Hive’s scope of support to Parler does not currently support the identification of misinformation or manipulated media (i.e., deepfakes).

We hope the details above are helpful in further increasing understanding of how we work with social networking sites such as Parler and the role we play in keeping their environment (and others) safe from harmful content.

Learn more at https://thehive.ai/ and follow us on Linkedin

Press with additional questions? Please contact press@thehive.ai to request an interview or additional statements.

Note: All data specific to Parler above was shared with explicit permission from Parler.


BACK TO ALL BLOGS

Hive Adds Hate Model to Fully-Automated Content Moderation Suite

Social media platforms increasingly play a pivotal role in both spreading and combating hate speech and discrimination today. Now integrated into Hive’s content moderation suite, Hive’s hate model enables more proactive and comprehensive visual and textual moderation of hate speech online.

Year over year, our content moderation suite has emerged as the preeminent AI-powered solution to both help platforms keep their environments protected from harmful content, and to dramatically reduce the exposure of human moderators to sensitive content. Hive’s content moderation models have consistently and significantly outperformed comparable models, and we are proud to currently work with more than 30 of the world’s largest and fastest-growing social networks and digital video platforms.

Today we are excited to officially integrate our hate model into our content moderation product suite, helping our current and future clients combat racism and hate speech online. We believe that blending our best-in-class models with the significant scale of our clients’ platforms can result in real step-change impact.

Detecting hate speech is a unique challenge that is dynamic and evolving rapidly. Context and subtle nuances vary widely across cultures, languages, and regions. Additionally, hate speech itself isn’t always explicit. Models must be able to recognize subtleties quickly and proactively. Hive is committed to taking on that challenge and, over the past months, we have partnered with several of our clients to ready our hate model for today’s launch.

How We Help

Hate speech can occur both visually and textually with a large percentage occurring in photos and videos. Powered by our distributed global workforce of more than 2 million registered contributors, Hive’s hate model is trained on more than 25 million human judgments and supports both visual classification models and text moderation models.

Our visual classification models classify entire images into different categories by assigning a confidence score for each class. These models can be multi-headed, where each group of mutually exclusive model classes belongs to a single model head. Within our hate model, some examples of heads include the Nazi and KKK symbols, and other terrorist or white supremacist propaganda. Results from our model are actioned according to platform rules. Many posts are automatically actioned as safe or restricted; others are routed for manual review of edge cases where a symbol may be present but not in a prohibited use. Our visual hate models will typically achieve >98% recall and a <0.1% false positive rate. View our full documentation here.

Our text content moderation model is a multi-head classifier that will now include hate speech. This model automatically detects “hateful language” – defined, with input from our clients, as any language, expression, writing, or speech that expresses / incites violence against, attacks, degrades, or insults a particular group or an individual in a particular group. These specific groups are based on protected attributes such as race, ethnicity, national origin, gender, sex, sexual orientation, disability, and religion. Hateful language includes but is not limited to hate speech, hateful ideology, racial / ethnic slurs, and racism. View our full documentation here.

We are also breaking ground on solving the particularly challenging problem of multimodal relationships between the visual and textual content, and expect to be adding multi-modal capabilities over the next weeks. Multimodal learning allows our models to understand the relationship between both text and visual content in the same setting. This type of learning is important to better understand the meaning of language and the context in which it is used. Accurate multimodal systems can avoid flagging cases where the visual content on its own may be considered hateful, but the presence of counterspeech text — where individuals speak out against the hateful content — negates the hateful signal in the visual content. Similarly, multimodal systems can help flag cases where the visual and textual content independently are not considered to be hateful, but in the context of one another are in fact hateful, such as hateful memes. Over time, we expect this capability to further reduce the need for human reviews of edge cases.

What’s Next?

Today’s release is a milestone we are proud of, but merely the first step in a multi-year commitment to helping platforms filter hate speech from their environments. We will continue to expand and enhance model classification with further input from additional moderation clients and industry groups.

BACK TO ALL BLOGS

How Hive is helping social platforms and BPOs manage emergent content moderation needs during the COVID-19 pandemic

Social platforms face significant PR and revenue risks during the coronavirus crisis, challenged to maintain safe environments in the face of constrained human content moderation and insufficient in-house AI; Hive is using AI and its distributed workforce of 2 million contributors to help

SAN FRANCISCO, CA (March 23, 2020) – The extraordinary measures taken worldwide to limit the spread of the coronavirus disease have disrupted the global economy, as businesses across industries scramble to adapt to a reality few were prepared for. In many cases, companies have stalled operations – with notable examples including airlines, movie theaters, theme parks, and restaurants among others.

The disruption facing consumer technology companies like Google, Facebook, Twitter, and others is different. Engagement on social media platforms is unaffected, if not boosted, by the outbreak. However, underneath user trends are significant public relations and revenue risks if content moderation cannot keep up with the volume of user-generated content uploads.

Hive, a San Francisco-based AI company, has emerged as a leader in helping platforms navigate the disruption through a combination of data labeling services at scale and production-ready automated content moderation models.

Hive operates the world’s largest distributed workforce of humans labeling data, now more than 2 million contributors from more than 100 countries, and has been able to step in to support emergent content moderation data labeling needs as contract workforces of business process outsourcers (BPOs) have been forced to go on hiatus given their inability to work from home. Further, Hive’s suite of automated content moderation models have consistently and significantly outperformed capable models from top public clouds, and are being used by more than 15 leading platforms to reduce the volume of content required for human review.

Context for the Disruption

It is no secret that major social platforms employ tens of thousands of human content moderators to police uploaded content. These massive investments are made to maintain a brand safe environment and protect billions of dollars of ad revenue from marketers who are fast to act when things go wrong.

Most of this moderation is done by contract workers, often secured through outsourced labor from firms like Cognizant and Accenture. Work from home mandates spurred by COVID-19 have disrupted this model, as most of the moderators are not allowed to work from home. Platforms have suggested that they will use automated tools to help fill the gap during the disruption, but they have also acknowledged that this is likely to reduce effectiveness and to result in slower response times than normal.

How Hive is Helping

Hive has emerged in a unique position to meet emergent needs from social media platforms.

As BPOs have been forced to stand down onsite content moderation services, significant demand for data labeling has arisen. Hive has been able to meet these needs on short notice, mobilizing the world’s largest distributed workforce of humans labeling data, now more than 2 million contributors sourced from more than 100 countries. Hive’s workforce is paid to complete data labeling tasks through a consensus-driven workflow that yields high quality ground truth data.

“As more people worldwide stay close to home during the crisis and face unemployment or furloughs, our global workforce has seen significant daily growth and unprecedented capacity,” says Kevin Guo, Co-Founder and CEO of Hive.

Among data labeling service providers, Hive brings differentiated expertise to content moderation use cases. To date, Hive’s workforce has labeled more than 80 million human annotations for “not safe for work” (NSFW) content and more than 40 million human annotations for violent content (e.g. guns, knives, blood). Those preexisting job designs and workforce familiarity has enabled negligible job setup for new clients signed already this week.

Platforms are also relying on Hive to reduce the volume of content required for human review through use of Hive’s automated content moderation product suite. Hive’s models – which span visual, audio, and text solutions – have consistently and significantly outperformed comparable models from top public clouds, and are currently helping to power content moderation solutions for more than fifteen of the top social platforms.

Guo adds, “We have ample capacity for labeling and model deployment and are prepared to support the industry in helping to keep digital environments safe for consumers and brands as we all navigate the disruption caused by COVID-19.”

For press inquiries, contact Kevin Guo, Co-Founder and CEO, at kevin.guo@thehive.ai.

BACK TO ALL BLOGS

Updated Best-in-Class Automated Content Moderation Model

Improved content moderation suite with additional subclasses; now performs better than human moderators

The gold standard for content moderation has always been human moderators. Facebook alone reportedly employs more than 15,000 human moderators. There are critical problems with this manual approach – namely cost, effectiveness, and scalability. Headlines over recent months and years are scattered with high-profile quality issues – and, increasingly, press has covered significant mental health issues affecting full-time content moderators (View article from The Verge).

Here at Hive, we believe AI can transform industries and business processes. Content moderation is a perfect example: there is an obligation on platforms to do this better, and we believe Hive’s role is to power the ecosystem in better addressing the challenge.

We are excited to announce the general release of our enhanced content moderation product suite, featuring significantly improved NSFW and violence detections. Our NSFW model now achieves 97% accuracy and our violence model achieves 95% accuracy, considerably better than typical outsourced moderators (~80%), and even better than an individual Hive annotator (~93%).

Deep learning models are only as good as the data they are trained on, and Hive operates the world’s largest distributed workforce of humans labeling data – now nearly 2 million contributors globally (our data labeling platform is described in further detail in an earlier article).

In our new release, we have more than tripled the training data, built off of a diverse set of user-generated content sourced from the largest content platforms in the world. Our NSFW model is now trained on more than 80 million human annotations and our violence model trained on more than 40 million human annotations.

Model Design

We were selective in our construction of the training dataset, and strategically added the most impactful training examples. For instance, we utilized active learning to select training images where the existing model results were the most uncertain. Deep learning models produce a confidence score on input images which ranges from 0 (very confident the image is not in the class) to 1.0 (very confident the image is in the class). By focusing our labeling efforts on those images in the middle range (0.4 – 0.6), we were able to improve model performance specifically on edge cases.

As part of this release, we also focused on lessening ambiguity in our ‘suggestive’ class in the NSFW model. We conducted a large manual inspection of images where either Hive annotators tended to disagree, or even more crucially, when our model results disagreed with consented Hive annotations. When examining images in certain ground truth sets, we noticed that up to 25% of disagreements between model prediction and human labels were due to erroneous labels, with the model prediction being accurate. Fixing these ground truth images was critical for improving model accuracy. For instance, in the NSFW model, we discovered that moderators disagreed on niche cases, such as which class leggings, contextually implied intercourse, or sheer clothing fell into. By carefully defining boundaries and relabeling data accordingly, we were able to teach the model the distinction in these classes, improving accuracy by as much as 20%.

Classified as clean:

Figure 1.1 - Updated examples of images classified as clean
Figure 1.1 – Updated examples of images classified as clean

Classified as suggestive:

Figure 1.2 - Updated examples of images classified as suggestive
Figure 1.2 – Updated examples of images classified as suggestive

For our violence model, we noticed from client feedback that the classes of knives and guns included instances of these weapons that wouldn’t be considered cause for alarm. For example, we would flag the presence of guns during video games and the presence of knives when cooking. It’s important to note that companies like Facebook have publicly stated the challenge of differentiating between animated and real guns (View article on TechCrunch). In this release, the model now distinguishes between culinary knives and violent knives, and animated guns and real guns, by the introduction of two brand new classes to provide real, actionable alerts on weapons.

Hive can now distinguish between animated guns and real guns:

Figure 2 – Examples of animated guns
Figure 2 – Examples of animated guns

The following knife picture is not considered violent anymore:

Figure 3 - Examples of culinary knives
Figure 3 – Examples of culinary knives

Model Performance

The improvement of our new models compared to our old models is significant.

Our NSFW model was the first and most mature model we built, but after increasing training annotations from 58M to 80M, the model still improved dramatically. At 95% recall, our new model’s error rate is 2%, while our old model’s error rate was 4.2% – a decrease of more than 50%.

Our new violence model was trained on over 40M annotations – a more than 100% increase over the previous training set size of 16M annotations. Performance also improved significantly across all classes. At 90% recall, our new model’s error rate decreased from 27% to 10% (a 63% decrease) for guns, 23% to 10% (a 57% decrease) for knives, and 34% to 20% (a 41% decrease) for blood.

Over the past year, we’ve conducted numerous head-to-head comparisons vs. other market solutions, using both our held-out test sets as well as evaluations using data from some of our largest clients. In all of these studies, Hive’s models came out well ahead of all the other models tested.

Figures 6 and 7 show data in a recent study conducted with one of our most prominent clients, Reddit. For this study, Hive processed 15,000 randomly selected images through our new model, as well as the top three public cloud players: Amazon Rekognition, Microsoft Azure, and Google Cloud’s Vision API.

At recall 90%, Hive precision is 99%; public clouds range between 68 and 78%. This implies that our relative error rate is between 22x and 32x lower!

The outperformance of our violence model is similarly significant.

For guns, at recall 90%, Hive precision is 90%; public clouds achieve about 8%. This implies that our relative error rate is about 9.2x lower!

For knives, at recall 90%, Hive precision is 89%; public clouds achieve about 13%. This implies that our relative error rate is about 7.9x lower!

For blood, at recall 90%, Hive precision is 80%; public clouds range between 4 and 8%. This implies that our relative error rate is between 4.8x and 4.6x lower!

Final Thoughts

This latest model release raises the bar on what is possible from automated content moderation solutions. Solutions like this will considerably reduce the costs of protecting digital environments and limit the need for harmful human moderation jobs across the world. Over the next few months, stay tuned for similar model releases in other relevant moderation classes such as drugs, hate speech and symbols, and propaganda.

For press or inquires, please contact Kevin Guo, Co-Founder and CEO (kevin.guo@thehive.ai)


BACK TO ALL BLOGS

The Effect of Dirty Data on Deep Learning Systems

Introduction

Better training data can significantly boost the performance of a deep learning model, especially when deployed in production. In this blog post, we will illustrate the impact of dirty data, and why correct labeling is important for increasing the model accuracy.

Background

An adversarial attack fools an image classifier by adding an imperceptible amount of noise to an image. One possible way to defend against this is to simply train machine learning models on adversarial examples. We can collect various hard mining examples and add them to the dataset. Another interesting model architecture to explore is generative adversarial network, which generally consist of two parts: a generator to generate fake examples in order to fool the discriminator, and a discriminator to discriminate between clean/fake examples.

Another possible type of attack, data poisoning, can happen during training time. The attacker can identify the weak parts of a machine learning architecture, and potentially modify the training data to confuse the model. Even slight perturbations to the training data and label can result in worse performance. There are several methods to defend against such data poisoning attacks. For example, it is possible to separate clean training examples from poisoned ones, so that the outliers are deleted from the dataset.

In this blog post, we investigate the impact of data poisoning (dirty data) using the simulation method: random labeling loss. We will show that with the same model architecture and dataset size, we are able to get huge accuracy increase with better data labeling.

Data

We experiment with the CIFAR-100 dataset, which has 100 classes and 600 32×32 coloured images per class.

We use the following steps to preprocess the images in the dataset

  • Pad each image to 36×36, then randomly crop to 32×32 patch
  • Apply random flip horizontally
  • Distort image brightness and contrast randomly

The dataset is randomly split into 50k training images and 10k evaluation images. Random labeling is the substitution of training data labels with random labels drawn from the marginal of data labels. Different amounts of random labeling loss are added to the training data. We simply shuffle certain amount of labels for each class. The images to be shuffled are chosen randomly from each class. Because of the randomness, the generated dataset is still balanced. Note that evaluation labels are not changed.

We test the model with 4 different datasets, 1 clean and 3 noisy ones.

  • Clean: No random noise. We assume that all labeling is correct for CIFAR-100 dataset. Named as ‘no_noise’.
  • Noisy: 20% random labeling noise. Named as ‘noise_20’.
  • Noisy: 40% random labeling noise. Named as ‘noise_40’.
  • Noisy: 60% random labeling noise. Named as ‘noise_60’.

Note that we choose aggressive data poisoning because the production model we build is robust to small amount of random noise. Note that the random labeling scheme allows us to simulate the effect of dirty data (data poisoning) in real world scenario.

Model

We investigate the impact of dirty data on one of the popular model, ResNet-152 model architecture. Normally it is a good idea to perform fine-tuning on pre-trained checkpoints to get better accuracy with fewer training steps. In this blog the model is trained from scratch, because we want to get a general idea of how noisy data would affect the training and final results without any prior knowledge gained from pretraining.

We optimize the model with SGD (stochastic gradient descent) optimizer with cosine learning rate decay.

Results

Quantitative results:

Accuracy

Cleaner datasets consistently perform better on the validation set. The model trained on the original CIFAR-100 dataset gives us 0.65 accuracy, using top 5 predictions boost the accuracy to 0.87. Testing accuracy decreases with more noise added. Each time we add 20% more random noise to the training data, testing accuracy drop by about 10%. Note that even if we add 60% random labeling noise, our model still manages to get 0.24 accuracy on the validation set. The variance of the training data, preprocessing methods and regularization terms help increase the robustness of the model. So even if it is learning from a very noisy dataset, the model is still able to learn certain useful features, although the overall performance significantly degrades.

Qualitative results:

Learning curve
Learning curve
Losses
Losses
Precision recall curves
Precision recall curves

Cleaner datasets consistently perform better on the validation set. The model trained on the original CIFAR-100 dataset gives us 0.65 accuracy, using top 5 predictions boost the accuracy to 0.87. Testing accuracy decreases with more noise added. Each time we add 20% more random noise to the training data, testing accuracy drop by about 10%. Note that even if we add 60% random labeling noise, our model still manages to get 0.24 accuracy on the validation set. The variance of the training data, preprocessing methods and regularization terms help increase the robustness of the model. So even if it is learning from a very noisy dataset, the model is still able to learn certain useful features, although the overall performance significantly degrades.

Conclusion

In this post we investigate the impact of data poisoning attacks on performances using image classification as an example task, by the random labeling simulation method. We show that popular model (ResNet) is somewhat robust to data poisoning, but the performance still significantly degrades after poisoning. High-quality labeling is thus crucial to modern deep learning systems.