BACK TO ALL BLOGS

How Hive is helping social platforms and BPOs manage emergent content moderation needs during the COVID-19 pandemic

Social platforms face significant PR and revenue risks during the coronavirus crisis, challenged to maintain safe environments in the face of constrained human content moderation and insufficient in-house AI; Hive is using AI and its distributed workforce of 2 million contributors to help

SAN FRANCISCO, CA (March 23, 2020) – The extraordinary measures taken worldwide to limit the spread of the coronavirus disease have disrupted the global economy, as businesses across industries scramble to adapt to a reality few were prepared for. In many cases, companies have stalled operations – with notable examples including airlines, movie theaters, theme parks, and restaurants among others.

The disruption facing consumer technology companies like Google, Facebook, Twitter, and others is different. Engagement on social media platforms is unaffected, if not boosted, by the outbreak. However, underneath user trends are significant public relations and revenue risks if content moderation cannot keep up with the volume of user-generated content uploads.

Hive, a San Francisco-based AI company, has emerged as a leader in helping platforms navigate the disruption through a combination of data labeling services at scale and production-ready automated content moderation models.

Hive operates the world’s largest distributed workforce of humans labeling data, now more than 2 million contributors from more than 100 countries, and has been able to step in to support emergent content moderation data labeling needs as contract workforces of business process outsourcers (BPOs) have been forced to go on hiatus given their inability to work from home. Further, Hive’s suite of automated content moderation models have consistently and significantly outperformed capable models from top public clouds, and are being used by more than 15 leading platforms to reduce the volume of content required for human review.

Context for the Disruption

It is no secret that major social platforms employ tens of thousands of human content moderators to police uploaded content. These massive investments are made to maintain a brand safe environment and protect billions of dollars of ad revenue from marketers who are fast to act when things go wrong.

Most of this moderation is done by contract workers, often secured through outsourced labor from firms like Cognizant and Accenture. Work from home mandates spurred by COVID-19 have disrupted this model, as most of the moderators are not allowed to work from home. Platforms have suggested that they will use automated tools to help fill the gap during the disruption, but they have also acknowledged that this is likely to reduce effectiveness and to result in slower response times than normal.

How Hive is Helping

Hive has emerged in a unique position to meet emergent needs from social media platforms.

As BPOs have been forced to stand down onsite content moderation services, significant demand for data labeling has arisen. Hive has been able to meet these needs on short notice, mobilizing the world’s largest distributed workforce of humans labeling data, now more than 2 million contributors sourced from more than 100 countries. Hive’s workforce is paid to complete data labeling tasks through a consensus-driven workflow that yields high quality ground truth data.

“As more people worldwide stay close to home during the crisis and face unemployment or furloughs, our global workforce has seen significant daily growth and unprecedented capacity,” says Kevin Guo, Co-Founder and CEO of Hive.

Among data labeling service providers, Hive brings differentiated expertise to content moderation use cases. To date, Hive’s workforce has labeled more than 80 million human annotations for “not safe for work” (NSFW) content and more than 40 million human annotations for violent content (e.g. guns, knives, blood). Those preexisting job designs and workforce familiarity has enabled negligible job setup for new clients signed already this week.

Platforms are also relying on Hive to reduce the volume of content required for human review through use of Hive’s automated content moderation product suite. Hive’s models – which span visual, audio, and text solutions – have consistently and significantly outperformed comparable models from top public clouds, and are currently helping to power content moderation solutions for more than fifteen of the top social platforms.

Guo adds, “We have ample capacity for labeling and model deployment and are prepared to support the industry in helping to keep digital environments safe for consumers and brands as we all navigate the disruption caused by COVID-19.”

For press inquiries, contact Kevin Guo, Co-Founder and CEO, at kevin.guo@thehive.ai.

BACK TO ALL BLOGS

Hive Named to Fast Company’s Annual List of the World’s Most Innovative Companies for 2020

Hive has been named to Fast Company’s prestigious annual list of the World’s Most Innovative Companies for 2020

SAN FRANCISCO, CA (March 10, 2020) – Hive has been named to Fast Company’s prestigious annual list of the World’s Most Innovative Companies for 2020.

The list honors the businesses making the most profound impact on both industry and culture, showcasing a variety of ways to thrive in today’s fast-changing world. This year’s MIC list features 434 businesses from 39 countries.

“It’s an honor to be featured in Fast Company’s list of the Most Innovative Companies for 2020,” said Kevin Guo, Co-Founder and CEO of Hive. “This recognition follows a year of step-change growth in Hive’s business and team, and symbolizes our progress in powering practical AI solutions for enterprise customers across industries.”

Hive is a full-stack AI company specialized in computer vision and deep learning, serving clients across industries with data labeling, model licensing, and subscription data products. During 2019, Hive grew to more than 100 clients, including 10 companies with market capitalizations exceeding $100 billion.

At the core of Hive’s business, the company operates the world’s largest distributed workforce of humans labeling data – now boasting nearly 2 million registered contributors globally. Hive’s workforce hand-labeled more than 1.3 billion pieces of training data in 2019, inputs to a consensus-driven workflow that powers deep learning models with unparalleled accuracy compared to similar offerings from the largest public cloud providers.

The company’s core models serve use cases including automated content moderation, logo and object detection, optical character recognition, voice transcription, and context classification. Across its models, Hive processed nearly 20 billion API calls in 2019.

The company also operates Mensio, a media analytics platform developed in partnership with Bain & Company that integrates Hive’s proprietary TV content metadata on commercial airings and camera-visible sponsorship placements with third-party viewership and outcome datasets. Mensio is currently in use by leading TV network owners, brands, and agencies for competitive intelligence, media planning, and optimization.

Fast Company’s editors and writers sought out the most groundbreaking businesses on the planet and across myriad industries. They also judged nominations received through their application process.

The World’s Most Innovative Companies is Fast Company’s signature franchise and one of its most highly anticipated editorial efforts of the year. It provides both a snapshot and a road map for the future of innovation across the most dynamic sectors of the economy.

“At a time of increasing global volatility, this year’s list showcases the resilience and optimism of businesses across the world. These companies are applying creativity to solve challenges within their industries and far beyond,” said Fast Company senior editor Amy Farley, who oversaw the issue with deputy editor David Lidsky.

Fast Company’s Most Innovative Companies issue (March/April 2020) is now available online at fastcompany.com/most-innovative-companies/2020, as well as in app form via iTunes and on newsstands beginning March 17, 2020. The hashtag is #FCMostInnovative.

About Hive

Hive is an AI company specialized in computer vision and deep learning, focused on powering innovators across industries with practical AI solutions and data labeling, grounded in the world’s highest quality visual and audio metadata. For more information, visit thehive.ai.

About Fast Company:

Fast Company is the only media brand fully dedicated to the vital intersection of business, innovation, and design, engaging the most influential leaders, companies, and thinkers on the future of business. Since 2011, Fast Company has received some of the most prestigious editorial and design accolades, including the American Society of Magazine Editors (ASME) National Magazine Award for “Magazine of the Year,” Adweek’s Hot List for “Hottest Business Publication,” and six gold medals and 10 silver medals from the Society of Publication Designers. The editor-in-chief is Stephanie Mehta and the publisher is Amanda Smith. Headquartered in New York City, Fast Company is published by Mansueto Ventures LLC, along with our sister publication Inc., and can be found online at www.fastcompany.com.

BACK TO ALL BLOGS

Updated Best-in-Class Automated Content Moderation Model

Improved content moderation suite with additional subclasses; now performs better than human moderators

The gold standard for content moderation has always been human moderators. Facebook alone reportedly employs more than 15,000 human moderators. There are critical problems with this manual approach – namely cost, effectiveness, and scalability. Headlines over recent months and years are scattered with high-profile quality issues – and, increasingly, press has covered significant mental health issues affecting full-time content moderators (View article from The Verge).

Here at Hive, we believe AI can transform industries and business processes. Content moderation is a perfect example: there is an obligation on platforms to do this better, and we believe Hive’s role is to power the ecosystem in better addressing the challenge.

We are excited to announce the general release of our enhanced content moderation product suite, featuring significantly improved NSFW and violence detections. Our NSFW model now achieves 97% accuracy and our violence model achieves 95% accuracy, considerably better than typical outsourced moderators (~80%), and even better than an individual Hive annotator (~93%).

Deep learning models are only as good as the data they are trained on, and Hive operates the world’s largest distributed workforce of humans labeling data – now nearly 2 million contributors globally (our data labeling platform is described in further detail in an earlier article).

In our new release, we have more than tripled the training data, built off of a diverse set of user-generated content sourced from the largest content platforms in the world. Our NSFW model is now trained on more than 80 million human annotations and our violence model trained on more than 40 million human annotations.

Model Design

We were selective in our construction of the training dataset, and strategically added the most impactful training examples. For instance, we utilized active learning to select training images where the existing model results were the most uncertain. Deep learning models produce a confidence score on input images which ranges from 0 (very confident the image is not in the class) to 1.0 (very confident the image is in the class). By focusing our labeling efforts on those images in the middle range (0.4 – 0.6), we were able to improve model performance specifically on edge cases.

As part of this release, we also focused on lessening ambiguity in our ‘suggestive’ class in the NSFW model. We conducted a large manual inspection of images where either Hive annotators tended to disagree, or even more crucially, when our model results disagreed with consented Hive annotations. When examining images in certain ground truth sets, we noticed that up to 25% of disagreements between model prediction and human labels were due to erroneous labels, with the model prediction being accurate. Fixing these ground truth images was critical for improving model accuracy. For instance, in the NSFW model, we discovered that moderators disagreed on niche cases, such as which class leggings, contextually implied intercourse, or sheer clothing fell into. By carefully defining boundaries and relabeling data accordingly, we were able to teach the model the distinction in these classes, improving accuracy by as much as 20%.

Classified as clean:

Figure 1.1 - Updated examples of images classified as clean
Figure 1.1 – Updated examples of images classified as clean

Classified as suggestive:

Figure 1.2 - Updated examples of images classified as suggestive
Figure 1.2 – Updated examples of images classified as suggestive

For our violence model, we noticed from client feedback that the classes of knives and guns included instances of these weapons that wouldn’t be considered cause for alarm. For example, we would flag the presence of guns during video games and the presence of knives when cooking. It’s important to note that companies like Facebook have publicly stated the challenge of differentiating between animated and real guns (View article on TechCrunch). In this release, the model now distinguishes between culinary knives and violent knives, and animated guns and real guns, by the introduction of two brand new classes to provide real, actionable alerts on weapons.

Hive can now distinguish between animated guns and real guns:

Figure 2 – Examples of animated guns
Figure 2 – Examples of animated guns

The following knife picture is not considered violent anymore:

Figure 3 - Examples of culinary knives
Figure 3 – Examples of culinary knives

Model Performance

The improvement of our new models compared to our old models is significant.

Our NSFW model was the first and most mature model we built, but after increasing training annotations from 58M to 80M, the model still improved dramatically. At 95% recall, our new model’s error rate is 2%, while our old model’s error rate was 4.2% – a decrease of more than 50%.

Our new violence model was trained on over 40M annotations – a more than 100% increase over the previous training set size of 16M annotations. Performance also improved significantly across all classes. At 90% recall, our new model’s error rate decreased from 27% to 10% (a 63% decrease) for guns, 23% to 10% (a 57% decrease) for knives, and 34% to 20% (a 41% decrease) for blood.

Over the past year, we’ve conducted numerous head-to-head comparisons vs. other market solutions, using both our held-out test sets as well as evaluations using data from some of our largest clients. In all of these studies, Hive’s models came out well ahead of all the other models tested.

Figures 6 and 7 show data in a recent study conducted with one of our most prominent clients, Reddit. For this study, Hive processed 15,000 randomly selected images through our new model, as well as the top three public cloud players: Amazon Rekognition, Microsoft Azure, and Google Cloud’s Vision API.

At recall 90%, Hive precision is 99%; public clouds range between 68 and 78%. This implies that our relative error rate is between 22x and 32x lower!

The outperformance of our violence model is similarly significant.

For guns, at recall 90%, Hive precision is 90%; public clouds achieve about 8%. This implies that our relative error rate is about 9.2x lower!

For knives, at recall 90%, Hive precision is 89%; public clouds achieve about 13%. This implies that our relative error rate is about 7.9x lower!

For blood, at recall 90%, Hive precision is 80%; public clouds range between 4 and 8%. This implies that our relative error rate is between 4.8x and 4.6x lower!

Final Thoughts

This latest model release raises the bar on what is possible from automated content moderation solutions. Solutions like this will considerably reduce the costs of protecting digital environments and limit the need for harmful human moderation jobs across the world. Over the next few months, stay tuned for similar model releases in other relevant moderation classes such as drugs, hate speech and symbols, and propaganda.

For press or inquires, please contact Kevin Guo, Co-Founder and CEO (kevin.guo@thehive.ai)