Home

AI Models

AI Models Overview

Hive’s AI models empower developers by providing the building blocks for understanding content use cases like content moderation, paywall enforcement, and more.

Content Tagging APIs
Visual Moderation
Visual Moderation
Text Moderation
Text Moderation
Audio Moderation
Audio Moderation
Demographic
Demographic
Logo & Logo Location
Logo & Logo Location
Visual Context
Visual Context
OCR
OCR
Speech-to-Text
Speech-to-Text
Translation
Translation
AI-Generated Media Recognition
AI-Generated Media Recognition
Intelligent Search APIs
Web Search
Web Search
Custom Search
Custom Search
Copyright Search
Copyright Search
NFT Search
NFT Search
Contextual Search
Contextual Search

Solutions

Technology & Digital Platforms

Solutions for moderating content of all forms, ensuring proprietary content stays on your platform, and more.

By Audience
For Online Communities
For Online Communities
For Dating Apps
For Dating Apps
For NFT Platforms
For NFT Platforms
For Marketplaces
For Marketplaces
By Application
Moderation Dashboard
Moderation Dashboard
Content Moderation
Content Moderation
Sports, Media, & Marketing

Solutions for measuring sponsorships, monitoring cross-platform advertising, and unlocking premium ad inventory.

By Audience
For Brands
For Brands
For Agencies
For Agencies
For Publishers
For Publishers
For Teams and Leagues
For Teams and Leagues
By Application
Sponsorship Intelligence
Sponsorship Intelligence
Ad Intelligence
Ad Intelligence
Context-Based Ad Targeting
Context-Based Ad Targeting

Docs

Company

Careers
About Us

Blog

Home
AI Models
AI Models Overview
Hive’s AI models empower developers by providing the building blocks for content understanding use cases like content moderation, paywall enforcement, and more.
Content Tagging APIs

Visual Moderation

Best-in-class moderation for images, videos, GIFs, and livestreams

Text Moderation

Moderate text and text in images with best-in-class deep learning models

Audio Moderation

Screen audio in real time to moderate speech and detect inappropriate sounds

Demographic

Identify demographic attributes within videos and images

Logo & Logo Location

Identify thousands of logos and common logo locations within videos and images

Visual Context

Identify common objects, settings, and event types within videos and images

OCR

Extract emojis and text in 15+ languages from videos and images

Speech-to-Text

Transcribe audio in real-time across multiple languages

Translation

Translate text in real time across multiple languages

AI-Generated Media Recognition

Identify and moderate AI-generated artwork, photos, and memes
Intelligent Search APIs

Web Search

Find copies and variants of content against billions of public web images

Custom Search

Find copies and variants of content within a configurable media library

Copyright Search

Find copies and variants of movies, TV shows, and broadcasts

NFT Search

Verify NFT uniqueness against millions of assets on major blockchains

Contextual Search

Natural language search on large image sets
Solutions
Technology & Digital Platforms
Solutions for moderating content of all forms, ensuring proprietary content stays on your platform, and more.
Sports, Media, & Marketing
Solutions for measuring sponsorships, monitoring cross-platform advertising, and unlocking premium ad inventory.
By Audience

For Online Communities

Screen user-generated visual, text, and audio content in real time

For Dating Apps

Elevate user experience with better moderation and user verification

For NFT Platforms

Protect creators and collectors from scams and fraud

For Marketplaces

Verify listings, enforce digital ownership, and moderate user interactions
By Application

Moderation Dashboard

Streamline moderation workflows with our no-code Moderation Dashboard

Content Moderation

Protect your community with real-time moderation of video, image, text, and audio
By Audience

For Brands

Measure and manage cross-platform advertising and sponsorships

For Publishers

Efficiently and effectively monetize advertising and sponsorship inventory

For Teams and Leagues

Monetize sponsorship asset inventory

For Agencies

Measure and manage cross-platform advertising and sponsorships
By Application

Sponsorship Intelligence

Cross-platform sponsorship measurement platform, powered by Hive's AI

Ad Intelligence

Analyze cross-platform ad intelligence in our Mensio platform

Context-Based Ad Targeting

Capitalize on ad inventory with contextual targeting and brand safety analysis
Docs
Company

About Us

Careers

Blog
BACK TO ALL BLOGS

OCR Moderation with Hive: New Approaches to Online Content Moderation

Hive
April 8, 2022September 15, 2022
LinkedIn icon
LinkedIn icon

Recently, image-based content featuring embedded text – such as memes, captioned images and GIFs, and screenshots of text – have exploded in popularity across many social platforms. These types of content can present unique challenges for automated moderation tools. Not only does embedded text need to be detected and ordered accurately, it also must be analyzed with contextual awareness and attention to semantic nuance. 

Emojis have historically been another obstacle for automated moderation. Thanks to native support across many devices and platforms, these characters have evolved into a new online lexicon for accentuating or replacing text. Many emojis have also developed connotations that are well-understood by humans but not directly related to the image itself, which can make it difficult for automated solutions to identify harmful or inappropriate text content.

To help platforms tackle these challenges, Hive offers optical character recognition (OCR)-based moderation as part of our content moderation suite. Our OCR models are optimized for the types of digitally-generated content that commonly appears on social platforms, enabling robust AI moderation on content forms that are widespread yet overlooked by other solutions. Our OCR moderation API combines competitive text detection and transcription capabilities with our best-in-class text moderation model (including emoji support) into a single response, making it easy for platforms to take real-time enforcement actions across these popular content formats. 

OCR Model for Text Recognition

Effective OCR moderation starts with training for accurate text detection and extraction. Hive’s OCR model is trained on a large, proprietary set of examples that optimizes for how text commonly appears within user-generated digital content. Hive has the largest distributed workforce for data labeling in the world, and we leaned on this capability to provide tens of millions of human annotations on these examples to build our model’s understanding. 

We recently conducted a head-to-head comparison of our OCR model against top public cloud solutions using a custom evaluation dataset sourced from social platforms. We were particularly interested in test examples that featured digitally-generated text – such as memes and captioned images – to capture how content commonly appears on social platforms and selected evaluation data accordingly. 

OCR Model performance compared between Hive and public cloud solutions in precision-recalls curves for end-to-end text detection and transcription. At 90% recall, Hive's OCR Model achieved about 98% precision, while other models ranged from about 88% precision to 97% precision.
Precision vs. recall for text detection and transcription with an edit distance of two characters. Evaluation set consisted of ~2000 images containing latin text, and data was chosen to account for various input constraints of public cloud models. The dotted line represents our chosen point of comparison at 90% recall.

In this evaluation, we looked at end-to-end text recognition, which includes both text detection and text transcription. Here, Hive’s OCR model outperformed or was competitive with other models on both exact transcription and transcription allowing character-level errors. At 90% recall, Hive’s OCR model achieved a precision of 98%, while public cloud models ranged from ~88% to 97%, implying a similar or lower end-to-end error rate.

OCR Moderation: Language Support

We recognize that many platforms’ moderation needs extend beyond English-speaking users. Hive’s OCR model supports text recognition and transcription for many widely spoken languages with comparable performance, many of which are also supported by our text moderation solutions. Here’s an overview of our current language support:

LanguageOCR Support?Text Moderation Support?
EnglishYesYes (Model)
SpanishYesYes (Model)
FrenchYesYes (Model)
GermanYesYes (Model)
MandarinYesYes (Pattern Match)
RussianYesYes (Pattern Match)
PortugueseYesYes (Model)
ArabicYesYes (Model)
KoreanYesYes (Pattern Match)
JapaneseYesYes (Pattern Match)
HindiYesYes (Model)
ItalianYesYes (Pattern Match)

Moderation of Detected Text

Hive’s OCR moderation solution goes beyond producing a transcript – we then apply our best-in-class text moderation model to understand the meaning of that speech in context (including any detected emojis). Our backend will automatically feed text detected in an image as an input to our text moderation model, making our model classifications on image-based text accessible with a single API call. Our text model is generally robust to misspellings and character substitutions, enabling high classification accuracies on text extracted via OCR even if errors occur in transcription. 

Hive’s text moderation model can classify extracted text across several sensitive or inappropriate categories, including sexuality, threats or descriptions of violence, bullying, and racism. 

Another critical use-case is moderating spam and doxxing: OCR moderation will quickly and accurately flag images containing emails, phone numbers, addresses and other personal identifiable information.  Finally, our text moderation model can also identify promotions such as soliciting services, asking for shares and follows, soliciting donations, or links to external sites. This gives platforms new tools to curate user experience and remove junk content. 

We understand that verbal communication is rarely black and white – context and linguistic nuance can have profound effects on how meaning and intent of words are perceived. To help navigate these gray areas, our text model responses supplement classifications with a score from benign (score = 0) to severe (score = 3), which can be used to adapt any necessary moderation actions to platforms’ individual needs and sensitivities. You can read more about our text models in previous blog posts or in our documentation.

Our currently supported moderation classes in each language are as follows:

LanguageClasses
EnglishSexual, Hate, Violence, Bullying
SpanishSexual, Hate
PortugueseSexual, Hate
FrenchSexual
GermanSexual
HindiSexual
ArabicSexual

Emoji Classification for Text Moderation

Emoji recognition is a unique feature of Hive’s OCR moderation model that opens up new possibilities for identifying harmful or harassing text-based content. Emojis can be particularly useful in moderation contexts because they can subtly (or not-so-subtly) alter how accompanying text is interpreted by the reader. Text that is otherwise innocuous can easily become inappropriate when accompanied by a particular emoji and vice-versa.

Hive OCR is able to detect and classify any emojis supported by Apple, Samsung, or Google devices. Our OCR model currently achieves a weighted accuracy of over 97% when classifying emojis. This enables our text moderation model to account for contextual meaning and connotations of emojis used in input text. 

To get a sense of our model’s understanding, let’s take a look at some examples of how use of emojis (or inclusion of text around emojis) changes our model predictions to align with human understanding. Each of these examples is from a real classification task submitted to our latest model release. 

Comparison of Hive OCR moderation model output in the "bullying" class for two text inputs: "you're a person" (bullying score 0) and "you're a [garbage emoji] person" (bullying score 2)

Here’s a basic example of how adding an emoji changes our model response from classifying as clean to classifying as sensitive.  Our models understand not only the verbal concept represented by the emoji, but what the emoji means semantically based on where it is located in the text. In this case, the bullying connotation of the “garbage” or “trash” emoji would be completely missed by an analysis of the text alone. 

Our model is similarly sensitive to changes in semantic meaning caused by substitutions of emojis for text.

Comparison of Hive OCR moderation model output in the "sexual" class for two text inputs: "lemme see that eggplant!" (sexual score 0) and "lemme see that [eggplant emoji]!" (sexual score 3)

In this case, our model catches the sexual connotation added by the eggplant emoji in place of the word “eggplant.” Again, the text alone without an emoji – “lemme see that !” – is completely clean.

In addition to understanding how emojis can alter the meaning of text, our model is also sensitive to how text can change implications of emojis themselves.

Comparison of Hive OCR moderation model output in the "sexual" class for two text inputs: emoji alone (sexual score 0) and "hey hotty" with emoji (sexual score 2)

Here, adding the phrase “hey hotty” transforms an emoji usually used innocuously into a message with suggestive intent, and our model prediction changes accordingly.  

Finally, Hive’s OCR and text moderation models are trained to differentiate between each skin tone option for emojis in the “People” category and understand their implications in the context of accompanying text. We are currently exploring how the ability to differentiate between light and darker skin tones can enable new tools to identify hateful, racist, or exclusionary text content.

OCR Moderation: Final Thoughts

User preferences for online communication are constantly evolving in both medium and content, which can make it challenging for platforms to keep up with abusive users. Hive prides itself on identifying blindspots in existing moderation tools and developing robust AI solutions using high-quality training data tailored to these use-cases. We hope that this post has showcased what’s possible with our OCR moderation capabilities and given some insight into our future directions. 

Feel free to contact sales@thehive.ai if you are interested in adding OCR capabilities to your moderation suite, and please stay tuned as we announce new features and updates!

AI Models
Demo Hub Documentation Data Labeling
Applications
Moderation Dashboard Context-Based Ad Targeting Ad Intelligence Sponsorship Intelligence
Platform Solutions
For NFT Platforms For Marketplaces For Dating Apps For Online Communities
Media Solutions
For Brands For Agencies For Publishers For Teams and Leagues
Company
About Us Careers Blog LinkedIn
Contact Us
support@thehive.ai sales@thehive.ai press@thehive.ai
© Copyright 2023
Status Dashboard Terms of Service Privacy Policy Ethics Policy

AI Models

Demo Hub
Documentation
Data Labeling

Applications

Moderation Dashboard
Context-Based Ad Targeting
Ad Intelligence
Sponsorship Intelligence

Platform Solutions

For NFT Platforms
For Marketplaces
For Dating Apps
For Online Communities

Media Solutions

For Brands
For Agencies
For Publishers
For Teams and Leagues

Company

About Us
Careers
Blog
LinkedIn

Other site pages

Status Dashboard
Terms of Service
Privacy Policy
Ethics Policy

Contact Us

support@thehive.ai
sales@thehive.ai
press@thehive.ai
© Copyright 2023