Featured – Blog & Insights

Our Commitment to Online Safety
Expanding Our Thorn Partnership
How the Classifier Works
Proactively Combating CSAM at Scale

We are excited to announce that Hive’s partnership with Thorn is expanding to include a new CSE Text Classifier API. Offering advanced AI-powered text detection capabilities, this API helps trust and safety teams proactively combat text-based child sexual exploitation at scale.

Our Commitment to Online Safety

Making the internet safer for everyone is at the core of Hive’s mission. Our innovative approach to content moderation and platform integrity has propelled us to become a leading voice in Trust and Safety.

Over the last several years, we’ve greatly expanded our content moderation product suite. While our content moderation tools reduce human exposure to harmful content across many categories, preventing online child sexual abuse requires specialized expertise and technology.

Last year, we announced our partnership with Thorn, an innovative nonprofit that transforms how children are protected from sexual abuse and exploitation in the digital age. Our enterprise-grade, cloud-based APIs allow us to serve Thorn’s proprietary technology to customers at a large scale.

Expanding Our Thorn Partnership

Under our Thorn partnership, we previously released our CSAM Detection API. This API runs two detection technologies—hash matching and an AI classifier—to detect both known and novel child sexual abuse material (CSAM) across image and video inputs.

Today, we’re expanding this partnership with the CSE (Child Sexual Exploitation) Text Classifier API, which has been highly requested by many of our current Hive customers. This classifier complements our CSAM detection suite by filling a critical content gap for use cases such as detecting text-based child sexual exploitation across user messaging and conversations. With this release, Hive and Thorn can provide customers with even broader detection coverage across text, image, and video.

How The Classifier Works

The CSE Text Classifier API detects suspected child exploitation in both English and Spanish.

Each text sequence submitted is tokenized before being passed into the text classifier. The classifier then returns the text sequence’s scores for each label. There are seven possible labels:

CSA (Child Sexual Abuse) Discussion: This is a broad category, encompassing text fantasizing about or expressing outrage toward the subject, as well as text discussing sexually harming children in an offline or online setting.
Child Access: Text discussing sexually harming children in an offline or online setting.
CSAM: Text related to users talking about, producing, asking for, transacting in, and sharing child sexual abuse material.
Has Minor: Text where a minor is unambiguously being referenced.
Self-Generated Content: Text where users are talking about producing self-generated content, offering to share their self-generated content with others, or generally talking about self-generated images and/or videos.
Sextortion: Text related to sextortion, which is where a perpetrator threatens to spread a victim’s intimate imagery in order to extort additional actions from them. This encompasses messages where an offender is sextorting another user, users talking about being sextorted, as well as users reporting sextortion either for themselves or on behalf of others.
Not Pertinent: The text sequence does not flag any of the above labels.

If any of these labels receive a score that is above their internally set threshold, all scores will be returned in the pertinent_labels section. Below is an example of a pertinent sample response.

A given text sequence might receive high scores across multiple labels. In these cases, it may be helpful to combine the label definitions to better understand the situation at hand and determine what cases are actionable with regard to your moderation team’s specific policies. For instance, text sequences scoring high on both CSAM and Child Access may be from individuals potentially abusing children offline and producing CSAM.

Proactively Combating CSAM at Scale

Safeguarding platforms from CSAM demands scalable solutions. We’re excited to expand our partnership and power more of Thorn’s advanced technology through our enterprise-grade APIs, helping more platforms proactively and comprehensively combat CSAM and CSE text.

If you have further questions or would like to learn more, please reach out to sales@thehive.ai or contact us here.

The Need For Explainability

Hive’s Text Moderation API scans a text-string or message, interprets it, and returns to our users a score from 0-3 mapping to a severity level across a number of top level classes and dozens of languages. Today, hundreds of customers send billions of text strings each month through this API to protect their online communities.

A top feature request has been explanations for why our model assigns the scores it does, especially for foreign languages. While some moderation scores may be clear, there also may be ambiguity around edge cases for why a string was scored the way it was.

This is where our new Text Moderation Explanations API comes in—delivering additional context and visibility into moderation results in a scalable way. With Text Moderation Explanations, human moderators can quickly interpret results and utilize the additional information to take appropriate action.

A Supplement to Our Text Moderation Model

Our Text Moderation classes are ordered by severity, ranging from level 3 (most severe) to level 0 (benign). These classes correspond to the possible scores Text Moderation can give a text string. For example: If a text string falls under the “sexual” head and contains sexually explicit language, it would be given a score of 3.

The Text Moderation Explanations API takes in three inputs: a text string, its class label (either “sexual”, “bullying”, “hate”, or “violence”), and the score it was assigned (either 3, 2, 1, or 0). The output is a text string that explains why the original input text was given that score relative to its class. It should be noted that Explanations is only supported for select multilevel heads (corresponding to the class labels listed previously).

To develop the Explanations model, we used a supervised fine-tuning process. We used labeled data—which we internally labeled at Hive using native speakers—to fine-tune the original model for this specialized process. This process allows us to support a number of languages apart from English.

Comprehensive Language Support

We have built our Text Moderation Explanation API with broad initial language support. Language support solves the crucial issue of understanding why a text string (in one’s non-native language) was scored a certain way.

We currently support eight different languages for Text Moderation Explanations and four top level classes:

Text Moderation Explanations are now included at no additional cost as part of our Moderation Dashboard product, as shown below:

Additionally, customers can also access the Text Moderation Explanations model through an API (refer to the documentation).

In future releases, we anticipate adding further language and top level class support. If you’re interested in learning more or gaining test access to the Text Moderation Explanations model, please reach out to our sales team (sales@thehive.ai) or contact us here for further questions.

AI Models

Solutions

Docs

Company

Blog

Pricing

Demo

Category: Featured

Expanding Hive’s CSAM Detection Suite with Text Classification, Powered by Thorn

Hive

Contents