Content Moderation – Blog & Insights

Model Explainability With Text Moderation

Hive

December 2, 2024February 21, 2025

The Need For Explainability
A Supplement to Our Text Moderation Model
Comprehensive Language Support

Hive is excited to announce that we are releasing a new API: Text Moderation Explanations! This API helps customers understand why our Text Moderation model assigns text strings particular scores.

The Need For Explainability

Hive’s Text Moderation API scans a text-string or message, interprets it, and returns to our users a score from 0-3 mapping to a severity level across a number of top level classes and dozens of languages. Today, hundreds of customers send billions of text strings each month through this API to protect their online communities.

A top feature request has been explanations for why our model assigns the scores it does, especially for foreign languages. While some moderation scores may be clear, there also may be ambiguity around edge cases for why a string was scored the way it was.

This is where our new Text Moderation Explanations API comes in—delivering additional context and visibility into moderation results in a scalable way. With Text Moderation Explanations, human moderators can quickly interpret results and utilize the additional information to take appropriate action.

A Supplement to Our Text Moderation Model

Our Text Moderation classes are ordered by severity, ranging from level 3 (most severe) to level 0 (benign). These classes correspond to the possible scores Text Moderation can give a text string. For example: If a text string falls under the “sexual” head and contains sexually explicit language, it would be given a score of 3.

The Text Moderation Explanations API takes in three inputs: a text string, its class label (either “sexual”, “bullying”, “hate”, or “violence”), and the score it was assigned (either 3, 2, 1, or 0). The output is a text string that explains why the original input text was given that score relative to its class. It should be noted that Explanations is only supported for select multilevel heads (corresponding to the class labels listed previously).

To develop the Explanations model, we used a supervised fine-tuning process. We used labeled data—which we internally labeled at Hive using native speakers—to fine-tune the original model for this specialized process. This process allows us to support a number of languages apart from English.

Comprehensive Language Support

We have built our Text Moderation Explanation API with broad initial language support. Language support solves the crucial issue of understanding why a text string (in one’s non-native language) was scored a certain way.

We currently support eight different languages for Text Moderation Explanations and four top level classes:

Text Moderation Explanations are now included at no additional cost as part of our Moderation Dashboard product, as shown below:

Additionally, customers can also access the Text Moderation Explanations model through an API (refer to the documentation).

In future releases, we anticipate adding further language and top level class support. If you’re interested in learning more or gaining test access to the Text Moderation Explanations model, please reach out to our sales team (sales@thehive.ai) or contact us here for further questions.

BACK TO ALL BLOGS

Expanding Our CSAM Detection API

Hive

November 21, 2024February 21, 2025

Our Commitment to Child Internet Safety
How the Classifier Works
Building a Safer Internet

We are excited to announce that Hive is now offering Thorn’s predictive technology through our CSAM detection API! This API now enables customers to identify novel cases of child sexual abuse material (CSAM) in addition to detecting known CSAM using hash-based matching.

Our Commitment to Child Internet Safety

At Hive, making the internet safer is core to our mission. While our content moderation tools help reduce human exposure to harmful content across many categories, addressing CSAM requires specialized expertise and technology.

That’s why we’re expanding our existing partnership with Thorn, an innovative nonprofit that builds technology to defend children from sexual abuse and exploitation in the digital age.

Until now, our integration with Thorn focused on hash-matching technology to detect known CSAM. The new CSAM detection API builds on this foundation by adding advanced machine learning capabilities that can identify previously unidentified CSAM.

By combining Thorn’s industry-leading CSAM detection technology with Hive’s comprehensive content moderation suite, we provide platforms with robust protection against both known and newly created CSAM.

How the Classifier Works

The classifier works by first generating embeddings of the uploaded media. An embedding is a list of computer-generated scores between 0 and 1. After generating the embeddings, Hive permanently deletes all of the original media. We then use the classifier to determine whether the content is CSAM based on the embeddings. This process ensures that we do not retain any CSAM on our servers.

The classifier returns a score between 0 and 1 that predicts whether a video or image is CSAM. The response object will have the same general structure for both image and video inputs. Please note that Hive will return both results together: probability scores from the classifier and any match results from hash matching against the aggregated hash database.

For a detailed guide on how to use Hive’s CSAM detection API, refer to the documentation.

Building a Safer Internet

Protecting platforms from CSAM demands scalable solutions. The problem is complex; but our integration with Thorn’s advanced technology provides an efficient way to detect and stop CSAM, helping to safeguard children and build a safer internet for all.

If you have any further questions or would like to learn more, please reach out to sales@thehive.ai or contact us here.

BACK TO ALL BLOGS

Ars Technica

Hive

November 21, 2024January 23, 2025

BACK TO ALL BLOGS

Semafor

Hive

October 30, 2024June 24, 2025

BACK TO ALL BLOGS

“Clear Winner”: Study Shows Hive’s AI-Generated Image Detection API is Best-in-Class

Hive

September 10, 2024July 29, 2025

Navigating an Increasingly Generative World
Structuring the Study
Evaluation Methods and Findings
Final Thoughts Moving Forward

Navigating an Increasingly Generative World

To the untrained eye, distinguishing human-created art from AI-generated content can be difficult. Hive’s commitment to providing customers with API-accessible solutions for challenging problems led to the creation of our AI-Generated Image and Video Detection API, which classifies images as human-created or AI-generated. Our model was evaluated in an independent study conducted by Anna Yoo Jeong Ha and Josephine Passananti from the University of Chicago, which sought to determine who was more effective at classifying images as AI-generated: humans or automated detectors.

Ha and Passananti’s study addresses a growing problem within the generative AI space: As generative AI models become more advanced, the boundary between human-created art and AI-generated images has become increasingly indistinguishable. With such powerful tools being accessible to the general public, various legal and ethical concerns have been raised regarding the misuse of said technology.

Such concerns are pertinent to address because the misuse of generative AI models negatively impacts both society at large and the AI models themselves. Bad actors have used AI-generated images for harmful purposes, such as spreading misinformation, committing fraud, or scamming individuals and organizations. As only human-created art is eligible for copyright, businesses may attempt to bypass the law by passing off AI-generated images as human-created. Moreover, multiple studies (on both generative image and text models) have shown evidence that AI models will deteriorate if their training data solely consists of AI-generated content—which is where Hive’s classifier comes in handy.

The study’s results show that Hive’s model outperforms both its automated peers and highly-trained human experts in differentiating between human-created art versus AI-generated images across most scenarios. This post examines the study’s methodologies and findings, in addition to highlighting our model’s consistent performance across various inputs.

Structuring the Study

In the experiment, researchers evaluated the performance of five automated detectors (three of which are commercially available, including Hive’s model) and humans against a dataset containing both human-created and AI-generated images across various art styles. Humans were categorized into three subgroups: non-artists, professional artists, and expert artists. Expert artists are the only subgroup with prior experience in identifying AI-generated images.

The dataset consists of four different image groups: human-created art, AI-generated images, “hybrid images” which combine generative AI and human effort, and perturbed versions of human-created art. A perturbation is defined as a minor change to the model input aimed at detecting vulnerabilities in the model’s structure. Four perturbation methods are used in the study: JPEG compression, Gaussian noise, CLIP-based Adversarial Perturbation (which performs perturbations at the pixel level), and Glaze (a tool used to protect human artists from mimicry by introducing imperceptible perturbations on the artwork).

After evaluating the model on unperturbed imagery, the researchers proceeded to more advanced scenarios with perturbed imagery.

Evaluation Methods and Findings

The researchers evaluated the automated detectors on four metrics: overall accuracy (ratio of training data classified correctly to the entire dataset), false positive rate (ratio of human-created art misclassified as AI-generated), false negative rate (ratio of AI-generated images misclassified as human-created), and AI detection success rate (ratio of AI-generated images correctly classified as AI-generated to the total amount of AI-generated images).

Among automated detectors, Hive’s model emerged as the “clear winner” (Ha and Passananti 2024, 6). Not only does it boast a near-perfect 98.03% accuracy rate, but it also has a 0% false positive rate (i.e., it never misclassifies human art) and a low 3.17% false negative rate (i.e., it rarely misclassifies AI-generated images). According to the authors, this could be attributed to Hive’s rich collection of generative AI datasets, with high quantities of diverse training data compared to its competitors.

Additionally, Hive’s model proved to be resistant against most perturbation methods, but faced some challenges classifying AI-generated images processed with Glaze. However, it should be noted that Glaze’s primary purpose is as a protection tool for human artwork. Glazing AI-generated images is a non-traditional use case with minimal training data available as a result. Thus, Hive’s model’s performance with Glazed AI-generated images has little bearing on its overall quality.

Final Thoughts Moving Forward

When it comes to automated detectors and humans alike, Hive’s model is unparalleled. Even compared to human expert artists, Hive’s model classifies images with higher levels of confidence and accuracy.

While the study considers the model’s potential areas for improvement, it is important to note that the study was published in February 2024. In the months following the study’s publication, Hive’s model has vastly improved and continues to expand its capabilities, with 12+ model architectures added since.

If you’d like to learn more about Hive’s AI-Generated Image and Video Detection API, a demo of the service can be accessed here, with additional documentation provided here. However, don’t just trust us, test us: reach out to sales@thehive.ai or contact us here, and our team can share API keys and credentials for your new endpoints.

BACK TO ALL BLOGS

Matching Against CSAM: Hive’s Innovative Integration with Thorn’s Safer Match

Hive's Innovative Integration with Thorn's Safer Match

Hive

July 8, 2024February 14, 2025

We are excited to announce that Hive’s Partnership with Thorn is now live! Our current and prospective customers can now easily integrate Thorn’s Safer Match, a CSAM (child sexual abuse material) detection solution, using Hive’s APIs.

The Danger of CSAM

The threat of CSAM involves the production, distribution, and possession of explicit images and videos depicting minors. Every platform with an upload button or messaging capabilities is at risk of hosting child sexual abuse material (CSAM). In fact, in 2023 alone, there were over 104 million reports of potential CSAM reported to the National Center of Missing and Exploited Children.

The current state-of-the-art approach is to use an encrypting function to “hash” the content and then “match” it against a database aggregating 57+ million verified CSAM hashes. If the content hash matches against the database, then the content can be flagged as CSAM.

How the Integration Works

When presented with visual content, we first hash it, then match it against known instances of CSAM.

Hashing: We take the submitted image or video, and convert it into one or more hashes.
Deletion: We then immediately delete the submitted content ensuring nothing stays on Hive’s servers.
Matching: We match the hashes against the CSAM database and return whether the hashes match or not to you.

Hive’s partnership with Thorn allows our customers to easily incorporate Thorn’s Safer Match into their detection toolset. Safer Match provides programmatic identification of known CSAM with cryptographic and perceptual hash matching for images and for videos, through proprietary scene-sensitive video hashing (SSVH).

How you can use this API today:

First, talk to your Hive sales rep, and get an API key and credentials for your new endpoint.

Image

For an image, simply send the image to us, and we will hash it using MD5 and Safer encryption algorithms. Once the image is hashed, we return the results in our output JSON.

Video

You can also send videos into the API. We use MD5 hashes and Safer’s proprietary perceptual hashing for videos as well. However, they have different use cases. MD5 will return exact match videos and will only indicate whether the whole video is a known CSAM video.

Additionally, Safer will hash different scenes within the video and will flag those which are known to be violating. Safer scenes are demarcated by a start and end timestamp as shown in the response below.

Note: For the Safer SSVH, videos are sampled at 1FPS.

For more information, you can reference our documents.

Teaming Up For a Safer Internet

CSAM is one of the most pervasive and harmful issues on the internet today. Legal requirements make this problem even harder to tackle, and previous technical solutions required significant integration efforts. But, together with Thorn’s proactive technology, we can respond to this challenge and help make the internet a safer place for everyone.

AI Models

Solutions

Docs

Company

Blog

Pricing

Demo

Category: Content Moderation

Model Explainability With Text Moderation

Hive

Contents

The Need For Explainability

A Supplement to Our Text Moderation Model

Comprehensive Language Support

Expanding Our CSAM Detection API

Hive

Contents

Our Commitment to Child Internet Safety

How the Classifier Works

Building a Safer Internet

Ars Technica

Hive

Semafor

Hive

“Clear Winner”: Study Shows Hive’s AI-Generated Image Detection API is Best-in-Class

Hive

Contents

Navigating an Increasingly Generative World

Structuring the Study

Evaluation Methods and Findings

Final Thoughts Moving Forward

Matching Against CSAM: Hive’s Innovative Integration with Thorn’s Safer Match

Hive

The Danger of CSAM

How the Integration Works

How you can use this API today:

Image

Video

Teaming Up For a Safer Internet