BACK TO ALL BLOGS

Streamline CSAM Reports with Moderation Dashboard’s NCMEC Integration

Contents

Hive is excited to announce that we have integrated the National Center for Missing & Exploited Children’s (NCMEC) CyberTipline into Moderation Dashboard, streamlining the process of submitting child sexual abuse material (CSAM) reports. This feature is now available to all Moderation Dashboard customers with valid NCMEC credentials.

Ensuring Child Safety Online

The National Center for Missing & Exploited Children is a non-profit organization dedicated to protecting children from all forms of exploitation and abuse. All electronic communication service providers are required under U.S. federal law to report any known CSAM on their platforms to NCMEC’s CyberTipline—a centralized system for receiving and processing CSAM reports. These reports are later shared with law enforcement and relevant service providers so they can take further action.

Throughout our endeavors and partnerships, Hive’s commitment to online safety has been unwavering. We built this integration to help automate the reporting process, simplify our customers’ workflows, and ensure that their platforms can comply with applicable law.

Integration Workflow

A step-by-step sample integration workflow is outlined, starting from when a user uploads an image to the platform and ending in the subsequent actions a moderator can take. For a more detailed guide on how the reporting process works, refer to the following documentation.

  1. A user uploads an image to the platform.
  2. The image is processed by Hive’s proprietary CSAM Detection API, powered by Thorn—a leading nonprofit that builds technology to defend children from sexual abuse. To learn more about our Thorn partnership, read our blog posts linked below:
  3. If there is a likelihood of CSAM detected in the image, this image will surface as a link in the CSAM Review Feed. Once the link is clicked, the media will open in a new browser tab for the moderator to review. Moderation Dashboard will never display CSAM content directly within the Review Feed.
  4. From the review feed, the moderator can take two actions:
    • Perform an enforcement action (e.g. banning the user or deleting the post). A webhook is sent to the customer’s server afterward, containing the moderator’s chosen enforcement action as well as the post and user metadata, all of which are used to take the content down.
    • The system will automatically create a report, which the moderator can send to NCMEC by clicking the “Submit” button within the Review Feed. After the report is submitted, the system creates an internal log to track the report (e.g. submission date and time, as well as storing the response from NCMEC).
“Report to NCMEC” button within Review Feed

NCMEC Report Contents

Customers can pre-fill information fields that are constant across reports. These fields will be automatically populated for each report, reducing effort on the customer’s end. To provide our customers with full transparency, the report sent to NCMEC includes: the moderator’s information, the company’s information, the potential CSAM content, and the incident date and time.

Moderator information fields

If you’re interested in learning more about what we do, please reach out to our sales team (sales@thehive.ai) or contact us here for further questions.

BACK TO ALL BLOGS

Model Explainability With Text Moderation

Contents

Hive is excited to announce that we are releasing a new API: Text Moderation Explanations! This API helps customers understand why our Text Moderation model assigns text strings particular scores.

The Need For Explainability

Hive’s Text Moderation API scans a text-string or message, interprets it, and returns to our users a score from 0-3 mapping to a severity level across a number of top level classes and dozens of languages. Today, hundreds of customers send billions of text strings each month through this API to protect their online communities.

A top feature request has been explanations for why our model assigns the scores it does, especially for foreign languages. While some moderation scores may be clear, there also may be ambiguity around edge cases for why a string was scored the way it was.

This is where our new Text Moderation Explanations API comes in—delivering additional context and visibility into moderation results in a scalable way. With Text Moderation Explanations, human moderators can quickly interpret results and utilize the additional information to take appropriate action.

A Supplement to Our Text Moderation Model

Our Text Moderation classes are ordered by severity, ranging from level 3 (most severe) to level 0 (benign). These classes correspond to the possible scores Text Moderation can give a text string. For example: If a text string falls under the “sexual” head and contains sexually explicit language, it would be given a score of 3.

The Text Moderation Explanations API takes in three inputs: a text string, its class label (either “sexual”, “bullying”, “hate”, or “violence”), and the score it was assigned (either 3, 2, 1, or 0). The output is a text string that explains why the original input text was given that score relative to its class. It should be noted that Explanations is only supported for select multilevel heads (corresponding to the class labels listed previously).

To develop the Explanations model, we used a supervised fine-tuning process. We used labeled data—which we internally labeled at Hive using native speakers—to fine-tune the original model for this specialized process. This process allows us to support a number of languages apart from English.

Comprehensive Language Support

We have built our Text Moderation Explanation API with broad initial language support. Language support solves the crucial issue of understanding why a text string (in one’s non-native language) was scored a certain way.

We currently support eight different languages for Text Moderation Explanations and four top level classes:

Text Moderation Explanations are now included at no additional cost as part of our Moderation Dashboard product, as shown below:

Additionally, customers can also access the Text Moderation Explanations model through an API (refer to the documentation).

In future releases, we anticipate adding further language and top level class support. If you’re interested in learning more or gaining test access to the Text Moderation Explanations model, please reach out to our sales team (sales@thehive.ai) or contact us here for further questions.