BACK TO ALL BLOGS

Hive to be Lead Sponsor of Trust & Safety Summit 2025

We are thrilled to announce that Hive is the lead sponsor of the Trust & Safety Summit 2025.

As Europe’s premier Trust & Safety conference, this summit is designed to empower T&S leaders to tackle operational and regulation challenges, providing them with both actionable insights and future-focused strategies. The summit will be held Tuesday, March 25th and Wednesday, March 26th at the Hilton London Syon Park, UK.

The 2-day event will explore themes such as regulatory preparedness, scaling trust and safety solutions, and best practices for effective content moderation. An incredible selection of programming will include expert-led panels, interactive workshops and networking events.

Hive’s CEO Kevin Guo will deliver the keynote presentation on “The Next Frontier of Content Moderation”, covering topics like multi-modal LLMs and detecting AI generated content. Additionally, Hive will host two panels during the event: 

  • Hyperscaling Trust & Safety: Navigating Growth While Maintaining Integrity. Hive will be discussing best practices for scaling trust & safety systems for online platforms experiencing hypergrowth.
  • Harnessing AI to Detect Unknown CSAM: Innovations, Challenges, and the Path Forward. Hive will be joined by partners Thorn and IWF to discuss recent advancements in CSAM detection solutions.

As the lead sponsor of the T&S Summit 2025, we are furthering our commitment to making the internet a safer place. Today, Hive’s comprehensive moderation stack empowers Trust & Safety teams of all sizes to scale their moderation workflows with both pre-trained and customizable AI models, flexible LLM-based moderation, and a moderation dashboard for streamlined enforcement of policies. 

We look forward to welcoming you to the Trust & Safety Summit 2025. If you’re interested in attending the conference, please reach out to your Hive account manager or sales@thehive.ai. Prospective conference attendees can also find more details and ticket information here. For a detailed breakdown of summit programming, download the agenda here

To learn more about what we do at Hive, please reach out to our sales team or contact us here for further questions.

BACK TO ALL BLOGS

Model Explainability With Text Moderation

Contents

Hive is excited to announce that we are releasing a new API: Text Moderation Explanations! This API helps customers understand why our Text Moderation model assigns text strings particular scores.

The Need For Explainability

Hive’s Text Moderation API scans a text-string or message, interprets it, and returns to our users a score from 0-3 mapping to a severity level across a number of top level classes and dozens of languages. Today, hundreds of customers send billions of text strings each month through this API to protect their online communities.

A top feature request has been explanations for why our model assigns the scores it does, especially for foreign languages. While some moderation scores may be clear, there also may be ambiguity around edge cases for why a string was scored the way it was.

This is where our new Text Moderation Explanations API comes in—delivering additional context and visibility into moderation results in a scalable way. With Text Moderation Explanations, human moderators can quickly interpret results and utilize the additional information to take appropriate action.

A Supplement to Our Text Moderation Model

Our Text Moderation classes are ordered by severity, ranging from level 3 (most severe) to level 0 (benign). These classes correspond to the possible scores Text Moderation can give a text string. For example: If a text string falls under the “sexual” head and contains sexually explicit language, it would be given a score of 3.

The Text Moderation Explanations API takes in three inputs: a text string, its class label (either “sexual”, “bullying”, “hate”, or “violence”), and the score it was assigned (either 3, 2, 1, or 0). The output is a text string that explains why the original input text was given that score relative to its class. It should be noted that Explanations is only supported for select multilevel heads (corresponding to the class labels listed previously).

To develop the Explanations model, we used a supervised fine-tuning process. We used labeled data—which we internally labeled at Hive using native speakers—to fine-tune the original model for this specialized process. This process allows us to support a number of languages apart from English.

Comprehensive Language Support

We have built our Text Moderation Explanation API with broad initial language support. Language support solves the crucial issue of understanding why a text string (in one’s non-native language) was scored a certain way.

We currently support eight different languages for Text Moderation Explanations and four top level classes:

Text Moderation Explanations are now included at no additional cost as part of our Moderation Dashboard product, as shown below:

Additionally, customers can also access the Text Moderation Explanations model through an API (refer to the documentation).

In future releases, we anticipate adding further language and top level class support. If you’re interested in learning more or gaining test access to the Text Moderation Explanations model, please reach out to our sales team (sales@thehive.ai) or contact us here for further questions.