BACK TO ALL BLOGS Why Watermarks Are No Longer The Sole Trusted Source To Detect AI-Generated Content HiveJanuary 12, 2026January 13, 2026 As AI-generated content becomes smarter and more realistic, the question of how to tell what is real becomes increasingly difficult to answer. Watermarks are often used as a telltale sign that clearly identifies AI-generated media, giving us confidence in what we are seeing. However, in reality, they are far less reliable than they appear (or not appear). First, you need to know what a watermark is Watermarks are labels used to show that a piece of content was generated by AI. The goal is to give some indication of where the content came from, especially as AI-generated images, videos, audio, and text become more common online. There are three different types of watermarks: visible, invisible, and metadata. Visible Watermarks Visible watermarks are logos or text placed directly on the image or video to indicate that it was created by AI from a specific generator. They are easy to understand, but they are also very easy to take off. A visible watermark can be cropped out or removed by running the media through another AI generator. Once it is gone, it will be very hard for an individual to pinpoint whether it is fake. Invisible Watermarks Some companies use invisible watermarks that are embedded into the pixels of an image or video when it is synthesized, like Google’s SynthID, following a specific mathematical pattern. These watermarks are more resilient than visible ones, but they still have limits. Only the company that created the AI-generated content can reliably detect it, which means an image or video has to be checked against multiple systems to figure out where it came from. Edits can also weaken the invisible watermark. Adding emojis, text overlays, heavy filters, or regenerating the media through another model can damage or remove it. Overall, even when detection works, it only confirms that an image or video was developed by a specific company when checked using that company’s own system, not that it is AI-generated in a broader sense. Metadata Watermarks Metadata-based watermarks do not change how content looks. Instead, they add extra information to the file that explains where it came from and how it was made. One example is C2PA. C2PA adds a record to an image or video that shows its creation and editing history, such as which tools were used and whether AI played a role. This record is designed to stay with the media as it is shared, so people can check its background. The downside of metadata-based watermarks is that they are easy to lose. The added information can be removed on purpose. Sharing an image or video through messaging apps, social platforms, or simple file conversions can remove this information by accident. Overall, there is no single watermarking system used across the industry. Different companies use their own methods. However, watermarks should not be thought of as a reliable indication for AI-generated content. They are better understood as an additional layer of detection, but it should be acknowledged that they may not survive once a piece of content is shared or edited. So if not watermarks, then what actually works Since watermarks can be inconsistent, then how can AI-generated content be detected at all? The answer is not another kind of label. Detecting AI-generated content today requires systems that can analyze the output itself rather than checking for markings that are not always present. Every generator is built on a specific architecture, and that process leaves behind patterns in the output, even when the image or video looks realistic to the human eye. These patterns are tied to how the model generates media, not to any watermark that was added afterward. Content produced by the same model tends to share subtle characteristics that appear consistently across many outputs. This is what detection systems are designed to identify and why they are so important. How AI detection systems work AI detection systems are trained by comparing large sets of real images or videos with AI-generated ones. By doing this at scale, they learn which signals tend to show up in synthetic content and which do not, even when media looks convincing to a human viewer. At Hive, we take this approach by training our detection systems to analyze images and videos directly. Instead of looking for a single obvious tell, our models learn many subtle signals across content developed by a wide range of AI models. Training on this mix of real and synthetic content allows our systems to recognize AI-generated media from new or unfamiliar generators, even before we have explicitly trained on those specific models. Because this approach is based on how content is generated rather than on labels or markings, it holds up in real-world use. Detection still works when watermarks are missing, metadata has been stripped, content has been edited or re-uploaded, or the source model is unknown or proprietary. As more open-source and custom generators are used to create and share content online, many of which include no markings at all, systems that can identify AI-generated images or videos without relying on labels become necessary. What this means for platforms Misinformation, CSAM, political deepfakes, claims fraud, violence, and other harmful content are already being generated with AI at a rapid pace. Bad actors can create this content and remove watermarks, making it appear real before spreading it online. This is where detection systems matter. They help prevent harmful AI-generated content from spreading and allow misuse or abuse to be flagged to the right teams for review and action. Where this leaves us AI-generated content is now a normal part of the media landscape. For that reason, detection systems need to go beyond watermarks if platforms are going to meaningfully protect users, support online safety efforts, and enforce their policies consistently. Detection is an ongoing process that requires regular training as new generators appear and existing models evolve. Systems must be able to continuously respond to changes in how content is produced. If you want to see how this works across images, video, audio, and text, you can explore our AI content detection platforms and tools to better understand what is possible today. Visit our demo: https://hivedetect.ai/
BACK TO ALL BLOGS Introducing New Free X Bot To Analyze and Detect AI-Generated Content For All Users HiveNovember 11, 2025January 12, 2026 Hive is excited to announce the launch of a new bot on X that uses our industry-leading AI models to analyze and share results in real time, completely free for users. How it works Anyone on X can simply tag @hive_ai and ask whether a post, image, video, or audio clip is AI-generated. There’s plenty of flexibility in how you phrase your question, the bot understands a wide range of prompts, such as: Is this AI-generated?Is this video genuine? The audio sounds AI generated.Is this real or is this another AI-generated photo? Hive’s detection models will automatically analyze the media and reply in real time with the results, providing them directly in-thread. In the reply, Hive provides confidence scores for whether the input is likely to be AI-Generated or Deepfake. Videos and audio files will also return frame-by-frame analysis. Finally, Hive identifies probabilities for which generative engines likely created the content (such as Sora2, GPT, etc.). Accessible AI detection for X Users As AI-generated and deepfake content becomes harder to distinguish from reality, tools like this bot are essential for restoring trust and transparency online. Every day, manipulated media spreads across social platforms, making it easy for misinformation to take hold. By making detection accessible to everyone, we’re helping rebuild confidence in the content we see and share. Beyond that, this launch marks an important step in bringing Hive’s enterprise-grade detection technology to everyday users. Best-in-Class Technology Today, Hive’s industry-leading AI-generated and deepfake content detection technology is trusted across both the public and private sectors. In 2024, an independent research study identified Hive as the “clear winner” in a study that found our AI-generated image and video detection model outperformed competing models as well as human expert analysis. Our technology was also selected among 36 competing solutions for a Department of War contract to support the U.S. Intelligence Community for deepfake detection of video, image, and audio content. More recently, the Department of Homeland Security’s Cyber Crimes Center has deployed Hive’s AI-Generated and Deepfake Detection technology to support its investigations. With this bot, we’re giving all users the power to verify what’s real. Try it out by tagging @hive_ai on X. Learn More You can upload individual media files to check for AI-generation and deepfake content at https://hivedetect.ai. Learn more about our enterprise AI models here.
BACK TO ALL BLOGS Expanding Hive’s CSAM Detection Suite with Text Classification, Powered by Thorn HiveJuly 21, 2025November 11, 2025 Contents Our Commitment to Online SafetyExpanding Our Thorn PartnershipHow the Classifier WorksProactively Combating CSAM at Scale We are excited to announce that Hive’s partnership with Thorn is expanding to include a new CSE Text Classifier API. Offering advanced AI-powered text detection capabilities, this API helps trust and safety teams proactively combat text-based child sexual exploitation at scale. Our Commitment to Online Safety Making the internet safer for everyone is at the core of Hive’s mission. Our innovative approach to content moderation and platform integrity has propelled us to become a leading voice in Trust and Safety. Over the last several years, we’ve greatly expanded our content moderation product suite. While our content moderation tools reduce human exposure to harmful content across many categories, preventing online child sexual abuse requires specialized expertise and technology. Last year, we announced our partnership with Thorn, an innovative nonprofit that transforms how children are protected from sexual abuse and exploitation in the digital age. Our enterprise-grade, cloud-based APIs allow us to serve Thorn’s proprietary technology to customers at a large scale. Expanding Our Thorn Partnership Under our Thorn partnership, we previously released our CSAM Detection API. This API runs two detection technologies—hash matching and an AI classifier—to detect both known and novel child sexual abuse material (CSAM) across image and video inputs. Today, we’re expanding this partnership with the CSE (Child Sexual Exploitation) Text Classifier API, which has been highly requested by many of our current Hive customers. This classifier complements our CSAM detection suite by filling a critical content gap for use cases such as detecting text-based child sexual exploitation across user messaging and conversations. With this release, Hive and Thorn can provide customers with even broader detection coverage across text, image, and video. How The Classifier Works The CSE Text Classifier API detects suspected child exploitation in both English and Spanish. Each text sequence submitted is tokenized before being passed into the text classifier. The classifier then returns the text sequence’s scores for each label. There are seven possible labels: CSA (Child Sexual Abuse) Discussion: This is a broad category, encompassing text fantasizing about or expressing outrage toward the subject, as well as text discussing sexually harming children in an offline or online setting.Child Access: Text discussing sexually harming children in an offline or online setting.CSAM: Text related to users talking about, producing, asking for, transacting in, and sharing child sexual abuse material.Has Minor: Text where a minor is unambiguously being referenced.Self-Generated Content: Text where users are talking about producing self-generated content, offering to share their self-generated content with others, or generally talking about self-generated images and/or videos.Sextortion: Text related to sextortion, which is where a perpetrator threatens to spread a victim’s intimate imagery in order to extort additional actions from them. This encompasses messages where an offender is sextorting another user, users talking about being sextorted, as well as users reporting sextortion either for themselves or on behalf of others.Not Pertinent: The text sequence does not flag any of the above labels. If any of these labels receive a score that is above their internally set threshold, all scores will be returned in the pertinent_labels section. Below is an example of a pertinent sample response. A given text sequence might receive high scores across multiple labels. In these cases, it may be helpful to combine the label definitions to better understand the situation at hand and determine what cases are actionable with regard to your moderation team’s specific policies. For instance, text sequences scoring high on both CSAM and Child Access may be from individuals potentially abusing children offline and producing CSAM. Proactively Combating CSAM at Scale Safeguarding platforms from CSAM demands scalable solutions. We’re excited to expand our partnership and power more of Thorn’s advanced technology through our enterprise-grade APIs, helping more platforms proactively and comprehensively combat CSAM and CSE text. If you have further questions or would like to learn more, please reach out to sales@thehive.ai or contact us here.
BACK TO ALL BLOGS Model Explainability With Text Moderation HiveDecember 2, 2024February 21, 2025 Contents The Need For ExplainabilityA Supplement to Our Text Moderation ModelComprehensive Language Support Hive is excited to announce that we are releasing a new API: Text Moderation Explanations! This API helps customers understand why our Text Moderation model assigns text strings particular scores. The Need For Explainability Hive’s Text Moderation API scans a text-string or message, interprets it, and returns to our users a score from 0-3 mapping to a severity level across a number of top level classes and dozens of languages. Today, hundreds of customers send billions of text strings each month through this API to protect their online communities. A top feature request has been explanations for why our model assigns the scores it does, especially for foreign languages. While some moderation scores may be clear, there also may be ambiguity around edge cases for why a string was scored the way it was. This is where our new Text Moderation Explanations API comes in—delivering additional context and visibility into moderation results in a scalable way. With Text Moderation Explanations, human moderators can quickly interpret results and utilize the additional information to take appropriate action. A Supplement to Our Text Moderation Model Our Text Moderation classes are ordered by severity, ranging from level 3 (most severe) to level 0 (benign). These classes correspond to the possible scores Text Moderation can give a text string. For example: If a text string falls under the “sexual” head and contains sexually explicit language, it would be given a score of 3. The Text Moderation Explanations API takes in three inputs: a text string, its class label (either “sexual”, “bullying”, “hate”, or “violence”), and the score it was assigned (either 3, 2, 1, or 0). The output is a text string that explains why the original input text was given that score relative to its class. It should be noted that Explanations is only supported for select multilevel heads (corresponding to the class labels listed previously). To develop the Explanations model, we used a supervised fine-tuning process. We used labeled data—which we internally labeled at Hive using native speakers—to fine-tune the original model for this specialized process. This process allows us to support a number of languages apart from English. Comprehensive Language Support We have built our Text Moderation Explanation API with broad initial language support. Language support solves the crucial issue of understanding why a text string (in one’s non-native language) was scored a certain way. We currently support eight different languages for Text Moderation Explanations and four top level classes: Text Moderation Explanations are now included at no additional cost as part of our Moderation Dashboard product, as shown below: Additionally, customers can also access the Text Moderation Explanations model through an API (refer to the documentation). In future releases, we anticipate adding further language and top level class support. If you’re interested in learning more or gaining test access to the Text Moderation Explanations model, please reach out to our sales team (sales@thehive.ai) or contact us here for further questions.