BACK TO ALL BLOGS

Web Search: Visual Comparisons To Web Content Using Deep Learning

Contents

What is Web Search?

Earlier this year, Hive launched a pair of API products built on new deep learning models that analyze visual similarity between images: NFT Search and Custom Search. To help platforms authenticate visual content, these APIs search unstructured datasets for similar images to uncover relationships within a larger content ecosystem.

These launches marked the start of a larger effort by Hive to make broader, more relational content understanding available to enterprise customers. Now, we’re unveiling the most ambitious of our Intelligent Search services yet: Web Search, for visual comparisons to content on the open web. 

Web Search deploys our best-in-class image similarity model across stored media from billions of crawled web pages, retrieves visually similar versions of a query image, and returns these matches via API.  The Web Search API enables automated checks against web images in a variety of use-cases, including: 

  • Detecting misuse of image assets in copyright enforcement contexts
  • Enforcing paywalls on premium content by identifying unauthorized reposts and shares of protected media
  • Verifying originality of user-generated content like profile and marketplace photos

In this announcement, we’ll take a closer look at the two pillars of Hive’s visual search engine – our image similarity model and web index – and preview how the Web Search API works. 

Building a Visual Comparison Engine: Hive’s Similarity Model and Search Index

The backbone of Web Search is Hive’s visual similarity model, a deep vision model that conducts pair-wise visual comparisons between images.  Unlike typical fingerprinting algorithms, our models assess visual similarity based on high-level feature alignment to mimic (and surpass) human perceptual comparison.  To build this, we used contrastive learning on image sets including substantial augmentations and negative examples, training a robust visual similarity model without relying on supervisory labels to teach specific definitions.

The resulting model considers both duplicates of an image and modified versions as similar – including overlay elements, filters and edits, and adversarial modifications. For a pair of images, the model returns a normalized score between 0 and 1 correlated with the output of its contrastive loss function (i.e., based on similarity of feature vectors). A pair-wise similarity score of 1 indicates an exact visual match between images, while lower scores reflect the extent of any visual differences. 

A robust image comparison model is a necessary part of a visual search engine, but not entirely sufficient. For Web Search to be broadly useful, we also needed a comprehensive reference database of images to compare against. To do this, Hive built and deployed a custom web crawler to continuously retrieve image content on public pages. Since we began crawling, we’ve grown this dataset to tens of billion images, which continues to expand as our crawler encounters new web pages and freshly posted content. To enable more detailed search results, we also index URL and domain information, alt text, and other image metadata that can be returned alongside matches in the API response.

Putting it Together: Web Search API and Visual Search Examples

Given a query image, the Web Search API uses the similarity model to compare against all reference images in our web index and returns all matches above a threshold similarity score.  For each match, the API response specifies: 

  • A direct link (URL) to the matching image
  • A backlink to the domain where the matching image was found
  • A similarity score between the query image and the match

Here are responses from a few example searches that show the versatility of the Web Search API:

Web Search is well-suited to help marketplaces automatically identify potential scam listings that use images taken from the open web. For example, we queried the left image from a suspiciously cheap rental ad that looked a little too good to be true. Web Search uncovered photos from a real listing for the unit on the realtor’s website. The two photos are almost identical except for slightly lower resolution in the scam image; our similarity model predicts accordingly with a similarity score of 0.99. 

Let’s look at another example, this time with more visually impactful differences: 

Here, the query image incorporates the original but uses a significant digital overlay. Still, our similarity model identifies the source image as a match with a similarity score of 0.7. The ability to recognize edited photos enables Web Search to help social and dating platforms identify impersonation attempts (“catfishing”) that use web photos on their profile, even if those photos have been noticeably modified. 

Here’s a similar example where the query image is “clean” and the matching image is modified with a text overlay:

In this case, the matching image reuses the original with text stylized as a magazine cover, and our model correctly identifies the edited version. With similar queries, Web Search can help platforms track down misuses of stock photos and copyrighted images, or reposts of premium (paywall) content to other websites. 

In their own searches, platforms can use our model’s similarity scores to target duplicates or close copies at high score thresholds and/or broaden searches to visually related images at lower scores to align with their definitions and intended use-cases.

Final Thoughts: Future Directions for Web Search

Hive’s Visual Search APIs offer enterprise customers new insight into how their visual content is used and where it comes from with on-demand searches on their own content (Custom Search), blockchains (NFT Search), and, now, the open web (Web Search). The capabilities of our image similarity model and other content tagging models raise the bar on what’s possible in the search space.

In building these datasets, we’re also thinking about ways to unlock other actionable insights within our search indexes. As a next step, we’ll be broadening our web index to include videos, GIFs, and audio data. From there, we plan to support using our targeted content tagging models – logo detectors, OCR, scene classification, and more – to enable open web searches across content modalities, content-targeted ad placements, and other use-cases in the future. 

To learn more about Web Search or our other visual search APIs, you can contact us here or reach out to our sales team directly. 

BACK TO ALL BLOGS

Updates to Hive’s Best-in-Class Visual Moderation Model

Contents

Hive’s visual classifier is a cornerstone of our content moderation suite. Our visual moderation API has consistently been the best solution on the market for moderating key types of image-based content, and some of the world’s largest content platforms continue to trust Hive’s Visual Moderation model for effective automated enforcement on NSFW images, violence, hate, and more. 

As content moderation needs have evolved and grown, our visual classifier has also expanded to include 75 moderation classes across 31 different model heads. This is usually an iterative process – as our partners continue to send high volumes of content for analysis, we uncover ways to refine our classification schemes and identify new, useful types of content.

Recently, we’ve worked on broadening our visual model by defining new classes with input from our customers. And today, we’re shipping the general release of our latest visual moderation model, including three new classes to bolster our existing model capabilities:

  • Undressed to target racier suggestive images that may not be explicit enough to label NSFW
  • Gambling to capture betting in casinos or on games and sporting events
  • Confederate to capture imagery of the Confederate flag and graphics based on its design

All Hive customers can now upgrade to our new model to access predictions in these new classes at no additional cost. In this post, we’ll take a closer look at how these classes can be used and our process behind creating them.

New Visual Moderation Classes For Greater Content Understanding

Deep learning classifiers are most effective when given training data that illustrates a clear definition of what does and does not belong in the class. For this release, we used our distributed data labeling workforce – with over 5 million contributors – to efficiently source instructive labels on millions of training images relevant to our class definitions. 

Below, we’ll take a closer look at some visual examples to illustrate our ground truth definitions for each new class. 

Undressed

In previous versions, Hive’s visual classifier separated adult content into two umbrella classes: “NSFW,” which includes nudity and other explicit sexual classes, and “Suggestive,” which captures milder classes that might still be considered inappropriate. 

Our “Suggestive” class is a bit broad by design, and some customers have expressed interest in a simple way to identify the racier cases without also flagging more benign images (e.g., swimwear in beach photos). So, for this release, we trained a new class to refine this distinction: undressed

We wanted this class to capture images where a subject is clearly nude, even if their privates aren’t visible due to their pose, are temporarily covered by their hands or an object, or are occluded by digital overlays like emojis, scribbles, or shapes. To construct our training set, we added new annotations to existing training images for our NSFW and Suggestive classes and sourced additional targeted examples. Overall, this gave us a labeled set of 2.6M images to teach this ground truth to our new classifier. 

Here’s a mild example to help illustrate the difference between our undressed and NSFW definitions (you can find a full definition for undressed and other relevant classes in our documentation): 

Confidence scores for unedited version (left): undressed 1.00; general_nsfw 1.00; general_suggestive 0.00. Confidence scores for edited version (right): undressed 0.99; general_nsfw 0.35; general_suggestive 0.61
Confidence scores for unedited version (left): undressed 1.00; general_nsfw 1.00; general_suggestive 0.00. Confidence scores for edited version (right): undressed 0.99; general_nsfw 0.35; general_suggestive 0.61

The first image showing explicit nudity is classified as both undressed and NSFW with maximum confidence. When we add a simple overlay over relevant parts of the image, however, the NSFW score drops far below threshold confidence while the undressed score remains very high. 

Platforms can use undressed to flag both nudity and more obviously suggestive images in a single class. For content policies where milder images are allowed but undressed-type images are not, we expect this class to significantly reduce any need for human moderator review to enforce this distinction. 

Gambling

Gambling was another type of content that frequently came up in customer feedback. This was a new undertaking for Hive, and building our ground truth and training set for this class was an interesting exercise in definitions and evaluating context in images. 

Technically, gambling involves a wager staked on an uncertain outcome with the intent of winning a prize. For practical purposes, though, we decided to consider evidence of betting as the key factor. Certain behavior – like playing a slot machine or buying a lottery ticket – is always gambling since it requires a bet. But cards, dice, and competitive games don’t necessarily involve betting. We found the most accurate approach to be requiring visible money, chips or other tokens in these cases in order to flag an image as gambling. Similarly, we don’t consider photos at races or sporting events to be gambling unless receipts from a betting exchange or website are also shown.  

To train our new class on this ground truth definition, we sourced and labeled a custom set of over 1.1M images. The new visual classifier can now distinguish between gambling activity and similar non-gambling behavior, even if the images are visually similar:

For more detailed information, you can see a full description of our gambling class here. Platforms that wish to moderate or identify gambling can access predictions from this model head by default after upgrading to this model release. 

Confederate Symbolism

Separately, many of our customers also expressed interest in more complete monitoring of visual hate and white nationalism, especially Confederate symbolism. For this release, we sourced and labeled over 1M images to train a new class for identifying imagery of the commonly used version of the Confederate flag. 

In addition to identifying photos of the flag itself, this new model head will also capture the Confederate “stars and bars” shown in graphics, tattoos, clothing, and the like. We also trained the model to ignore visually similar flags and historical variants that are not easily recognizable:

Along with our other hate classes, customers can now use predictions from our Confederate class to keep their online environments safe.

Improvements to Established Visual Moderation Classes

Beyond these new classes, we also focused on improving the model’s understanding around niche edge cases in existing model heads. For example, we leveraged active learning and additional training examples to address biases we occasionally found in our NSFW and Gun classifiers. This corrected some interesting biases where the model sometimes incorrectly identified studio microphones as guns, or mistook acne creams for other, less safe-for-work liquids. 

Final Thoughts

This release delivers our most comprehensive and capable Visual Moderation model yet to help platforms develop proactive, cost-effective protection for their online communities. As moderation needs become more sophisticated, we’ll continue to incorporate feedback from our partners and refine our content moderation models to keep up. Stay tuned for our next release with additional classes and improvements later this year.

If you have any questions about this release, please get in touch at support@thehive.ai or api@thehive.ai. You can also find a video tutorial for upgrading to the latest model configuration here. For more information on Visual Moderation more generally, feel free to reach out to sales@thehive.ai or check out our documentation