{"id":78,"date":"2022-06-22T14:19:00","date_gmt":"2022-06-22T14:19:00","guid":{"rendered":"https:\/\/thehive.ai\/blog\/?p=78"},"modified":"2025-03-05T06:05:04","modified_gmt":"2025-03-05T06:05:04","slug":"web-search-visual-comparisons-to-web-content-using-deep-learning","status":"publish","type":"post","link":"https:\/\/thehive.ai\/blog\/web-search-visual-comparisons-to-web-content-using-deep-learning","title":{"rendered":"Web Search: Visual Comparisons To Web Content Using Deep Learning"},"content":{"rendered":"\n<h5 class=\"at-a-glance-heading\">Contents<\/h5>\n\n\n\n<ul class=\"at-a-glance\"><li><a href=\"#anchor1\">What is Web Search?<\/a><\/li><li><a href=\"#anchor2\">Building a Visual Comparison Engine: Hive\u2019s Similarity Model and Search Index<\/a><\/li><li><a href=\"#anchor3\">How it Works: Web Search API and Visual Search Examples<\/a><\/li><li><a href=\"#anchor4\">Final Thoughts: Future Directions for Web Search<\/a><\/li><\/ul>\n\n\n\n<h2 id=\"anchor1\">What is Web Search?<\/h2>\n\n\n\n<p>Earlier this year, Hive launched a pair of API products built on new deep learning models that analyze visual similarity between images:&nbsp;<a href=\"https:\/\/thehive.ai\/blog\/find-duplicated-and-modified-nft-images-with-new-nft-search-apis\" target=\"_blank\" rel=\"noreferrer noopener\">NFT Search<\/a>&nbsp;and&nbsp;<a href=\"https:\/\/thehive.ai\/blog\/searching-custom-image-libraries-with-new-image-similarity-models\" target=\"_blank\" rel=\"noreferrer noopener\">Custom Search<\/a>. To help platforms authenticate visual content, these APIs search unstructured datasets for similar images to uncover relationships within a larger content ecosystem.<\/p>\n\n\n\n<p>These launches marked the start of a larger effort by Hive to make broader, more relational content understanding available to enterprise customers. Now, we\u2019re unveiling the most ambitious of our Intelligent Search services yet:&nbsp;<strong>Web Search,&nbsp;<\/strong>for visual comparisons to content on the open web.&nbsp;<\/p>\n\n\n\n<p>Web Search deploys our best-in-class image similarity model across stored media from billions of crawled web pages, retrieves visually similar versions of a query image, and returns these matches via API.&nbsp; The Web Search API enables automated checks against web images in a variety of use-cases, including:&nbsp;<\/p>\n\n\n\n<ul><li>Detecting misuse of image assets in copyright enforcement contexts<\/li><li>Enforcing paywalls on premium content by identifying unauthorized reposts and shares of protected media<\/li><li>Verifying originality of user-generated content like profile and marketplace photos<\/li><\/ul>\n\n\n\n<p>In this announcement, we\u2019ll take a closer look at the two pillars of Hive\u2019s visual search engine \u2013 our image similarity model and web index \u2013 and preview how the Web Search API works.&nbsp;<\/p>\n\n\n\n<h2 id=\"anchor2\"><strong>Building a Visual Comparison Engine: Hive\u2019s Similarity Model and Search Index<\/strong><\/h2>\n\n\n\n<p>The backbone of Web Search is Hive\u2019s visual similarity model, a deep vision model that conducts pair-wise visual comparisons between images.&nbsp; Unlike typical fingerprinting algorithms, our models assess visual similarity based on high-level feature alignment to mimic (and surpass) human perceptual comparison.&nbsp; To build this, we used contrastive learning on image sets including substantial augmentations and negative examples, training a robust visual similarity model without relying on supervisory labels to teach specific definitions.<\/p>\n\n\n\n<p>The resulting model considers both duplicates of an image and modified versions as similar \u2013 including overlay elements, filters and edits, and adversarial modifications. For a pair of images, the model returns a normalized score between 0 and 1 correlated with the output of its contrastive loss function (i.e., based on similarity of feature vectors). A pair-wise similarity score of 1 indicates an exact visual match between images, while lower scores reflect the extent of any visual differences.&nbsp;<\/p>\n\n\n\n<p>A robust image comparison model is a necessary part of a visual search engine, but not entirely sufficient. For Web Search to be broadly useful, we also needed a comprehensive reference database of images to compare against. To do this, Hive built and deployed a custom web crawler to continuously retrieve image content on public pages. Since we began crawling, we\u2019ve grown this dataset to tens of billion images, which continues to expand as our crawler encounters new web pages and freshly posted content. To enable more detailed search results, we also index URL and domain information, alt text, and other image metadata that can be returned alongside matches in the API response.<\/p>\n\n\n\n<h2 id=\"anchor3\"><strong>Putting it Together: Web Search API and Visual Search Examples<\/strong><\/h2>\n\n\n\n<p>Given a query image, the Web Search API uses the similarity model to compare against all reference images in our web index and returns all matches above a threshold similarity score.&nbsp; For each match, the API response specifies:&nbsp;<\/p>\n\n\n\n<ul><li>A direct link (URL) to the matching image<\/li><li>A backlink to the domain where the matching image was found<\/li><li>A similarity score between the query image and the match<\/li><\/ul>\n\n\n\n<p>Here are responses from a few example searches that show the versatility of the Web Search API:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" width=\"1024\" height=\"427\" src=\"https:\/\/staticblog.thehive.ai\/uploads\/2024\/07\/Graph-1-1024x427.jpg\" alt=\"\" class=\"wp-image-225\" srcset=\"https:\/\/staticblog.thehive.ai\/uploads\/2024\/07\/Graph-1-1024x427.jpg 1024w, https:\/\/staticblog.thehive.ai\/uploads\/2024\/07\/Graph-1-300x125.jpg 300w, https:\/\/staticblog.thehive.ai\/uploads\/2024\/07\/Graph-1-768x320.jpg 768w, https:\/\/staticblog.thehive.ai\/uploads\/2024\/07\/Graph-1-1536x640.jpg 1536w, https:\/\/staticblog.thehive.ai\/uploads\/2024\/07\/Graph-1.jpg 1920w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>Web Search is well-suited to help marketplaces automatically identify potential scam listings that use images taken from the open web. For example, we queried the left image from a suspiciously cheap rental ad that looked a little too good to be true. Web Search uncovered photos from a real listing for the unit on the realtor\u2019s website. The two photos are almost identical except for slightly lower resolution in the scam image; our similarity model predicts accordingly with a similarity score of 0.99.&nbsp;<\/p>\n\n\n\n<p>Let\u2019s look at another example, this time with more visually impactful differences:&nbsp;<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" width=\"1024\" height=\"427\" src=\"https:\/\/staticblog.thehive.ai\/uploads\/2024\/07\/Graphic-2-1024x427.jpg\" alt=\"\" class=\"wp-image-226\" srcset=\"https:\/\/staticblog.thehive.ai\/uploads\/2024\/07\/Graphic-2-1024x427.jpg 1024w, https:\/\/staticblog.thehive.ai\/uploads\/2024\/07\/Graphic-2-300x125.jpg 300w, https:\/\/staticblog.thehive.ai\/uploads\/2024\/07\/Graphic-2-768x320.jpg 768w, https:\/\/staticblog.thehive.ai\/uploads\/2024\/07\/Graphic-2-1536x640.jpg 1536w, https:\/\/staticblog.thehive.ai\/uploads\/2024\/07\/Graphic-2.jpg 1920w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>Here, the query image incorporates the original but uses a significant digital overlay. Still, our similarity model identifies the source image as a match with a similarity score of 0.7. The ability to recognize edited photos enables Web Search to help social and dating platforms identify impersonation attempts (\u201ccatfishing\u201d) that use web photos on their profile, even if those photos have been noticeably modified.&nbsp;<\/p>\n\n\n\n<p>Here\u2019s a similar example where the query image is \u201cclean\u201d and the matching image is modified with a text overlay:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" width=\"1024\" height=\"464\" src=\"https:\/\/staticblog.thehive.ai\/uploads\/2024\/07\/Graphic-3-1024x464.jpg\" alt=\"\" class=\"wp-image-227\" srcset=\"https:\/\/staticblog.thehive.ai\/uploads\/2024\/07\/Graphic-3-1024x464.jpg 1024w, https:\/\/staticblog.thehive.ai\/uploads\/2024\/07\/Graphic-3-300x136.jpg 300w, https:\/\/staticblog.thehive.ai\/uploads\/2024\/07\/Graphic-3-768x348.jpg 768w, https:\/\/staticblog.thehive.ai\/uploads\/2024\/07\/Graphic-3-1536x696.jpg 1536w, https:\/\/staticblog.thehive.ai\/uploads\/2024\/07\/Graphic-3.jpg 1920w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>In this case, the matching image reuses the original with text stylized as a magazine cover, and our model correctly identifies the edited version. With similar queries, Web Search can help platforms track down misuses of stock photos and copyrighted images, or reposts of premium (paywall) content to other websites.&nbsp;<\/p>\n\n\n\n<p>In their own searches, platforms can use our model\u2019s similarity scores to target duplicates or close copies at high score thresholds and\/or broaden searches to visually related images at lower scores to align with their definitions and intended use-cases.<\/p>\n\n\n\n<h2 id=\"anchor4\"><strong>Final Thoughts: Future Directions for Web Search<\/strong><\/h2>\n\n\n\n<p>Hive\u2019s Visual Search APIs offer enterprise customers new insight into how their visual content is used and where it comes from with on-demand searches on their own content (Custom Search), blockchains (NFT Search), and, now, the open web (Web Search). The capabilities of our image similarity model and other content tagging models raise the bar on what\u2019s possible in the search space.<\/p>\n\n\n\n<p>In building these datasets, we\u2019re also thinking about ways to unlock other actionable insights within our search indexes. As a next step, we\u2019ll be broadening our web index to include videos, GIFs, and audio data. From there, we plan to support using our targeted content tagging models \u2013 logo detectors, OCR, scene classification, and more \u2013 to enable open web searches across content modalities, content-targeted ad placements, and other use-cases in the future.&nbsp;<\/p>\n\n\n\n<p>To learn more about Web Search or our other visual search APIs, you can contact us&nbsp;<a href=\"https:\/\/thehive.ai\/contact-us?source=blog\" target=\"_blank\" rel=\"noreferrer noopener\">here<\/a>&nbsp;or reach out to our&nbsp;<a href=\"mailto:sales@thehive.ai\" target=\"_blank\" rel=\"noreferrer noopener\">sales team<\/a>&nbsp;directly.&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Hive announces a first-of-its-kind API that checks images against web sources. Web Search harnesses deep learning for human-level visual similarity analysis.<\/p>\n","protected":false},"author":1,"featured_media":216,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"kia_subtitle":""},"categories":[8,3],"tags":[],"_links":{"self":[{"href":"https:\/\/thehive.ai\/blog\/wp-json\/wp\/v2\/posts\/78"}],"collection":[{"href":"https:\/\/thehive.ai\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/thehive.ai\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/thehive.ai\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/thehive.ai\/blog\/wp-json\/wp\/v2\/comments?post=78"}],"version-history":[{"count":5,"href":"https:\/\/thehive.ai\/blog\/wp-json\/wp\/v2\/posts\/78\/revisions"}],"predecessor-version":[{"id":417,"href":"https:\/\/thehive.ai\/blog\/wp-json\/wp\/v2\/posts\/78\/revisions\/417"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/thehive.ai\/blog\/wp-json\/wp\/v2\/media\/216"}],"wp:attachment":[{"href":"https:\/\/thehive.ai\/blog\/wp-json\/wp\/v2\/media?parent=78"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/thehive.ai\/blog\/wp-json\/wp\/v2\/categories?post=78"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/thehive.ai\/blog\/wp-json\/wp\/v2\/tags?post=78"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}