BACK TO ALL BLOGS

Find Duplicated and Modified NFT Images with New NFT Search APIs

Contents


Why We Built the NFT Search API

Artists, technologists, and collectors have recently shown growing interest in non-fungible tokens (NFTs) as digital collectibles. With this surge in popularity, however, the red-hot NFT space has also become a prime target for plagiarism, copycats, and other types of fraud

While built-in blockchain consensus mechanisms are highly effective at validating the creation, transaction, and ownership of NFTs, these “smart contracts” are typically not large enough to store the files they represent. Instead, the token simply links to a metadata file with a public link to the image asset. So while the token on the blockchain is itself unique, the underlying image may not be.

Additionally, current blockchain technology has no way of understanding image content or the relationships between images. Hashing checks and other conventional methods cannot address the subjective and more complicated problem of human perceptual similarity between images.

Due to these technical limitations, the same decentralization that empowers creators to sell their work independently also enables bad actors to create copycat tokens with unlicensed or modified image assets. At a minimum, this puts less sophisticated NFT buyers at risk as they may be unable to tell the difference between original and stolen arts; beyond this, widespread duplication also undermines the value proposition of original tokens as unique collectibles. 

To help solve this problem, we are excited to offer NFT Search, a new API product built on a searchable index of major blockchain image assets and using Hive’s robust image similarity model.  

NFT Search makes an otherwise opaque dataset easily accessible, allowing marketplaces and other stakeholders to search existing NFT image assets for matches to query images, accurately identifying duplicates and modified copies. NFT Search has the potential to provide much-needed confidence across the NFT ecosystem to help accelerate growth and stability in the market.  

This post explains how our model works and the new API that makes this functionality accessible.

How Our Models Assess Similarity Between NFT Images

Hive’s NFT Search model is a deep vision image similarity model optimized for the types of digital art used in NFTs. To build this model, we used contrastive learning and other self-supervised techniques to expose a range of possible image augmentation methods. We then fine-tuned our notion of image similarity in order to account for a characteristic feature of NFTs: small, algorithmically-generated trait differences between images intended to be unique tokens.

The resulting model is targeted toward exact visual matches, but also resilient to manual manipulations and computer-generated variants that would bypass conventional hashing checks. 

To quantify visual similarity between a query image and existing NFT image assets, the model returns similarity scores normalized between 0 and 1 for each identified match. For a matching NFT image, a similarity score of 1.0 indicates that the query image is an exact duplicate of the matching image. Lower scores indicate that the query image has been modified or is otherwise visually distinct in some way. 

Building a Robust NFT Index for Broad Similarity Searches

Building a robust image comparison model was a necessary first step, but to make a NFT search solution useful we also needed to construct a near-complete set of existing NFT images as a reference set for broad comparisons. To do this, Hive crawls and indexes NFT images referenced on the Ethereum and Polygon blockchains in real-time, with support for additional blockchains in development. We also store identifying metadata from the associated tokens – including token IDs and URLs, contract addresses, and descriptors – to create a searchable “fingerprint” of each blockchain that enables comprehensive visual comparisons. 

Putting it all together: Example NFT Searches and Model Predictions

At a high level: when receiving a query image, our NFT model compares the query image against each existing NFT image in this dataset. The NFT Search API then returns a list of any identified matches, including links to the matching images and token metadata. 

To get a sense of NFT Search’s capabilities and how our scores align with human perceptual similarity, here’s a look at a few copycat tokens the model identified in recent searches: 

Side-by-side comparison of the BAYC 4819 NFT art with a visually identical duplicate found by Hive's NFT Search API. The matching NFT was identified on the Polygon blockchain. Hive's NFT Search model returned a similarity score of 1, correctly indicating an exact visual match.

This is an example of an exact duplicate (similarity score 1.00): a copy of one of the popular Bored Ape Yacht Club arts minted on the Polygon blockchain. Because NFT Search compares the query image to Hive’s entire NFT dataset, it is able to identify matching images across multiple blockchains and token standards. 

Things get more interesting when we look for manually or programmatically manipulated variants at lower similarity scores. Take a look at the results from the search on another Bored Ape token, number 320: 

Side-by-side comparison of the original BAYC 320 NFT art with four variants identified by Hive's NFT Search API. Matches include both basic manual manipulations (e.g., rotations) and more complex computer-manipulated copies. Similarity scores returned by Hive's NFT model ranged from 0.97 down to 0.71 for more heavily modified variants.

This search returned many matches, including several exact matches on both the Ethereum and Polygon blockchains. Here’s a look at other, non-exact matches it found:

  • Variant 1: A basic variant where the original Bored Ape 320 image is mirrored horizontally. This simple manipulation has little impact on the model’s similarity prediction. 
  • Variant 2 – “BAPP 320”: An example of a computer-manipulated copy on the Ethereum blockchain. The token metadata describes the augmented duplicate as an “AI-pixelated NFT” that is “inspired by the original BAYC collection.” Despite visual differences, the resulting image is structurally quite similar to the original, and our NFT model predicted accordingly (score = 0.94). 
  • Variant 3 – “VAYC 5228”: A slight variant located on the Ethereum blockchain. The matching image has a combination of Bored Ape art traits that does not exist in the original collection, but since many traits match, the NFT model still returns a relatively high similarity score (0.85).  
  • Variant 4 – These Apes Don’t Exist #274: Another computer-manipulated variant, but this one results in a new combination of Bored Ape traits and visible changes to the background. The token metadata, describes these as “AI-generated apes with hyper color blended visual traits imagined by a neural network.” Due to these clear visual and feature differences, this match yielded a lower similarity score (0.71)

NFT Search API: Response Object and Match Descriptions

Platforms integrate our NFT Search API response into their workflows to automatically submit queries when tokens are minted, listed for sale, or sold, and receive model prediction results in near-real time. 

The NFT Search API will return a full JSON response listing any NFTs that match the query image. For each match, the response object includes:

  • A link (URL or IPFS address) to the matching NFT image
  • A similarity score 
  • The token URL,
  • Any descriptive token metadata hosted at the token URL (e.g., traits and other descriptors), and
  • The unique contract address and token ID pair

To make the details of the API response more concrete, here’s the response object for the “BAPP 320” match shown above: 

"matches": [
    ...    
    {
        "url": "ipfs://QmY6RZ29zJ7Fzis6Mynr4Kyyw6JpvvAPRzoh3TxNxfangt/320.jpg",
        "token_id": "320",
        "contract_address": "0x1846e4EBc170BDe7A189d53606A72d4D004d614D",
        "token_url": "ipfs://Qmc4onW4qT8zRaQzX8eun85seSD8ebTQjWzj4jASR1V9wN/320.json",
        "image_hash": "ce237c121a4bd258fe106f8965f42b1028e951fbffc23bf599eef5d20719da6a",
        "blockchain": "ethereum", //currently, this will be either "ethereum" or "Polygon"
        "metadata":{
             "name": "Pixel Ape #320",
             "description": "**PUBLIC MINTING IS LIVE NOW: https://bapp.club.** *Become a BAPP member for only .09 ETH.* The BAPP is a set of 10,000 Bored Ape NFTs inspired by the original BAYC collection. Each colorful, AI-pixelated NFT is a one-of-a-kind collectible that lives on the Ethereum blockchain. Your Pixel Bored Ape also serves as your Club membership card, granting you access to exclusive benefits for Club members.",
             "image": "ipfs://QmY6RZ29zJ7Fzis6Mynr4Kyyw6JpvvAPRzoh3TxNxfangt/320.jpg",
             "attributes":[
             {
             //list of traits for NFT art if applicable 
             },
"similarity_score": 0.9463750907624477
    },
    ...
]

Aside from identifying metadata, the response object also includes a SHA256 hash of the NFT image currently hosted at the image URL. The hash value (and/or a hash of the query image) can be used to confirm an exact match, or to verify that the NFT image hosted at the URL has not been modified or altered at a later time. 

Final Thoughts

Authenticating NFTs is an important step forward in increasing trust between marketplaces, collectors, and creators who are driving the growth in this new digital ecosystem. We also recognize that identifying duplicates and altered copies within blockchains is just one part of a broader problem, and we’re currently hard at work on complementary authentication solutions that will expand our comparison scope from blockchains to the open web.

If you’d like to learn more about NFT Search and other solutions we’re building in this space, please feel free to reach out to sales@thehive.ai or contact us here.