Contents
- Navigating an Increasingly Generative World
- Structuring the Study
- Evaluation Methods and Findings
- Final Thoughts Moving Forward
Navigating an Increasingly Generative World
To the untrained eye, distinguishing human-created art from AI-generated content can be difficult. Hive’s commitment to providing customers with API-accessible solutions for challenging problems led to the creation of our AI-Generated Image and Video Detection API, which classifies images as human-created or AI-generated. Our model was evaluated in an independent study conducted by Anna Yoo Jeong Ha and Josephine Passananti from the University of Chicago, which sought to determine who was more effective at classifying images as AI-generated: humans or automated detectors.
Ha and Passananti’s study addresses a growing problem within the generative AI space: As generative AI models become more advanced, the boundary between human-created art and AI-generated images has become increasingly indistinguishable. With such powerful tools being accessible to the general public, various legal and ethical concerns have been raised regarding the misuse of said technology.
Such concerns are pertinent to address because the misuse of generative AI models negatively impacts both society at large and the AI models themselves. Bad actors have used AI-generated images for harmful purposes, such as spreading misinformation, committing fraud, or scamming individuals and organizations. As only human-created art is eligible for copyright, businesses may attempt to bypass the law by passing off AI-generated images as human-created. Moreover, multiple studies (on both generative image and text models) have shown evidence that AI models will deteriorate if their training data solely consists of AI-generated content—which is where Hive’s classifier comes in handy.
The study’s results show that Hive’s model outperforms both its automated peers and highly-trained human experts in differentiating between human-created art versus AI-generated images across most scenarios. This post examines the study’s methodologies and findings, in addition to highlighting our model’s consistent performance across various inputs.
Structuring the Study
In the experiment, researchers evaluated the performance of five automated detectors (three of which are commercially available, including Hive’s model) and humans against a dataset containing both human-created and AI-generated images across various art styles. Humans were categorized into three subgroups: non-artists, professional artists, and expert artists. Expert artists are the only subgroup with prior experience in identifying AI-generated images.
The dataset consists of four different image groups: human-created art, AI-generated images, “hybrid images” which combine generative AI and human effort, and perturbed versions of human-created art. A perturbation is defined as a minor change to the model input aimed at detecting vulnerabilities in the model’s structure. Four perturbation methods are used in the study: JPEG compression, Gaussian noise, CLIP-based Adversarial Perturbation (which performs perturbations at the pixel level), and Glaze (a tool used to protect human artists from mimicry by introducing imperceptible perturbations on the artwork).
After evaluating the model on unperturbed imagery, the researchers proceeded to more advanced scenarios with perturbed imagery.
Evaluation Methods and Findings
The researchers evaluated the automated detectors on four metrics: overall accuracy (ratio of training data classified correctly to the entire dataset), false positive rate (ratio of human-created art misclassified as AI-generated), false negative rate (ratio of AI-generated images misclassified as human-created), and AI detection success rate (ratio of AI-generated images correctly classified as AI-generated to the total amount of AI-generated images).
Among automated detectors, Hive’s model emerged as the “clear winner” (Ha and Passananti 2024, 6). Not only does it boast a near-perfect 98.03% accuracy rate, but it also has a 0% false positive rate (i.e., it never misclassifies human art) and a low 3.17% false negative rate (i.e., it rarely misclassifies AI-generated images). According to the authors, this could be attributed to Hive’s rich collection of generative AI datasets, with high quantities of diverse training data compared to its competitors.
Additionally, Hive’s model proved to be resistant against most perturbation methods, but faced some challenges classifying AI-generated images processed with Glaze. However, it should be noted that Glaze’s primary purpose is as a protection tool for human artwork. Glazing AI-generated images is a non-traditional use case with minimal training data available as a result. Thus, Hive’s model’s performance with Glazed AI-generated images has little bearing on its overall quality.
Final Thoughts Moving Forward
When it comes to automated detectors and humans alike, Hive’s model is unparalleled. Even compared to human expert artists, Hive’s model classifies images with higher levels of confidence and accuracy.
While the study considers the model’s potential areas for improvement, it is important to note that the study was published in February 2024. In the months following the study’s publication, Hive’s model has vastly improved and continues to expand its capabilities, with 12+ model architectures added since.
If you’d like to learn more about Hive’s AI-Generated Image and Video Detection API, a demo of the service can be accessed here, with additional documentation provided here. However, don’t just trust us, test us: reach out to sales@thehive.ai or contact us here, and our team can share API keys and credentials for your new endpoints.