BACK TO ALL BLOGS

Back to Our Roots: Hive Data in Academia

Hive was started by two PhD students at Stanford who were frustrated with the difficulty of generating quality datasets for machine learning research. We found that solutions on the market were either too inaccurate or too expensive to conduct the typical research study. From those early days, we’ve now built one of the world’s largest marketplaces of human labor.

In keeping with our academic roots, we always intended Hive Data to be the perfect partner for academic labs. The best way to showcase Hive Data’s impact on academia is through a real case study. A machine learning researcher, who we’ll call Professor X, had an urgent conference submission deadline coming up. He still had quite a lot of work remaining, and much of his work required him to label a large corpus of videos. He had tried many other services without success, and was urgently searching for a solution that could address all of his needs. Here were the constraints he was under, and how we solved them:

1. Professor X didn’t want to pay a large upfront cost.

Given his limited budget and inability to risk project failure, Professor X needed to ensure that the provider he chose had a competitive price point and offered him the flexibility that he needed. Other services on the market generally had fixed costs ran upwards of hundreds of thousands of dollars the first year. Even if he could afford a single engagement, if the service wasn’t up to par, he didn’t have the budget to try a different one. Hive doesn’t impose any upfront fees, so it made us a low-risk option.

2. Professor X needed to make sure the data output quality would be high enough to publish research.

While he did find some services whose rates were competitive with Hive’s, he noted quickly that all of them suffered from poor data quality, especially for tasks in data labeling for videos. This rendered the data unusable for his research project. Hive, on the other hand, offered a complex system of audits and a worker consensus model to ensure high data accuracy. Because the tasks passed through several rounds of worker auditing, Hive was able to offer the high-quality data that Professor X needed.

3. Professor X needed his results in a fast turnaround time.

As we mentioned, Professor X was on a tight deadline to submit his paper for publication. Most other services have inflexible, week-long timelines for returning datasets. Hive, however, offered a much faster turnaround time. Due to our remarkably large global workforce, we were able to scale up to finish jobs as quickly as the Professor needed. He was able to get his job finished in less than a day, whereas other providers had quoted him as long as a month!

4. Professor X was searching for a service provider that could provide technical insight during the process.

Part of Hive Data’s value proposition to its customers is in offering our own expertise in building machine learning models, as well as supplying the quality data to do so. We’d seen similar projects as the one Professor X was dealing with, and we understood the problems he would face in generating this dataset. Even before getting started, we helped Professor X optimize his project by structuring his tasks in a way that improved his results and helped him build an effective model off his data.

In addressing these needs, Professor X was able to submit his paper and get it published on time. He still continues to use Hive Data to power his AI research today.

Hive Data has already been used by top-tier university research labs all over the world, including at Stanford, MIT, Cornell, and Simon Fraser University. We’ve seen projects range from labeling datasets for vehicle detection in autonomous driving, object recognition for robotic arms, and pedestrian identification from security cameras. The number of research verticals we cater to is constantly growing, as we pride ourselves on rapid engineering cycles to release data labeling capabilities as soon as we see a need emerging.

If you’re an academic researcher and you’re curious about how we can partner together, contact me at research@thehive.ai. We’re excited to support your research!