Hive has been named to Fast Company’s prestigious annual list of the World’s Most Innovative Companies for 2020
Hive
SAN FRANCISCO, CA (March 10, 2020) – Hive has been named to Fast Company’s prestigious annual list of the World’s Most Innovative Companies for 2020.
The list honors the businesses making the most profound impact on both industry and culture, showcasing a variety of ways to thrive in today’s fast-changing world. This year’s MIC list features 434 businesses from 39 countries.
“It’s an honor to be featured in Fast Company’s list of the Most Innovative Companies for 2020,” said Kevin Guo, Co-Founder and CEO of Hive. “This recognition follows a year of step-change growth in Hive’s business and team, and symbolizes our progress in powering practical AI solutions for enterprise customers across industries.”
Hive is a full-stack AI company specialized in computer vision and deep learning, serving clients across industries with data labeling, model licensing, and subscription data products. During 2019, Hive grew to more than 100 clients, including 10 companies with market capitalizations exceeding $100 billion.
At the core of Hive’s business, the company operates the world’s largest distributed workforce of humans labeling data – now boasting nearly 2 million registered contributors globally. Hive’s workforce hand-labeled more than 1.3 billion pieces of training data in 2019, inputs to a consensus-driven workflow that powers deep learning models with unparalleled accuracy compared to similar offerings from the largest public cloud providers.
The company’s core models serve use cases including automated content moderation, logo and object detection, optical character recognition, voice transcription, and context classification. Across its models, Hive processed nearly 20 billion API calls in 2019.
The company also operates Mensio, a media analytics platform developed in partnership with Bain & Company that integrates Hive’s proprietary TV content metadata on commercial airings and camera-visible sponsorship placements with third-party viewership and outcome datasets. Mensio is currently in use by leading TV network owners, brands, and agencies for competitive intelligence, media planning, and optimization.
Fast Company’s editors and writers sought out the most groundbreaking businesses on the planet and across myriad industries. They also judged nominations received through their application process.
The World’s Most Innovative Companies is Fast Company’s signature franchise and one of its most highly anticipated editorial efforts of the year. It provides both a snapshot and a road map for the future of innovation across the most dynamic sectors of the economy.
“At a time of increasing global volatility, this year’s list showcases the resilience and optimism of businesses across the world. These companies are applying creativity to solve challenges within their industries and far beyond,” said Fast Company senior editor Amy Farley, who oversaw the issue with deputy editor David Lidsky.
Fast Company’s Most Innovative Companies issue (March/April 2020) is now available online at fastcompany.com/most-innovative-companies/2020, as well as in app form via iTunes and on newsstands beginning March 17, 2020. The hashtag is #FCMostInnovative.
About Hive
Hive is an AI company specialized in computer vision and deep learning, focused on powering innovators across industries with practical AI solutions and data labeling, grounded in the world’s highest quality visual and audio metadata. For more information, visit thehive.ai.
About Fast Company:
Fast Company is the only media brand fully dedicated to the vital intersection of business, innovation, and design, engaging the most influential leaders, companies, and thinkers on the future of business. Since 2011, Fast Company has received some of the most prestigious editorial and design accolades, including the American Society of Magazine Editors (ASME) National Magazine Award for “Magazine of the Year,” Adweek’s Hot List for “Hottest Business Publication,” and six gold medals and 10 silver medals from the Society of Publication Designers. The editor-in-chief is Stephanie Mehta and the publisher is Amanda Smith. Headquartered in New York City, Fast Company is published by Mansueto Ventures LLC, along with our sister publication Inc., and can be found online at www.fastcompany.com.
Improved content moderation suite with additional subclasses; now performs better than human moderators
Hive
The gold standard for content moderation has always been human moderators. Facebook alone reportedly employs more than 15,000 human moderators. There are critical problems with this manual approach – namely cost, effectiveness, and scalability. Headlines over recent months and years are scattered with high-profile quality issues – and, increasingly, press has covered significant mental health issues affecting full-time content moderators (View article from The Verge).
Here at Hive, we believe AI can transform industries and business processes. Content moderation is a perfect example: there is an obligation on platforms to do this better, and we believe Hive’s role is to power the ecosystem in better addressing the challenge.
We are excited to announce the general release of our enhanced content moderation product suite, featuring significantly improved NSFW and violence detections. Our NSFW model now achieves 97% accuracy and our violence model achieves 95% accuracy, considerably better than typical outsourced moderators (~80%), and even better than an individual Hive annotator (~93%).
Deep learning models are only as good as the data they are trained on, and Hive operates the world’s largest distributed workforce of humans labeling data – now nearly 2 million contributors globally (our data labeling platform is described in further detail in an earlier article).
In our new release, we have more than tripled the training data, built off of a diverse set of user-generated content sourced from the largest content platforms in the world. Our NSFW model is now trained on more than 80 million human annotations and our violence model trained on more than 40 million human annotations.
Model Design
We were selective in our construction of the training dataset, and strategically added the most impactful training examples. For instance, we utilized active learning to select training images where the existing model results were the most uncertain. Deep learning models produce a confidence score on input images which ranges from 0 (very confident the image is not in the class) to 1.0 (very confident the image is in the class). By focusing our labeling efforts on those images in the middle range (0.4 – 0.6), we were able to improve model performance specifically on edge cases.
As part of this release, we also focused on lessening ambiguity in our ‘suggestive’ class in the NSFW model. We conducted a large manual inspection of images where either Hive annotators tended to disagree, or even more crucially, when our model results disagreed with consented Hive annotations. When examining images in certain ground truth sets, we noticed that up to 25% of disagreements between model prediction and human labels were due to erroneous labels, with the model prediction being accurate. Fixing these ground truth images was critical for improving model accuracy. For instance, in the NSFW model, we discovered that moderators disagreed on niche cases, such as which class leggings, contextually implied intercourse, or sheer clothing fell into. By carefully defining boundaries and relabeling data accordingly, we were able to teach the model the distinction in these classes, improving accuracy by as much as 20%.
Classified as clean:
Classified as suggestive:
For our violence model, we noticed from client feedback that the classes of knives and guns included instances of these weapons that wouldn’t be considered cause for alarm. For example, we would flag the presence of guns during video games and the presence of knives when cooking. It’s important to note that companies like Facebook have publicly stated the challenge of differentiating between animated and real guns (View article on TechCrunch). In this release, the model now distinguishes between culinary knives and violent knives, and animated guns and real guns, by the introduction of two brand new classes to provide real, actionable alerts on weapons.
Hive can now distinguish between animated guns and real guns:
The following knife picture is not considered violent anymore:
Model Performance
The improvement of our new models compared to our old models is significant.
Our NSFW model was the first and most mature model we built, but after increasing training annotations from 58M to 80M, the model still improved dramatically. At 95% recall, our new model’s error rate is 2%, while our old model’s error rate was 4.2% – a decrease of more than 50%.
Our new violence model was trained on over 40M annotations – a more than 100% increase over the previous training set size of 16M annotations. Performance also improved significantly across all classes. At 90% recall, our new model’s error rate decreased from 27% to 10% (a 63% decrease) for guns, 23% to 10% (a 57% decrease) for knives, and 34% to 20% (a 41% decrease) for blood.
Over the past year, we’ve conducted numerous head-to-head comparisons vs. other market solutions, using both our held-out test sets as well as evaluations using data from some of our largest clients. In all of these studies, Hive’s models came out well ahead of all the other models tested.
Figures 6 and 7 show data in a recent study conducted with one of our most prominent clients, Reddit. For this study, Hive processed 15,000 randomly selected images through our new model, as well as the top three public cloud players: Amazon Rekognition, Microsoft Azure, and Google Cloud’s Vision API.
At recall 90%, Hive precision is 99%; public clouds range between 68 and 78%. This implies that our relative error rate is between 22x and 32x lower!
The outperformance of our violence model is similarly significant.
For guns, at recall 90%, Hive precision is 90%; public clouds achieve about 8%. This implies that our relative error rate is about 9.2x lower!
For knives, at recall 90%, Hive precision is 89%; public clouds achieve about 13%. This implies that our relative error rate is about 7.9x lower!
For blood, at recall 90%, Hive precision is 80%; public clouds range between 4 and 8%. This implies that our relative error rate is between 4.8x and 4.6x lower!
Final Thoughts
This latest model release raises the bar on what is possible from automated content moderation solutions. Solutions like this will considerably reduce the costs of protecting digital environments and limit the need for harmful human moderation jobs across the world. Over the next few months, stay tuned for similar model releases in other relevant moderation classes such as drugs, hate speech and symbols, and propaganda.
For press or inquires, please contact Kevin Guo, Co-Founder and CEO (kevin.guo@thehive.ai)
Next-day analysis highlights trends in measured exposure during Hollywood’s biggest night
Hive
Hollywood’s biggest night, the Academy Awards, wrapped up this year’s awards season in style. Red carpet fashion started the night and Parasite stole headlines after the Korean-language film claimed four awards including Best Picture.
While awards are permanent markers of achievement, exposure is a broader prize shared by winners, nominees, performers, and presenters. Hive’s Celebrity Model, used by agencies to measure endorsement value and by media companies to enrich metadata in their video libraries, measured the screen time earned by the stars during the 2020 Oscars.
Bong Joon Ho – who took the stage as a winner four times for Parasite – earned the most time on screen during last night’s telecast of the Academy Awards, according to data from Hive’s Celebrity Model (see Figure 1). The remainder of the top 10 was made up of winners (Joaquin Phoenix, Brad Pitt, Laura Dern), presenters (Steve Martin, Chris Rock, Kristin Wiig, Maya Rudolph), and some celebs wearing multiple hats (Elton John as a winner and performer; Taika Waititi as a winner and presenter).
Much was said leading up to the event about another host-less award show short on diversity. A diverse mix of presenters and performers aimed to compensate for nominees that skewed white and male. However, while these themes stole headlines leading up to the event and were scattered across acceptance speeches during the night, most of what was said during the show was relatively consistent year-over-year (see Figure 2).
Hive’s Speech-to-Text model, with commercial uses including transcription of audio and monitoring of brand mentions in TV, radio, and digital video, was used to track mentions and keywords from within the Oscars’ telecast. Insights from what was said across award presentations and acceptance speeches included:
Thanks were given more than 120 times and love was expressed more than 55 times – mostly to thematic groups including The Academy, parents, partners, children, and God, as well as casts and crew
Statements on diversity and inclusion – spanning gender, race, and sexual orientation – were sprinkled throughout the night and were material in aggregate
Women, plural, were referenced as a group more than 3 times as often as men, most notably differentiated by messages of strength and unity (“all women are superheroes”)
The presence of Black Panther and BlacKkKlansman in the 2019 Oscars drove more significant conversation on race during last year’s telecast, which was less in frequent this year although still referenced across multiple speeches (e.g., Matthew A. Cherry and Karin Rupert Toliver), award presentations (e.g., Chris Rock and Steve Martin), and performances (e.g., Janelle Monae)
References to current events were scattered across the awards show, reflecting topics that impacted society over the past year including climate change and the environment, politics, and the death of Kobe Bryant
For the second year in a row, Netflix earned the highest count of mentions by award recipients among media companies – even with just 2 of its 24 nominations resulting in wins
About our models:
Hive’s Celebrity Model is trained to identify more than 80,000 public figures in media content and uniquely leverages Hive’s distributed workforce of more than 1.5 million registered contributors to efficiently optimize the precision and recall of low confidence results. Commercial uses of the model include measurement of endorsement value by agencies and enrichment of metadata in media companies’ video libraries.
Hive’s Speech-to-Text Model parses and transcribes speech data from video and audio content, and can be accessed via an API or on device. The model is trained by tens of thousands of hours of paired audio and speech data. Commercial uses of the model include transcription of audio and monitoring of brand mentions in TV, radio, and digital video.
Kevin Guo is the cofounder and CEO of Hive and is based in San Francisco. Dan Calpin is President of Hive Media and a Senior Advisor with Bain & Company based in Los Angeles; he was a founding partner of Bain Media Lab. Laura Beaudin is a Bain partner in San Francisco and leads Bain’s Global Marketing Excellence practice. Andre James is a Bain partner in Los Angeles and leads Bain’s Global Media & Entertainment practice; he was a founding partner of Bain Media Lab.
Super Bowl Sunday is more than a sporting event. Here are the highlights from next-day analysis of the commercials and sponsorships within TV advertising’s biggest event.
Hive
At a Glance:
Next-day insights using Mensio, an AI-powered TV advertising and sponsorship analytics platform developed in partnership between Bain & Company and Hive, highlights insights from the commercials and sponsorships within TV advertising’s biggest event.
League and broadcast sponsors again captured significant time on screen, with 9 brands achieving more than 1 minute of total screen time outside of commercials.
Analysis of engagement with Super Bowl ads, using data from TVision, shows 2.6X higher eyes-on-screen attention during ads in the Super Bowl than ads during the NFL regular season, and 2.0X higher eyes-on-screen attention with the game itself and the sponsorship placements visible within it.
Commercial minutes were led by advertisers also present in last year’s game – 25 companies representing 52% of national airtime in this year’s Super Bowl. Increased share of voice came from consumer goods advertisers, whereas financial services & insurance companies opted for a smaller advertising presence during the game.
Advertisers increased the share of commercials featuring celebrities and greater diversity.
Since winning their respective conference championships two weeks ago, the San Francisco 49ers and Kansas City Chiefs were heads down planning their schemes to achieve an on-field advantage in yesterday’s big game. For many months prior, brands and agencies were drawing up their own plays to break through on game day with memorable and viral creative.
What did we learn? For the second year, Bain Media Lab and Hive have partnered to analyze marketing within and around the Super Bowl using Mensio, an AI-powered TV advertising and sponsorship analytics platform developed in partnership between Bain and Hive.
The research relied on analysis of Mensio’s creative library, powered by metadata created using Hive’s proprietary computer vision models and Hive’s consensus-driven data labeling platform which leverages a distributed workforce of more than 1.5 million registered contributors.
Sponsors capture significant Super Bowl screen time
While Super Bowl ads may lead water cooler conversations this week, official league and broadcast sponsors achieved significant time on screen during yesterday’s Super Bowl through camera-visible signage, product placement and digital billboards in the telecast. Using Hive’s proprietary logo detection model, trained to automatically detect exposure for more than 4,000 brands with more than 200 million individual pieces of human-labeled training data, Bain Media Lab measured the quantity and quality of logo placements within the TV broadcast of the game and halftime show. While sponsorship placements don’t offer the sight-and-sound of traditional ad units, brands and their agencies are increasingly applying more quantitative rigor to understand the level and value of exposure that these activations deliver across platforms. Consistent with last year’s Super Bowl, the 3 most exposed brands were Nike, Bose, and Pepsi. Nike, the NFL’s uniform and on-field apparel supplier, logged more than 45 minutes of cumulative Super Bowl screen time with swooshes visible on uniforms, cleats and other sideline apparel. Bose, the league’s official headset provider, and Pepsi, which again sponsored the game’s halftime show, each totaled more than 3 minutes of cumulative screen time (see Figure 1). Among sponsors, Gatorade’s camera-visible exposure grew the most year-over-year, tallying 3 minutes and 12 seconds of exposure in Super Bowl LIV spread across bottles, cups, coolers and towels, surging from 1 minute and 20 seconds of time on screen during last year’s big game. In total, eleven brands surpassed 30 seconds of cumulative brand exposure within the Super Bowl LIV telecast (not including the pre-game show and excluding league, team and network brands). Among the top brands, Hard Rock, Amazon, and Pepsi achieved the highest average Brand Prominence Score, a proprietary measure of the size, clarity, centrality, and share of voice for a given exposure. Hard Rock, which holds stadium naming rights, earned its prominence through in-stadium signage whereas exposure for Amazon and Pepsi was highlighted by recurrent digital overlays on the telecast.
Brand Prominence Score is a proprietary metric that reflects the size, clarity and location on the screen, as well as the presence of other brands or objects, measured every second
…But Are People Really Watching? (Yes, They Really Are)
The reported price of a 30 second Super Bowl spot in this year’s game rose to as much as $5.6 million, powered by continued demand from advertisers. While Super Bowl advertisements are objectively differentiated in their ability to reach a uniquely large live audience, many marketers have also long contended that Super Bowl ads reach a more engaged audience. In collaboration with TVision, a company focused on measuring how viewers engage with television content, we confirmed this hypothesis applying computer vision technology to viewing behaviors during this NFL season and yesterday’s finale.
Compared to 2019 regular season NFL games, yesterday’s Super Bowl delivered a dramatically more engaged audience. The game itself delivered 2.0X more eyes-on-screen attention as a percentage of total viewership for the game itself, and the sponsorship exposure within it. Even more significant, commercials achieved 2.6X more eyes-on-screen attention than commercials during NFL regular season games (see Figure 2).
Our analysis highlighted two other interesting trends specific to this year’s Super Bowl commercials:
Dedicated Advertisers Lead an Evolving Mix
Super Bowl advertisements have become annual traditions for some companies – 22 advertisers representing 52% of national airtime in this year’s Super Bowl were also present during last year’s game, where they commanded 72% of national airtime. These included stalwarts like Anheuser-Busch, which led all advertisers in airtime in both Super Bowl LIII and Super Bowl LIV, this year spread across spots for Budweiser, Bud Light, and Michelob Ultra (see Figure 3).
Super Bowl LIV also brought its share of new advertisers, with 48% of airtime coming from 25 new advertisers not present during Super Bowl LIII. Some brands were returning to the Super Bowl, such as The Hershey Company, which bought its first Super Bowl ad since 2008 to amplify awareness for the newly rebranded Reese’s Take5 bar. For others, this year marked a first Super Bowl commercial, including Facebook which promoted Facebook Groups. Other newcomers, ahead of the upcoming 2020 election, were the campaigns for President Trump and former New York Mayor Michael Bloomberg.
The net effect resulted in a different mix of advertisers than the regular season and playoffs. Notably, consumer goods companies claimed 33% of airtime in Super Bowl LIV, compared to just 8% during the entirety of this year’s NFL regular season and playoffs. The category’s Super Bowl presence was led by multiple spots from Anheuser-Busch, Proctor & Gamble, and PepsiCo. Financial services and insurance shrank from 17% of airtime in the rest of the season to only 7% of Super Bowl LIV airtime, a result of several top advertisers placing ads in the pregame show or taking the game off altogether (see Figure 4).
Advertisers Add Celebrities, Greater Diversity
What brands choose to say on TV’s largest stage is often reflective of trends and inflection points in our culture and society.
Sometimes, this is explicit – with Super Bowl ads introducing us to the cars we will be driving, the movies we will be watching, and the food and drinks we will be consuming in the years ahead. 40% of this year’s Super Bowl ads introduced new products, roughly constant year-over-year.
More nuanced is the study of trends in casting, based on analysis of creative metadata generated through a combination of Hive’s computer vision models as well as Hive’s consensus-driven data labeling platform which leverages a distributed workforce of more than 1.5 million registered contributors.
Cast analysis shows advertisers continuing to support the zeitgeist surrounding gender equality and diversity & inclusion. Women were present in 90% of spots this year, up from 74% last year. Similarly, 82% of spots this year included people from more diverse backgrounds compared 64% of spots last year (see Figure 5).
The most significant increase in casting this year was a surge in spots featuring actors and actresses, musicians, and other public figures, featured in 65% of Super Bowl LIV ads compared to just 36% of spots last year.
A resurgent NFL season is now complete. The bounce back in viewership versus the 2018 season yielded a sigh of relief for the league and its broadcast partners, and further validated the continued role of the NFL in the TV advertising landscape. However, the Super Bowl is not the only tentpole TV advertising event this month. Next Sunday, brands will be on stage again, this time targeting the premium audience watching The Oscars on ABC.
Dan Calpin is President of Hive Media and a Senior Advisor with Bain & Company based in Los Angeles; he was a founding partner of Bain Media Lab. Laura Beaudin is a Bain partner in San Francisco and leads Bain’s Global Marketing Excellence practice. Andre James is a Bain partner in Los Angeles and leads Bain’s Global Media & Entertainment practice; he was a founding partner of Bain Media Lab. Sharona Sankar-King is a partner with Bain & Company based in New York and a senior leader in Bain’s Customer Strategy & Marketing practice.
Hive is an AI company specialized in computer vision and deep learning, focused on powering innovators across industries with practical AI solutions and data labeling. For more information, visit thehive.ai.
TVision is a TV performance metrics company focused on measuring how viewers engage with television content. For more information, visit www.tvisioninsights.com.
Note: Published Bain Media Lab research relies solely on third-party data sources and is independent of any data or input from clients of Bain & Company.
Dan Calpin, President of Hive Media, shares an overview of Hive and our media business at the 2019 Plug and Play Fall Innovation Summit in Sunnyvale, CA.
LOS ANGELES – April 30, 2019 – Bain & Company announced today the formation of Bain Media Lab, a business that will feature a portfolio of digital products and related services that combine breakthrough technologies with powerful datasets. Hive, a full-stack deep learning company based in San Francisco, will be the launch partner for Bain Media Lab.
Bain Media Lab is a new venture incubated in the Bain Innovation Exchange, a business unit that leverages Bain’s network of venture capitalists, startups, and tech leaders to help clients innovate through the ecosystem, as well as support Bain in creating new ventures.
“We are excited to introduce Bain Media Lab and to announce our partnership with Hive,” said Elizabeth Spaulding, the co-lead of Bain & Company’s Global Digital practice. “Today’s milestone launch exemplifies our strategy to deepen select Bain Innovation Exchange relationships through the formation of new businesses like Bain Media Lab, which will pair Bain’s expertise with best-in-class innovation to create disruptive solutions. It will also be a powerful vehicle to dramatically accelerate the visibility and growth of innovative technology companies like Hive.”
In partnership, Bain Media Lab and Hive have developed Mensio, an artificial intelligence-powered analytics platform focused on bringing “digital-like” measurement, intelligence, and attribution to traditional television advertising and sponsorships.
Mensio addresses a pain point shared by marketers and media companies – the lack of recent and granular data on the performance of traditional television advertising and sponsorships. As digital marketing has continued to grow its share of advertising dollars, marketers have become accustomed to seeing real-time campaign performance data with granular measurement of audience reach and outcomes. This dynamic has added pressure on television network owners to source comparable data to defend their share of marketers’ advertising budgets.
“Our partnership with Hive is the result of an extensive evaluation of the landscape and our resulting conviction that together we can uniquely create truly differentiated solutions,” said Dan Calpin, who leads Bain Media Lab. “Our launch product, Mensio, unlocks the speed and granularity of data for TV advertising and sponsorships that marketers have come to expect from their digital ad spend. Mensio arms marketers and their agencies to transition from post-mortem analysis of TV ad spend to real-time optimization, and gives network owners long-elusive data that can help them recast the narrative on advertising.”
“We are excited to partner with Bain & Company as the launch partner of Bain Media Lab,” said Kevin Guo, co-founder and CEO of Hive. “In jointly developing Mensio, we have blended the distinctive competencies of our two firms into a seamlessly integrated go-to-market offering. Hive’s ambition is to leverage artificial intelligence in practical applications to transform industries, and Mensio is our flagship product in the media space.”
Subscribers to the Mensio platform access a self-service, cloud-based dashboard that provides point-and-click reporting. Two tiers of the dashboard product are available: one for the buyers of TV advertising (marketers and their agencies) and one for the sellers (TV network owners). Selected features available in the Mensio dashboard and from related services include:
Reach: Measurement of exposure to a brand’s TV advertisements for a given population, ranging from total population to specific behavior-defined segments like frequent guests at quick service restaurants
Frequency: Reporting on the distribution of frequency for a given population (e.g., what percent of households were exposed to more than 20 TV ads for a given brand over the course of a month)
Attribution: Evaluation of the impact of exposure to TV advertising and sponsorships on a broad set of outcomes, including online activity, store visitation, and purchases as well as qualitative brand metrics
Competitive intelligence for brands: Insight into a brand’s relative share of voice versus peers, as well as the mix of networks, programs, genres, dayparts, and ad formats used by a given brand relative to its competitive set
Competitive intelligence for TV network owners: Insights into trends in spending by industry vertical and brand, as well as relative share of a given TV network owner vs. competitors
Sponsorship measurement and return on investment: Measurement of the volume, quality, and equivalent media value of sponsorship placements and earned media, with the ability to link to outcomes
The Mensio product suite uses Hive’s computer vision models – trained using data labeled by Hive’s distributed global workforce of over 1 million people – to enrich recorded television content with metadata including the identification of commercials and sponsorship placements as well as contextual elements like beach scenes. Second-by-second viewership of that content is derived using data from nearly 20 million U.S. households, inclusive of cable and satellite set-top boxes as well as Smart TVs, that is then scaled nationally and can be matched in a privacy-safe environment to a range of outcome behaviors. Outcome datasets enable household-level viewership of content to be matched to online activity (including search and website visits), retail store visits, and purchases (including retail purchases as well as several data sets specific to certain industries such as automotive and consumer packaged goods).
Mensio is currently in beta in the U.S. with a growing number of clients across industries. It will begin to expand into other geographies over the next year. For more information, visit: www.bainmedialab.com/mensio.
Bain & Company and Hive are additionally collaborating on other related products and services for television network owners addressing programming optimization and content tagging use cases.
Editor’s note: To arrange an interview with Mrs. Spaulding or Mr. Calpin, contact Dan Pinkney at dan.pinkney@bain.com or +1 646 562 8102. To arrange an interview with Mr. Guo, contact Kristy Yang at press@thehive.ai or +1 415 562 6943.
About Hive
Hive is a full-stack deep learning company based in San Francisco that focuses on solving visual intelligence challenges. Today, Hive works with many of the world’s biggest companies in media, retail, security, and autonomous driving in building best–in-class computer vision models. Through its flagship enterprise platform, Hive Media, the company is aiming to build the world’s largest database of structured media content. Hive has raised over $50M from a number of well-known venture investors and strategic partners, including General Catalyst, 8VC, and Founders Fund. For more information visit: www.thehive.ai. Follow us on Twitter @hive_ai.
About Bain & Company
Bain & Company is the management consulting firm that the world’s business leaders come to when they want results. Bain advises clients on private equity, mergers and acquisitions, operations excellence, consumer products and retail, marketing, digital transformation and strategy, technology, and advanced analytics, developing practical insights that clients act on and transferring skills that make change stick. The firm aligns its incentives with clients by linking its fees to their results. Bain clients have outperformed the stock market 4 to 1. Founded in 1973, Bain has 57 offices in 36 countries, and its deep expertise and client roster cross every industry and economic sector. For more information visit: www.bain.com. Follow us on Twitter @BainAlerts.
Better training data can significantly boost the performance of a deep learning model, especially when deployed in production. In this blog post, we will illustrate the impact of dirty data, and why correct labeling is important for increasing the model accuracy.
Background
An adversarial attack fools an image classifier by adding an imperceptible amount of noise to an image. One possible way to defend against this is to simply train machine learning models on adversarial examples. We can collect various hard mining examples and add them to the dataset. Another interesting model architecture to explore is generative adversarial network, which generally consist of two parts: a generator to generate fake examples in order to fool the discriminator, and a discriminator to discriminate between clean/fake examples.
Another possible type of attack, data poisoning, can happen during training time. The attacker can identify the weak parts of a machine learning architecture, and potentially modify the training data to confuse the model. Even slight perturbations to the training data and label can result in worse performance. There are several methods to defend against such data poisoning attacks. For example, it is possible to separate clean training examples from poisoned ones, so that the outliers are deleted from the dataset.
In this blog post, we investigate the impact of data poisoning (dirty data) using the simulation method: random labeling loss. We will show that with the same model architecture and dataset size, we are able to get huge accuracy increase with better data labeling.
Data
We experiment with the CIFAR-100 dataset, which has 100 classes and 600 32×32 coloured images per class.
We use the following steps to preprocess the images in the dataset
Pad each image to 36×36, then randomly crop to 32×32 patch
Apply random flip horizontally
Distort image brightness and contrast randomly
The dataset is randomly split into 50k training images and 10k evaluation images. Random labeling is the substitution of training data labels with random labels drawn from the marginal of data labels. Different amounts of random labeling loss are added to the training data. We simply shuffle certain amount of labels for each class. The images to be shuffled are chosen randomly from each class. Because of the randomness, the generated dataset is still balanced. Note that evaluation labels are not changed.
We test the model with 4 different datasets, 1 clean and 3 noisy ones.
Clean: No random noise. We assume that all labeling is correct for CIFAR-100 dataset. Named as ‘no_noise’.
Noisy: 20% random labeling noise. Named as ‘noise_20’.
Noisy: 40% random labeling noise. Named as ‘noise_40’.
Noisy: 60% random labeling noise. Named as ‘noise_60’.
Note that we choose aggressive data poisoning because the production model we build is robust to small amount of random noise. Note that the random labeling scheme allows us to simulate the effect of dirty data (data poisoning) in real world scenario.
Model
We investigate the impact of dirty data on one of the popular model, ResNet-152 model architecture. Normally it is a good idea to perform fine-tuning on pre-trained checkpoints to get better accuracy with fewer training steps. In this blog the model is trained from scratch, because we want to get a general idea of how noisy data would affect the training and final results without any prior knowledge gained from pretraining.
We optimize the model with SGD (stochastic gradient descent) optimizer with cosine learning rate decay.
Results
Quantitative results:
Accuracy
Cleaner datasets consistently perform better on the validation set. The model trained on the original CIFAR-100 dataset gives us 0.65 accuracy, using top 5 predictions boost the accuracy to 0.87. Testing accuracy decreases with more noise added. Each time we add 20% more random noise to the training data, testing accuracy drop by about 10%. Note that even if we add 60% random labeling noise, our model still manages to get 0.24 accuracy on the validation set. The variance of the training data, preprocessing methods and regularization terms help increase the robustness of the model. So even if it is learning from a very noisy dataset, the model is still able to learn certain useful features, although the overall performance significantly degrades.
Qualitative results:
Cleaner datasets consistently perform better on the validation set. The model trained on the original CIFAR-100 dataset gives us 0.65 accuracy, using top 5 predictions boost the accuracy to 0.87. Testing accuracy decreases with more noise added. Each time we add 20% more random noise to the training data, testing accuracy drop by about 10%. Note that even if we add 60% random labeling noise, our model still manages to get 0.24 accuracy on the validation set. The variance of the training data, preprocessing methods and regularization terms help increase the robustness of the model. So even if it is learning from a very noisy dataset, the model is still able to learn certain useful features, although the overall performance significantly degrades.
Conclusion
In this post we investigate the impact of data poisoning attacks on performances using image classification as an example task, by the random labeling simulation method. We show that popular model (ResNet) is somewhat robust to data poisoning, but the performance still significantly degrades after poisoning. High-quality labeling is thus crucial to modern deep learning systems.
We’ve reached the end of the road. 1-seed Virginia were crowned NCAA Champions with their overtime win against 3-seed Texas Tech. Hive wrapped up the tournament with analysis on the Final Four and NCAA Championship game. Here’s how March Madness played out for brands this year.
Hive
At a Glance:
Hive analyzed the NCAA Championship game to assess logo distribution by Brand Prominence*, earned media exposure, and viewership trends across games.
AT&T’s sponsored logos on digital overlays and during the halftime shows won the most screen time across all of March Madness.
Apparel sponsors placed their own March Madness bets by choosing which of the teams to sponsor as gear providers. Of the 64 teams, Nike backed 59% of them, followed by Under Armour with 25%, and Adidas with 16%. Under Armour’s sponsorship bets paid off by the championship as Texas Tech went head to head with Nike-backed Virginia.
The NCAA Championship game viewership fell slightly from last year but still reached nearly twice as many households as the Duke vs. Virginia Tech game (the highest viewed non-finals game in the tournament).
Another March Madness, another CBS ‘One Shining Moment’ montage. Texas Tech and Virginia beat out Michigan State and Auburn in the Final Four to face each other in the championship game. Both schools had never made it this far before in NCAA history and their game was the first time two first-time participants went head to head in 40 years. Hive used its best-in-class computer vision models in conjunction with viewership data (powered by 605, an independent TV data and analytics company) from more than 10M households to analyze brand logo distribution and viewership levels.
Brands
Hive recapped all the rounds and mapped out logo placements and earned media for all sponsors. AT&T remained consistent throughout the entire course of the tournament and scored 50% more airtime than Nike, who had the second highest amount of screen time.
One brand’s March Madness bets paid off big time this year. The tournament started with Nike sponsoring 40 teams followed by Under Armour with 17 and Adidas with 11. Adidas got the boot in the earlier rounds but Under Armour edged its way into the NCAA Championship game backing the Texas Tech Raiders as they went head to head with the Nike-backed Virginia Cavaliers. Under Armour went from sponsoring 25% of teams in the First Four to nearly 50% by the finals, earning them the same amount of screen time as competitor Nike. Figure 2 shows their fight from the beginning of the tournament to the very last game. These sponsors gave brackets a whole new meaning.
Games
Two defensive-minded teams faced each other in the championship this year. Texas Tech was unranked when their season started and soon found themselves in their first ever National Championship game. After being the first 1-seed to lose to a 16-seed in NCAA history last year, Virginia proved everyone wrong this year and also made their National Championship debut. The game itself got off to a slow start before we saw Virginia take a 10-point lead, fall to a 3-point deficit, then tie the game 68-68 to force overtime. Texas Tech fought hard, but at the end of the day, Virginia had the last say.
As the biggest night of the year for college basketball, the NCAA Championship game reached 12% of American households with a peak of 15% – almost double the amount of viewers than the Duke vs. Virginia Tech game, the highest performing non-finals game of the tournament.
Conclusion
March Madness is a huge opportunity for brands. We’ve learned which brands performed the best, what elements drove viewership, what aspects retained viewership. We also learned that you don’t need a Zion to go to the Final Four but it helps to have a star player to hike up viewership levels. Hive is the premier solution for brands looking to improve their real-time reach and ROI measurement for commercials, earned media, and sponsorships during events like March Madness.
Kevin Guo is Co-founder and CEO of Hive, a full-stack deep learning company based in San Francisco building an AI-powered dashboard for media analytics. For inquiries, please contact him at kevin.guo@thehive.ai.
Viewership data was powered by 605, an independent TV data and analytics company.
*A Brand Prominence Score is defined as the combination of a logo’s clarity, size, and location on-screen, in addition to the presence of other brands or objects on screen.
March Madness lived up to its name last weekend in this year’s Sweet 16 and Elite Eight. The road to the Final Four has been exhilarating for some and heartbreaking for others. Only four teams remain in the NCAA tournament, and Hive followed the journeys of the teams, viewers, and advertisers. Here’s how everyone’s stories unfolded in the next two chapters of March Madness.
Hive
At a Glance:
Hive analyzed the Sweet 16 and Elite Eight to assess logo distribution by brand prominence, earned media exposure, and viewership trends across games.
Buffalo Wild Wings capitalized on its overtime commercial spots, with the highest average household reach (5.7% of households) on its placements.
AT&T’s logo placements showed consistency, maintaining its spot for the most screen time with a majority of logos scoring above average on Brand Prominence.*
The highest average household viewership occurred during the Sweet 16 where Duke vs. Virginia Tech had a 7.6% average household reach and a peak of 9.2%. Second place went to Duke vs. Michigan State in the Elite Eight with a 6.8% average household reach and a peak of 10.5% – the highest of any game.
Hive assessed how the point gap in the last minutes of the games drove increased viewership and found a strong correlation, with the closest games seeing up to a 200% bump in viewership in the last minutes.
Texas Tech, Virginia, Auburn, and Michigan State fought tough battles and earned themselves spots in the Final Four. Over the course of four days, six teams upset their opponents, three games went into overtime, and two teams found out that they would make their Final Four debuts. Hive used its best-in-class computer vision models in conjunction with viewership data (powered by 605, an independent TV data and analytics company) from more than 10M households to analyze brand logo distribution and sources of viewership fluctuation.
Brand winners
Official NCAA Corporate Partner Buffalo Wild Wings snagged the most effective commercial spots in the Sweet 16 and Elite Eight, earning the top household reach per commercial airing. Their overtime-specific commercial was created and set to air only during overtime games, which paid off big time in these two rounds. With Purdue vs. Tennessee in the Sweet 16 and Purdue vs. Virginia and Auburn vs. Kentucky in the Elite Eight all going into overtime, the brand’s ad slots earned them a number of extra opportunities to get in front of fans with relevant content. Google reached the second highest percentage of household reach per commercial airing followed by Apple.
The Hive Logo Model also scanned every second of the 12 games this week for logo placements and earned media. AT&T’s digital overlays and halftime sponsorship earned the most airtime again this week. Their logos were not only frequently on screen, but they were also quite prominent with a majority of logos scoring more than 20 on Brand Prominence.* Apparel and gear sponsors Nike, Under Armour, and Spalding all received lots of screen time but their logos were low prominence, usually appearing in the action on jerseys, shoes, or hoops. Courtside sponsors Lowes, Buick, Infiniti, and Coca Cola were all consistently mid-scoring with a few very strong placements when the camera caught the logo in the background of a close-up.
Top games
The Sweet 16 and Elite Eight once again proved that the Zion effect is real. The top two games with the highest average viewership over the course of the two rounds went to both Duke games. In the Sweet 16, fans held their breath as the Blue Devils they narrowly escaped Virginia Tech 75-73. The game itself raked in the largest audience size in the tournament yet. However, the Zion show came to an end after Michigan State shocked Duke in the Elite Eight. The final score read 68-67, a bracket-busting win for Michigan State.
No. 3 seed Purdue put up a fight in this year’s tournament with two overtime games. Figure 4 shows a graph of their battle against No. 2 seed Tennessee in the Sweet 16 overlaid with the Florida State vs. Gonzaga game that started a few minutes before. The CBS game retained steady viewership as it approached halftime while the TBS game started just in time for the other game’s viewers to switch over. They flipped back to CBS during Purdue vs. Tennessee’s halftime show, but they did not return when it ended. This may be attributed to the fact that barely five minutes into the second half, Purdue took an 18-point lead over Tennessee. However, Tennessee began to make a comeback and viewership spiked to 7% as they forced OT. Purdue prevailed, securing their spot in the Elite Eight for the first time since 2000.
Interestingly, viewers in this round overwhelmingly followed the action on both channels. The loss in viewership during halftime on the CBS show was almost perfectly mirrored with a bump in viewership on the TBS game. When the CBS game returned, most switched back until Tennessee started to come back from their double-digit deficit, stealing a majority of the viewership as the CBS game tailed off in the last few minutes.
Purdue’s Elite Eight performance drew an even bigger crowd than the last round. An average of 6% of American households watched them play 1-seed Virginia, arguably one of the most exciting games in the entire tournament. Within the last two minutes of regulation, Carson Edwards gave Purdue the lead, impressing America with his tenth three-pointer of the game. With only six seconds remaining, all of the stars were aligned as UVA’s Ty Jerome perfectly missed his second free throw, commencing the play that allowed Mamadi Diakite to tie up the game and force OT. Ultimately, Virginia edged out Purdue 80-75 preventing what could have been their first ever Final Four appearance.
Two teams, however, anticipated their Final Four debuts. After defeating Kansas and North Carolina, 5-seed Auburn beat 2-seed Kentucky in the Elite Eight proving that the Tigers can hang with the blue bloods.Texas Tech will also be showing up to the Final Four for the first time in program history after upsetting No. 1 Gonzaga. This game had the highest average household viewership on TBS during these two rounds.
Given the last minute shifts in viewership during the Purdue vs. Tennessee nail-biter, Hive decided to analyze how point gap in the last 10 minutes of the game drives viewership. As would be expected, increases in game viewership in the last ten minutes was strongly driven by how close the scores were. As the average point differential near the end of the game decreased, the viewership grew substantially with the closest games seeing up to a 250% bump. Auburn vs. North Carolina was an exception to this, seeing viewership rise 100% during the last ten minutes despite a double-digit point gap. This was likely due to its interest relative to the competing game, LSU vs. Michigan, which had a similarly wide point gap but in favor of the higher seeded team. Auburn’s upset coupled with Chuma Okeke’s unfortunate injury increased attention to the game despite Auburn’s substantial lead.
Conclusion
Heading into the Final Four, all but one 1-seed team have packed their bags and gone home. If your bracket wasn’t busted before, it most likely is now. We’ve almost reached the end of the road, but there is still more madness to come. Next week, we’ll find out who will cut the nets in Minneapolis and which team and brand will be crowned NCAA Champions.
Kevin Guo is Co-founder and CEO of Hive, a full-stack deep learning company based in San Francisco building an AI-powered dashboard for media analytics. For inquiries, please contact him at kevin.guo@thehive.ai.
Viewership data was powered by 605, an independent TV data and analytics company.
*A Brand Prominence Score is defined as the combination of a logo’s clarity, size, and location on-screen, in addition to the presence of other brands or objects on screen.
March Madness is one of the most popular sports showcases in the nation and Hive is following all month long. Here’s how the First and Second Round advertisers did and which games elicited the most commotion.
Hive
At a Glance:
March Madness is the most anticipated tournament of the year for college basketball fans, bracket-holders, and advertisers alike.
Hive analyzed the First and Second Rounds to assess viewership trends across games, earned media exposure, and sponsorship winners.
The highest average viewership occurred during the Round of 32 where Auburn vs. Kansas had a 0.8% average household reach. However, First Round Cinderella story UC Irvine’s Second Round game against Oregon achieved the highest peak viewership, reaching 1.1% of households.
Although UC Irvine’s fairy tale ending wasn’t meant to be, the potential of another upset in the Second Round bumped their viewership 108% from the last round. The victor of that game, Oregon, was the only double-digit seed to survive through to the Sweet 16.
AT&T optimized their sponsorship spot and earned the most screen time while maintaining a high Brand Prominence Score.
Progressive had the most effective airings with over 2% average reach for their spots in total but GMC and AT&T generated the most viewership with over 100 airings and an average household reach of just under 2%.
March Madness is a live TV experience like no other. Millions of brackets are filled out every year, and unlike other one-time sporting events such as Super Bowl Sunday, March Madness is an extended series of games with an all-day TV schedule. This results in more data points and in turn, more opportunities to assess patterns and trends. The elongated showcase gives marketers some madness of their own – TV advertisers and sponsors receive a unique chance to hold their audience’s attention and craft a story.
In the First and Second Rounds, Hive used its best-in-class computer vision models in conjunction with viewership data (powered by 605, an independent TV data and analytics company) from more than 10M households to analyze which brands made an appearance, which games viewers were watching, and which advertisers optimized their NCAA sponsorship real-estate.
Brands that stole the show
Because of the diversity in viewership of the tournament, brands have the opportunity to market content to both fans and non-fans. Acquiring March Madness real-estate means unlocking millions of impressions from one of the largest audiences in the nation. Hive is able to measure earned media and sponsorship exposure by using computer vision AI to identify brand logos in content during regular programming, creating a holistic “media fingerprint.” This visual contextual meta-data is overlaid with the most robust viewership dataset available to enable brands with an unparalleled level of data on their earned media and sponsorship. Hive Media helps brands understand how a dollar of advertising spend may translate to real-life consumer purchases.
VIDEO
Hive’s AI models capture every second that a brand’s logo is on screen and assign that logo a Brand Prominence Score.*
AT&T won big in the first two rounds of the tournament. As an Official NCAA Corporate Champion and halftime sponsor, it had prominent digital overlays, one of the highest brand prominences, and the most seconds on screen with almost six hours. Its earned media was equivalent to 260 30-second spots. In second place Capital One, another Corporate Champion, with 3 hours and 22 minutes followed by Nike with 3 hours and 20 minutes of screen time.
Apparel and gear sponsors such as Nike, Under Armour, and Spalding earned a significant amount of minutes on screen because they appeared on locations such as jerseys and backboards. However, as a result of their locations, the logos appeared with much less prominence.
In addition to on-site logos, Hive also tracked the top brands by commercial airings and average household reach. Progressive, Coca-Cola, State Farm and Apple all earned higher than 2% average household reach with a selective placement strategy, however GMC and AT&T were the big winners in terms of volume as each earned almost 2% average reach with significantly more airings.
Top games
Here are the most viewed games on broadcast networks.
And here are the top five cable games from each round.
In the First Round, the Florida vs. Nevada game on TNT had the highest average viewership levels on cable TV. Figure 5 shows the household reach of Florida vs. Nevada on TNT alongside that of St. Mary’s vs. Villanova on TBS, games that broadcasted at the same time. Household tune-in remained steady in the first half with dips in viewership during commercial breaks.
By halftime, Florida had secured a healthy lead and viewership dropped as viewers switched over to the TBS game. However, viewership recovered with strength during the second half as Florida began to squander a double-digit lead just as the TBS game went to halftime. Viewers were retained even as the TBS game returned, with viewership continuing to rise until Florida narrowly edged out a win over Nevada. The head to head comparison illustrates powerful correlations between viewership and excitement during live games.
March Madness always has the entire nation buzzing about Duke, a national brand with millions of people following the blue blood powerhouse. This year, fan loyalty, coupled with the intrigue of Zion Williamson, unsurprisingly earned the team one of the largest overall audiences of the first two rounds. To top it off, the Blue Devils narrowly escaped what would have been the biggest upset of the tournament with a one-point victory.
Despite CBS’s broadcast games driving the most viewership, many households switched to cable to follow the Round of 64’s most exciting underdog story, UC Irvine.
After the First Round, it seemed as if Cinderella had moved to sunny California as UC Irvine ran away with its first tournament victory in NCAA history. Although Irvine’s average viewership level did not make the top 5 in the First Round, audience tune-in continued to soar throughout the game. After this exciting upset, UCI reached a peak of 1.1% of U.S. households in the Second Round to make it in the top five most-viewed games in the Second Round as they went head to head with Oregon. The dream of a fairy tale ending came to an end as Oregon defeated Irvine to become the only double-digit seed to secure a spot in the Sweet 16.
Conclusion
The first two rounds may be over, but we’ve only just begun the games. With no true Cinderella run this year, March Madness continues on with only the top programs. Keep an eye out to learn how this week’s bracket busters will affect audience retention in the Sweet 16 and Elite Eight as Hive continues to track viewership and advertising trends.
Kevin Guo is Co-founder and CEO of Hive, a full-stack deep learning company based in San Francisco building an AI-powered dashboard for media analytics. For inquiries, please contact him at kevin.guo@thehive.ai.
Viewership data was powered by 605, an independent TV data and analytics company.
*A Brand Prominence Score is defined as the combination of a logo’s clarity, size, and location on-screen, in addition to the presence of other brands or objects on screen.