BACK TO ALL BLOGS One Shining Moment for Brands We’ve reached the end of the road. 1-seed Virginia were crowned NCAA Champions with their overtime win against 3-seed Texas Tech. Hive wrapped up the tournament with analysis on the Final Four and NCAA Championship game. Here’s how March Madness played out for brands this year. HiveApril 17, 2019July 4, 2024 At a Glance: Hive analyzed the NCAA Championship game to assess logo distribution by Brand Prominence*, earned media exposure, and viewership trends across games.AT&T’s sponsored logos on digital overlays and during the halftime shows won the most screen time across all of March Madness.Apparel sponsors placed their own March Madness bets by choosing which of the teams to sponsor as gear providers. Of the 64 teams, Nike backed 59% of them, followed by Under Armour with 25%, and Adidas with 16%. Under Armour’s sponsorship bets paid off by the championship as Texas Tech went head to head with Nike-backed Virginia.The NCAA Championship game viewership fell slightly from last year but still reached nearly twice as many households as the Duke vs. Virginia Tech game (the highest viewed non-finals game in the tournament). Another March Madness, another CBS ‘One Shining Moment’ montage. Texas Tech and Virginia beat out Michigan State and Auburn in the Final Four to face each other in the championship game. Both schools had never made it this far before in NCAA history and their game was the first time two first-time participants went head to head in 40 years. Hive used its best-in-class computer vision models in conjunction with viewership data (powered by 605, an independent TV data and analytics company) from more than 10M households to analyze brand logo distribution and viewership levels. Brands Figure 1. Results from the Hive Logo Model Hive recapped all the rounds and mapped out logo placements and earned media for all sponsors. AT&T remained consistent throughout the entire course of the tournament and scored 50% more airtime than Nike, who had the second highest amount of screen time. Figure 2. Results from the Hive Logo Model One brand’s March Madness bets paid off big time this year. The tournament started with Nike sponsoring 40 teams followed by Under Armour with 17 and Adidas with 11. Adidas got the boot in the earlier rounds but Under Armour edged its way into the NCAA Championship game backing the Texas Tech Raiders as they went head to head with the Nike-backed Virginia Cavaliers. Under Armour went from sponsoring 25% of teams in the First Four to nearly 50% by the finals, earning them the same amount of screen time as competitor Nike. Figure 2 shows their fight from the beginning of the tournament to the very last game. These sponsors gave brackets a whole new meaning. Games Two defensive-minded teams faced each other in the championship this year. Texas Tech was unranked when their season started and soon found themselves in their first ever National Championship game. After being the first 1-seed to lose to a 16-seed in NCAA history last year, Virginia proved everyone wrong this year and also made their National Championship debut. The game itself got off to a slow start before we saw Virginia take a 10-point lead, fall to a 3-point deficit, then tie the game 68-68 to force overtime. Texas Tech fought hard, but at the end of the day, Virginia had the last say. Figure 3. Viewership data powered by 605 As the biggest night of the year for college basketball, the NCAA Championship game reached 12% of American households with a peak of 15% – almost double the amount of viewers than the Duke vs. Virginia Tech game, the highest performing non-finals game of the tournament. Conclusion March Madness is a huge opportunity for brands. We’ve learned which brands performed the best, what elements drove viewership, what aspects retained viewership. We also learned that you don’t need a Zion to go to the Final Four but it helps to have a star player to hike up viewership levels. Hive is the premier solution for brands looking to improve their real-time reach and ROI measurement for commercials, earned media, and sponsorships during events like March Madness. Kevin Guo is Co-founder and CEO of Hive, a full-stack deep learning company based in San Francisco building an AI-powered dashboard for media analytics. For inquiries, please contact him at kevin.guo@thehive.ai. Viewership data was powered by 605, an independent TV data and analytics company. *A Brand Prominence Score is defined as the combination of a logo’s clarity, size, and location on-screen, in addition to the presence of other brands or objects on screen.
BACK TO ALL BLOGS Survive and Advance: Winners of the Sweet 16 and Elite Eight March Madness lived up to its name last weekend in this year’s Sweet 16 and Elite Eight. The road to the Final Four has been exhilarating for some and heartbreaking for others. Only four teams remain in the NCAA tournament, and Hive followed the journeys of the teams, viewers, and advertisers. Here’s how everyone’s stories unfolded in the next two chapters of March Madness. HiveApril 8, 2019July 4, 2024 At a Glance: Hive analyzed the Sweet 16 and Elite Eight to assess logo distribution by brand prominence, earned media exposure, and viewership trends across games.Buffalo Wild Wings capitalized on its overtime commercial spots, with the highest average household reach (5.7% of households) on its placements.AT&T’s logo placements showed consistency, maintaining its spot for the most screen time with a majority of logos scoring above average on Brand Prominence.*The highest average household viewership occurred during the Sweet 16 where Duke vs. Virginia Tech had a 7.6% average household reach and a peak of 9.2%. Second place went to Duke vs. Michigan State in the Elite Eight with a 6.8% average household reach and a peak of 10.5% – the highest of any game.Hive assessed how the point gap in the last minutes of the games drove increased viewership and found a strong correlation, with the closest games seeing up to a 200% bump in viewership in the last minutes. Texas Tech, Virginia, Auburn, and Michigan State fought tough battles and earned themselves spots in the Final Four. Over the course of four days, six teams upset their opponents, three games went into overtime, and two teams found out that they would make their Final Four debuts. Hive used its best-in-class computer vision models in conjunction with viewership data (powered by 605, an independent TV data and analytics company) from more than 10M households to analyze brand logo distribution and sources of viewership fluctuation. Brand winners Figure 1. Viewership data powered by 605 Official NCAA Corporate Partner Buffalo Wild Wings snagged the most effective commercial spots in the Sweet 16 and Elite Eight, earning the top household reach per commercial airing. Their overtime-specific commercial was created and set to air only during overtime games, which paid off big time in these two rounds. With Purdue vs. Tennessee in the Sweet 16 and Purdue vs. Virginia and Auburn vs. Kentucky in the Elite Eight all going into overtime, the brand’s ad slots earned them a number of extra opportunities to get in front of fans with relevant content. Google reached the second highest percentage of household reach per commercial airing followed by Apple. Figure 2. Results from the Hive Logo Model The Hive Logo Model also scanned every second of the 12 games this week for logo placements and earned media. AT&T’s digital overlays and halftime sponsorship earned the most airtime again this week. Their logos were not only frequently on screen, but they were also quite prominent with a majority of logos scoring more than 20 on Brand Prominence.* Apparel and gear sponsors Nike, Under Armour, and Spalding all received lots of screen time but their logos were low prominence, usually appearing in the action on jerseys, shoes, or hoops. Courtside sponsors Lowes, Buick, Infiniti, and Coca Cola were all consistently mid-scoring with a few very strong placements when the camera caught the logo in the background of a close-up. Top games Figure 3. Viewership data powered by 605 The Sweet 16 and Elite Eight once again proved that the Zion effect is real. The top two games with the highest average viewership over the course of the two rounds went to both Duke games. In the Sweet 16, fans held their breath as the Blue Devils they narrowly escaped Virginia Tech 75-73. The game itself raked in the largest audience size in the tournament yet. However, the Zion show came to an end after Michigan State shocked Duke in the Elite Eight. The final score read 68-67, a bracket-busting win for Michigan State. Figure 4. Viewership data powered by 605 No. 3 seed Purdue put up a fight in this year’s tournament with two overtime games. Figure 4 shows a graph of their battle against No. 2 seed Tennessee in the Sweet 16 overlaid with the Florida State vs. Gonzaga game that started a few minutes before. The CBS game retained steady viewership as it approached halftime while the TBS game started just in time for the other game’s viewers to switch over. They flipped back to CBS during Purdue vs. Tennessee’s halftime show, but they did not return when it ended. This may be attributed to the fact that barely five minutes into the second half, Purdue took an 18-point lead over Tennessee. However, Tennessee began to make a comeback and viewership spiked to 7% as they forced OT. Purdue prevailed, securing their spot in the Elite Eight for the first time since 2000. Interestingly, viewers in this round overwhelmingly followed the action on both channels. The loss in viewership during halftime on the CBS show was almost perfectly mirrored with a bump in viewership on the TBS game. When the CBS game returned, most switched back until Tennessee started to come back from their double-digit deficit, stealing a majority of the viewership as the CBS game tailed off in the last few minutes. Purdue’s Elite Eight performance drew an even bigger crowd than the last round. An average of 6% of American households watched them play 1-seed Virginia, arguably one of the most exciting games in the entire tournament. Within the last two minutes of regulation, Carson Edwards gave Purdue the lead, impressing America with his tenth three-pointer of the game. With only six seconds remaining, all of the stars were aligned as UVA’s Ty Jerome perfectly missed his second free throw, commencing the play that allowed Mamadi Diakite to tie up the game and force OT. Ultimately, Virginia edged out Purdue 80-75 preventing what could have been their first ever Final Four appearance. Two teams, however, anticipated their Final Four debuts. After defeating Kansas and North Carolina, 5-seed Auburn beat 2-seed Kentucky in the Elite Eight proving that the Tigers can hang with the blue bloods.Texas Tech will also be showing up to the Final Four for the first time in program history after upsetting No. 1 Gonzaga. This game had the highest average household viewership on TBS during these two rounds. Figure 5. Viewership data powered by 605 Given the last minute shifts in viewership during the Purdue vs. Tennessee nail-biter, Hive decided to analyze how point gap in the last 10 minutes of the game drives viewership. As would be expected, increases in game viewership in the last ten minutes was strongly driven by how close the scores were. As the average point differential near the end of the game decreased, the viewership grew substantially with the closest games seeing up to a 250% bump. Auburn vs. North Carolina was an exception to this, seeing viewership rise 100% during the last ten minutes despite a double-digit point gap. This was likely due to its interest relative to the competing game, LSU vs. Michigan, which had a similarly wide point gap but in favor of the higher seeded team. Auburn’s upset coupled with Chuma Okeke’s unfortunate injury increased attention to the game despite Auburn’s substantial lead. Conclusion Heading into the Final Four, all but one 1-seed team have packed their bags and gone home. If your bracket wasn’t busted before, it most likely is now. We’ve almost reached the end of the road, but there is still more madness to come. Next week, we’ll find out who will cut the nets in Minneapolis and which team and brand will be crowned NCAA Champions. Kevin Guo is Co-founder and CEO of Hive, a full-stack deep learning company based in San Francisco building an AI-powered dashboard for media analytics. For inquiries, please contact him at kevin.guo@thehive.ai. Viewership data was powered by 605, an independent TV data and analytics company. *A Brand Prominence Score is defined as the combination of a logo’s clarity, size, and location on-screen, in addition to the presence of other brands or objects on screen.
BACK TO ALL BLOGS Brand Madness: The Road to the Final Four March Madness is one of the most popular sports showcases in the nation and Hive is following all month long. Here’s how the First and Second Round advertisers did and which games elicited the most commotion. HiveMarch 28, 2019July 5, 2024 At a Glance: March Madness is the most anticipated tournament of the year for college basketball fans, bracket-holders, and advertisers alike.Hive analyzed the First and Second Rounds to assess viewership trends across games, earned media exposure, and sponsorship winners.The highest average viewership occurred during the Round of 32 where Auburn vs. Kansas had a 0.8% average household reach. However, First Round Cinderella story UC Irvine’s Second Round game against Oregon achieved the highest peak viewership, reaching 1.1% of households.Although UC Irvine’s fairy tale ending wasn’t meant to be, the potential of another upset in the Second Round bumped their viewership 108% from the last round. The victor of that game, Oregon, was the only double-digit seed to survive through to the Sweet 16.AT&T optimized their sponsorship spot and earned the most screen time while maintaining a high Brand Prominence Score.Progressive had the most effective airings with over 2% average reach for their spots in total but GMC and AT&T generated the most viewership with over 100 airings and an average household reach of just under 2%. March Madness is a live TV experience like no other. Millions of brackets are filled out every year, and unlike other one-time sporting events such as Super Bowl Sunday, March Madness is an extended series of games with an all-day TV schedule. This results in more data points and in turn, more opportunities to assess patterns and trends. The elongated showcase gives marketers some madness of their own – TV advertisers and sponsors receive a unique chance to hold their audience’s attention and craft a story. In the First and Second Rounds, Hive used its best-in-class computer vision models in conjunction with viewership data (powered by 605, an independent TV data and analytics company) from more than 10M households to analyze which brands made an appearance, which games viewers were watching, and which advertisers optimized their NCAA sponsorship real-estate. Brands that stole the show Because of the diversity in viewership of the tournament, brands have the opportunity to market content to both fans and non-fans. Acquiring March Madness real-estate means unlocking millions of impressions from one of the largest audiences in the nation. Hive is able to measure earned media and sponsorship exposure by using computer vision AI to identify brand logos in content during regular programming, creating a holistic “media fingerprint.” This visual contextual meta-data is overlaid with the most robust viewership dataset available to enable brands with an unparalleled level of data on their earned media and sponsorship. Hive Media helps brands understand how a dollar of advertising spend may translate to real-life consumer purchases. VIDEO Hive’s AI models capture every second that a brand’s logo is on screen and assign that logo a Brand Prominence Score.* AT&T won big in the first two rounds of the tournament. As an Official NCAA Corporate Champion and halftime sponsor, it had prominent digital overlays, one of the highest brand prominences, and the most seconds on screen with almost six hours. Its earned media was equivalent to 260 30-second spots. In second place Capital One, another Corporate Champion, with 3 hours and 22 minutes followed by Nike with 3 hours and 20 minutes of screen time. Figure 1. Results from Hive Logo Model Apparel and gear sponsors such as Nike, Under Armour, and Spalding earned a significant amount of minutes on screen because they appeared on locations such as jerseys and backboards. However, as a result of their locations, the logos appeared with much less prominence. In addition to on-site logos, Hive also tracked the top brands by commercial airings and average household reach. Progressive, Coca-Cola, State Farm and Apple all earned higher than 2% average household reach with a selective placement strategy, however GMC and AT&T were the big winners in terms of volume as each earned almost 2% average reach with significantly more airings. Figure 2. Results from Hive Commercial AI Model, viewership powered by 605 Top games Here are the most viewed games on broadcast networks. Figure 3. Viewership data powered by 605 And here are the top five cable games from each round. Figure 5. Viewership data powered by 605 Figure 6. Viewership data powered by 605 In the First Round, the Florida vs. Nevada game on TNT had the highest average viewership levels on cable TV. Figure 5 shows the household reach of Florida vs. Nevada on TNT alongside that of St. Mary’s vs. Villanova on TBS, games that broadcasted at the same time. Household tune-in remained steady in the first half with dips in viewership during commercial breaks. By halftime, Florida had secured a healthy lead and viewership dropped as viewers switched over to the TBS game. However, viewership recovered with strength during the second half as Florida began to squander a double-digit lead just as the TBS game went to halftime. Viewers were retained even as the TBS game returned, with viewership continuing to rise until Florida narrowly edged out a win over Nevada. The head to head comparison illustrates powerful correlations between viewership and excitement during live games. Figure 7. Viewership data powered by 605 March Madness always has the entire nation buzzing about Duke, a national brand with millions of people following the blue blood powerhouse. This year, fan loyalty, coupled with the intrigue of Zion Williamson, unsurprisingly earned the team one of the largest overall audiences of the first two rounds. To top it off, the Blue Devils narrowly escaped what would have been the biggest upset of the tournament with a one-point victory. Despite CBS’s broadcast games driving the most viewership, many households switched to cable to follow the Round of 64’s most exciting underdog story, UC Irvine. After the First Round, it seemed as if Cinderella had moved to sunny California as UC Irvine ran away with its first tournament victory in NCAA history. Although Irvine’s average viewership level did not make the top 5 in the First Round, audience tune-in continued to soar throughout the game. After this exciting upset, UCI reached a peak of 1.1% of U.S. households in the Second Round to make it in the top five most-viewed games in the Second Round as they went head to head with Oregon. The dream of a fairy tale ending came to an end as Oregon defeated Irvine to become the only double-digit seed to secure a spot in the Sweet 16. Figure 8. Viewership data powered by 605 Conclusion The first two rounds may be over, but we’ve only just begun the games. With no true Cinderella run this year, March Madness continues on with only the top programs. Keep an eye out to learn how this week’s bracket busters will affect audience retention in the Sweet 16 and Elite Eight as Hive continues to track viewership and advertising trends. Kevin Guo is Co-founder and CEO of Hive, a full-stack deep learning company based in San Francisco building an AI-powered dashboard for media analytics. For inquiries, please contact him at kevin.guo@thehive.ai. Viewership data was powered by 605, an independent TV data and analytics company. *A Brand Prominence Score is defined as the combination of a logo’s clarity, size, and location on-screen, in addition to the presence of other brands or objects on screen.
BACK TO ALL BLOGS Spark on Mesos Part 2: The Great Disk Leak HiveMarch 8, 2019July 5, 2024 After ramping up our usage of Spark, we found that our Mesos agents were running out of disk space. It was happening rapidly on some of our agents with small disks: The issue turned out to be that Spark was leaving behind binaries and jars in both driver and executor directories: Each uncompressed Spark binary directory folder contains 248MB, so to sum this up: For a small pipeline with one driver and one executor, this adds up to 957MB. At our level of usage, this was 100GB of dead weight added every day. I looked into ways to at least avoid storing the compressed Spark binaries, since Spark only really needs the uncompressed version. It turns out that Spark uses the Mesos fetcher to copy and extract files. By enabling caching on the Mesos fetcher, Mesos will store only one cached copy of the compressed Spark binaries, then extract it directly into each sandbox directory. In the spark documentation, it looks like this should be solved by setting the spark.mesos.fetcherCache.enable option to true; If set to true, all URIs (example: spark.executor.uri, spark.mesos.uris) will be cached by the Mesos Fetcher Cache.” Adding this to our Spark application confs, we found that the cache option was turned for the executor, but not driver: This brought our disk leak down to 740MB per Spark application. Reading through the Spark code, I found that the driver’s fetch configuration is defined by the MesosClusterScheduler, whereas the executor’s are defined by the MesosCourseGrainedSchedulerBackend. There were two oddities about the MesosClusterScheduler: It reads options from the dispatcher’s configuration instead of the submitted application’s configurationIt uses the spark.mesos.fetchCache.enable option instead of spark.mesos.fetcherCache.enable So bizarre! Finding no documentation for either of these issues online, I filed two bugs. By now, my PRs to fix them have been merged in, and should show up in upcoming releases. In the meantime, I implemented a workaround by adding the spark.mesos.fetchCache.enable=true option to the dispatcher. Now the Driver also used caching, reducing the disk leak to 523MB per Spark application: Finally, I took advantage of Spark’s shutdown hook functionality to manually clean up the driver’s uberjar and uncompressed spark binaries: //shutdown hook to clean driver spark binaries after application finishes sys.env.get("MESOS_SANDBOX").foreach((sandboxDirectory) => { sparkSession.sparkContext.addSparkListener(new SparkListener { override def onApplicationEnd(sparkListenerApplicationEnd: SparkListenerApplicationEnd): Unit = { val sandboxItems = new File(sandboxDirectory).listFiles() val regexes = Array( "^spark-\d+.\d+.\d+-bin".r, "^hive-spark_.*\.jar".r ) sandboxItems .filter((item) => regexes.exists((regex) => regex.findFirstIn(item.getName).isDefined)) .foreach((item) => { FileUtils.forceDelete(item) }) } }) }) This reduced the disk leak to just 248MB per application: This still isn’t perfect, but I don’t think there will be a way to delete the uncompressed spark binaries from your Mesos executor sandbox directories until Spark adds more complete Mesos functionality. For now, it’s a 74% reduction in the disk leak. Last, and perhaps most importantly, we reduced the time to live for our completed Mesos frameworks and sandboxes from one month to one day. This effectively cut our equilibrium disk usage by 97%. Our Mesos agents’ disk usage now stays at a healthy level.
BACK TO ALL BLOGS Spark on Mesos Part 1: Setting Up HiveFebruary 12, 2019July 5, 2024 At Hive, we’ve created a data platform including Apache Spark applications that use Mesos as a resource manager. While both Spark and Mesos are popular frameworks used by many top tech companies, using the two together is a relatively new idea with incomplete documentation. Why choose Mesos as Spark’s resource manager? Spark needs a resource manager to tell it what machines are available and how much CPU and memory each one has. It uses this information and then requests that the resource manager add tasks for the executors it needs. There are currently four resource manager options: standalone, YARN, Kubernetes, and Mesos. The next table should make clear why we chose to use Mesos. We wanted our Spark applications to use the same in-house pool of resources that other, non-Hadoop workloads do, so only Kubernetes and Mesos were options to us. There are great posts out there contrasting the two of these, but for us the deciding factor was that we already use Mesos. Spark applications can share resources with your other frameworks in Mesos. Learnings from Spark on Mesos Spark’s guide on running Spark on Mesos is the best place to start setting this up. However, we ran into a few notable quirks it does not mention. A word on Spark versions While Spark has technically provided support for Mesos since version 1.0, it wasn’t very functional until recently. We strongly recommend using Spark 2.4.0 or later. Even in Spark 2.3.2, there were some pretty major bugs: The obsolete MESOS_DIRECTORY environment variable was used instead of MESOS_SANDBOX, which caused an error during sparkSession.stop in certain applications.Spark-submit to mesos would not properly escape your application’s configurations. To run the equivalent of spark-submit –master local[4] –conf1 “a b c” –class package.Main my.jar on mesos, you would need to run spark-submit –master –conf1 “a\ b\ c” mesos://url –deploy-mode cluster –class package.Main my.jar. Spark 2.4.0 still has this issue for arguments.Basically everything under the version 2.4.0 Jira page with Mesos in the name. Still, Spark’s claim that it “does not require any special patches of Mesos” is usually wrong on one count: accessing jars and spark binaries in S3. To access these, your mesos agents will need hadoop libraries. Otherwise, you will only be able to access files stored locally on the agents or accessible by http. In order to use S3 links or HDFS links, one must configure every Mesos agent with a local path to the Hadoop client. This allows the Mesos Fetcher to successfully grab the Spark bin and begin executing the job. Spark Mesos dispatcher The Spark dispatcher is a very simple Mesos Framework for running Spark jobs inside a Mesos cluster. The dispatcher actually does not manage the resource allocation nor the application lifecycle of the jobs. Instead, for each new job it receives, it launches a Spark Driver within the cluster. The Driver itself is also a Mesos Framework with its own UI and is given the responsibility of provisioning resources and executing its specific job. The dispatcher is solely responsible for launching and keeping track of Spark Drivers. How a Spark Driver runs jobs in any clustered configuration. While setup for the dispatcher is as simple as running the provided startup script, one operational challenge to consider is the placement of your dispatcher. The two pragmatic locations for us were running the dispatcher on a separate instance outside the cluster, or as an application inside the Marathon Mesos framework. Both had their trade offs, but we decided to run the dispatcher on a small dedicated instance as it was an easy way to have a persistent endpoint for the service. One small concern worth mentioning is the lack of HA for the dispatcher. While Spark Drivers continue to run when the dispatcher is down and state recovery is available with Apache Zookeeper, multiple dispatchers cannot be coordinated together. If HA is an important feature, it may be worthwhile to run the service on Marathon and setting up some form of service discovery so you can have a persistent endpoint for the dispatcher. Dependency management There are at least three ways to use manage dependencies for your Spark repo: Copying dependency jars to the Spark driver yourself and specifying spark.driver.extraClassPath and spark.driver.extraClassPath.Specifying spark.jars.packages and optionally spark.jars.repositories.Creating an uberjar that includes both your code and all necessary dependencies’ jars. Option 1 gives you total control over which jars you use and where they come from, in case there are some items in the dependency tree you know you don’t need. This can save some application startup time, but is very tedious. Option 2 streamlines option 1 by listing the required jars only once and pulling from the list of repositories automatically, but loses the very fine control by pulling the full dependency tree of each dependency. Option 3 gives back that very fine control, and is the most simple, but duplicates the dependencies in every uberjar you make. Overall, we found option 3 most appealing. Compared to option 2, it saved 5 seconds of startup time on every Spark application and removed the worry that the maven repository would become unavailable. Better automating option 1 might be the most ideal solution of all, but for now, it isn’t worth our effort. What next? Together with Spark’s guide to running on Mesos, this should address many of hiccups you’ll encounter. But join us next time as we tackle one more: the great disk leak.