BACK TO ALL BLOGS

Hive Media: Revolutionizing the Way We Understand On-Screen Content and Viewership

Hive is a full-stack deep learning platform focused on solving visual intelligence problems. While we are working with companies in sectors ranging from autonomous driving to facial recognition, our flagship enterprise product is Hive Media.

As the name suggests, Hive Media is our complete enterprise solution utilizing deep learning to revolutionize traditional media analytics. However, it is far more than a simple collection of neural net models. What we’ve built with Hive Media is an end-to-end solution, beginning with data ingestion and extending all the way to real-time, device-level viewership metrics.

The Vision

Imagine you could watch 100 different channels at the same time and remember every key element of what was on screen – what brand was shown, what actor was present, what commercial was playing etc. Now, suppose you could remember this forever and could query this information instantly. This would be a massively valuable dataset, because it seems like an impossible feat for a human to achieve. This, however, is precisely what we set Hive Media out to achieve. Essentially, we wanted to build a system that could “watch” all of broadcast television in the same way a human would and then store this information in an easily accessible manner.

Data Ingestion

The first step in our pipeline is accessing TV streams. Today, we are processing 400 channels in the US, with 300 more in Europe to come later this year. See Figure 1 for a graphical display of our present and planned TV coverage.

Geographical distribution of current and planned TV markets Hive is analyzing, colored according to number of channels ingested
Figure 1

We are recording every second of every channel, totaling up to 10,000+ hours of footage per day! We expect this number to be well over 30,000+ hours a day by next year. In addition, all major channels are covered, as well as a wide range of local affiliates on the network side. As you can imagine, this is a lot of data and we are storing all of it in our own datacenter rollouts around the world. Ultimately, we are aiming to build the world’s largest repository of linear broadcast data.

Deep Learning Models

Having this much data is only useful if you can understand it. This is where our deep learning models come into play. Using Hive Data, we’ve built some of the world’s largest celebrity, logo, and brand databases in the world. These models, amongst several others, are applied to every second of our recorded footage and stored in a database in a manner that is optimized for easy retrieval. This means a query such as “How many times did a Nike logo appear on NBC in the month of September?” – previously impossible – can now be answered in a matter of seconds! Unlike some other products on the market, our models don’t rely upon any sort of metadata associated with the programming –- tags are generated purely based on the video content. This is extremely powerful, because it means our system can handle a large variety of content without having to constantly hard code in parameters.

User Viewership

The final piece of the puzzle is understanding how our tags affect viewership – the holy grail of media analytics. Everything we’ve described up until now has been generating what I call “cause” – to measure “effect;” we are currently working with actual device partners who give us real-time data on viewers. Today, we have access to millions of devices that send us real viewership data that we overlap with our tags to understand how on-screen content affects viewership. This means that every query we run not only tells us what aired, but also how it affects the viewership bottom line.

The easiest way to understand this system is to visually see some queries executed. In Figure 2, we show an example query for Chevrolet vs. Toyota commercials on NBC in a week’s period. You can see the tags our system found in the bottom right. The bottom left shows a video player illustrating the content corresponding to the tag, mainly to serve as video evidence that our tag is correct. What’s powerful about Hive Media is the fact that now, we can analyze viewership data at each of these tag occurrences to understand what their effect on viewership is. One important way to understand viewership, as shown in Figure 2, is the notion of tune-out, which is the percentage of users that change the channel in a time interval. This is often the strongest indicator of whether a viewer is enjoying content shown on screen. Interestingly enough, it seems that Chevy commercials generate almost twice as much tune-out as their Toyota counterparts in this case.

Hive Media analytics dashboard showing viewership and tune-out timelines for a particular channel, aligned with play times for both Chevrolet and Toyota commercials
Figure 2

Let’s take another example query that looks for Nike logos, as shown in Figure 3. What we’re demonstrating here in the highlighted tag is a snippet of content that shows a Nike logo prominently placed in the center of the screen; even though this isn’t a true Nike commercial. Instead, this is Simone Biles, a Nike athlete, being featured in a Mattress Firm / Foster Kids commercial. But as part of any Nike athlete contract, Simone is obliged to wear Nike clothing whenever she appears on TV, Nike commercial or not. Nike would probably be highly interested in knowing how many similar logo placements occurred for Simone, as well as for all of their other sponsored athletes.

Hive Media analytics dashboard showing a timeline comparison of logo exposure between Adidas and Nike over a weeklong period on FOX
Figure 3

Today, we are only beginning our journey toward understanding the wealth of data we have at our disposal. Hive Media is pioneering a new way of thinking around media content, and we are eager to help both broadcasters and advertisers optimize their content to better retain viewers and inform advertising decisions.