BACK TO ALL BLOGS Brand Madness: The Road to the Final Four March Madness is one of the most popular sports showcases in the nation and Hive is following all month long. Here’s how the First and Second Round advertisers did and which games elicited the most commotion. HiveMarch 28, 2019July 5, 2024 At a Glance: March Madness is the most anticipated tournament of the year for college basketball fans, bracket-holders, and advertisers alike.Hive analyzed the First and Second Rounds to assess viewership trends across games, earned media exposure, and sponsorship winners.The highest average viewership occurred during the Round of 32 where Auburn vs. Kansas had a 0.8% average household reach. However, First Round Cinderella story UC Irvine’s Second Round game against Oregon achieved the highest peak viewership, reaching 1.1% of households.Although UC Irvine’s fairy tale ending wasn’t meant to be, the potential of another upset in the Second Round bumped their viewership 108% from the last round. The victor of that game, Oregon, was the only double-digit seed to survive through to the Sweet 16.AT&T optimized their sponsorship spot and earned the most screen time while maintaining a high Brand Prominence Score.Progressive had the most effective airings with over 2% average reach for their spots in total but GMC and AT&T generated the most viewership with over 100 airings and an average household reach of just under 2%. March Madness is a live TV experience like no other. Millions of brackets are filled out every year, and unlike other one-time sporting events such as Super Bowl Sunday, March Madness is an extended series of games with an all-day TV schedule. This results in more data points and in turn, more opportunities to assess patterns and trends. The elongated showcase gives marketers some madness of their own – TV advertisers and sponsors receive a unique chance to hold their audience’s attention and craft a story. In the First and Second Rounds, Hive used its best-in-class computer vision models in conjunction with viewership data (powered by 605, an independent TV data and analytics company) from more than 10M households to analyze which brands made an appearance, which games viewers were watching, and which advertisers optimized their NCAA sponsorship real-estate. Brands that stole the show Because of the diversity in viewership of the tournament, brands have the opportunity to market content to both fans and non-fans. Acquiring March Madness real-estate means unlocking millions of impressions from one of the largest audiences in the nation. Hive is able to measure earned media and sponsorship exposure by using computer vision AI to identify brand logos in content during regular programming, creating a holistic “media fingerprint.” This visual contextual meta-data is overlaid with the most robust viewership dataset available to enable brands with an unparalleled level of data on their earned media and sponsorship. Hive Media helps brands understand how a dollar of advertising spend may translate to real-life consumer purchases. VIDEO Hive’s AI models capture every second that a brand’s logo is on screen and assign that logo a Brand Prominence Score.* AT&T won big in the first two rounds of the tournament. As an Official NCAA Corporate Champion and halftime sponsor, it had prominent digital overlays, one of the highest brand prominences, and the most seconds on screen with almost six hours. Its earned media was equivalent to 260 30-second spots. In second place Capital One, another Corporate Champion, with 3 hours and 22 minutes followed by Nike with 3 hours and 20 minutes of screen time. Figure 1. Results from Hive Logo Model Apparel and gear sponsors such as Nike, Under Armour, and Spalding earned a significant amount of minutes on screen because they appeared on locations such as jerseys and backboards. However, as a result of their locations, the logos appeared with much less prominence. In addition to on-site logos, Hive also tracked the top brands by commercial airings and average household reach. Progressive, Coca-Cola, State Farm and Apple all earned higher than 2% average household reach with a selective placement strategy, however GMC and AT&T were the big winners in terms of volume as each earned almost 2% average reach with significantly more airings. Figure 2. Results from Hive Commercial AI Model, viewership powered by 605 Top games Here are the most viewed games on broadcast networks. Figure 3. Viewership data powered by 605 And here are the top five cable games from each round. Figure 5. Viewership data powered by 605 Figure 6. Viewership data powered by 605 In the First Round, the Florida vs. Nevada game on TNT had the highest average viewership levels on cable TV. Figure 5 shows the household reach of Florida vs. Nevada on TNT alongside that of St. Mary’s vs. Villanova on TBS, games that broadcasted at the same time. Household tune-in remained steady in the first half with dips in viewership during commercial breaks. By halftime, Florida had secured a healthy lead and viewership dropped as viewers switched over to the TBS game. However, viewership recovered with strength during the second half as Florida began to squander a double-digit lead just as the TBS game went to halftime. Viewers were retained even as the TBS game returned, with viewership continuing to rise until Florida narrowly edged out a win over Nevada. The head to head comparison illustrates powerful correlations between viewership and excitement during live games. Figure 7. Viewership data powered by 605 March Madness always has the entire nation buzzing about Duke, a national brand with millions of people following the blue blood powerhouse. This year, fan loyalty, coupled with the intrigue of Zion Williamson, unsurprisingly earned the team one of the largest overall audiences of the first two rounds. To top it off, the Blue Devils narrowly escaped what would have been the biggest upset of the tournament with a one-point victory. Despite CBS’s broadcast games driving the most viewership, many households switched to cable to follow the Round of 64’s most exciting underdog story, UC Irvine. After the First Round, it seemed as if Cinderella had moved to sunny California as UC Irvine ran away with its first tournament victory in NCAA history. Although Irvine’s average viewership level did not make the top 5 in the First Round, audience tune-in continued to soar throughout the game. After this exciting upset, UCI reached a peak of 1.1% of U.S. households in the Second Round to make it in the top five most-viewed games in the Second Round as they went head to head with Oregon. The dream of a fairy tale ending came to an end as Oregon defeated Irvine to become the only double-digit seed to secure a spot in the Sweet 16. Figure 8. Viewership data powered by 605 Conclusion The first two rounds may be over, but we’ve only just begun the games. With no true Cinderella run this year, March Madness continues on with only the top programs. Keep an eye out to learn how this week’s bracket busters will affect audience retention in the Sweet 16 and Elite Eight as Hive continues to track viewership and advertising trends. Kevin Guo is Co-founder and CEO of Hive, a full-stack deep learning company based in San Francisco building an AI-powered dashboard for media analytics. For inquiries, please contact him at kevin.guo@thehive.ai. Viewership data was powered by 605, an independent TV data and analytics company. *A Brand Prominence Score is defined as the combination of a logo’s clarity, size, and location on-screen, in addition to the presence of other brands or objects on screen.
BACK TO ALL BLOGS Spark on Mesos Part 2: The Great Disk Leak HiveMarch 8, 2019July 5, 2024 After ramping up our usage of Spark, we found that our Mesos agents were running out of disk space. It was happening rapidly on some of our agents with small disks: The issue turned out to be that Spark was leaving behind binaries and jars in both driver and executor directories: Each uncompressed Spark binary directory folder contains 248MB, so to sum this up: For a small pipeline with one driver and one executor, this adds up to 957MB. At our level of usage, this was 100GB of dead weight added every day. I looked into ways to at least avoid storing the compressed Spark binaries, since Spark only really needs the uncompressed version. It turns out that Spark uses the Mesos fetcher to copy and extract files. By enabling caching on the Mesos fetcher, Mesos will store only one cached copy of the compressed Spark binaries, then extract it directly into each sandbox directory. In the spark documentation, it looks like this should be solved by setting the spark.mesos.fetcherCache.enable option to true; If set to true, all URIs (example: spark.executor.uri, spark.mesos.uris) will be cached by the Mesos Fetcher Cache.” Adding this to our Spark application confs, we found that the cache option was turned for the executor, but not driver: This brought our disk leak down to 740MB per Spark application. Reading through the Spark code, I found that the driver’s fetch configuration is defined by the MesosClusterScheduler, whereas the executor’s are defined by the MesosCourseGrainedSchedulerBackend. There were two oddities about the MesosClusterScheduler: It reads options from the dispatcher’s configuration instead of the submitted application’s configurationIt uses the spark.mesos.fetchCache.enable option instead of spark.mesos.fetcherCache.enable So bizarre! Finding no documentation for either of these issues online, I filed two bugs. By now, my PRs to fix them have been merged in, and should show up in upcoming releases. In the meantime, I implemented a workaround by adding the spark.mesos.fetchCache.enable=true option to the dispatcher. Now the Driver also used caching, reducing the disk leak to 523MB per Spark application: Finally, I took advantage of Spark’s shutdown hook functionality to manually clean up the driver’s uberjar and uncompressed spark binaries: //shutdown hook to clean driver spark binaries after application finishes sys.env.get("MESOS_SANDBOX").foreach((sandboxDirectory) => { sparkSession.sparkContext.addSparkListener(new SparkListener { override def onApplicationEnd(sparkListenerApplicationEnd: SparkListenerApplicationEnd): Unit = { val sandboxItems = new File(sandboxDirectory).listFiles() val regexes = Array( "^spark-\d+.\d+.\d+-bin".r, "^hive-spark_.*\.jar".r ) sandboxItems .filter((item) => regexes.exists((regex) => regex.findFirstIn(item.getName).isDefined)) .foreach((item) => { FileUtils.forceDelete(item) }) } }) }) This reduced the disk leak to just 248MB per application: This still isn’t perfect, but I don’t think there will be a way to delete the uncompressed spark binaries from your Mesos executor sandbox directories until Spark adds more complete Mesos functionality. For now, it’s a 74% reduction in the disk leak. Last, and perhaps most importantly, we reduced the time to live for our completed Mesos frameworks and sandboxes from one month to one day. This effectively cut our equilibrium disk usage by 97%. Our Mesos agents’ disk usage now stays at a healthy level.
BACK TO ALL BLOGS Spark on Mesos Part 1: Setting Up HiveFebruary 12, 2019July 5, 2024 At Hive, we’ve created a data platform including Apache Spark applications that use Mesos as a resource manager. While both Spark and Mesos are popular frameworks used by many top tech companies, using the two together is a relatively new idea with incomplete documentation. Why choose Mesos as Spark’s resource manager? Spark needs a resource manager to tell it what machines are available and how much CPU and memory each one has. It uses this information and then requests that the resource manager add tasks for the executors it needs. There are currently four resource manager options: standalone, YARN, Kubernetes, and Mesos. The next table should make clear why we chose to use Mesos. We wanted our Spark applications to use the same in-house pool of resources that other, non-Hadoop workloads do, so only Kubernetes and Mesos were options to us. There are great posts out there contrasting the two of these, but for us the deciding factor was that we already use Mesos. Spark applications can share resources with your other frameworks in Mesos. Learnings from Spark on Mesos Spark’s guide on running Spark on Mesos is the best place to start setting this up. However, we ran into a few notable quirks it does not mention. A word on Spark versions While Spark has technically provided support for Mesos since version 1.0, it wasn’t very functional until recently. We strongly recommend using Spark 2.4.0 or later. Even in Spark 2.3.2, there were some pretty major bugs: The obsolete MESOS_DIRECTORY environment variable was used instead of MESOS_SANDBOX, which caused an error during sparkSession.stop in certain applications.Spark-submit to mesos would not properly escape your application’s configurations. To run the equivalent of spark-submit –master local[4] –conf1 “a b c” –class package.Main my.jar on mesos, you would need to run spark-submit –master –conf1 “a\ b\ c” mesos://url –deploy-mode cluster –class package.Main my.jar. Spark 2.4.0 still has this issue for arguments.Basically everything under the version 2.4.0 Jira page with Mesos in the name. Still, Spark’s claim that it “does not require any special patches of Mesos” is usually wrong on one count: accessing jars and spark binaries in S3. To access these, your mesos agents will need hadoop libraries. Otherwise, you will only be able to access files stored locally on the agents or accessible by http. In order to use S3 links or HDFS links, one must configure every Mesos agent with a local path to the Hadoop client. This allows the Mesos Fetcher to successfully grab the Spark bin and begin executing the job. Spark Mesos dispatcher The Spark dispatcher is a very simple Mesos Framework for running Spark jobs inside a Mesos cluster. The dispatcher actually does not manage the resource allocation nor the application lifecycle of the jobs. Instead, for each new job it receives, it launches a Spark Driver within the cluster. The Driver itself is also a Mesos Framework with its own UI and is given the responsibility of provisioning resources and executing its specific job. The dispatcher is solely responsible for launching and keeping track of Spark Drivers. How a Spark Driver runs jobs in any clustered configuration. While setup for the dispatcher is as simple as running the provided startup script, one operational challenge to consider is the placement of your dispatcher. The two pragmatic locations for us were running the dispatcher on a separate instance outside the cluster, or as an application inside the Marathon Mesos framework. Both had their trade offs, but we decided to run the dispatcher on a small dedicated instance as it was an easy way to have a persistent endpoint for the service. One small concern worth mentioning is the lack of HA for the dispatcher. While Spark Drivers continue to run when the dispatcher is down and state recovery is available with Apache Zookeeper, multiple dispatchers cannot be coordinated together. If HA is an important feature, it may be worthwhile to run the service on Marathon and setting up some form of service discovery so you can have a persistent endpoint for the dispatcher. Dependency management There are at least three ways to use manage dependencies for your Spark repo: Copying dependency jars to the Spark driver yourself and specifying spark.driver.extraClassPath and spark.driver.extraClassPath.Specifying spark.jars.packages and optionally spark.jars.repositories.Creating an uberjar that includes both your code and all necessary dependencies’ jars. Option 1 gives you total control over which jars you use and where they come from, in case there are some items in the dependency tree you know you don’t need. This can save some application startup time, but is very tedious. Option 2 streamlines option 1 by listing the required jars only once and pulling from the list of repositories automatically, but loses the very fine control by pulling the full dependency tree of each dependency. Option 3 gives back that very fine control, and is the most simple, but duplicates the dependencies in every uberjar you make. Overall, we found option 3 most appealing. Compared to option 2, it saved 5 seconds of startup time on every Spark application and removed the worry that the maven repository would become unavailable. Better automating option 1 might be the most ideal solution of all, but for now, it isn’t worth our effort. What next? Together with Spark’s guide to running on Mesos, this should address many of hiccups you’ll encounter. But join us next time as we tackle one more: the great disk leak.
BACK TO ALL BLOGS Learning Hash Codes via Hamming Distance Targets HiveJanuary 18, 2019July 5, 2024 We recently submitted our paper Learning Hash Codes via Hamming Distance Targets to arXiv. This was a revamp and generalization of our previous work, CHASM (Convolutional Hashing for Automated Scene Matching). We achieved major recall and performance boosts against state-of-the-art methods for content-based image retrieval and approximate nearest neighbors tasks. Our method can train any differentiable model to hash for similarity search. Similarity search with binary hash codes Let’s start with everyone’s favorite example: ImageNet. A common information retrieval task selects 100 ImageNet classes and requires hashing “query” and “dataset” images to compare against each other. Methods seek to maximize the mean average precision (MAP) of the top 1000 dataset results by hash distance, such that most of the 1000 nearest dataset images to each query image come from the same ImageNet class. This is an interesting challenge because it requires training a differentiable loss term, whereas the final hash is discrete. Trained models must either binarize their last layer into 0s and 1s (usually just taking its sign), or (like FaceNet) pair up with a nearest neighbors method such as k-d trees or Jegou et al.’s Product Quantization for Nearest Neighbor Search. Insight 1: It’s not a classification task. While information retrieval on ImageNet is reminiscent of classification, its optimization goal is actually quite different. Every image retrieval paper we looked at implicitly treated similarity search as if it were a classification task. Some papers make this assumption by using cross entropy terms, asserting that the probability two images with last layers of are similar is something like The issue here is that the model uses hashes at inference time, not the asserted probabilities. An example of this is Cao et al.’s HashNet: Deep Learning to Hash by Continuation. Other papers make this assumption by simply training a classification model with an encoding layer, then hoping that the binarized encoding is a good hash. The flaw here is that, while the floating-point encoding contains all information used to classify the image, its binarized version might not make a good hash. Bits may be highly imbalanced, and there is no guarantee that binarizing the encoding preserves much of the information. An example of this is Lin et al.’s Deep Learning of Binary Hash Codes for Fast Image Retrieval. Finally, a few papers make this assumption by first choosing a target hash for each class, then trying to minimize the distance between each image and its class’s target hash. This is actually a pretty good idea for ImageNet, but leaves something to be desired: it only works naturally for classification, rather than more general similarity search tasks, where similarity can be non-transitive and asymmetric. An example of this is Lu et al.’s Deep Binary Representation for Efficient Image Retrieval. This seems to be the second best performing method after ours. We instead choose a loss function that easily extends to non-transitive, asymmetric similarity search tasks without training a classification model. I’ll elaborate on this in the next section. Insight 2: There is a natural way to compare floating-point embeddings to binarized hashes. Previous papers have tried to wrestle floating-point embeddings into binarized hashes through a variety of means. Some add “binarization” loss terms, punishing the model for creating embeddings that are far from -1 or 1. Others learn “by continuation”, producing an embedding by passing its inputs through a tanh function that sharpens during training. The result is that their floating-point embeddings always lie close to , a finding that they boast. They use this in order to make Euclidean distance or correspond more closely to Hamming distance (the number of bits that differ). If your embedding is just -1’s and 1s, then Hamming distance is simply half of Euclidean distance. However, this is actually a disaster for training. First of all, forcing all outputs to does not even change their binarized values; 0.5 and 0.999 both binarize to 1. Also, it shrinks gradients for embedding components near , forcing the model to learn from an ever-shrinking gray area of remaining values near 0. We resolve this by avoiding the contrived usage of Euclidean distance altogether. Instead, we use a sound statistical model for Hamming distance based on embeddings, making two approximations that turn out to be very accurate. First, we assume that a model’s last layer produces embeddings that consists of independent random unit normals (which we encourage with a gentle batch normalization). If we pick a random embedding , this implies that is a random point on the unit hypersphere. We can then simply evaluate the angle between and The probability that such vectors differ in sign on a particular component is , so we make our second approximation: that the probability for each component to differ in sign is independent. This implies that probability for Hamming distance between and to be is a binomial distribution: This allows us to use a very accurate loss function for the true optimization goal, the chance for an input to be within a target Hamming distance of an input it is similar to (and dissimilar inputs to be outside that Hamming distance). Using the natural geometry of the Hamming embedding, we achieve far better results than previous work. Insight 3: It’s important to structure your training batches right. Imagine this: you train your model to hash, passing in 32 pairs of random images from your training set, and averaging the 32 pairwise loss terms. Since you have 100 ImageNet classes, each batch consists of 31.68 dissimilar pairs and 0.32 similar pairs on average. How accurate is the gradient? The bottleneck is learning about similar images. If random error for each similar pair is𝜎σ, the expected random error in each batch is , even greater than𝜎σThis is a tremendous amount of noise, making it incredibly slow for the model to learn anything. We can first improve by comparing every image in the batch to every other image. This takes comparisons, which will be only slightly slower than the original 32 comparisons since we still only need to run the model once on each input. This gives us 2016 pairwise comparisons, with 1995.84 dissimilar pairs and 20.16 similar pairs on average. Our random error is now somewhere between and (probably closer to the latter), a big improvement. But we can do even better by choosing constructing a batch with random similar images. By first choosing 32 random images, then for each one choosing a random image it is similar to, we get 51.84 similar pairs on average, 32 of which are independent. This reduces our random error to between and , another big improvement Under reasonable conditions, this improves training speed by a factor of 10. Discussion Read our paper for the full story! We boosted the ImageNet retrieval benchmark from 73.3% to 85.3% MAP for 16-bit hashes and performed straight-up approximate nearest neighbors with 2 to 8 times fewer distance comparisons than previously state-of-the-art methods at the same recall.
BACK TO ALL BLOGS Multi-label Classification HiveMay 31, 2018July 5, 2024 Classification challenges like Imagenet changed the way we train models. Given enough data, neural networks can distinguish between thousands of classes with remarkable accuracy. However, there are some circumstances where basic classification breaks down, and something called multi-label classification is necessary. Here are two examples: You need to classify a large number of brand logos and what medium they appear on (sign, billboard, soda bottle, etc.)You have plenty of image data on a lot of different animals, but none on the platypus – which you want to identify in images In the first example, should you train a classifier with one class for each logo and medium combination? The number of such combinations could be enormous, and it might be impossible to get data on some of them. Another option would be to train a classifier for logos and a classifier for medium; however, this doubles the runtime to get your results. In the second example, it seems impossible to train a platypus model without data on it. Multi-label models step in by doing multiple classifications at once. In the first example, we can train a single model that outputs both a logo classification and a medium classification without increasing runtime. In the second example, we can use common sense to label animal features (fur vs. feathers vs. scales, bill vs. no bill, tail vs. no tail) for each of the animals we know about, train a single model that identifies all features for an animal at once, then infer that any animal with fur, a bill, and a tail is a platypus. A simple way to accomplish this in a neural network is to group a logit layer into multiple softmax predictions: You can then train such a network by simply adding the cross entropy loss for each softmax where a ground truth label is present. To compare these approaches, let’s consider a subset of imagenet classes, and two features that distinguish them: First, I trained two 50-layer resnet V2’s on this balanced dataset: one trained on the single-label classification problem, and the other trained on the multi-label classification problem. In this example, every training image has both labels, but real applications may have only a subset of labels available for each image. The single-label model trained specifically on the 6-animal classification performed slightly better when distinguishing all 6 animals: Single-label model: 90% accuracyMulti-label model: 88% accuracy However, the multihead model provides finer information granularity. Though it got only 88% accuracy on distinguishing all 6 animals, it achieved 92% accuracy at distinguishing scales/exoskeleton/fur and 95% accuracy at distinguishing spots/no spots. If we care about only one of these factors, we’re already better off with the multi-label model. But this toy example hardly touches on the regime where multi-label classification really thrives: large datasets with many possible combinations of independent labels. In this regime, we get the interesting benefit of transfer learning. Imagine if we had categorized hundreds of animals into a dozen binary criteria. Training a separate model for each binary criterion would yield acceptable results, but learning the other features can actually help in some cases by effectively pre-training the network on a larger dataset. At Hive, we recently deployed a multi-label classification model that replaced 8 separate classification models. For each image, we usually had truth data available for 2 to 5 of the labels. Out of the 8, 2 were better (think 93% instead of 91%). These were the labels with less data. This makes sense, since they would benefit most from domain-specific pretraining on the same images. But most importantly for this use case, we were able to run all the models together in 1/8th the time as before.
BACK TO ALL BLOGS How to Use SSH Tunneling HiveMarch 15, 2018July 5, 2024 This guide will present a step-by-step guide to solving common connectivity problems using SSH tunnels. To read some useful comments and context, skip to the end. Here we will use ssh local port forwarding to PULL the service port through the ssh connection. The ssh tunnel command (to be run from A) is: Step 0: If box A can already hit box B:8080, then good for you. Otherwise, follow the steps below to make that happen. Step 1: If box A can ssh into box B, read the following section. If not, go to Step 2. Here we will use ssh local port forwarding to PULL the service port through the ssh connection. The ssh tunnel command (to be run from A) is: user@A > ssh -vCNnTL A:8080:B:8080 user@B This pulls the service port over to box A at port 8080, so that anyone connecting to A:8080 will transparently get their requests forwarded over the the actual server on B:8080. Step 2: If box B can ssh into box A, read the following section. Otherwise, skip to step 3. Now we will use ssh remote port forwarding to PUSH the service port through the ssh connection. The command (to be run from B) is user@B > ssh -vCNnTR localhost:8080:B:8080 user@A Now users on A hitting localhost:8080 will be able to connect to B:8080. Users not on either A or B will still be unable to connect. To enable listening on A:8080, you have 2 options: A) If you have sudo, add the following line to /etc/ssh/sshd_config and reload the sshd service: GatewayPorts clientspecified Then rerun the above command with “localhost” replaced by “A”: user@B > ssh -vCNnTR A:8080:B:8080 user@A B) Pretend “localhost (A)” is another box, and apply step 1, since A can generally ssh into itself: user@A > ssh -vCNnTL A:8080:localhost:8080 user@localhost Now we come to the situation where neither A nor B can ssh into the other. Step 3: If there are any TCP ports that allow A, B to connect, continue reading. Otherwise, move on to step 4. Suppose that B is able to connect to A:4040. Then the way to allow B to ssh into A is to turn A:4040 into an ssh port. This is doable by applying Step 1 on A itself, to pull the ssh service on 22 over to listen on A:4040: user@A > ssh -vCNnTL A:4040:A:22 user@A And then you can apply Step 2, specifying port 4040 for ssh itself: user@B > ssh -vCNnTL localhost:8080:B:8080 user@A -p 4040 Similarly, if A is able to connect to B:4040, you’ll want to forward B:22 to B:4040 using Step 1, then apply Step 1 again as usual. Step 4: Find a box C which has some sort of connectivity to both A and B, and continue reading. Otherwise, skip to step 10. If A and B have essentially no connectivity, then the way to proceed is to route through another box. Step 5: If C is able to hit B:8080, continue reading. Otherwise, skip to step 9. Step 6: If A is able to SSH to C, continue reading. Otherwise, skip to step 7. This is very similar to step 1 — we will pull the connection through the SSH tunnel using LOCAL forwarding. user@A > ssh -vCNnTL A:8080:B:8080 user@C Step 7: If C is able to SSH to A, continue reading. Otherwise, skip to step 8. Again, this is analogous to step 2 — we will push the connection through the SSH tunnel using REMOTE ssh forwarding. user@C: ssh -vCNnTR localhost:8080:B:8080 user@A Just as before, we will need to add an additional forwarding step to listen on a public A interface rather than localhost on A: user@A: ssh -vCNnTL A:8080:localhost:8080 user@localhost Step 8: If neither C nor A can ssh to each other, but there are TCP ports open between the 2: Apply the technique in step 3. Otherwise, skip to step 10. Now we are in the situation where C can’t hit B:8080 directly. Step 9: The general idea is to first connect C:8080 to B:8080 using one of Steps 1 or 2, and then do the same to connection A:8080 to C:8080. Note that it doesn’t matter which order you set up these connections. 9a) If C can ssh to B and C can ssh to A. This is a very common scenario – maybe C is your local laptop which is connected to 2 separate VPNs. First pull B:8080 to C:1337 and then push C:1337 to A:8080: user@C > ssh -vCNnTL localhost:1337:B:8080 user@B user@C > ssh -vCNnTR localhost:8080:localhost:1337 user@A user@C > ssh user@A user@A > ssh -vCNnTL A:8080:localhost:8080 user@localhost 9b) If C can ssh to B and A can ssh to C: Again, a fairly common scenario if you have a “super-private” network accessible only from an already private network. Do two pulls in succession: user@C > ssh -vCNnTL localhost:1337:B:8080 user@B user@A > ssh -vCNnTL A:8080:localhost:1337 user@C You can actually combine these into a single command: user@A : ssh -vC -A -L A:8080:localhost:1337 user@c ‘ssh -vCNnTL localhost:1337:B:8080 user@B’ 9c) If B can ssh to C and C can ssh to A: Double-push. user@B > ssh -vCNnTR localhost:1337:B:8080 user@C user@C > ssh -vCNnTR localhost:8080:localhost:1337 user@A user@C > ssh user@A user@A > ssh -vCNnTL A:8080:localhost:8080 user@A Again these can be combined into a single command. 9d) If B can ssh to C and A can ssh to C: Push from B and then pull from A. user@B > ssh -vCNnTR localhost:1337:B:8080 user@C user@A > ssh -vCNnTL A:8080:localhost:1337 user@C Any number of these may also need the trick from Step 3 to enable SSH access. Step 10: If box C doesn’t have any TCP connectivity to either B or A,then having box C doesn’t really help the situation at all. You’ll need to find a different box C which actually has connectivity to both and return to step 4, or find a chain (C, D, etc.) through which you could eventually patch a connection through to B. In this case you’ll need a series of commands such as those in 9a)-d) to gradually patch B:8080 to C:1337, to D:42069, etc., until you finally end up at A:8080. Addendum 1: If your service uses UDP rather than TCP (this includes dns, some video streaming protocols, and most video games), you may have to add a few steps to convert to TCP; see https://superuser.com/questions/53103/udp-traffic-through-ssh-tunnel for a guide.Addendum 2: If your service host and port come with url signing (e.g. signed s3 urls), changing your application to hit A:8080 rather than B:8080 may cause the url signatures to fail. To remedy this, you can add a line in your /etc/hosts file to redirect B to A; since your computer checks this file before doing a DNS lookup, you can do completely transparent ssh tunneling while still respecting SSL and signed s3/gcs urls.Addendum 3: My preferred tools for checking which TCP ports are open between 2 boxes are nc -l 4040 on the receiving side and curl B:4040 on the sending side. ping B, traceroute B, and route -n are also useful for diagnostic information but may not tell you the full story.Addendum 4: SSH tunnels will not work if there is something already listening on that port, such as the SSH tunnel you created yesterday and forgot to remove. To easily check this, try ps -efjww | grep ssh or sudo netstat -nap | grep LISTEN.Addendum 5: All the ssh flags are explained in the man page: man ssh. To give a brief overview: -v is verbose logging, -C is compression, -nNT together disable the interactive part of ssh and make it only tunnel, -A forwards over your ssh credentials, -f backgrounds ssh after connecting, and -L and -R are for local and remote forwarding respectively. -o StrictHostKeyChecking=no is also useful for disabling the known-hosts check. FURTHER COMMENTS SSH tunnels are useful as a quick-fix solution to networking issues, but are generally recognized as inferior solutions compared to long-term proper networking fixes. They tend to be difficult to maintain for an number of reasons: the setup does not require any configuration and leaves no trace other than a running process; they don’t automatically come up when restarting a box — except if manually added to the startup daemon; and they can easily be killed by temporary network outages. However we’ve made good use of them here at Hive, for instance most recently when we needed to keep up production services during a network migration, but also occasionally when provisioning burst GPU resources from AWS and integrating them seamlessly into our hardware resource pool. They also can be very useful when developing locally or debugging production services, or for getting gmail access in China. If you’re interested in different and more powerful ways to tunnel, I’m no networking expert — all I can do is point you in the direction of some interesting networking vocabulary. References https://en.wikipedia.org/wiki/OSI_model https://help.ubuntu.com/community/SSH/OpenSSH/PortForwarding#Dynamic_Port_Forwarding https://en.wikipedia.org/wiki/SOCKS https://en.wikipedia.org/wiki/Network_address_translation https://en.wikipedia.org/wiki/IPsec https://en.wikipedia.org/wiki/Iptables https://wiki.archlinux.org/index.php/VPN_over_SSH https://en.wikipedia.org/wiki/Routing_table https://linux.die.net/man/8/route