{"id":937,"date":"2018-01-05T05:49:00","date_gmt":"2018-01-05T05:49:00","guid":{"rendered":"https:\/\/thehive.ai\/blog\/?p=937"},"modified":"2024-07-05T06:53:43","modified_gmt":"2024-07-05T06:53:43","slug":"inside-a-neural-networks-mind","status":"publish","type":"post","link":"https:\/\/thehive.ai\/blog\/inside-a-neural-networks-mind","title":{"rendered":"Inside a Neural Network&#8217;s Mind"},"content":{"rendered":"\n<p><strong>Why do neural networks make the decisions they do? Often, the truth is that we don\u2019t know; it\u2019s a black box. Fortunately, there are now some techniques that help us peek under the hood to help us understand how they make decisions.<\/strong><\/p>\n\n\n\n<p>What has the neural network learned is attractive? Where does it look to decide if an image is safe for work? Using grad-cam, we explore the predictions of our models: sport type, action \/ non-action, drugs, violence, attractiveness, race, age, etc.<\/p>\n\n\n\n<p>Github repo: <a href=\"https:\/\/github.com\/hiveml\/tensorflow-grad-cam\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/github.com\/hiveml\/tensorflow-grad-cam<\/a><\/p>\n\n\n\n<div class=\"wp-block-columns\">\n<div class=\"wp-block-column\">\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" width=\"380\" height=\"368\" src=\"https:\/\/staticblog.thehive.ai\/uploads\/2018\/01\/1-1.png\" alt=\"\" class=\"wp-image-1002\" srcset=\"https:\/\/staticblog.thehive.ai\/uploads\/2018\/01\/1-1.png 380w, https:\/\/staticblog.thehive.ai\/uploads\/2018\/01\/1-1-300x291.png 300w\" sizes=\"(max-width: 380px) 100vw, 380px\" \/><\/figure>\n<\/div>\n\n\n\n<div class=\"wp-block-column\">\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" width=\"379\" height=\"368\" src=\"https:\/\/staticblog.thehive.ai\/uploads\/2018\/01\/2.png\" alt=\"\" class=\"wp-image-993\" srcset=\"https:\/\/staticblog.thehive.ai\/uploads\/2018\/01\/2.png 379w, https:\/\/staticblog.thehive.ai\/uploads\/2018\/01\/2-300x291.png 300w\" sizes=\"(max-width: 379px) 100vw, 379px\" \/><\/figure>\n<\/div>\n<\/div>\n\n\n\n<p>Hey, my face is up here! Clearly, the attractiveness model focuses on body over face in the mid-range shots above. Interestingly, it has also learned to localize people without any specific bounding box information in training. The model is trained on 200k images, labeled by <a href=\"http:\/\/thehive.ai\/data\" target=\"_blank\" rel=\"noreferrer noopener\">Hive<\/a> into three classes: hot, neutral, and not. Then the scores for each bucket are combined to create a rating 0-10. This classifier is available <a href=\"https:\/\/thehive.ai\/demos\" target=\"_blank\" rel=\"noreferrer noopener\">here1<\/a>.<\/p>\n\n\n\n<div class=\"wp-block-columns\">\n<div class=\"wp-block-column\">\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" width=\"380\" height=\"368\" src=\"https:\/\/staticblog.thehive.ai\/uploads\/2018\/01\/3.png\" alt=\"\" class=\"wp-image-994\" srcset=\"https:\/\/staticblog.thehive.ai\/uploads\/2018\/01\/3.png 380w, https:\/\/staticblog.thehive.ai\/uploads\/2018\/01\/3-300x291.png 300w\" sizes=\"(max-width: 380px) 100vw, 380px\" \/><\/figure>\n<\/div>\n\n\n\n<div class=\"wp-block-column\">\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" width=\"380\" height=\"366\" src=\"https:\/\/staticblog.thehive.ai\/uploads\/2018\/01\/4.png\" alt=\"\" class=\"wp-image-995\" srcset=\"https:\/\/staticblog.thehive.ai\/uploads\/2018\/01\/4.png 380w, https:\/\/staticblog.thehive.ai\/uploads\/2018\/01\/4-300x289.png 300w\" sizes=\"(max-width: 380px) 100vw, 380px\" \/><\/figure>\n<\/div>\n<\/div>\n\n\n\n<p>The main idea is to apply the logit layer with the last convolutional layer before global pooling. This creates a map showing the importance of each pixel in the network\u2019s decision.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" width=\"940\" height=\"304\" src=\"https:\/\/staticblog.thehive.ai\/uploads\/2018\/01\/5.png\" alt=\"Sports action, NSFW, violence\" class=\"wp-image-996\" srcset=\"https:\/\/staticblog.thehive.ai\/uploads\/2018\/01\/5.png 940w, https:\/\/staticblog.thehive.ai\/uploads\/2018\/01\/5-300x97.png 300w, https:\/\/staticblog.thehive.ai\/uploads\/2018\/01\/5-768x248.png 768w\" sizes=\"(max-width: 940px) 100vw, 940px\" \/><figcaption>Sports action, NSFW, violence<\/figcaption><\/figure>\n\n\n\n<p>The pose of the football player tells the model that a play is in action. We can clearly locate the nudity and the guns in the NSFW and Violence images, too.<\/p>\n\n\n\n<div class=\"wp-block-columns\">\n<div class=\"wp-block-column\">\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" width=\"380\" height=\"368\" src=\"https:\/\/staticblog.thehive.ai\/uploads\/2018\/01\/6.png\" alt=\"Snowboarding, TV show\" class=\"wp-image-997\" srcset=\"https:\/\/staticblog.thehive.ai\/uploads\/2018\/01\/6.png 380w, https:\/\/staticblog.thehive.ai\/uploads\/2018\/01\/6-300x291.png 300w\" sizes=\"(max-width: 380px) 100vw, 380px\" \/><figcaption>Snowboarding, TV show<\/figcaption><\/figure>\n<\/div>\n\n\n\n<div class=\"wp-block-column\">\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" width=\"379\" height=\"368\" src=\"https:\/\/staticblog.thehive.ai\/uploads\/2018\/01\/7.png\" alt=\"\" class=\"wp-image-998\" srcset=\"https:\/\/staticblog.thehive.ai\/uploads\/2018\/01\/7.png 379w, https:\/\/staticblog.thehive.ai\/uploads\/2018\/01\/7-300x291.png 300w\" sizes=\"(max-width: 379px) 100vw, 379px\" \/><\/figure>\n<\/div>\n<\/div>\n\n\n\n<p>A person in a suit, center frame, apparently indicates that it is a TV show instead of a commercial (right). The TV \/ commercial model is a great example of how grad-CAM can uncover unexpected reasons behind the decisions our models make. They can also confirm what we expect, as seen in the snowboarding example (left).<\/p>\n\n\n\n<div class=\"wp-block-columns\">\n<div class=\"wp-block-column\">\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" width=\"380\" height=\"375\" src=\"https:\/\/staticblog.thehive.ai\/uploads\/2018\/01\/8.png\" alt=\"The Simpsons, Rick and Morty\" class=\"wp-image-999\" srcset=\"https:\/\/staticblog.thehive.ai\/uploads\/2018\/01\/8.png 380w, https:\/\/staticblog.thehive.ai\/uploads\/2018\/01\/8-300x296.png 300w\" sizes=\"(max-width: 380px) 100vw, 380px\" \/><figcaption>The Simpsons, Rick and Morty<\/figcaption><\/figure>\n<\/div>\n\n\n\n<div class=\"wp-block-column\">\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" width=\"380\" height=\"375\" src=\"https:\/\/staticblog.thehive.ai\/uploads\/2018\/01\/9.png\" alt=\"\" class=\"wp-image-1000\" srcset=\"https:\/\/staticblog.thehive.ai\/uploads\/2018\/01\/9.png 380w, https:\/\/staticblog.thehive.ai\/uploads\/2018\/01\/9-300x296.png 300w\" sizes=\"(max-width: 380px) 100vw, 380px\" \/><\/figure>\n<\/div>\n<\/div>\n\n\n\n<p>This example uses our animated show classifier. Interestingly, the most important spot in the images above is the edge of Bart and Morty, including a substantial amount of the background in both cases.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" width=\"940\" height=\"486\" src=\"https:\/\/staticblog.thehive.ai\/uploads\/2018\/01\/10.png\" alt=\"\" class=\"wp-image-1001\" srcset=\"https:\/\/staticblog.thehive.ai\/uploads\/2018\/01\/10.png 940w, https:\/\/staticblog.thehive.ai\/uploads\/2018\/01\/10-300x155.png 300w, https:\/\/staticblog.thehive.ai\/uploads\/2018\/01\/10-768x397.png 768w\" sizes=\"(max-width: 940px) 100vw, 940px\" \/><\/figure>\n\n\n\n<h2>CAM and GradCam<\/h2>\n\n\n\n<p>First developed by Zhou2, Class Activation Maps (CAM) show what the network is looking at. For each class, CAM illustrates the parts of the image most important for that class.<\/p>\n\n\n\n<p>Ramprasaath3 extended CAM to apply to a wider range of architectures without any changes. Specifically, grad-CAM can handle fully connected layers and more complicated scenarios like question answering. However, almost all popular neural nets like ResNet, DenseNet, and even NasNet end with global average pooling. Therefore the heatmap can be computed directly using CAM without the backward pass. This is especially important for speed critical applications. Fortunately, with the ResNet used in this post we don&#8217;t have to modify the nets at all to compute CAM or grad-CAM.<\/p>\n\n\n\n<p>Recently, grad-CAM++ Chattopadhyay4 further generalized the method to increase the precision of the output heat maps. Grad-CAM++ is better at dealing with multiple instances of the class and highlighting the entire class rather than just the most salient parts. It achieves this using a weighted combination of positive partial derivatives.<\/p>\n\n\n\n<h2>Here&#8217;s how it&#8217;s implemented in Tensorflow:<\/h2>\n\n\n\n<pre class=\"wp-block-code\"><code>one_hot = tf.sparse_to_dense(predicted_class, &#91;num_classes], 1.0)\nsignal = tf.multiply(end_points&#91;\u2018Logits\u2019], one_hot)\nloss = tf.reduce_mean(signal)\n<\/code><\/pre>\n\n\n\n<p>This returns an array of num_classes elements with only the logit of the predicted class non-zero. This defines the loss.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>grads = tf.gradients(loss, conv_layer)&#91;0]\nnorm_grads = tf.divide(grads, tf.sqrt(tf.reduce_mean(tf.square(grads)))\n\t+ tf.constant(1e-5))<\/code><\/pre>\n\n\n\n<p>The pose of the football player tells the model that a play is in action. We can clearly locate the nudity and the guns in the NSFW and Violence images, too.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>output, grads_val = sess.run(&#91;conv_layer, norm_grads],\n\tfeed_dict={imgs0: img})<\/code><\/pre>\n\n\n\n<p>A person in a suit, center frame, apparently indicates that it is a TV show instead of a commercial (right). The TV \/ commercial model is a great example of how grad-CAM can uncover unexpected reasons behind the decisions our models make. They can also confirm what we expect, as seen in the snowboarding example (left).<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>weights = np.mean(grads_val, axis = (0, 1))             # &#91;2048]\ncam = np.ones(output.shape&#91;0 : 2], dtype = np.float32)  # &#91;10,10]<\/code><\/pre>\n\n\n\n<p>This example uses our animated show classifier. Interestingly, the most important spot in the images above is the edge of Bart and Morty, including a substantial amount of the background in both cases.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>cam = np.ones(output.shape&#91;0 : 2], dtype = np.float32)  # &#91;10,10]\nfor i, w in enumerate(weights):\n\tcam += w * output&#91;:, :, i]<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>cam = np.maximum(cam, 0)\ncam = cam \/ np.max(cam)\ncam = cv2.resize(cam, (eval_image_size, eval_image_size))<\/code><\/pre>\n\n\n\n<p>Pass the cam through a RELU to only take the positive suggestions for that class. Then we resize the coarse cam output to the input size and blend to display.<\/p>\n\n\n\n<p>Finally, the main function grabs the tensorflow slim model definition and pre-processing function. With these it computes the grad-CAM output, and blends that with the input photo. In the code below, we use the class with the greatest softmax probability as input to grad_cam. Instead, we could choose any class. For example:<\/p>\n\n\n\n<p>The model predicted alcohol as the top choice with 99% and gambling with only 0.4%. By changing the predicted_class from alcohol to gambling, we can see how-despite the low class probability, it can clearly pinpoint the gambling in the image.<\/p>\n\n\n\n<h2>References<\/h2>\n\n\n\n<ul><li>Our attractiveness classifier: <a href=\"https:\/\/thehive.ai\/demo\/attractiveness\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/thehive.ai\/demo\/attractiveness<\/a><br><\/li><li>Bolei Zhou, Aditya Khosla, \u00c0gata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. CoRR, abs\/1512.04150, 20151<br><\/li><li>Ramprasaath R. Selvaraju, Abhishek Das, Ramakrishna Vedantam, Michael Cogswell, Devi Parikh, and Dhruv Batra. Grad-cam: Why did you say that? visual explanations from deep networks via gradient-based localization. CoRR, abs\/1610.02391, 2016<br><\/li><li>Aditya Chattopadhyay, Anirban Sarkar, Prantik Howlader, and Vineeth N. Balasubramanian. Grad-cam++: Generalized gradient-based visual explanations for deep convolutional networks. CoRR, abs\/1710.11063, 2017<br><\/li><li>Tensorflow Slim: <a href=\"https:\/\/github.com\/tensorflow\/models\/tree\/master\/research\/slim\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/github.com\/tensorflow\/models\/tree\/master\/research\/slim<\/a><br><\/li><li>Our grad-cam github: <a href=\"https:\/\/github.com\/hiveml\/tensorflow-grad-cam\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/github.com\/hiveml\/tensorflow-grad-cam<\/a><br><\/li><li>Original grad-cam repo: <a href=\"https:\/\/github.com\/Ankush96\/grad-cam.tensorflow\">https:\/\/github.com\/Ankush96\/grad-cam.tensorflow<\/a><\/li><\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Why do neural networks make the decisions they do? There are now some techniques that help us understand.<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"kia_subtitle":""},"categories":[8],"tags":[],"_links":{"self":[{"href":"https:\/\/thehive.ai\/blog\/wp-json\/wp\/v2\/posts\/937"}],"collection":[{"href":"https:\/\/thehive.ai\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/thehive.ai\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/thehive.ai\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/thehive.ai\/blog\/wp-json\/wp\/v2\/comments?post=937"}],"version-history":[{"count":5,"href":"https:\/\/thehive.ai\/blog\/wp-json\/wp\/v2\/posts\/937\/revisions"}],"predecessor-version":[{"id":1007,"href":"https:\/\/thehive.ai\/blog\/wp-json\/wp\/v2\/posts\/937\/revisions\/1007"}],"wp:attachment":[{"href":"https:\/\/thehive.ai\/blog\/wp-json\/wp\/v2\/media?parent=937"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/thehive.ai\/blog\/wp-json\/wp\/v2\/categories?post=937"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/thehive.ai\/blog\/wp-json\/wp\/v2\/tags?post=937"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}