Final Project - Improving Brand Analytics with an Image Logo Detection Convolutional Neural Net in TensorFlow

For my final Metis project, I developed an application that can improve brand analytics through logo detection in images. The core of my solution leverages a Deep Convolutional Neural Network developed and trained using Google’s Deep Learning library, TensorFlow. Since my presentation was constrained to only four minutes, I’ll use this post to elaborate on the slides I presented (provided as screenshots throughout this post). My hope is this post will be useful to you if you’re trying to build your own image detection model or just trying to understand more about deep learning!

Note: while I focus on detecting Patagonia’s logo, this project was in no way associated with Patagonia. I just really like their gear and their logo!

Why image logo detection?

“A picture is worth a thousand words” – almost everyone ever

It’s not easy for brands to understand who is using their products and how they’re using them. One method brands use is surveys. But surveys can be time-consuming, expensive, and have somewhat biased results. Another more scalable solution is text analytics on social media posts, specifically on the captions, comments, tags, and other metadata such as location and user information. However, text analytics rely on manual data entry from users and can miss out on posts that only include the people using the brand’s products in an image. To capture these additional image-only posts, a brand can pay someone to manually sift through the torrent of social media image posts and pick out ones that are relevant. Unfortunately, that kind of manual search is expensive and doesn’t scale! One way we can create a scalable solution is with a logo detection Convolutional Neural Net model…

Designing the logo detector model and app

The core of the image logo detection model I developed uses the state-of-the-art Deep Convolutional Neural Net developed by Google, Inception v3. Instead of training the Inception v3 model from scratch, I used a technique called transfer learning to retrain the model to classify if an image has a Patagonia logo vs. no-logo. I used transfer learning because it took significantly less time to train the model and I didn’t need as many training images.

To make the retraining easier, I started with this great TensorFlow retraining tutorial and associated script. As I’ll explain in the Model development section below, I had to make custom changes to the script, but it allowed me to get up and running quickly!

Next, let’s explore the end-to-end design of the logo detection model and associated web app, including the technical details.

Data collection

I chose to focus my data collection (i.e., image collection) efforts on Instagram given the platform’s image-centered nature and popularity among brands. Unfortunately, the Instagram API recently underwent some major changes that greatly limit access. Therefore, I chose to manually scrape images from the Instagram accounts of many Patagonia-sponsored athletes and Patagonia stores. To do the manual scraping, I used JavaScript in a Google Chrome console. Not ideal, but it allowed me to quickly gather images and start training my model!

Because this was a supervised problem, I also manually went through all the images I downloaded and grouped them into two folders: 1) has a Patagonia logo (logo), or 2) doesn’t have a Patagonia logo (no-logo).

Model development

The majority of my time was spent iterating on developing the logo detection model. The main steps included modifying the out-of-the-box Inception v3 model and associated retraining script in TensorFlow, retraining the modified model, analyzing model performance, and tuning model parameters accordingly.

Some of the major modifications to the out-of-the-box Inception v3 model and associated retraining script in TensorFlow I made were:

Retrained the final model layer to categorize logo vs. no-logo
Added a dropout layer to minimize overfitting. If you’ve never heard of dropout before, check out this great short video from the Google Udacity Deep Learning course.
Up-sampled logo class of images to improve precision and recall for the unbalanced class.
Removed random sampling for validation and test sets from the script which originally distorted the results.
Added TensorBoard summaries to be able to visualize model training. See my other post for more details.

As far as tuning model parameters, I experimented with different hyper-parameters and image distortions. Hyper-parameters included the learning rate, number of training steps, testing/validation percentage, and training batch size. Distortions included random image cropping, scale, and brightness. While I found distortions to improve model performance somewhat, they came at a major performance cost so I did not end up using them in my final model.

Web app

The web app is a simple prototype that allows a user to upload an image (either from a desktop or mobile phone) and test whether or not there is a Patagonia logo in the image, as determined by the TensorFlow logo detection model I trained and tested.

Check out the Live app prototype! section below for a screencast and link to the live web app.

Technical architecture

Here’s the technical view, highlighted by TensorFlow and several AWS services:

A few notes about the technical architecture:

Speeding up model training – Training the TensorFlow image detection model can require a lot of computing power. To speed things up, I used AWS EC2 GPU instances (g2.2xlarge instance type) and automated the instance setup to run multiple model trainings in parallel.
Reducing costs – on-demand AWS EC2 GPU instances can be expensive ($0.65 per hour). However, I used AWS spot instances to take advantage of the significantly reduced costs (e.g., ~$0.10 per hour vs. $0.65 per hour for a regular on-demand GPU EC2 instance). The main downside was that a lot of times my servers were terminated without any warning when the spot prices increased above my bid price. Luckily I always saved everything to AWS S3 so I could spin up a new spot instance and continue training the model once the spot prices went back down!

Evaluating the logo detector model

The major challenge with this logo vs. no-logo classification problem is the class imbalance (i.e., most images don’t have a Patagonia logo in it). Initially, my trained models had much poorer logo precision and recall than the final model. In order to improve these metrics, I experimented with up-sampling the logo class, as well as down-sampling the no-logo class, each of which improved the precision and recall on the logo class. In the end, the model achieved 77% precision and 40% recall on the logo class. There’s definitely room for improvement in the next iteration, but still a good start!

Need a refresher on the difference between precision and recall? Take a look at this post on Quora.

Curious which test images the model correctly and incorrectly predicted? Let’s take a look:

As you can see, the model struggled dealing with:

Logo scale differences
Multiple logos for the same brand
Other images with text and other logos

However, sometimes the model was even smarter than me…

App Screencast!

Contributing to the Google TensorFlow project

One of the most exciting parts of my project was being able to contribute code I developed back to the Google TensorFlow project so others can benefit from it. For more information, take a look at my other post for more details.

Future work

I was very happy with the results of my initial model and the basic functionality of the web app. However, that was just the tip of the iceberg! Here are some of the potential next steps for this project I’ve been thinking about:

Better understanding how a solution like this could be successful while still protecting people’s privacy.
Pitch use-cases to brands/marketers to determine product/market fit
Expand prototype of product and collect feedback
- Enable batch image processing via a RESTful API
- Integrate with Instagram (and other social media image services) API to automatically collect images
Improve and optimize model/process
- Improve recall on “has logo” class by adding class weights to loss function similar to what’s suggested in this StackOverflow post.
- Improve ability to generalize to broader set of images by training model on more data.
- Localization of logo detection (i.e., where in this picture is a logo, if any?)
- Combine logo detection with existing textual analytics
- Automate model selection / hyper-parameter tuning process
- Explore other model architectures
- Compare model performance to existing logo-detection services

Conclusion

Combining the power of Deep Convolutional Neural Networks, like the logo detector model I developed, with existing text analytics capabilities presents an opportunity for brands to discover new insights about their customers and how customers use their products.

Additional References

I did a lot of research on computer vision and neural networks during this project. Here are the sources I found most useful:

31 Aug 2016

« Reflecting on my Metis Data Science Bootcamp Experience Key Apache Spark Trends from Spark Summit East 2017 »

Max's Musings