So if you recall, I spoke to this other smart person at Google…
And he told me I was tackling the problem all wrong.
Well, let’s go back to the original sample photos…
Notice the first image, he said. Well, it not only has two vehicles in it, but two different TYPES of vehicles.
Oh? I asked. Why does that matter?
Well, you see, this is multi-task learning. Long story short: if you have multiple object types being classified side-by-side… they each need their own recognition model. So, if I were to do this the way I was currently doing this, I would need to build a car recognition model, a truck recognition model… etc etc. With two weeks until the conference, my brain was about ready to just throw up the flowers object recognition demo and call it a day. But my audience… they had to see… this other dataset…
Smart person at Google: well, just use the images that contain one type of vehicle in them and try again.
My brain: wait… you mean that whole weekend of watching Hell on Wheels and fighting World War III with LabelImg was for nothing?! I have to do that all over again? Well, I guess I could finish WestWorld and do that… but really?!
So I set to the task of going through the OIRDs dataset all over again, this time only picking out the images with a single vehicle in them, and throwing them into the right folders…
Here is where the story ends somewhat happily, somewhat sadly.
The happy news? I demoed my model training (disconnected!) with Mobilenet as my final model of choice, mostly due to speed reasons. I used the basic steps in Tensorflow for Poets with the folder structure. I was able to use the right command line commands to test the model.
The problem was, the final resulting test accuracy was an abysmal 52%.
52%? I couldn’t show that on stage.
So I proceeded to test as many of my 200 test images as my patience would allow me. All of them successfully classified- almost too much so. The certainties of the model were coming out skewed, with the right vehicle being identified with a 0.99% confidence, and the others drastically in the negatives (scientific notation was involved…). How could my accuracy be so low?
But I didn’t give it any more thought, and said screw it. You know what? I’ll make another point on stage that I think is very important that I learned from this whole experience.
It can be summed up in the photo below:
Watching a model learn, let alone trying to train it (already being a dummy beginner such as myself) was very difficult… and I also learned that models are not right all the time when inference is doing its magic.
So, what is the take away?
I had a military commander in the audience ask me, following my successful live demo (which crashed several times prior to getting on stage, per usual): how do I execute or make decisions on anything at 52% certainty?
My answer was- don’t do it just with the machine. At this point, for certain tasks, in certain [most] situations, machine learning models augment, maybe automate certain aspects of human tasks with the goal of making processes more efficient (in this case) or coming to conclusions about data that a human might have missed entirely. Due to… just being human. But replacing a human talent entirely? Not so sure about that.
So that was the sad-ish part of the ending of my first foray with machine learning. The happy part? I think I found out why my model was outputting various low percentages of accuracy with each repetition of this demo that I did for conferences following the initial one that led to this horrid scramble. The inference seemed to be performing very well, bu the percentages were so low…
But wait. Only the final ones were. As I scrolled up my command line screen to watch the training accuracy percentages, I noticed that they had been as high as 79% a couple convolutions prior to reaching the 400th and final convolution. What had happened?
I currently [still] believe that my model had over-fit to my data. After my whittling down of the images to eliminate the need to invent a multi-tasking machine learning scenario, I realized my final training dataset was down to about 600 images rather than the original 1000+ I had began with. Could this have led to over-training or over-fitting with the massive number of training convolutions? Probably.
Will I change this demo to fix these problems? I am personally still in debate. The lessons I learned from these bloops were huge- the value I could communicate to the audience even larger if my goal is to orient machine learning newbs to just how intricate machine learning development can be.
Interested in replicating my process? Eliminate the heartache and follow the Tensorflow article to take a shot at transfer learning in object classification with your own dataset. Or, take the long, hard road and go back to this tutorial that I think is awesome for the long way.
Once you’re done, if you choose to use the OIRDs dataset (or any dataset, really) consult this article for an easy step by step outline with commands on how to perform this demo if you must be shoved in front of hundreds of people.
Happy learning, never give up.