What You Need to Know to Sustain a 10-Minute Conversation About Artificial Intelligence and Machine Learning
Once upon a time, I knew nothing about the two buzzwords driving everyone up the wall at the moment. When I decided to educate myself about the field, one of the first things I had to do was get some definitions straight…
So if you want to claim you know a thing or two about AI or ML (without being THAT person in the room that throws those two terms and others related to them about without really knowing what the hell is going on) let’s get through this post together…
A disclaimer before we get started…
Definitions in the field of AI are up for debate, are constantly changing, and frankly embody different levels of granularity to different members of the technical community that touch AI and ML. To a PhD in the area of data science and machine learning, this whole post might be a funny joke… but to those who have never touched the terms before but whose manager told them they must apply against a problem to see a promotion, this will at the very least get you started on the windy road that embodies the field.
At the end of the day, engineers can argue over minute details for hours and maybe even days, which is why I decided to go out on a limb and just get some damn answers for you in the meantime until the next re-org of AI and ML happens.
So what is Artificial Intelligence?
Defining AI is a long and hard discussion; if you were to place a handful of engineers, data scientists, developers (whatever you want to call them) in one room, you will get a handful of different answers to this question. Potentially some heated arguments with name calling. Who knows.
While AI and ML are considered sparkly fields now, in fact the field of artificial intelligence has been in existence since at least the 50’s, if not before. This field is not new- post “AI winter” it has acquired a newness to it, but in fact humans have been exploring how we map our own data, information, knowledge, and decision making processes for decades.
“AI involves machines that can perform tasks that are characteristic of human intelligence” -John McCarthy, 1956
So there you have it, nice and simple. But one of the wisest things I have heard said to me as I humbly stumble around giants in the AI and ML field who vastly outpace me on knowledge and experience:
Artificial Intelligence is neither artificial nor intelligent.
After reading through some of these upcoming definitions, you might get a gist as to why.
So now that we have defined AI, let’s gravitate towards my field of specialization currently, which is machine learning (which is my least uncomfortable comfort zone in the land of AI, so go with it).
Artificial Intelligence can be divided into two general fields: General AI and Narrow AI
Think of it like this:
Therefore General AI would would have all of the characteristics of human intelligence. Not just some, but all. Some could simply be the combination of (for example) several machine learning models deployed at once- this in fact is not a demonstration of intelligence, but math multi-tasking (which is much better than me or you trying to multitask, that’s for sure).
Narrow AI exhibits some facet(s) of human intelligence, and can do that facet extremely well, but is lacking in other areas. This is where we will dive further into machine learning as a sub-field of artificial intelligence…
Which brings me to my next point.
Artificial intelligence and machine learning as terms in the field do NOT refer to the same thing. Do not use them interchangeably.
What you have just learned (and will dive into in more depth) is that machine learning is a sub-field to artificial intelligence, and is an example of narrow AI.
So what are some examples of technologies that fall under the field of Narrow AI?
Notice that most of the technologies mentioned above tackle very distinctive and defined tasks. The area of image recognition, for example, have encompassed narrow tasks such as facial recognition, object classification, OCR (optical character recognition, or finding text in images and extracting them properly) etc. This diagram is fuzzy in a sense, since the grouping of technologies as they are mapped in this tree are up for debate, but they have all in a sense been accurately classified as fields under the AI of type: narrow.
So what do I need to know about Machine Learning to not make an ass of myself in a high level conversation about the subject?
Let’s first be clear that artificial intelligence can be accomplished without machine learning as a means to an end. There are many other fields under AI that do not involve machine learning explicitly. However, when it comes to defining the goal of machine learning under AI:
“A field of computer science that uses statistical techniques to give computer systems the ability to “learn” with data without being explicitly programmed.” -Arthur Samuel
Please keep the word “statistical” in mind during this entire conversation- it is very, very important. Probably the most important word to keep in mind when ML makes it’s pzzazzy entrance into any conversation.
So the idea is…screw the millions of lines of codes that account for every seen and known type of spam email, and instead take (some very complicated methods, in my opinion) algorithmic approaches that are constantly learning to in a way classify what spam means through methods like clustering and reinforcement learning.
So how does the thing learn, you might be asking? Many talk about the “black box” when it comes to machine learning… how algorithms are exposed to data, and how the output comes from a process largely unknown. I am one who is skeptical of the concept of a black box, and have seen some brilliant minds work through the math of concepts like forward and back propagation to at least throw some glo-sticks into the dark and shed light on some of the motions we know to be happening during the process.
Machine learning is very simply put the art of prediction. Not prediction in the mathematical sense, but prediction in terms of finding patterns and projecting the most likely patterns that future trends might follow considering what has already happened in the past.
Let’s be clear about some definitions and words that oftentimes get confused (even by myself) in conversations about supervised machine learning (or machine learning at all!)
Algorithms: these are the mathematical/statistical functions or string of functions that are exposed to training data.
Model training is the process of exposing data examples (also known as training data, which include both features and labels) to those algorithms that then produce a model. This model gradually learns relationships between features and labels by seeing them a bagillion times.
Models therefore map data examples to predicted labels. Those predicted labels (in supervised learning, dictated by the human in the loop) are defined by learned internal parameters that are formed after the algorithms run over training data.
Inference is the process of applying the trained model to unlabeled examples of data (aka magic).
So what did all of that mean? Here is your talk track:
Algorithms are any one of these:
Now run those mathematical functions over your dataset (labeled or unlabeled). The resulting mappings of math over data = your resulting model, which you will deploy to perform inference on data that the model has not seen before.
At this point, two different types of machine learning technologies should be delineated:
Supervised vs. Unsupervised machine learning
What is supervised machine learning?
Think of supervised machine learning as the ML approach or process that has heavy human involvement and control. You teach the machine what is right and wrong, the machine trains for a while, you have to help the machine find the algorithms that train best on your data, then the machine predicts right and wrong on data it has never seen before.
What is unsupervised machine learning?
Wikipedia defines unsupervised machine learning as the following:
“Unsupervised machine learning is the machine learning task of inferring a function that describes the structure of “unlabeled” data (i.e. data that has not been classified or categorized). Since the examples given to the learning algorithm are unlabeled, there is no straightforward way to evaluate the accuracy of the structure that is produced by the algorithm.”
Okay. So think of it this way: I do not tell a mathematical function what is right or wrong. I throw data at said mathematical function. Mathematical function will then seek patterns and relationships amongst data as it’s function dictates it should, and come out with a guesstimated answer on what it thinks is going on in the data.
Techniques such as clustering and density analysis oftentimes find trends and patterns in data that might have been otherwise unnoticed. Just like the supervised ML training process, this requires experimentation to see exactly which unsupervised algorithms work best for your dataset and problem.
Unsupervised learning is oftentimes the technology I hear the most referred to in my circles. “Can’t you just throw data at an algorithm and let it figure it out?” Well, the answer is yes AND it depends. Sometimes solving your problem with machine learning will involve both supervised and unsupervised methods depending on the best way to break out your tasks. It also greatly depends on who you have around to actually see the project through…
A quick deviation: how can my data reveal patterns, trends and predictions to me that I might have not known otherwise through manual/traditional methods of analysis?
Keep in mind that the best way to get machine learning and even AI as a field more broadly working for you quickly (and by you I mean all of the minions of the world who are not DeepMind, Brain, Google writ-large, etc) is:
Start off with a defined goal or answer you are trying to achieve.
This will get you started working with your data, getting to know it better, understanding it, and experimenting with running algorithms (supervised and unsupervised) over it. Another thing a wise data scientist once told me:
You start off with a goal in mind for building a machine learning suite of tools. Machine learning by nature forces you to become more intimate with your data and most importantly explore it. In starting that process of exploring your data, you suddenly might come upon insights you had never seen before… and suddenly, you have another goal to add onto you list of goals where machine learning can help answer your questions.
Okay, so how do I actually do the thing you call training a model so magic comes out at the end?
Here are six basic operational steps that generally must be accomplished to make machine learning actually work for you:
1. Define Objectives.
These objectives must be narrow in scope, think task streams that can be automated. This is one of the most important steps that can be done, where you will need your team to work together the most to define what processes slow them down the most, and where help is most needed at a task.
2. Collect data.
So I’ll just quote it and get it out of the way so that we can move on:
Garbage data in. Garbage models out. Garbage predictions. Useless machine learning.
Yes, I know it seems basic. But if your data is in questionable shape (eg. incomplete, not enough of it, not cleansed, not uniform, not known) your model will be set up for failure. Think about training a toddler to learn French in an isolated jail cell for its childhood, then unleashing it into the world where everyone speaks English. The child will only know what he/she (poor thing, you can tell I’m not a parent) has been exposed to.
How you go about collecting your data (or amassing data from old data stores) is crucial to success. The data you use for training must look (as much as possible) as close to what the model will see in production as possible. Your data also must include “bad” and “good” examples. If you are training a model to detect if a ship is present in a satellite imagery shot, you must include images with no ships at all, and images with things that are present in the image but are not ships. You must include images with good resolution, mediocre resolution, and bad resolution. Imagery of ships during the day and at nighttime. You get the idea.
3. Understand and prepare the data.
This goes back to my previous point, but let me make this remark first to manage your expectations:
Understanding and preparing data for machine learning training is 80% of the battle. It will take the longest, and if done wrong, will completely set you up for complete failure.
So my point being: this stage is important. Understanding your data means your data scientists can talk for about an hour continuously about the data to give you everything you need to know without staring at the data yourself for hours on end. Where does it fall short? What types of integrity issues might you run into? Where do null or zero values pop up when they shouldn’t? What fields are corrupted? How many dimensions and features? Etc.
Preparing the data: think of our basic ETL (or ELT, if you’re using special Google Cloud Platform products). This involves extracting data from its source, transforming the data into a clean format that is ready for labeling, and loading it onto the platform where training will be performed on the data (whether that is across software platforms or on-premises to cloud).
4. Create and evaluate the model.
This part, while tedious and time consuming, might not be as gut wrenching as the last step (note I said might…). At this point, you have cleansed your data and have decided to do the right thing and give your ML pet project as much time as you need to to experiment with… all the algorithm combinations possible! Just kidding, but close. Generally, you want to try to train a model on your data with as many different types of algorithms (or already trained models using transfer learning) as possible. There are a million ways to perform training ( reinforcement training, transfer learning, or ground-up model training with the confusing algorithm map I featured previously).
The most important thing to know is that you should go into this phase and process knowing you will fail, and knowing that you will have to try a multitude of different models to determine where you get the best initial accuracy.
Once the models are created, evaluating the training process is a meta-process in of itself that has its own field of research attached to it. Elements to think about here are things like: how do I visualize my model training process so I can observe where I might have done something like over-trained my model to my data? Tools such as Tensorboard are aimed to help machine learning developers with this process.
5. Refine the model.
An algorithm that is training on a dataset (and is in the process of becoming an adult model, but is still in fetus-stage, if you will) comes with anywhere from 10 to 50 to 100 to hundreds of knobs and dials that can be adjusted during the training process, also called hyperparameters. The process of refining the model can also be referred to as hyperparameter tuning. An example of this is determining how many hidden layers to add to a neural network, or maybe how many training convolutions you will iterate your algorithm over the training dataset to produce a model (think of convolutions as push-up repetitions. You don’t want to do too many days in a row or else you will get sore, but you also don’t want to do too little or you will never improve. The same type of logic applies to how many times you run an algorithm over your training dataset, in a very rough high level way of explaining things).
With hundreds of potential hyperparamters to tune, this process can (as you can imagine) get very arduous very quickly. It will be important to be clear with your development team about accuracy thresholds and expectations and limitations such as compute power and time. Ultimately the timeline a program manager would want to be mindful of here would not be unlike that of a development timeline to a MVP- enough time to actually experiment, but controlled so that your engineers do not go down rabbit holes of hyperparameter tuning (then they stop showering, eating, etc… it can get messy).
6. Serve and monitor the model.
Last! But not least… no, it does not end with a “and then they lived happily ever after” ending here. Actually, this is where the fun just gets started…
A common misconception is that once the model training is done, the hard work is over. In fact, launching the model into production against your high velocity real time datasets is where everything… breaks. Then you have to go back, re-evaluate how your training data compares to data seen in production… Think of it as launching a brand new software product into production for the first time. It will break. Badly.
So keep this in mind when your model (or more likely, suite of models) launch into production. Things like workflow of how the models interact/are called on for inference in your production widget will need to be fine tuned. Once that is taken care of, then you are looking at long term model maintenance… which in of itself (like so many other aspects of machine learning!) is a meta-field of research and exploration in the machine learning field. Think of things like models developing bias- you will have to closely monitor model performance to ensure that it is not overcompensating in the training process due to the streaks of types of data it might be seeing. Also consider tasks such as model versioning, model re-training once enough new data has been seen, and different model serving and inference demands your users might have if your product becomes really, really popular.
And there… you have it! Look how much it takes to get smart on a ten minute conversation. Now imagine becoming a SME in the field in its entirety… it’s a life long of learning, re- learning, more learning, and applying what you learn along the way…