5 Minutes of Machine Learning: Loss, Line Fitting, and Model Performance [Day 3]
Enough time has elapsed that your machine learning hangover from the last post should be over now…
But here we go.. We will next discuss the concept of loss, what it means when you are training an ML model, and how to evaluate prediction rates. All with pretty pictures yours truly will draw.
Let’s start with Linear Algebra.
Yes, it is still useful, and if you’re a weirdo like myself and found linear algebra soothing (solving for X is the best busywork that makes you feel smart while also entering zen mode). Let’s remind you of a familiar equation below:
y = mx +b
Cool. Let’s review what each of these variables mean in this equation as they relate to machine learning:
y= value you want to predict.
m = slope (also known as weight, more on this in a jiffy)
x = value of the input feature
b= y- intercept
Because machine learning is special, we actually change this equation slightly to fit our needs (haha… fit.. get it? No..?)
y’ = predicted label (desired output)
b= bias/ y-intercept
w= weight/slope of feature 1
x = feature (known input)
Wait. There is a subscript
So now that we are talking about such equations, the graph below should not look too startling:
Now, let’s add some context to it:
Cool. Let’s add our special machine learning lingo to create even more context:
Okay, so the line is your model, think of it that way. That line should, or you want it to, hit as many data points (the small x’s) as possible. What if it doesn’t? What if those data points are just a little bit off?
Loss can be defined as how well the line is doing at prediction. The small, thick black arrows below (Sorry, I only carry black pens…) indicates how loss is measured.
Without getting too into it… how do we mathematically define loss? Well, it’s the square difference between the prediction (line) and label (where the x is actually placed on the graph. Does the below look slightly familiar? If not, don’t worry:
The only thing you really need to understand at this point, is that loss is the distance between the x’s or data points on the graph and the line that represents the model or prediction. The goal in training a model is to minimize this loss as much as possible. Think of loss as a penalty for a bad prediction.
So how can loss be minimized? Stay tuned for the next post on how to reduce loss and increase accuracy in prediction.