5 minutes of Machine Learning: Basic Definitions [Day 2]

For those who are starting here… Consume these articles (Titled “5 minutes of Machine Learning”) for the bottom line up front takeaways I would like to share as I take Google’s Machine Learning Crash Course.

Now, let’s start with some definitions that you probably knew OF, but didn’t know about in a mathematical context:

Labels: labels are the targets we are trying to predict, also the labels you use for… well, labeling data. This is expressed as (y).

Labeling data with LabelImg

Features: features are input variables that describe our data. The first dimension is typically expressed as x, with the subscript numbering [1, 2, etc…] refer to the different values that go into that dimension.

Example: Lets take dimensions of what makes spam email really annoying. Here are your dimensions:

  • words in the email text
  • sender’s address
  • time of day the email was sent
  • email contains the phrase “one weird trick.”

A labeled example: one piece of data with a feature and accompanying label. Mathematically, in the format of: {features, label}: (x, y). Labeled examples are used to train the actual model.

On the other hand, unlabeled examples are used to make predictions on new data, in the format of {features, label}: (x, ?)

A model defines the relationship between features and label.

Model training is when you show the model labeled examples and enable the model to gradually learn the relationships between features and label.

Inference occurs when you expose the trained model to unlabeled examples.

REGRESSION vs. CLASSIFICATION

Initially… the two terms had me baffled. Classification… like object classification? Yes, Amina.

Regression and classification are two different approaches to prediction and/or inference.

Regression [simplified] predicts continuous values and answers questions like: What is the value of farmland in Iowa? Better yet, what will it be in x years? When should I finally buy that cattle farm I have always wanted?

Staying hopeful… but will stick to the facts with my machine learning based Iowa farmland genie.

Classification [simplified] predicts discrete values. If you have read my series on building an object classification model, this is the part where you ask the model: is it a car? truck? semi? van? See! Discrete values. Classification models typically learn to distinguish between two or more discrete classes.

So what you’re telling me is… my pug, Zelda, is just a blob because her features are hard to detect… gotcha. Data from the Oxford -IIIT pet dataset.

That is all for definitions today. Stay tuned for the next article on line fitting.