If you work in tech or are even thinking about it, you’ve probably come across the term “machine learning”. A quick search on Indeed.com shows that at the time of this writing, there are over 300 active jobs postings containing the term “Machine Learning” in Vancouver alone, and the estimated median salary for these jobs is over $110,000 per year. Google Trends shows that the popularity of the search term “Machine Learning” has grown by about 400% in the last three years. Clearly, there is a high demand for people with knowledge of machine learning–but what exactly is it? In this post, I’ll describe what exactly is meant by the term “machine learning”, and explain why it seems to have such importance to modern businesses.

## A conceptual definition of Machine Learning

There is a famous definition by the computer scientist Tom M Mitchell which has often been used and which I will adapt here. We say that a computer program is *learning *how to perform some task if it gets better at performing the task as it accumulates *experience*. So computer programs that learn in this way are said to fall under the umbrella of machine learning.

It might help to understand this definition by considering programs that do *not* satisfy the definition. An important problem in some areas of math and computer science is the problem of finding the prime factors of a number. If you input the number 1081 into a program that solves this problem, it would return as output the numbers 23 and 47 because 1081 = 23 * 47. It’s not too hard to figure out how to write a simple program that solves this problem–an obvious one simply tries dividing 1081 by all the numbers smaller than 1081. This program accomplishes the task, but it does not *learn*: it does not get better at factoring numbers with experience. You can run the program on a million inputs and it will never factor the number 1081 any faster or better than it did the first time you ran it.

Now let’s think about a different kind of task: recognizing handwriting. Suppose we wish to write a program which takes as its input an image of a handwritten digit and gives as an output a digit from 0 to 9. Coming up with a simple solution to this problem is not as straightforward as the integer factoring problem above, but we might be able to come up with some ideas. You might write a program that says something like:

- If the image is an oval, return 0.
- If the image is a vertical line, return 1.
- If the image is two circles on top of each other, return 8.
- …

Maybe it’s obvious why this approach is doomed to fail. Different people write differently. It is not remotely feasible to capture every acceptable variation in even a single digit by these kinds of rules, let alone all 10 digits. But anyway, even if you could, this type of a solution would not count as machine learning: the rules don’t change or adapt as experience grows.

The machine learning approach attacks the problem in a completely different way. Rather than trying to impose rules from the start, a machine learning program seeks to *discover *the rules by looking at examples. In a machine learning solution, instead of trying to come up with rules, we try to come up with data. We gather as many pre-labelled images of digits as we can into what’s called a *training set*, which is used to literally train the computer program. We take all the images that we have of 1’s, show them to the computer, and tell it that they’re 1’s. And then we do the same with the images of 2’s, and so on. For each digit, the computer tries to figure out on its own what that digit’s images have in common.

I’m hand-waving over the details a little bit, but you can see how this approach would tend to improve with experience in accordance with our working definition of machine learning. Some people cross their 7’s. If the set of images that I start with does not contain any crossed 7’s, my resulting program might not be able to recognize that a crossed 7 should be labelled as a 7. But as I increase the number of examples that it has to look at, eventually it will end up with some crossed 7’s, and will learn that sometimes 7’s are crossed. And similarly with any other common variation that might occur.

It turns out that cleverly designed machine learning programs can become incredibly good at this kind of task. A common introductory project for learning how to do machine learning is to perform exactly this task on a well known dataset of images called the MNIST database. Very simple machine learning algorithms can learn to classify these images correctly with better than 90% accuracy, and researchers have used more advanced machine learning tools to achieve better than 99.7% accuracy.

## When does Machine Learning work?

Why is the handwriting recognition problem well-suited to a machine learning solution while the integer factoring problem is not? There are a few key differences.

One is in the complexity of the rules governing the relationship between input and output. The integer factoring problem is very difficult in a certain technical sense, but the relationship between the input and output of the factoring problem is very straightforward: if the numbers output by the program are prime and multiply together to give the input, then you’ve got the right answer. The rules that link images of handwriting to the digits that they represent are much more complex and fuzzy and difficult to capture.

A related difference is that in the factoring problem, we are looking for an exact solution, whereas in the handwriting recognition problem, we are satisfied with a very good approximate solution. In fact, an exact solution to the handwriting recognition problem would not be feasible even in theory. Some 3’s look like 5’s and some 4’s look like 9’s and the only way to tell for sure what the correct label is would be to ask the person who wrote down the digit in the first place. All we can reasonably expect out of a solution to the handwriting recognition is that it is right *most *of the time.

Finally, it seems that handwriting recognition is inherently a *statistical* or *probabilistic *task. As humans, we don’t actually ever know with certainty whether we’re looking at a 9 or a 4. We think that a digit is *probably *a 9 because it looks more like 9’s we’ve seen in the past than 4’s we’ve seen in the past. Most of the time we have a lot of certainty about our guess, but we are still making a guess. We shouldn’t expect the computer to be able to do any better than that either.

## Why Machine Learning works for business problems

Machine learning is well suited to problems that have the characteristics of the handwriting recognition problem–that is, problems which are highly complex, problems where approximate solutions will suffice, and problems that are inherently statistical or probabilistic. Businesses are increasingly discovering that many of their problems have these traits. For example, consider the problem of flagging fraudulent credit card transactions.

**Complexity:**The rules that identify fraudulent credit card transactions are complex and ever changing.**Approximations suffice:**We are*flagging*transactions for further review, so it is alright if the program is wrong sometimes.**Solutions are probabilistic:**We are never certain that a transaction is fraudulent until we verify by contacting the customer.

And what do we need to implement a machine learning solution to a business problem like this? Data–a commodity that modern businesses have in high supply. For these reasons, businesses are discovering that the tools from machine learning fit quite naturally in with their activities and objectives, which is why we are seeing such a dramatic rise in the application of machine learning tools and technologies in the business world.

Machine learning as a field has matured quite a bit since researchers started considering these ideas in about the 1960s, and nowadays there are basic tools and ideas that are understood to be fundamental to machine learning. Beyond just an abstract conceptual definition, there is a language that anyone getting started in machine learning should understand. In my next post on machine learning, I will take a closer look at some of these tools and ideas and go over the language of modern machine learning in a little bit more detail.

**Interested in learning more about the science behind machine learning?** Take a look at our upcoming Data Science course.