Quantcast
Channel: Philosophy – William M. Briggs
Viewing all articles
Browse latest Browse all 529

Every Result Of Unsupervised Learning Is Correct; Or, All Learning Is Supervised

$
0
0

The real point I wish to make is that there is no such thing as unsupervised learning; or, stated another way, Truth exists; or, stated another way, every solution to an unsupervised learning problem is conditionally correct.

To explain. In machine learning, artificial intelligence, and statistics, too, there are, it is said, two regimes: supervised and unsupervised learning. Let’s state these in terms of classification, since that’s the simplest and always discrete, a form of problem which regular readers will know I love.

In so-called supervised learning, a set of data is observed which have specified classes (more than one, of course). These classes are known, A, B, C, and so on. Male and female are good example. Days of week another. And so on endlessly.

Not only are the classes known, but other measures thought probative of the classes are also measured—these additional measures are a requirement. If all—as in all—we had were labels, we are done. Given that warning, if we’re interested in M/F, then perhaps we have “calories consumed today” and “weight bench pressed”, or whatever.

A model then relates the probative measures to the classes, and in the end we form the prediction

     Pr( class = i | old data, new probative measures, M),

where M are the assumptions that led to our model. Notice we have “integrated out” any parameters of M, since they are of no interest to man nor beast. This form is so familiar that nothing more need be said about it. Except that I dislike the term “learning” applied to it. It doesn’t matter if M is a standard statistical model, or machine learning algorithm, or neural net, or whatever. It’s just a model—unless are in the rare situation where we know the cause of the class given the new probative measure, or have otherwise deduced M from known or assumed true premises.

What about unsupervised learning? All we have are some measures and no classes. For instance, somebody supplies us with a spreadsheet having just “calories consumed today” and “weight bench pressed”. Obviously—and most importantly—these are labels we assign, that we supply meaning to. This must not be forgotten. To the computer which reads these numbers, they are just numbers; and they’re not even numbers, just electrical impulses.

Now we might think to ourselves, “Say, I wonder if these two measures belong directly or indirectly, together or perhaps separately, to different classes? All I have are the measures and no indication of class. How many classes might there be? If there are two or three or whatever, what are their characteristics? It might be that there are k classes, and that it is true that these two measures vary by some set way inside each class. Indeed, if the measures do not vary in some set way between two or more classes, I will not be able to tell these classes apart.”

And so we come to an algorithm that identifies classes. There are many such algorithms. K-means clustering is one of the most popular, ably and succinctly described here (I’ll assume everybody reads this, or already knows about these algorithms).

We start with the assumption this algorithm will find the clusters, or classes, that are there. And we end with that assumption, too!

Put it another way. We started by assuming the algorithm will do its job. We run it and it does its job. Therefore, there algorithm has done its job. Assuming no mechanical or electrical failures in the computations, the algorithm has done what it promised to do. How could it not?

The clusters/classes the algorithm finds are correct—conditional on assuming the algorithm. It’s exactly the same situation as supervised learning. The probability above spits out correct probabilities conditional on the model. Unsupervised learning spits out correct classifications conditional on the model/algorithm.

So why, then, do people look at the output of unsupervised learning algorithms, and (sometimes) say, “The number of clusters/classes is too large. The algorithm is over-fitting,” or “I think these classes are right in number, but they don’t look right”? Why do they express any dissatisfaction, since the algorithm always does what it was designed to do?

Because, as should now be obvious, these folks are using higher-order criteria to judge the algorithms. Meaning the “learning” was supervised after all, but that the supervision wasn’t done inside the computer. Which proves the boundaries of the computing machine/algorithm are artificial, if you like. Or, since part of the algorithm is sans mathematics, not all aspects of the algorithm are quantified or quantifiable, which is also not surprising to long-time readers.

It turns out the algorithms are always set up in such a way that there was prior knowledge of the targets; i.e., supervision. (And given many of these algorithms are used in computer vision applications, the pun is apt.) The picture which heads this post (taken from the clustering/k-means link above) proves the point. Supervision is always there. And it’s always there because we’re always looking toward what we either know or suspect is true.

Technical Notes: All these clustering algorithms are based on two notions: the number of clusters/classes and the concept of variation within and between classes. Something has to guess how many, using some prior criterion, and something has to say what it means to vary and how that variation is measured, also pre-specified. Changing these two notions gives a different algorithm.

There is no such thing as random, and so pieces of algorithms that are said to do this or that “randomly” are always deterministic after all, but with an eye closed to the determinism.

Of course, since the “learning” (the high falutin’ term for estimating parameters in a model) is always supervised, and the problems to which these models are put are important (automatic classification, say), finding better algorithms is just the right thing to do. Obviously, the better we are at knowing the cause or measuring the determinants of classes, the better these algorithms will be. So our ultimate goal, just as in statistical modeling, is always the same: understanding cause.


Viewing all articles
Browse latest Browse all 529

Trending Articles