Image may be NSFW.
Clik here to view.
We’re interested in whether some proposition Y is true. One explanation, perhaps causal or deterministic, or even only probabilistic, is X. Thus, Pr(Y | X) is high or equals 1.
A second explanation, again causal, deterministic, or only probabilistic, is the joint proposition W = W1 & W2 & … & Wm. Again, Pr(Y | W) is high, maybe even higher than Pr(Y | X), or equals 1.
X is simple, in the sense that the proposition is brief and grammatically of less content than W, which is long and grammatically complex.
As a rule of thumb, Occam’s razor says to prefer X as the better explanation of Y because X is simpler than W. Why does this work? Or rather, does it always work?
Suppose Y = “The table card is an Ace of Clubs.” A fellow shows you three blank cards and one Ace of Clubs. He shuffles them around, showing them to you at various points, lays one down on a table and asks you to identify the card as the Ace of Clubs, which is somewhat objectionable, because the way he has been handling the cards made it appear for certain that the card would be one of those still in the fellow’s hand.
Let X = “Magic”, which is the fellow’s causal explanation of Y. Then Pr (Y | X) = 1.
Now let W = “The fellow first holds the cards in such a way as to disguise their identity, pretending there are three blanks and one Ace, but where there are really three Aces and one blank, and let the delicate handling be such that the thumb of the right hand blocks the Ace mark on the bottom card while allowing you what appears to be a surreptitious peek, building your confidence you know where the so-called long Ace is, and let etc., etc., etc.”
W is very complex, takes much practice, and is most difficult to explain (a good 800 words is necessary). Nevertheless, Pr (Y | W) = 1. Since X is much simpler than W, via Occam’s razor, X is the preferred explanation.
Well, this is absurd, because W is the true explanation (when I do the trick). Occam’s razor has failed. But Occam’s razor was not meant to be more than a rule of thumb. It was never meant to be taken as an authoritative argument.
Occam’s razor starts with this premise: of all the times complex and simple explanations were put forth for a proposition, more of the simple than complex turned out to be true. This start is true upon common observation: nobody disagrees with it. Here’s the finish of Occam: X is a simpler explanation than W, therefore it’s more likely that X is true and W false. That conclusion also follows from the first premise. The conclusion would not follow if it were re-written like this: X is a simpler explanation than W, therefore X is true and W false. This is a fallacy. The “more likely” qualification is what make Occam work.
Why does it work; that is, why is the first premise true? We want to know about Y and have X and W in hand. We all come equipped with more information than just Y, X & W. We also know Z, which itself is a very complex proposition about the way the world works.
In order to believe X, we have to have Pr (X | Z) be high or 1; again, in order to believe W, Pr (W | Z) should be high or 1. Now X is simple, so all we have is Pr (X | Z). But W is complex, and as a rough guide, the following equation approximately holds:
Pr(W|Z) = Pr (W1 & W2 & … & Wm | Z) ~=~ Pr(W1|Z) x Pr( W2|Z) x … x Pr(Wm | Z).
All those probabilities on the right hand side are numbers equal to 1 or less, and when they’re multiplied the result is a number much less than 1—usually. Thus—usually Pr(X|Z) > Pr(W|Z).
Of course, in our example, and using my background knowledge Z Pr(W1|Z) = 1, Pr( W2|Z) = 1, …, Pr(Wm | Z) = 1, so Pr(W|Z) = 1; also, Pr(X|Z) = epsilon (which is any number near 0 but not equal to it; I cannot prove magic wasn’t used). For you, Z is probably something like Z’ = “I know this guy is tricking me but I don’t know how, and there is no way he is using actual magic.” Again, Pr(W|Z’) = 1 (or near enough) and Pr (X|Z’) = 0 (or near enough).
That’s it. That’s all Occam’s razor is. Nothing more than common sense boiled down.