Update I often do a poor job setting the scene. Today we have the solution to an age-old problem (get it? get it?), a “problem” thought to be a reason not to adopt (certain aspects of) Bayesian theory or logical probability. I sometimes think solutions are easier to accept if they are at least as difficult as the supposed problems.
I was asked to comment by Bill Raynor on Deborah Mayo’s article “The Conversion of Subjective Bayesian, Colin Howson, & the problem of old evidence“.
Howson is Howson of Howson & Urbach, an influential book that showed the errors of frequentism, but then introduced a few new ones due to subjectivity. We’ve talked time and again on the impossibility that probability is subjective (where probability depends on how many scoops of ice cream the scientist had before taking measurements), but we’ve never yet tackled the so-called problem of old evidence. There isn’t one.
Though there is no problem of evidence, old or new, there are plenty of problems with misleading notation. All of this is in Uncertainty.
The biggest error, found everywhere is probability, is to only partially write down the evidence one has for a proposition, and then that information “float”, so that the one falls prey to equivocation.
Mayo:
Consider Jay Kadane, a well-known subjective Bayesian statistician. According to Kadane, the probability statement: Pr(d(X) >= 1.96) = .025
“is a statement about d(X) before it is observed. After it is observed, the event {d(X) >= 1.96} either happened or did not happen and hence has probability either one or zero” (2011, p. 439).
Knowing d0= 1.96, (the specific value of the test statistic d(X)), Kadane is saying, there’s no more uncertainty about it.* But would he really give it probability 1? If the probability of the data x is 1, Glymour argues, then Pr(x|H) also is 1, but then Pr(H|x) = Pr(H)Pr(x|H)/Pr(x) = Pr(H), so there is no boost in probability for a hypothesis or model arrived at after x. So does that mean known data doesn’t supply evidence for H? (Known data are sometimes said to violate temporal novelty: data are temporally novel only if the hypothesis or claim of interest came first.) If it’s got probability 1, this seems to be blocked. That’s the old evidence problem. Subjective Bayesianism is faced with the old evidence problem if known evidence has probability 1, or so the argument goes.
Regular readers (or those who have understood Uncertainty) will see the problem. For those who have not yet read that fine, award-eligible book, here is the explanation.
To write “Pr(d(X) > 1.96)” is to make a mistake. The proposition “d(X) > 1.96” has no probability. Nothing has a probability. Just like all logical argument require premises, so do all probabilities. They are here missing, and they are later supplied in different ways and equivocation occurs. In this case deadly equivocation.
We need a right hand side. We might write
(1) Pr(d(X) > 1.96 | H),
where H is some compound, complex proposition that supplies information about the observable d(X), and what the (here anyway) ad hoc probability model for d(X) is. If this model allows quantification, we can calculate a value for (1). Unless that model insists “d(X) > 1.96” is impossible or certain, the probability will be non-extreme (i.e. not 0 or 1).
Suppose we actually observe some d(X_o) (o-for-observed). We can calculate
(2) Pr(d(X) > d(X_o) | H)
and unless d(X_o) is impossible or certain, then again we’ll calculate some non-extreme number. (2) is almost identical with (1) but with a possibly different number than 1.96 for d(X_o). The following equation is not the same:
(3) Pr( 1.96 >= 1.96 | H),
which indeed has a probability of 1.
Of course! “I observed what I observed” is a tautology where knowledge of H is irrelevant. The problem comes in there to put the actual observation, of the right or left hand side.
Take the standard evidence of a coin flip C = “Two-sided object which when flipped by show one of h or t”, then Pr(h | C) = 1/2. One would not say because one just observed a tail on an actual flip that, suddenly, Pr(h | C) = 0. Pr(h | C) = 1/2 because that 1/2 is deduced from C about h. (h is the proposition “An h will be observed”).
Pr(I saw an h | I saw an h & C) = 1, and Pr(A new h | I saw an h & C) = 1/2. It is not different from 1/2 because C says nothing about how to add evidence of new flips.
Suppose for ease d() is “multiply by 1” and H says X follows a standard normal (ad hoc is ad hoc, so why not?). Then
(4) Pr(X > 1.96 | H) = 0.025.
If an X of (say) 0.37 is observed, then what does (4) equal? The same. But this is not (4):
(5) Pr(0.37 > 1.96 | H) = 0,
but because of the assumption H includes, as it always does, tacit and implicit knowledge of math and grammar.
Or we might try this:
(6) Pr(X > 1.96 | I saw an old X = 0.37 & H) = 0.025.
The answer is also the same because H like C says nothing about how to take old X and modify the model of X.
Now there are problems in this equation, too:
(7) Pr(H|x) = Pr(H)Pr(x|H)/Pr(x) = Pr(H).
There is no such thing as “Pr(x)” nor does “Pr(H)” exist and we already seen it is false that “Pr(x|H) = 1”.
Remember: Nothing has a probability! Probability does not exist. Probability, like logic, is a measure of a proposition of interest with respect to premises. If there are no premises, there is no logic and no probability.
Better notation is:
(8) Pr(H|xME) = Pr(x|HME)Pr(H|ME)/Pr(x|ME),
where M is a proposition specifying information about the ad hoc parameterized probability model, H is usually a proposition saying something about one or more of the parameters of M, but it could also be a statement about the observable itself, and x is a proposition about some observable number. And E is a compound proposition that includes assumptions about all the obvious things.
There is no sense that Pr(x|HME) nor Pr(x|ME) equals 1 (unless we can deduce that via H or ME) before or after any observation. To say so if to swap in an incorrect probability formulation, like in (5) above.
There is therefore no old evidence problem. There are many self-created problems, though, due to incorrect bookkeeping and faulty notation.