There Is No Problem Of Old Evidence In Bayesian Probability

Rationalists, like those at Less Wrong (think Eliezer Yudkowsky and Scott Alexander), are prone to fetishsize Bayes theorem, seeing it as the key to all thought. It isn’t. Bayes is a helpful tool, and no more, and like all tools, not always needed. But because of the perceived importance of Bayes, people think they have discovered flaws in it. These are almost always based on simple mistakes, which can go decades without anybody noticing. As in the so-called problem of old evidence.

Here’s what one prominent author (Colin Howson) thinks is the “problem” of old evidence: Can a hypothesis h be confirmed by evidence e if the evidence is old and already known?

The answer will seem to be obvious, and yes. Howson and others say no. That “no” and the “problem” arises when people write things like this (as in the link):

Pr(h|e) = [ Pr(e|h)Pr(h) ] / Pr(e).

That might look to you like Bayes theorem, favorite of “rationalists” everywhere, but it is not. It is missing something. The missing parts are what cause the “problem.”

Howson, and many like him, says (modifying his notation so that it’s consistent with mine): “This [existence of background knowledge] has the following unpleasant consequence, however. If e is known at the time h is proposed, then e is in [the background knowledge] and so Pr(e)=Pr(e|h)= 1, giving, Pr(h|e) = Pr(h); which means that e gives no support to h.”

Before reading further, and recalling the hint about something missing, see if you can spot the flaw in this thinking.

Don’t cheat. Think.

The answer is this: There is no such thing as “Pr(h)” or “Pr(e)”. While “Pr(h|e)” and “Pr(h|e)” are fine, as such, they are incomplete in the face of the first two elements.

There is no such thing as unconditional probability: all probability is conditional. Every probability everywhere needs premises, conditions, assumptions, some evidence upon which to pass the judgement. That means “Pr(e)” is impossible. No such creature exists.

We can write, perhaps, Pr(h|K), which is the probability of h given some background knowledge K (the K is from Howson). We could also—and here comes the trouble—write Pr(e|K).

That’s fine as it stands, and it could be as Howson suggests that Pr(e|K) = 1. But that only happens when K includes the premise (or proposition, or assumption, or whatever you want to call it), “e has been observed.” That makes K = “‘bunch of other premises related to h’ & ‘e has been observed’.”

With that K, then indeed Pr(e|K) = 1. (Make sure you see this.)

Let’s rewrite the equation above properly, using this K (two letters put together mean logical “and”, so that “eK” means “e and K”):

Pr(h|eK) = [ Pr(e|hK)Pr(h|K) ] / Pr(e|K).

We have Pr(e|K) = 1, since K says e was observed, which obviously makes the probability of e equal to 1, given e was observed. Of course it does! Adding the h, unless that h says “e is impossible” or something like that, gives Pr(e|hK) = Pr(e|K) = 1. But since logically eK = K, then Pr(h|eK) = Pr(h|K). The math works! Both sides are Pr(h|K).

And so it seems e says nothing about h. But that’s not how evidence works.

What happens with evidence in real life is this. We do indeed start with some background knowledge, or surmises, etc. about h. Call that B. B says nothing about e already having been observed. It says stuff about h. We then write:

Pr(h|eB) = [ Pr(e|hB)Pr(h|B) ] / Pr(e|B).

No change, except from K to B. Let’s look at each piece.

Pr(e|hB) is the probability that e can be observed given h is true and B (which are our assumptions). This is so even if e never is observed! Even if e remains a thought experiment. Don’t read more until you grasp this.

Since B is silent on e having been observed (and ignoring “degenerate situations” like hB = “e is impossible”), then 0 < Pr(e|hB) < 1. Pr(h|B) is our "prior", given by our background information. Again (and still ignoring degenerate scenarios like B = 'h is impossible') 0 < Pr(h|B) < 1. Pr(e|B) is the probability e could be true given B, but it says nothing directly about h. We could always "expand" Pr(e|B) like this (using "total probability"): Pr(e|B) = Pr(e|hB)Pr(h|B) + Pr(e|not-hB)Pr(not-h|B). The first term on the right we already did. The second is similar, and where "not-h" is the logical contrary of whatever h is¹. We could find Pr(e|not-hB), the probability e is true given h is false and B, and recalling Pr(h|B) + Pr(not-h|B) = 1 (this works for every h!).

So as long as

[ Pr(e|hB) / Pr(e|B) ] > 1,

which is to say, as long as the evidence e is more probable under hB than under B alone, then e supports or confirms h. Even if nobody in the world ever observes e! You must get this.

If [ Pr(e|hB) / Pr(e|B) ] < 1, then e disconfirms h. If [ Pr(e|hB) / Pr(e|B) ] = 1, then knowledge of e is irrelevant to h.

That’s it. The simple solution to the “problem”. It does not matter when e is observed, or even if it is observed. It could be ancient wisdom—like apples fall onto heads and do not soar into the air. And h is “gravity attracts”. Or it could be entirely novel.

It only matters whether e is already part of h, as in the “problem” which uses K, or that it is considered on its own, as with B.

There has been a lot of ink spilled on this “problem”, all of it because of bad notation. Notation that become popular because it was forgotten all probability is conditional. Change the conditions, change the probability.

¹ h is a complex proposition, usually, of the form P_1 & P_2 & … & P_q, where each P_i is some proposition; thus not-h is not-“P_1 & P_2 & … & P_q”. Only one of the P_i need be false for not-h to be true. Failure to understand this leads to much confusion about what models and theories are.

This is not the first time we tackled this subject; however, the first article was put in obscure terms in answer to a technical question, and the point was lost.

Subscribe or donate to support this site and its wholly independent host using credit card click here. For Zelle, use my email: matt@wmbriggs.com, and please include yours so I know who to thank.

There Is No Problem Of Old Evidence In Bayesian Probability

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112