Quantcast
Channel: Philosophy – William M. Briggs
Viewing all articles
Browse latest Browse all 529

How Not To Think Like A Bayesian Rationalist

$
0
0

Lisping Rationalists

When I read people like Eliezer Yudkowsky and Scott Alexander it becomes clear to me how the French could have built a Temple of Reason during The Terror. Constructed, you recall, to hold practicing members of the Culte de la Raison.

Alexander started a recent article with what I thought, and what any sane person would think, was a rhetorical question: “Does it matter if COVID was a lab leak?”

Why, yes, Mr Alexander. It matters a great deal. For if hubris-filled Dr Frankenstein’s are unbinding Prometheus in gain-of-lethality experiments and wreaking global death we ought to know about it.

Which is what I thought he’d say. But he said this instead:

A good Bayesian should start out believing there’s some medium chance of a lab leak pandemic per decade. Then, if COVID was/wasn’t a lab leak, they should make the appropriate small update based on one extra data point. It probably won’t change very much!

I did fake Bayesian math with some plausible numbers, and found that if I started out believing there was a 20% per decade chance of a lab leak pandemic, then if COVID was proven to be a lab leak, I should update to 27.5%, and if COVID was proven not to be a lab leak, I should stay around 19-20%.

But if you would freak out and ban dangerous virology research at a 27.5%-per-decade chance of it causing a pandemic per decade, you should probably still freak out at a 19-20%-per-decade chance. So it doesn’t matter very much whether COVID was a lab leak or not.

I don’t entirely accept this argument – I think whether or not it was a lab leak matters in order to convince stupid people, who don’t know how to use probabilities and don’t believe anything can go wrong until it’s gone wrong before. But in a world without stupid people, no, it wouldn’t matter. Or it would matter only a tiny amount. You’d start with some prior about how likely lab leaks were – maybe 20% of pandemics – and then make the appropriate tiny update for having one extra data point.

This is gibberish, but couched in a technical science language, which results in a form of scientism.

As if one data point is not enough! The residents of Nagasaki circa 1945 can tell you all about the sufficiency of a mere one additional data point.

Rationalists love Bayes’s Theorem. They have words about how everything is, or should be, a matter of BT, and that if it was, how a paradise in thinking would arrive. Here’s how the Rationalists at Less Wrong define themselves:

The rationalist movement, rationality community, rationalsphere or rationalistsphere represents a set of modes of bayesian thinking from self-described rationalists or ‘aspiring rationalists’ typically associated with the Less Wrong diaspora and their associated communities.

Diaspora. Good grief, how well they think of themselves. Never mind. Let’s see how they treat Bayes:

Bayesians conceive rationality as a technical codeword used by cognitive scientists to mean “rational”. Bayesian probability theory is the math of epistemic rationality, Bayesian decision theory is the math of instrumental rationality. Right up there with cognitive bias as an absolutely fundamental concept on Less Wrong.

Let’s see why this is more wrong. There’s going to be some math below, but I don’t want your eyes to glaze over. I made it as easy as possible. Plow through it to get to the main points. The point is that math is not the point.

We’ll do the math first and return to the lab leak at the bottom.

Bayes Theorem

Bayes’s Theorem is simplicity itself. That is, the dry, textbook version of it is. Which is near useless in most real-life examples.

You begin with some proposition H, called a hypothesis, and conjure some probative information about it, so that you can form your prior. Most rationalists write this incorrectly as “Pr(H)”, which implies the hypothesis just “has” a probability. Nothing has a probability. Write “Pr(H|E)” instead, where E is the long string of complex probative information you assume about H. Then Pr(H|E) is the probability H is true given E. An entirely objective calculation based on your subjective assumptions. This is your uncertainty in H assuming E.

Next, “data” is collected or assumed, even just one point. Call it “D”. Then, through some mathematical manipulation, all perfectly true, you get this:

Pr(H|DE) = Pr(D|HE) x Pr(H|E) / Pr(D|E).

Some things to note about this equation. Pr(H|E) we know. “Pr(D|HE)” is called the data model (some say likelihood), and it depends on the assumptions you bring just as much as Pr(H|E) does—with the additional assumption that H is true. The denominator is the data model not making any assumptions about H.

There are nice technicalities about how to work this out. Yet even if you don’t follow any of this, if I told you “Pr(H|E) = 0.2”, and “Pr(D|HE) = 0.7”, and that “Pr(D|E) = 0.18”, then anybody (well, some bodies) can calculate “Pr(H|DE) = 0.78”.

That “Pr(H|DE)” is called the posterior, the probability of H after seeing or assuming D, and assuming E. It has changed from Pr(H|E) = 0.2; it has increased, because, in this fictional example, D has a better chance if H is true than if H is not.

This is the kind of calculation Alexander did, using H = “Lab leak”, and, like me, making up the other numbers. It’s Stats 101 stuff. The math is right. And useless.

Well, of almost no use.

This is not how people think of uncertainty, nor should it be in general. It is not wrong. It not even less wrong. It’s that this math is answering the wrong question.

Here’s how people really think about uncertainty, and should.

If you asked anybody (or most bodies, none of whom were employed in gain-of-lethality research) in the fall of 2019, “What’s the chance of a lab leak of a deadly Expert-manufactured virus?” They’d probably say, “How would I know?” Which is the right answer. The rational answer. And the point at which Bayesian rationalists first go wrong, because they are powerless because they can’t yet use BT! Pause and ponder this, because this will turn out to be the final answer.

Now most bodies in 2019 hadn’t any idea what the words “Lab leak” even meant, let alone “deadly Expert-manufactured virus”. These are vague, fuzzy terms. They do not lend themselves to making quantitative judgments about uncertainty. Besides that, what assumptions were most bodies about lab leaks supposed to make, even presuming they knew what the words meant?

On the other hand, it was not impossible to form some kind of guess. You could have sat down with a body and tried to explain what all the terms meant, yet since (then) we hadn’t seen coronadoom, none of us (not involved in the research) would have had a precise idea of what everything meant. Still, if pressed, some bodies might have been able to go as far as saying things like this:

Pr(Some kind of lab leak thingee | Some genetics and lab practices I just heard about) = “small, I guess.”

If you further pressed this body what “small” meant, you could have squeezed a number out of him, but these squeezings, I mean the assumptions that went into them, all go on the right hand side of the equation (to the right of the “|”).

All right, let’s have our “one” data point enter the picture. Just what is it?

Alexander couldn’t be bothered to specify his, instead just assigning it a number. But our “one” data point is not so simple as that. And it’s not the same for everybody, unless they all knew the exact same things about the claimed leak.

Here’s the kicker: the definitions of “lab leak” and the nature of assumed evidence, not just in the new data point, changed, and changed radically from late 2019 to today.

It’s a little known secret that you do not need Bayes’s Theorem at all. Not to quantify uncertainty. Sure, it makes things easier in many cases, and the math is dandy fine. But it’s not necessary. Here’s a mathematical example for the exceptionally curious.

I’ll try and explain it here without the math.

Real Uncertainty

What we always want, BT or no, are answers to questions like this:

Pr(H|E) = ?

Where H is our hypothesis (any proposition), and E our assumptions, data, beliefs, models, everything. Inside E are also the definitions of what H even means. E is all the evidence we have in favor of, or against, H. Everything we’re willing, or can, consider, at any rate. E is a complex proposition. So can be H.

Unless it’s textbook homework, like above where everything is specified or obvious, no two people have the same E for complex H. Which is why it’s rare for people to agree on the uncertainty of things. (I have two small papers on this recently, which I’ll highlight here at a later data.)

Here’s an illustration of what I mean. I don’t mean anybody ever writes things down like this, even professional researchers. I mean this is how people think, and that if researchers wanted to better grasp how people think they ought to write things like this.

We start with equations like this, which we can call our “prior”, if you like (take your time with this; the ‘&’ are logical ands.):

Pr(H|E) = Pr(H_1 & H_2 & H_3 & … & H_n(h) | E_h(1) & E_h(2) & E_h(3) & … & E_d(1) & E_d(2) & E_d(3) & … & E_1 & E_2 & E_3 & … & F_1 & F_2 & F_3 & …)

This is, of course, only a cartoon, but it is in the right form of how people think about uncertainty.

Each of those sub-propositions make up D and E. Some of them, the one with the subscripts ‘H’ and ‘D’, provide the definitions of the words used to form H and the model D. Definitions are always part of your assumptions. The subscripts ‘E’ provide the definition of the words in E! The ‘F’ are also E, but I want to separate them out for a reason I’ll tell you below. After all, “coronavirus” has to have a certain meaning, as does “leak”. We can’t get anywhere without having word, and grammar, definitions in mind. I cannot stress this strongly enough.

The remaining E are all those propositions we’re considering as probative (explain or explain away) H, even if the explanations are mere correlations. As you can see, even the hypothesis is a compound proposition (think precisely what “lab leak” can mean).

The thing is, even before “D” arrives, the E shift and morph, especially when H is complex. But, ignoring that and assuming all evidence is fixed before we see D, then we can at least given an impression of Pr(H|E), but maybe not a quantification.

Now “D” arrives”. D looks like this:

D = D_1 & D_2 & D_3 & …

So what we now want is

Pr(H|E) = Pr(H_1 & … & H_n(h) & H’_2 & H’_4 & … | D_1 & D_2 & D_3 & … & E_h(1) & E’_h(2) & E’_h(3) & E’_h(4) & … & E_d(1) & E_d(2) & E’_d(3) & … & E_1 & E_3 & … & G_1 & G_2 & G_3 & …)

What happened to the F? They were expunged and replaced by G! Because when we see the hideously complex D, we change our minds about some of the other evidence we first considered. Out it goes! And, perhaps, here comes some new evidence to replace it. The hypothesis has also changed, with some parts removed and other parts added (those primes indicated with single quotation marks), moves which necessitate changes in E.

I show just how all this works in the papers I mentioned, which are really only ideas borrowed from ET Jaynes (who, like us, was a logical and not subjective probabilist). All you need take away here is that both sides of the equation change, and not just because of the addition of new data D.

There is no one single fixed textbook set of hypothesis, data, model, prior and therefore posterior in these kinds of real-life questions. If there was, as in trivial and textbook examples, then everybody would always agree, or come closer to agreeing, when new data comes in.

That is not what happens in real life. People disagree, and often move farther apart when new data arises. And though many people are indeed irrational, and make any of a seeming infinite number of thinking mistakes, it is not irrationality that drives this conclusion. In fact—sneak peek!—it is Bayes’s Theorem itself!

The idea that rationality comes only with Bayes’s Theorem is false.

Lab Leak

I want to convey the whole idea of Alexander’s “one extra data point” and gross hand-waving with BT is ridiculous, but I don’t want to rehash the entire saga of how the lab leak hypothesis was refined.

It should be obvious enough the term “lab leak” itself morphed and changed from the days of that video of a Chinese woman eat bat soup to today’s discussion of furin cleavage sites and the like. Both the “H” and “E” have undergone radical surgery.

The point of all these changes is to get to the causes of “lab leak”, one way or the other, so that we know the full truth. Not that this is always possible. It’s not that you couldn’t use BT for every micro-step along the very long path of this argument—but if you did, H had to remained rigorously fixed, which of course it did not. That judgment follows directly from the math. It is “Pr(H|DE)” and not “Pr(H’|DE)”.

The evidence for a “lab leak” is long and complex. All this data can be considered as “one” data point, treating D as yet another complex proposition. But nobody serious should do this. Treating D as “one extra data point” leads to Alexander’s hand-waving, absurd conclusion. Because he doesn’t treat that “one extra data point” with any seriousness; he doesn’t consider rationally all the points of D; it’s only a tool that he can assign a ridiculous number to in his slavish use of BT.

We want to know if hubris-filled Dr Frankenstein’s are unbinding Prometheus in gain-of-lethality experiments and wreaking global death. All the evidence we’ve seen so far indicates they did so once. Will they do so again? Bayes Theorem that.

Subscribe or donate to support this site and its wholly independent host using credit card click here. Or use the paid subscription at Substack. Cash App: $WilliamMBriggs. For Zelle, use my email: matt@wmbriggs.com, and please include yours so I know who to thank.


Viewing all articles
Browse latest Browse all 529

Trending Articles