Seventy-some researcher groups were given identical data, and asked to investigate an identical question. The groups did not communicate. Details are in the paper “Observing Many Researchers Using the Same Data and Hypothesis Reveals a Hidden Universe of Uncertainty“, by some enormous number of authors.
As is the wont of sociologists, each group created several models, about 15 on average. There were 1,253 different models from the seventy groups Each was examined after the fact, and it was discovered no two models were the same.
The question was this: Whether “more immigration will reduce public support for government provision of social policies.”
The answers are yes, no, or can’t tell. Only one group said they could not investigate the question. All other groups went to town.
The answer was standardized across the models, and called the “Average Marginal Effect” (AME). Clever idea. Here are all the quantified answers, plotted from smallest to largest AME. The 95% “confidence interval” of each model’s AME is also shown.
Image may be NSFW.
Clik here to view.
About half the models were somewhere in the middle, about a quarter said the effect was negative, and about a sixth said positive.
There were many, many, many wee p-values. There were wee p-values galore! Each “confirming” the researchers had the right answer, and that all the other researchers were wrong. Further, those CIs were nice and tight, “proving”, just like p-values, each model was right on the money.
Now, I don’t know about you, but when I saw this, I laughed and laughed and laughed and then laughed some more. I am laughing now. Later, I will laugh again.
There are many warnings about models we examined over the years, you and I, dear readers. Two that should have stuck by now are these:
1. All models only say what they are told to say.
2. Science models are nothing but a list of premises, tacit and explicit, describing the uncertainty of some observable.
The first warning is easy to see, and it goes some way in removing the mysticism of “computer” models (that a model was computed still impresses many civilians). Every one of those 1,253 models was a computer model.
The second warning I can’t make stick. Let me try again. By premises I mean all the propositions, or assumptions, observational or otherwise, that speak of the observable. This also includes all premises that can be deduced from the premises.
If notation helps, here is every model ever:
Pr(Y | P_1, P_2, …, P_q),
where the P_i are some enormous long list of propositions (synonymous with premises). One might be P_j = “I observed x_j = (2,5,2,3,4…)”, i.e. some measure thought by the modeler to modify the probability of the observable Y in the presence of all the other premises. Both italicized phrases are crucial. (The probabilities of Y can be extreme, i.e. 0 or 1, like in many physics models; e.g. a gravity model F = GmM/r^2).
Too many think writing down, say, the statistics math is the model, the whole of it. That math, such as in a regression, is only a small, a very small piece of any model. Writing down only equations because that is easy (and it is) leads to the Deadly Sin of Reification, which not only inflicts sociologists, but every scientist, from physicists on down.
The temptation to say the math parts of the model is Reality (or “close enough” to it) is overpowering. The temptation is almost never resisted. I guarantee you it wasn’t by researchers in the paper above.
That’s where that “thought by the modeler” comes in to play. He might toss in an “x” to see how it does, because that’s what modelers are trained to do. But that carries with it tacit premises on the strength of the relationship between that premise and Y, and all the other premises in the model at that time. Since that isn’t easy to quantify, or impossible, it doesn’t show up in the math. And the premise is lost from view. It’s still there, though.
Also tacit are the humongous number of tacit premises that accompany data collection (where, who, how, when, etc.). How many unmeasured things about a measured “x” affect the model of Y? Many. But because we can’t quantify these, we forget they are there.
Incidentally, the reason for the prejudice toward math is because often researchers believe there is a “true model”. Of course, there will be true causes of Y, the truest model of all. But researchers weaken “true model” to mean “true probability model”. And there isn’t one.
There’s always a deducible (though perhaps not quantitative) locally true model given the modeler’s premises. But that does not mean the model is universally true, as causes are. (More on that here.)
I know I’ve lost a great many of you. Suffice to say the model is more, and much more, than the math that is written down or coded. That picture above, which is the best case scenario where the data was identical for all, proves it.
It’s worth wondering whether these 1,253 models were converted to their predictive form we’d still have the hilarious result.
We would, but not to the same extent. We’d likely drop to, as a wild guess, maybe 8-10% models that insist on high probability the answer was no, and others that answer was yes. Predictive methods cannot escape those hidden and tacit model premises.
Meaning if you tried to replicate, or verify the prediction, you’ll surely get some of the tacit premises, those that existed only in modelers’ heads, wrong. Your predictions will be weakened.
The gist? The author say researchers “should exercise humility and strive to better account for the uncertainty in their work.”
Humility is not known in science these days.
Therefore, my conclusion is: Stop trusting models so damned much.
Buy my new book and learn to argue against the regime: Everything You Believe Is Wrong.
Subscribe or donate to support this site and its wholly independent host using credit card click here. For Zelle, use my email: matt@wmbriggs.com.