Had a named person in statistics (Andrew Althouse) ask me about randomization, which he likes, and which I do not. “I want to compare outcomes for a specific patient group getting surgery A vs surgery B (assume clinical equipoise). If I’m not going to randomize, how should I allocate the patients in my study so I am confident that afterward I can make inference on the difference in outcomes?”
Excellent question. My response, though it was unsatisfying to the gentleman, was “I’d have independent experts allocate patients, ensuring balance of (what are thought to be) secondary causes, where the panel’s decisions are hidden form trial surgeons. Try to inject as much control as possible, while minimizing cheating etc.”
Too terse to be believed, perhaps. I expand the answer here.
Control in any experiment is what counts, not randomization. For one, there is no such thing as “randomization” in any mystical sense as required by frequentist theory. Probability does not exist. Randomness does not exist. This is proved elsewhere.
What we can do is to create some sort of device or artifice that removes control of allocating patients from a man and gives it to a machine. The machine then controls, by some mechanism, which patients get surgery A and which B.
A man could do it, too. But men are often interested in the outcome; therefore, the temptation to cheat, to shade, to manipulate, to cut corners, is often too strong to be resisted. I’ve said it a million times, and I say it again now: every scientist believes in confirmation bias, they just believes it happens to the other guy.
There is also the placebo effect to consider in medical trials. If a patient knows for sure he is getting a sham or older treatment, if affects him differently than if he were ignorant. The surgeons must know, of course, which surgeries they are performing; thus it is impossible to remove the potential for fooling oneself here. The surgeons doing the sham or older surgery (which we can imagine is A) might slack off; when switching to B they might cut with vigor and renewed enthusiasm.
Now if some sort of “randomization” (i.e. allocation control) device that spit out A and B, 100 of each (Althouse later gave this number), it could be that all 100 As were female and all 100 Bs male. It doesn’t matter that this is unlikely: it could happen. Imagine if it did. Would you be satisfied in analyzing the result?
No, because we all believe—it is a tacit premise of our coming model—that sex is important in analyzing results. Why? Because sex, or the various systems biologically related to sex, tend to cause different outcomes, which include, we suppose, the surgical outcomes of interest here. We would be foolish not to control for sex.
Which is exactly why many trials “randomize” within sex by removing the control from the device and giving it back to some man, to ensure a good balance of males and females in the groups. This makes eminent sense: control is everything.
I don’t know what the surgery is, but it has to be something. Suppose it’s some kind of vascular surgery applied near or to the heart. We know there are lots of other causes, such as CHF, that might also play a causal role in the outcomes we’re tracking. If we’re sure of this, we would also “block” on CHF. That is, we would again remove control of the allocation device and give it to a man.
And so on for the other causes. We might not have the funds or time to explicitly control for all of these, in this physical allocation sense. But we might later include these in any model of uncertainty of the outcome. This is also called “controlling”, although there is no control about it. We’re just looking at things as they stood: we had no control over these other measures. (I wish we’d drop the misleading terminology. See this award-eligible book for a longer discussion of this.)
Enter Don Rumsfeld’s unknown unknowns. There may be many other causes, secondary or more removed (mitigators and so on), of the outcome of which we are ignorant. This must be so, or science would be at its end. How many such things are there in our surgery? We don’t know. They are unknown unknowns. There could be one, there could be ten thousand. The human body is a complicated organism: there are feedbacks upon feedbacks.
How will the machine allocator split these possible causes in the groups? We have no idea. It could be that the machine, like we imagined for sex, puts all or most of a dastardly cause in A and all or most of a beneficent cause in B. And this could go back and forth, and forth and back across all the other causes.
There is nothing we can do about this. They are, after all, unknown unknowns. But the mechanical allocator can’t somehow magically fix the situation such that an equal number of all causes are distributed in the groups. You don’t know what you’ll get. Worse, this ignorance is true, too, for the mechanical allocator for causes we know but don’t explicitly control for. “Randomization” is the experimental procedure of tossing darts and hoping for the best.
Notice closely, though, that the desire for uniform distribution of causes is sound. It is often thought “randomization” gives this. It cannot, as we have seen. But if it is so important—and it is—why not then control explicitly for the causes we know? Why leave it to “chance”? (That’s a joke, son.)
Consider this is precisely how physics experiments are done. Especially in sensitive experiments, like tracking heat, extreme care is taken to remove or control all possible known causes of heat. Except, of course, for the cause the physicist is manipulating. He wants to be able to say that “When I pulled this lever by so much, the heat changed this much, because of the lever”. If he is wrong about removing other causes, it might not be the lever doing the work. This is what got Fleischmann and Pons into such deep kimchi.
Return to my panel of independent experts. They know the surgeries and the goals of these surgeries. They are aware, as can be, of the secondary and other causes. They do their best to allocate patients to the two groups so that the desired balance of the known causes is achieved.
Perfection cannot be had. Panel members can be bought; or, more likely, they won’t be as independent as we liked. Who on the panel wouldn’t, deep in his heart, not like the new treatment to work? I’ll tell you who: the rival of the man who proposed the treatment. The panel might control sub-optimally. Besides all that, there are always the possibility of unknown unknowns. Yet this panel still has a good chance to supply the control we so rightly desire.
Randomization isn’t needed, does nothing, can cause harm, while blinding is often crucial and control is paramount.
Bonus Althouse also asked this (ellipsis original): “Your ‘expert panel’ has assigned 100 patients to receive A and 100 patients to receive B. 14 of the patients that received A died, 9 of the patients that received B died. Your statistical analysis is…what, exactly?”
He wasn’t satisfied (again) with my “Predictive analysis complete with verification.” Too terse once more. As regular readers know, if we cannot deduce a model from accepted-by-all premises (as we sometimes but rarely can), we have to apply looser premises which often lead to ad hoc models. These are the most frequent kind of models in use.
I don’t know what ad hoc model I’d use in this instance; it would depend on knowing all the details of the trial. There are many choices of model, as all know.
“That’s a cop out. Which model is best here?”
Glad you asked, friend. We find that out by doing a predictive analysis (I pointed to this long paper for details on his this works) followed by a verification analysis—a form of analysis which is almost non-existent in the medical literature.
I can sum up the process short, though: make a model, make predictions, test the predictions against reality.
Makes sense, yes?