October 17, 2021

Understanding science

Ten errors in randomized experiments

A recent review discusses errors in the implementation, analysis, and reporting of randomization within obesity and nutrition research

Read Time 6 minutes

Anyone who’s read my stuff with any regularity is acutely aware of my disdain for the way many observational studies are conducted and interpreted in health and nutrition research, as well as my admiration for randomized-controlled trials (RCTs). Randomization, a method by which study participants are assigned to treatment groups based on chance alone, is a critical component in distinguishing cause and effect. Randomization helps to prevent investigators from introducing systematic (and often hidden) biases between experimental groups.

But there are also many ways in which randomized experiments can fall short. Recently, David Allison and his colleagues published an excellent review discussing ten errors in the implementation, analysis, and reporting of randomized experiments — and outlined best practices to avoid them. David is the Dean of Public Health at Indiana University, where he conducts research on obesity and practices psychology. He is also one of the best statisticians in the world, and will be joining me soon as a guest on the podcast. I’ve provided a brief summary of his review below, but to anyone interested in improving their ability to read and understand research, I suggest reading the original text in its entirety. Here, I focus on the general point that while RCTs may be considered the gold standard for establishing reliable knowledge, they are also prone to error and bias.

§

A) Errors in implementing group allocation

1 | Representing nonrandom allocation methods as random

Occasionally, in studies styled as “randomized,” participants are allocated into treatment groups by use of methods that are not, in fact, random.

The review authors provide the example of a vitamin D supplementation trial in which the control group came from a nonrandomized cohort from another hospital.
Lack of appropriate randomization can introduce selection bias: the selection of subjects into a study that is not representative of the target population.

A 2017 analysis by John Carlisle suggested that nonrandom allocation may be a concern in many studies labeled as “randomized.” One of the trials flagged was the well-known PREDIMED trial. Study participants at high cardiovascular risk were randomly assigned to a Mediterranean diet supplemented with mixed nuts or olive oil, or to a low-fat diet. In some cases, whole households were collectively assigned to the same diet. Even more problematic, one of the sites in the trial assigned entire clinics to the same diet. However, the investigators did not initially report this, and they analyzed their data at the level of individual participants rather than at the level of household or clinic. After discovering these problems in a post-publication audit, PREDIMED investigators retracted and reanalyzed the study, leading to various changes in findings.

2 | Failing to adequately conceal allocation

Allocation concealment hides the sorting of trial participants into treatment groups, preventing researchers from knowing the allocation of the next participant, and participants from knowing their assignment ahead of time. Allocation concealment is different from blinding. Allocation concealment ensures the treatment to be allocated is not known before that participant is entered into, while blinding ensures either the participant or investigator (or both, in the case of double-blinding) remains unaware of treatment allocation after the participant is enrolled in the study. Studies with poor allocation concealment are prone to selection bias.

Poor allocation concealment from participants can lead to bias when, for example, certain study participants prefer one possible treatment over another. Those participants may drop out of the study if they become aware that they will not receive their preferred treatment, potentially skewing the group populations.

Poor allocation concealment from investigators can also lead to bias. Researchers may — consciously or unconsciously — place participants expected to have the best outcomes in the treatment group and those expected to have poorer outcomes in the control group.

3 | Not accounting for changes in allocation ratios

When designing an RCT, one step in the process is determining the ratio of subjects to each group. It’s not always 1:1 – that is, one subject assigned to treatment for every subject assigned to placebo. Sometimes it’s necessary from a statistical standpoint to assign twice (2:1) or three times (3:1) as many individuals to the treatment group as the placebo group. Further, investigators may choose to change the ratios in the middle of a study for various reasons. However, changing the allocation ratio partway through a study requires corresponding changes to statistical analyses, which doesn’t always happen.

Dr. Allison gives the example of a study investigating body weight changes associated with daily intake of sucrose or one of four low-calorie sweeteners. Participants were initially randomly allocated evenly among the five treatment groups (1:1:1:1:1). Because one group had a high attrition rate, the investigators changed to a 2:1:1:1:1 ratio halfway through the study, but they did not account for these different study phases in their statistical analyses.

4 | Replacements are not randomly selected

In virtually all RCTs, some participants will inevitably drop out. One way that investigators try to mitigate this problem is by using intention-to-treat (ITT) analysis, which we discussed in more depth in this article on the efficacy vs. effectiveness of a time-restricted eating trial. In ITT analyses, every participant that is assigned to a treatment group must be included in outcome analyses, regardless of whether those participants followed the protocol or dropped out of the study.

In some cases, investigators replace dropouts with more participants to ensure the study remains adequately powered. These replacements must be randomized to avoid another form of Error #3: changing allocation ratios. (For more information on statistical power, which represents the probability that a study will correctly identify a genuine effect, read Part V of our Studying Studies series.)

B) Errors in the analysis of randomized experiments

5 | Failing to account for non-independence

Sometimes groups of subjects are randomly assigned to a treatment together, but are analyzed as if they were randomized individually. For instance, an entire classroom might be randomized to one group while a separate classroom is assigned to another. These types of studies are referred to as cluster RCTs and are subject to error when they are powered and analyzed at the individual level instead of the group level. The PREDIMED study exemplifies this error, as groups of individuals within certain households or clinics were assigned to a treatment together, but the authors did not initially adjust their statistical analysis to account for clustering.

6 | Basing conclusions on within-group statistical tests instead of between-groups tests

The strength of an RCT lies in its ability to compare the results between two or more groups. For example, I recently wrote about a study that randomized men to morning exercise, evening exercise, or no exercise. The investigators reported that nocturnal glucose profiles improved only in men who exercised in the evening. The improvement, however, was “in-group,” meaning that nocturnal glucose levels had improved relative to baseline values, not compared to the other groups in the study. The authors’ conclusion that evening exercise conferred greater benefit for glycemic control than morning or no exercise is thus an example of the Difference in Nominal Significance (DINS) error. This error occurs when differences in “in-group” effects are used to draw conclusions about differences in “between-group” effects, rather than directly comparing groups to each other.

7 | Improper pooling of data

Pooling data under the umbrella of one study without accounting for it in statistical analyses can introduce bias. Dr. Allison cites an example of a trial on the effects of weight loss on telomere length in women with breast cancer. Data were pooled from two different phases of an RCT with different allocation ratios (see Error #3), which wasn’t taken into account in the analysis.

The different sites, subgroups, or phases of a study need to be taken into account during analysis. Otherwise, any differences in the subsets of data being pooled together can bias the estimation of an effect in the trial.

8 | Failing to account for missing data

Missing data — whether due to dropouts, errors in measurement, or other reasons — may not occur completely at random, breaking the randomization component of the study and introducing bias.

The review authors provide the example of a trial of intermittent energy restriction vs. continuous energy restriction on body composition and resting metabolic rate. The study had a 50% dropout rate, yet only data from participants who completed the protocol were analyzed. (This is an example of “per protocol” analysis, in which data from noncompliant subjects is removed from analyses.) Reanalysis of the study including all participants halved the magnitude of effect estimates compared with original reported results.

Investigators may mitigate this problem by reporting both per protocol and ITT results: efficacy and effectiveness, respectively. However, Dr. Allison suggests that this isn’t a perfect fix: “ITT can estimate the effect of assignment, not treatment per se, in an unbiased manner, whereas the per protocol analysis can only estimate in a way that allows the possibility for bias.”

(As noted earlier, this article details efficacy vs. effectiveness of time-restricted eating.)

C) Errors in the reporting of randomization

9 | Failing to fully describe randomization

Investigators must provide sufficient information so that readers can fully comprehend and evaluate the methods used for randomization. The review authors themselves admit to having a history of inadequate reporting of randomization methods.

10 | Failing to properly communicate inferences from randomized studies

When following the ITT principle, an RCT tests the effect of assigning participants to a treatment on the outcome of interest, but investigators often communicate results as the effect of the treatment itself (meaning, how well the treatment works if followed exactly as it’s prescribed).  Avoidance of this error depends on conscientious framing of the precise causality question addressed by the study.

For example, in the article I wrote reviewing a time-restricted eating trial, I highlighted the investigators’ statement that, “Time-restricted eating, in the absence of other interventions, is not more effective in weight loss than eating throughout the day.” In actuality, the investigators found that being assigned to time-restricted eating, in the absence of other interventions, is not more effective in weight loss than being assigned to eating throughout the day.

§

The review from David Allison and his colleagues highlights that while randomized controlled trials are powerful tools for examining cause-and-effect relationships, they are not immune to errors and bias. The paper is a great reminder of the high level of rigor involved in designing, conducting, and reporting randomized experiments, as well as a useful guide for investigators and readers alike for avoiding many pitfalls associated with this study design.

Disclaimer: This blog is for general informational purposes only and does not constitute the practice of medicine, nursing or other professional health care services, including the giving of medical advice, and no doctor/patient relationship is formed. The use of information on this blog or materials linked from this blog is at the user's own risk. The content of this blog is not intended to be a substitute for professional medical advice, diagnosis, or treatment. Users should not disregard, or delay in obtaining, medical advice for any medical condition they may have, and should seek the assistance of their health care professionals for any such conditions.
  1. Greetings Peter,

    Incredible and well detailed article. I am curious, however, that you haven’t seemed to comment at all on the data coming out of Israel which refutes the vaccine for cov 19 (as this seems to be the biggest health care issue of our lifetimes).
    Wouldn’t it feel good for you to call a spade a spade and call out this shill
    narrative peddled by the CDC.

    thanks,

    James

  2. Peter,

    Excellent article, and I especially appreciate your emphasis on allocation concealment (one of the most important aspects of the process of randomization that most investigators with whom I’ve worked don’t fully understand). And, I’m very excited that you’ll have David Allison on your podcast, he truly is one of the best!

    I do, however, have one correction to offer. While you correctly note that lack of randomization can induce selection bias, for randomized trials that really has nothing to do with a target population. Instead, in RCTs, selection bias has to do with who gets what treatment. And, as you correctly noted, the whole goal is to take selection away from either the clinician or the patient and leave it entirely up to chance (which can only be accomplished with adequate allocation concealment!).

    Cheers,
    Mark

  3. The biggest issue in RCTs might be the improper control of comorbidities.

    Take age for example (as a variable). Two arms might have the same avg age, but one arm might have a larger variance, having both a higher number of older people and younger people. This can easily explain a couple of % difference in trial outcomes!

    Fish Oil: One RCT finds reduced morbidity, another doesn’t.
    Aspirin: One RCT finds reduced morbidity, another doesn’t.

    How do you explain all this? Probably no one can. My guess is two arms would be imbalanced in terms of baseline death probability. This leads to conflicting outputs in trials.

    Nobody, remember nobody controls for probability of dying. Studies control for avg age, cholesterol, # of people with diabetes etc. But this isn’t enough. You literally need a statistical model for mortality rate, then equalize two arms in distribution by mortality rate. Then do the test.

    Most RCTs with mortality-endpoint are suspect. They have way too many confounders and/or underpowered to detect small deltas. Yet, the authors present as if they do.

    Best.

  4. Kudos for a well laid out article.

    Gorkan’s insight that “nobody controls for probability of dying,” was very good too. Most researchers would probably gloss over unequal end-of-study populations.

    Is there any chance you and Dr. Allison might chew a bit on how these systemic errors ( and others ) can tilt outcomes when using Bayesian methodologies?

    Best regards, Ted

Facebook icon Twitter icon Instagram icon Pinterest icon Google+ icon YouTube icon LinkedIn icon Contact icon