Randomized controlled trials (RCTs) have revolutionized modern medicine, offering a means for determining causation – rather than mere associations – in medical therapies and other interventions. When well-designed, they minimize bias and confounding variables and provide robust evidence as to whether a treatment is or isn’t effective in its intended purpose. However, as we move into a more personalized and preventative form of medicine (what I call “Medicine 3.0”), we need to recognize that RCTs, while powerful, aren’t always the right tool for every question.
Anatomy of an RCT
A properly executed RCT requires several key elements that, while crucial for scientific rigor, can limit their applicability in longevity-related research. By definition, these studies must involve an appropriate control intervention for comparison to the treatment under investigation, and participants must be randomized as to which intervention they receive. But strategies for meeting these two requirements are often constrained by practical considerations.
One such practical consideration is ethics. Ethics prohibits randomization or optimal control design for certain research questions, as participants cannot be randomly assigned to an intervention which we have good reason to believe may be harmful. For instance, clinical trials have never been performed to solidify the causal link between cigarette smoking and lung cancer because the body of evidence from epidemiological studies and preclinical data is already so overwhelming that it would be unethical to randomly assign a participant to a smoking intervention. Likewise, knowing the benefits of exercise on health and well-being, we cannot ethically test those benefits by allocating individuals into a control group that is required to completely refrain from exercise for years. In some cases, control interventions can be modulated in such a way that would avoid active harm (e.g., by comparing a high-exercise group to a moderate-exercise group), but as we’ve discussed at length in a past newsletter on the Generation 100 study (which adopted just such an approach), doing so can easily undermine the results of the study.
Proper blinding is also essential for creating a valid control group, but not all interventions make blinding possible. To continue with the exercise example, someone will know whether they are exercising or not; you cannot blind them to this treatment. Thus, the treatment and control groups differ not only in the treatment itself but also in their awareness of being treated, potentially introducing placebo effects or performance bias (if participants know the group they are in, they may change other behaviors as a result, influencing the outcome). In other words, any differences in results between groups could arise from the knowledge of the intervention rather than the intervention itself. The same challenge applies to dietary interventions, sleep studies, and many other lifestyle modifications that are crucial to longevity research.
RCTs are ideally suited to test the effects of simple, binary interventions, as is the case with most drug studies – a participant is either given the active drug or they are not. But how would we design a control intervention to compare to, say, the Mediterranean diet? There is no “opposite” of olive oil and whole grains. How would we even standardize the Mediterranean diet itself across thousands of participants for years? Even when the intervention is simple and binary, humans are not simple, binary creatures. Human aging is a complex system with multiple interacting variables, and when we study interventions in isolation, we might fail to account for the way different interventions interact with each other.
Compromises for saving time and money
Due to the logistics and cost of running these trials, they must have reasonable duration, usually no more than four to five years – beyond which the study may become cost-prohibitive. (For example, the Women’s Health Initiative was initially funded for 15 years at a whopping $625 million.) But for many interventions, meaningful impacts on human lifespan or incidence of chronic disease may take decades to manifest in a general population. Thus, researchers must often make compromises with respect to the readouts they choose to monitor and the population they choose to study to keep costs in check.
In order to get an “answer” to a given research question in a manageable period of time, investigators frequently forgo direct measurement of the outcomes we care about most – like chronic disease incidence or mortality – and instead opt for metrics that are related to these outcomes but are likely to show the effects of an intervention on shorter timescales. For instance, as I discussed recently with Eric Ravussin, the intention of the CALERIE trial was to evaluate the effect of calorie restriction on lifespan, but rather than continuing the study for decades to determine the average age at which participants died, the researchers monitored numerous metrics that might impact or correlate with lifespan, such as oxidative stress and immune function. The same logic is in play when we encounter trials that track apoB levels or coronary artery calcium (CAC) to make conclusions about how a drug might affect risk for major adverse cardiac events (MACE, such as heart attacks or strokes) – the outcomes that are most clinically meaningful.
Another strategy investigators employ for limiting the duration of an RCT is to restrict the study population to individuals who are already at high risk for the outcome of interest. Trials on interventions to prevent MACE, for example, are often restricted to patients who have already experienced at least one such an event in the past. Relative to studying a general population, this approach drastically reduces the length of time needed to see a separation in risk between the treatment and control groups, but it potentially limits the generalizability of findings.
Moving beyond RCTs for Medicine 3.0
Let’s be clear: RCTs are still the most powerful tool we currently have for determining how an intervention may be causally related to an outcome. We certainly are not trying to diminish their importance, but the limitations we’ve addressed above point to a need for other forms of evidence to supplement RCT-derived insights when appropriate. Evidence-informed medicine must complement evidence-based medicine by intelligently extrapolating from multiple sources, allowing us to fill gaps in knowledge generated from RCTs.
For example, epidemiology and observational studies may help us identify interesting questions to ask, while animal and cell studies could help answer questions about mechanisms. Where ethical randomization is not possible, Mendelian randomization studies can be used to reduce the influence of confounders to improve our understanding of observational data. (Mendelian randomization uses gene variants associated with an exposure, like alcohol consumption, as proxies for that exposure. Since gene variants are randomly distributed in the population, it mimics a “natural” randomized trial, minimizing the influence of lifestyle or environmental factors.) To answer the difficult questions that longevity research seeks to address will take a combination of interesting questions, the appropriate study design to answer those questions, and intelligent interpretation and synthesis of these data.
Beyond combining evidence from RCTs with that from different study designs, emerging technologies and resources are expanding the capabilities of RCTs themselves to address more complex and challenging research questions. For example, long-term exercise trials have historically been hampered by the difficulty of ensuring adherence to the intervention, yet the use of wearable activity trackers can mitigate these concerns, and likewise, wearables for tracking sleep, blood glucose, temperature, and other metrics can provide rich, longitudinal data about individual responses to interventions in those arenas as well. Machine learning algorithms, artificial intelligence, network analysis, and other analytical and statistical techniques can help us identify patterns and interactions that might not be apparent in traditional statistical analyses. Adaptive trial designs that adjust study protocols based on new information can help cut down on the costs and durations of clinical trials without compromising their ability to reach reliable conclusions. These improvements will allow us to answer questions that have previously been difficult or impossible to address.
The bottom line
RCTs remain our mightiest tool for establishing causation in medical science. However, the future of longevity research requires a more nuanced approach that matches the right study design to the right question. Sometimes that will be an RCT, but often it will require synthesizing evidence from multiple sources and study types.
The absence of RCT evidence doesn’t necessarily mean an intervention is ineffective – it may simply be impractical to study in that format. As we move forward in longevity science, we need to become more sophisticated in how we evaluate evidence, using RCTs where appropriate while developing and validating other methodologies where they aren’t.
This doesn’t mean lowering our standards for evidence – if anything, it means raising them by requiring multiple lines of evidence converging on the same conclusion. The future of longevity research lies not in replacing RCTs, but in supplementing them with other forms of evidence to build a more complete understanding of how we can extend both lifespan and healthspan.
For a list of all previous weekly emails, click here.