Check out more content on this topic:
- (Oct 17, 2021) Ten errors in randomized experiments
- (Feb 26, 2022) Nutritional epidemiology: abolition vs defending the status quo
- (Feb 28, 2022) The science of obesity & how to improve nutritional epidemiology
I make no secret of my disdain for nutritional epidemiology studies. Between their reliance on weak methods, weak measurements, weak analyses, and weak reporting, I would hardly know where to begin in simply describing the problems with these study types, let alone identifying their possible solutions.
Fortunately, David Allison and his colleagues have succeeded where I would surely fail. On the heels of their excellent review on errors in randomized experiments (which I highlighted in a recent post), David and his team have turned their attention toward the current state of nutritional epidemiology, publishing an insightful review in which they discuss the many flaws of such investigations and suggest potential remedies to enhance their rigor and utility. Specifically, they identify four areas of focus to improve the quality of nutritional epidemiology studies: (1) Stronger designs, (2) Stronger measurement, (3) Stronger analyses, and (4) Stronger execution and reporting.
Will David convince me that this brand of studies is worthy of saving, or that they even can be saved? First, let’s consider each point from his review in greater detail.
1 | Stronger designs
Forming a research question — which the research project sets out to answer — is an essential step in study design. Does the question matter? What is the benefit of answering the question? Can the question be answered by existing literature? Have others already asked this question? Did they answer it?
Nutritional epidemiology studies often fall short on this task. Many published studies fail to begin with a clear and concise question to be answered. Among those that do, many ask questions that have already been examined in many other published papers, making it very unlikely that another study will add any meaningful new insight on the topic. For example, a working group of the World Health Organization considered more than 800 epidemiological studies that investigated an association between cancer and red or processed meat consumption. If a research project is setting out to answer whether red or processed meat is carcinogenic to humans, investigators should determine whether one or more of the thousands of papers on the topic has already asked their specific question.
Let’s say you’ve arrived at a novel, meaningful research question. Can you design your study so that the methods align with your question?
Randomization is considered the most powerful study design to examine cause-and-effect relationships. For many research questions, randomized controlled trials are necessary to establish reliable knowledge. Conducting 200 more observational studies on cancer and red or processed meat consumption won’t get us any closer to a definitive result. This is largely because observational studies are by definition nonrandomized, allowing virtually limitless possibilities for confounding. Such data might help spark new hypotheses for subsequent, more rigorous testing through randomized trials, but all too often, those subsequent trials are never performed. Many point out that, for certain research questions, randomized controlled trials aren’t possible for reasons ranging from cost to ethics to difficulty in controlling for necessary variables, but these challenges are hardly a fair excuse for throwing up our hands and deciding that completely nonrandomized observational studies are our only option. As David and his colleagues point out, several creative solutions have been developed to facilitate interventional studies or introduce a degree of randomization to otherwise nonrandomized investigations.
All of this is to say that a staggering number of nutritional epidemiology studies trip at the starting line before even the earliest data are collected. If a poorly-formulated research question or poor study design can poison all subsequent work and meaning on a given project, then the primary antidote – according to David and his colleagues – must be more mindful planning and preparation at the start.
Recommendations for stronger designs
- Begin with the research question to be answered, consider which measurements would most effectively answer this question, and develop a study design best-suited to delivering these data.
- First consider what a design can and cannot accomplish (to the extent this is known).
- Designs exist both for generalizability of results and for making inferences to a specific individual (e.g., pragmatic trials versus repeated N-of-1 trials, respectively), and investigators should be precise about the population or individual to which inferences are appropriately made.
- Before declaring a randomized trial to be impossible, impractical, or unethical, researchers should thoroughly review all available options.
2 | Stronger measurement
Most of us can probably recall very generally what we had for dinner last night. If asked to recall every ingredient, we may have to think a bit more carefully, but at least some of us could probably come up with a fairly accurate list. But when asked to recall every ingredient down to precise volumes and weights – over several meals – our ability to report with any reasonable accuracy falls precipitously. Unfortunately, this flawed recall is the primary source of data for many nutritional epidemiology studies.
Self-reporting from food frequency questionnaires (FFQs) is notoriously inaccurate, but the use of FFQs continues to persist in nutritional epidemiology. David and his colleagues note that the FFQ does not accurately estimate frequency of intake or gauge serving size and energy intake from FFQs is invalid. Many FFQs ask you how frequently you’ve eaten particular foods over the course of a year. While most of us can recall with some accuracy what we ate yesterday, it’s a lot more difficult to recall what we ate (and how much of it we ate) 11 months ago.
The authors point out that ideal measurements would be accurate, precise, detailed, and frequent, but that this combination is seldom achievable by a single measurement tool. FFQs clearly fall short on accuracy and often on precision and frequency as well. So if we can’t rely on self-reporting tools like FFQs in nutritional epidemiology, what are some ways in which measurement can be improved?
Recommendations for stronger measurement
- Self-reporting tools have utility for some uses; however, when possible, self-report should be used in conjunction with additional, objective means of evidence validation, and should be avoided when invalid or unfit for a particular use.
- Blending varying degrees of automation with traditional, observational studies can improve the quality of self-reported data.
- Researchers should continually seek additional biomarkers and other new technologies and methods for collecting objective data.
3 | Stronger analyses
Randomization, the process by which participants in trials are assigned by chance to separate groups, is a critical component in distinguishing cause and effect and in mitigating confounding.
Bias due to confounding is a core limitation of observational research. Confounding occurs when a factor is associated with both the exposure (or treatment) and the outcome (e.g., disease or death), and is not part of the causal pathway from exposure to outcome. In observational nutritional epidemiology, there are many confounding factors that can distort the results of a study.
Using statistical analyses, confounding factors can be adjusted for after data gathering. But confounding can persist, even after adjustment. There are often additional confounding factors that were not considered, or there was no attempt to adjust for them because they were not measured during the process of data gathering. And in some situations, confounding variables are measured with error. How much do you adjust for multiple confounders in the same study? How much crosstalk is there between confounding variables? To add to the problem, different investigators can take a different approach to control for confounding variables because there isn’t a uniform standard.
Is there anything that can be done to mitigate biases like confounding in observational studies?
Recommendations for stronger analyses
- The relationship of dietary factors to numerous potential confounders, such as age, sex, education, and income, should be determined, and uniform standards developed to include and address these.
- Investigators should use multiple analytical methods, including appropriately robust and sometimes novel statistical tools, to mitigate biases common to simple observational studies.
- To resolve the complex problem of innumerable interacting variables in the exposome, investigators should seek information technology approaches to the investigation, reduction, and interpretation of data.
4 | Stronger execution and reporting
The reporting of observational studies is often inadequate. Two of the biggest issues raised by Allison and colleagues are the multiple hypothesis testing problem and selective non-reporting of studies and results.
One of the supposed benefits of nutritional epidemiology is the vast amount of data it can collect and generate. Studies often calculate multiple primary and secondary endpoints and evaluate the effects of multiple exposures. We are awash in a seemingly bottomless sea of observational studies reporting an almost endless number of results. When a study reports results of multiple outcomes and multiple exposures, each combination of outcome and exposure constitutes a separate hypothesis being tested. The more hypotheses are tested, the higher the probability that results from at least some of these tests may be statistically significant purely by chance, without any true underlying effect. Now think about the problem more globally: multiple studies testing multiple hypotheses. With hundreds of thousands of studies, how do we know which results are “true discoveries” versus false positives? This multiple hypothesis testing problem often leads to the reporting of nonexistent effects and associations and contributes to the poor reproducibility of many nutritional epidemiology studies.
There’s also the problem of selective non-reporting of studies. Though countless nutrition studies have generated a massive volume of data over the years, many of the findings from such investigations are never reported. Known as the “publication bias,” selective non-reporting typically leads to underrepresentation of negative (non-significant) results in scientific literature. One reason for this bias is that investigators often think their (usually null) results are unimportant or uninteresting and therefore never submit them for publication.
Reproducibility — the ability to generate the same results from previous studies using the same methods and analysis — is an essential part of the scientific process. Multiple hypothesis testing and selective non-reporting are each major contributors to irreproducibility, particularly when the two issues are combined (as is often the case). In addition, the datasets used in nutritional epidemiology are rarely made available to the public, and since reproducing results requires access to the same data, this lack of transparency prohibits even any attempt to reproduce findings. Providing thorough information on the back-end of a study through results reporting helps to make it reproducible, as does providing thorough information on the front-end through study preregistration. Investigators can preregister by posting analysis plans to sites such as clinicaltrials.gov or osf.io (Center For Open Science), committing to the analytic plans without any advanced knowledge of outcomes. Investigators can also commit to submitting their results to clinicaltrials.gov. Unfortunately, preregistration of observational studies is relatively rare: a 2016 analysis found that the number of registered observational studies paled in comparison with the number of published reports. In the cases where a study was registered, it usually occurred after study had started, with less than 3% of published articles having a registration date that preceded the study start date. Further, fewer than 11% of the registered studies were followed by publications, hinting at the magnitude of the problem of selective non-reporting.
Recommendations for stronger execution and reporting
- Nutritional epidemiology should adhere to reporting guidelines (e.g., CONSORT and STROBE-nut).
- To prevent selective non-reporting of studies and results, investigators should register research prospectively (e.g., on ClinicalTrials.gov) and report results for all outcomes and analyses.
- To improve transparency and openness, investigators should share research materials, data, and code.
- To promote scientifically appropriate interpretations, researchers should avoid “spin” in scientific reports and press releases and identify limitations associated with their findings.
Considering these proposed improvements to study designs, measurements, analyses, and reporting, do I expect nutritional epidemiology can ever reach the same standards as randomized controlled trials? It seems unlikely. But do these types of studies have the potential to increase their scientific rigor and provide more meaningful knowledge for human health? I believe so. The question is, will they?
In spite of their many faults, observational studies in nutrition continue to be popular with the press, where spurious associations are often translated into flashy headlines. Moreover, food industry sponsorship has long ensured ample funding for nutrition studies, and on many occasions, this conflict of interest has been found to lead to flawed methodology or misleading reporting of results. So if demand for these studies remains high and funding remains available, what provides the incentive for reform, and who should be responsible for ensuring quality? Editors and reviewers for scientific journals might help to elevate standards by requiring more rigorous methods and public disclosure of data and analytical codes. Perhaps the ultimate gatekeepers must be investigators themselves. We can only hope that most have a hunger for truth over flash.