With the push for disaggregation of data and incorporation of intersectionality, the concern for multiple comparisons adjustment has grown, including in research on health and racial inequities. Multiple comparison adjustments are often used in genetics research as a statistical tool to protect against overinterpretation of chance findings when simultaneously testing many possible hypotheses. Multiple comparisons adjustments entail making a choice between tolerating type 1 errors (overinterpreting chance findings) versus type 2 errors (missing real associations). However, the conventional application of multiple comparison adjustments increases the risk of missing real findings, which could be harmful in the context of disparities research. Here, we briefly describe the (mis)use of multiple comparisons adjustments and the potential impact on the validity of racial and health equity research findings.
Multiple Comparisons Adjustments in Racial and Health Equity Research
When we conduct more than one statistical test in a given sample, the chance of at least one association occurring by chance -- and being deemed “statistically significant” -- increases. The multiple comparison adjustment is important for certain types of questions when a great number of hypotheses, each with very low likelihood of being true, is tested. Multiple comparisons adjustment is intended to keep the likelihood of false positive findings to a tolerable level.
However, applying multiple comparisons adjustments in the context of disparities research can obscure inequalities, making it difficult to identify important differences. For example, we may compare multiple groups against the reference category, perhaps the most privileged group. Each of several comparisons may be well motivated, with strong theoretical or empirical reason to believe there is an association; and each comparison may have important substantive implications if group differences are verified. But, in this case, applying multiple comparisons adjustments could lead to missed associations, suggesting there are no differences in outcomes when, in fact, there are.
A common approach to evaluating disparities across multiple racial/ethnic groups begins with a joint test of the null hypothesis that all groups have equivalent health. If the joint test is rejected, we then move on to group-by-group comparisons. Applying a multiple comparisons correction at that point is misguided, but a fairly common mistake. Preceding the multiple hypotheses tests with a joint test obviates the need for a multiple comparisons correction. It may sometimes not even be appropriate to begin with a joint test as a threshold to evaluate group-by-group comparisons, if for example there is a strong prior belief that inequalities in one or more groups are likely or there is specific substantive interest in one or more of the group-by-group comparisons.
We focus here on null hypothesis significance testing, which has been controversial for many reasons, but many of the concerns are similar when presenting confidence intervals to evaluate racial inequities.
Disparities research already struggles to overcome statistical power limitations due to inherently small sample sizes, especially when considering intersectional identities. By redundantly adjusting for multiple testing, we are further restricting our ability to understand differential impacts on historically marginalized groups. Adding more groups to compare, as in intersectional identity groups – for example, gender, sexual identity, ethnicity, socioeconomic status, and religion – will typically reduce statistical power, increasing the chances that we overlook important differences. In fact, one could reduce almost any racial/ethnic difference to non-statistical significance by defining ever more, and smaller, sub-groupings. Consequently, the choice to pursue multiple comparisons adjustments influences our conclusions (i.e., the validity of findings) when we exclusively focus on contrasts that are statistically significant after redundantly adjusting for multiple testing.
Our major concern when conducting multiple comparisons corrections is the detection of unusual chance findings. Our research question should serve as the fundamental base guiding the selection of our methods, necessitating a clear and accurate identification and reporting of the need for multiple comparison adjustments, when used. Racial and health equity research requires multiple comparisons across and within subgroups, including subgroups defined by multiple intersectional identities (e.g., race x gender, race x age). While this is complex, it is not necessarily concerning and should not automatically trigger adjustments for multiple comparisons. Indeed, “science comprises a multitude of comparisons, and this simple fact in itself is no cause for alarm.” But determining whether, when, and how to correct for these multiple comparisons - guided by a nuanced understanding of theoretical and contextual factors - is essential.
Resources
- Rubin M. Redundant multiple testing corrections: The fallacy of using family-based error rates to make inferences about individual hypotheses. Published online January 24, 2024. Accessed January 25, 2024. http://arxiv.org/abs/2401.11507
- García-Pérez MA. Use and misuse of corrections for multiple testing. Methods Psychol. 2023;8:100120. doi:10.1016/j.metip.2023.100120
- Kauh TJ. Racial Equity Will Not Be Achieved Without Investing In Data Disaggregation. Health Aff Forefr. doi:10.1377/forefront.20211123.426054
- Midway S, Robertson M, Flinn S, Kaller M. Comparing multiple comparisons: practical guidance for choosing the best multiple comparisons test. PeerJ. 2020;8:e10387. doi:10.7717/peerj.10387
- Rothman KJ. No Adjustments Are Needed for Multiple Comparisons. Epidemiology. 1990;1(1):43-46.