CDDmlr was designed to examine the Wilcox-Russell hypothesis [2, 4, 5], and its extensions, e.g. Hernández-Diaz et al. [6], and to provide quantitative estimates of the direct effects, which are independent of birth weight and the indirect effects that may operate through birth weight. As described above we have implemented the same assumptions as Wilcox-Russell [2, 4, 5] and Hernández-Diaz et al. [6]. Nevertheless, application of a quantitative model has some additional limitations over qualitative models, e.g. data quality and quantity, as well as the details of the implementation.

The analyses are based on the public use samples of the NCHS linked birth death files. These have very large sample sizes (Table 1) so there are unlikely to be issues with power. Birth weight is considered to be reliably measured. Mortality estimates may be slightly biased due to problems associated with linking birth and death certificates. However, these are the same data, with the same problems, that most representative analyses of the US are based upon. For our purposes the most troubling defects are that births at <500 grams and LMP gestational ages <20 weeks are not consistently reported by all states [23]. Following many analyses of these data we have truncated the data to avoid this problem. Consequently, we have used Gaussian distributions truncated at 500 grams to match the data.

One technical difficulty in models of this kind is estimating unbiased direct and indirect effects. The qualitative analysis in Hernández-Diaz et al. [6] is based on the assumptions of counterfactuals [24–26]. Here we take an alternative approach, developed from statistical decision theory [18]. In our case, we have modelled the birth weight density as the sum of two Gaussian distributions and the subpopulation specific mortality curves as a 2^{nd} degree polynomial of Z-scored birth weight standardized with respect to these Gaussians. This eliminates the main effects (associations) of race and birth weight and the logistic regressions can then estimate the direct effect of race on infant mortality versus potential interaction effects of race and birth weight on infant mortality. Direct and indirect effects can be estimated using procedures similar to direct standardization [18]. The result is called a "generated direct effect" by Geneletti [18], which is similar to Pearl's "natural direct effect" [24]. Since the "normal" and "compromised" subpopulations are defined as Gaussian distributions, the appropriate distribution is theoretically available for direct standardization. In this regard, truncation of the data at 500 grams creates a significant truncation difference in the standardized birth weight distributions between African and European American "compromised" births. Consequently, the results based on a common reference population (i.e. the European American distribution, Table 5) may be preferred. Identification issues concerning "generated direct effects" are discussed by Geneletti [18].

One advantage of the decision theory approach is that the assumptions concerning the existence of counterfactuals are not necessary. However, like counterfactual methods, the same strong unmeasured covariate assumptions are required. In particular; a) no unmeasured covariates which affect the stressor (race in this case) and the racial disparities in infant mortality, b) no unmeasured confounding of race and birth weight, and c) no unmeasured confounding of birth weight and infant mortality. Assumption a is necessary to estimate total racial disparities, all three are needed to estimate "generated direct effects" [18].

These assumptions may be less of a problem with race than with other variables such as smoking, which have more precise definitions. Race is typically considered to be socially constructed and defined as that collection of variables (some of which may be observable and some of which are currently unobservable) that are associated in some way with reported race. Given this view, all confounders of racial effects on birth weight or infant mortality, are integral parts of the definition of race. This is the assumption generally used when reporting total "racial disparities", such as those presented in Table 1. Of course it is possible to partial out the effects of measured confounders on racial disparities, e.g. the effects of maternal age, but what are left in this case are simply all the unmeasured and unknown effects of race. The results presented above are uncorrected for confounders, and consequently represent the sum total of all direct and indirect effects associated in some way with race. This should be considered when interpreting the results.

Based on the "pediatric paradox", Wilcox has argued that racial disparities may be underestimated due to unmeasured confounding [5, 9–11]. Gage has hypothesized that the lower birth weight specific mortality of African compared to European American "compromised" birth cohorts[10] is due to the heavier fetal loss and selection documented among African Americans [27, 28]. If this assumption is correct, then differential fetal loss is associated with the direct effect of being African American in the "compromised" subpopulation and with the "pediatric paradox". This interpretation is also consistent with Platt et al.'s finding [29] that the race "birth weight paradox" disappears when observable fetal deaths (total fetal loss is not observable) are included (as well as live births and infant deaths) in the analysis of racial disparities in infant mortality. Should this selection bias be included in the definition of "race" or should differential fetal loss be excluded from the definition of race? The answer depends upon the question, but CDDmlr potentially makes it possible to correct for this "unmeasured" source of confounding.

Model-based adjustment of this effect yields relative risks of 4.2 and 3.6 for African American female and male births, respectively. These are higher than the predicted total relative risks in Table 5, and much higher than the observed relative risk of 2.1 for both sexes derived from Table 1. This adjusted racial disparity needs to be considered with some caution, since it assumes that the direct effect in the "compromised" subpopulation is completely due to selection bias and can be reduced to zero while all other modelled effects remain the same. Nevertheless, it is possible that a substantial part of the racial disparity in infant mortality is hidden by differential fetal loss.

We assume that unmeasured confounding of birth weight and infant mortality (assumption c) is responsible for the reverse-J shape of the birth weight specific mortality curve [16, 17] and that the reverse-J shape is not a "causal" effect of birth weight. We have implemented the characteristic reverse-J shape of birth weight specific infant mortality using a second-degree polynomial to account for this unmeasured confounding. This could cause some error if it cannot adequately represent the shape determined by the unmeasured covariates assumed to be responsible for this phenomenon (Figure 1). A 2^{nd} degree polynomial, however, is a relatively flexible function, and is considered to provide an optimal fit to birth weight specific mortality in the homogeneous case [30].

Moreover, the CDDmlr model corrects for some unmeasured confounding of birth weight and infant mortality, referred to as "normal" versus "compromised" births. It is unlikely that dividing birth cohorts into two Gaussian subpopulations will account for all of the unmeasured confounding between birth weight and infant mortality. Nevertheless, the two subpopulations display significantly different mortality patterns indicating that the CDDmlr model accounts for some otherwise unmeasured heterogeneity [9, 10]. In particular, we have argued that the generally higher "normal" birth weight specific mortality compared to "compromised" birth weight specific mortality is due to greater fetal loss among "compromised" births, resulting in a highly selected "compromised" sample at live birth [9, 10] similar to the hypothesis concerning the "pediatric paradox". If correct, this effect would violate assumption c, unless the two subpopulations are examined separately, as they are here.

The statistical results presented above (Tables 2 and 5) are consistent with the Wilcox-Russell hypothesis [2, 4, 5], and its extensions [6] (Figure 1c) that suggest that birth weight is not on the "causal pathway" to infant mortality at least for "normal" births. The racial disparity in birth weight has no significant association with the racial disparity in infant mortality after controlling for the other paths in Figure 1c. There is no evidence of any residual difference in infant mortality between birth weight and infant mortality over and above the direct effect and the reverse-J shape of the standard population, European American births in this case. It is unlikely that this result is compromised by uncontrolled confounding of birth weight and infant mortality, since this would require that the sum total of associations generated by uncontrolled confounding equal zero. It is more likely that all of the effects of race on infant mortality in this subpopulation operate through pathways that do not include birth weight.

On the other hand, there is a substantial indirect effect, which disadvantages African American infant mortality among "compromised" births (Table 5). The results in Table 2 indicate that this association is largely due to a change in shape of the reverse-J-shaped birth weight specific mortality curve between the races. This could be due to an interaction of race and birth weight on infant mortality, or due to a violation of no unmeasured confounding assumptions b or c. It is also equivalent to the interaction [6] required by Figure 1a and also possible in Figure 1b, both of which require that birth weight be on the "causal pathway" to infant mortality. In any event an association between birth weight and infant mortality can not be excluded, and it remains possible that birth weight has a "causal" effect on infant mortality among these "compromised" births.

Overall, the findings suggest that interventions with respect to birth weight will not reduce racial disparities in mortality among "normal" births, but might reduce them among "compromised" births. Identification of the exact mechanisms and whether birth weight plays a "causal" role conditional on "compromised" birth will require additional analysis, i.e. control of potential confounding. The "compromised" subpopulation accounts for about 29-41% of the observed racial disparity for females and males respectively (Table 4).

If our hypothesis concerning the selection effects of fetal loss on observed racial disparities is correct, then the total racial disparity is higher than observed, and the proportion of the disparity due to the "compromised" subpopulation is larger than observed. The confounding, represented by the mixing proportion, accounts for an additional 17-21% of the observed racial disparity for males and females, respectively (Table 4). Nevertheless, completely eliminating the "compromised" subpopulation would a) reduce both the low and the macrosomic birth weight rates, which are generally associated with elevated infant mortality in both African and European American birth cohorts, b) reduce the size of the racial disparity if direct standardization based on the European American distribution are accepted, c) reduce the size of the disparity yet again if our hypothesis concerning the selection effects of fetal loss in the "compromised" subpopulation is correct and included as a potential bias, but d) still result in a population with a racial disparity of 1.9 and 1.8 for females and males, respectively (Table 5), about the level of the relative risk currently observed in the raw data (2.1 for both sexes, Table 1).