Study design
The JECS gathered medical records, questionnaire results, and biological specimens from pregnant women from pregnancy through to child-rearing, with the content of the data collected depending on the stage of gestation, parturition, and childcare. This method of investigation enabled researchers to determine participants’ characteristics throughout the period in question.
We used the data obtained from the questionnaires and medical records. Pregnant women completed the first questionnaire (M-T1) during their first trimester, and the second questionnaire (M-T2) during their second and third trimesters. These respondents answered the questionnaires and returned them in person at subsequent prenatal visits or by sending them via mail to JECS Regional Centers. Where possible, the centers addressed incomplete questionnaires by performing subsequent face-to-face or telephone interviews with the respondents [24]. The participating women also recruited their partners, and there are approximately half as many registered fathers in the dataset as there are registered mothers. We limited the data used in our analysis to the mothers’ responses; this was to avoid the risk of sample selection bias that could be caused by including fathers’ responses. M-T1 includes question items concerning family characteristics, disease, tobacco use, substance use, working status, working environment, and various other topics. Moreover, M-T2 contains question items pertaining to health status, dietary habits, tobacco use, sleep quality, home appliances, substance use, working status, education history, household income, and social capital. Finally, medical records following delivery (Dr-0 m) contain details regarding the newborn baby, obstetric and delivery complications, and other topics.
From M-T1, we used the information regarding family characteristics, self-reported history of disease, and labor-force participation; moreover, from M-T2 we used the PCS and MCS scores, age, experience of stressful events, education history, household income, and level of social capital (stressful events included experiencing, over the course of the previous year, the death and/or illness of a loved one, the loss of the respondent’s and/or spouse’s job, the acquiring of a significant mortgage, divorce, moving home, and marital problems). The presence of obstetric complications was identified using information from M-T2 and Dr-0 m. In Dr-0 m, physicians reported the timing and diagnosis of obstetric complications; if the diagnosis of an obstetric complication was recorded prior to the respondent completing M-T2, we regarded the respondent as having experienced a pregnancy with an obstetric complication.
Outcome measures
In our statistical analysis, we considered the PCS and MCS scores as outcome variables. Specifically, the SF-8 PCS and MCS scores were calculated based on the respondents’ answers to the question items in M-T2, which includes items assessing general health, physical functioning, role-physical, bodily pain, vitality, social functioning, mental health, and role-emotional. The PCS and MCS scores measure physical and mental functioning, respectively, with higher scores indicating better health status; the validity of the Japanese translation of these question items has been verified in previous research [28].
Exposure
The main exposures are the variables measuring pregnant women’s social capital. As previous studies conceptualized, we regarded the resources embodied by the individual’s social network as individual social capital; and the resources formed by social cohesion, such as the stocks of trust or reciprocal relationships within the community, as neighborhood social capital [3, 12, 29]. The M-T2 questionnaire contained question items pertaining to individual communication and evaluation of trust in and support received from neighbors.
Supplementary Table 1 (Additional File 1) shows the question items related to social capital. The contents of the questions concerning individual social capital (questions A to D) are similar to those of the six questions from the Social Support Questionnaire (SSQ) [30]. In response to these questions, the respondents provide information regarding how often and strongly they depend on others. The variables we extracted from the questions represented social capital in terms of social networking at the individual level. Moreover, the questions on neighborhood social capital (questions E and F) are similar to the “social cohesion and trust” components of a questionnaire used in the Project on Human Development in Chicago Neighborhoods (PHDCN) [31]. Questions E and F require respondents to evaluate their degree of trust in and the support they receive from their neighbors. The answers respondents provide to these questions imply group attributes, measured in terms of individual understanding. The variables we extracted from these questions were considered to reflect social cohesion
Participants
Research groups can access the JECS data through the JECS Program Office’s regional centers. We used the “jecs-ag-20160424” dataset, which includes questionnaire responses from mothers and fathers and medical records from physicians from the time of registration to 1 month after parturition. Before beginning the statistical analysis, we excluded some portions of the dataset, in accordance with our research criteria. More specifically, the total number of pregnancies registered in the dataset was 103,099. Women registered with multiple pregnancies within the survey period were included but, for each woman, we limited the data to that for the first pregnancy, which reduced the dataset to 97,454. Data from respondents who withdrew consent were eliminated; this resulted in a further reduction to 97,425 participants. Finally, we targeted data from only participating women who answered all question items related to the variables used in our analysis. Thus, we ultimately analyzed a dataset of responses obtained from 79,210 respondents. The study flow chart is shown in Fig. 1.
Statistical analysis
To estimate the causal influence of social capital on health, we adopted average treatment effect (ATE) estimation with an inverse probability weighting (IPW) estimator. In the analysis, it is required to consider the property that the richness of social capital would be endogenously determined by individual characteristics in general, and to measure the causal influence without bias caused by said property. Generally, ATE is based on the difference of average outcomes between treated and untreated groups in an intervention. In this research, we reckoned individuals’ different levels of social capital as a kind of non-randomized treatment for respective individuals, and we applied the method of the ATE estimation.
The IPW estimator is useful to control the bias caused by non-randomized treatment. In cases where the treatment is not randomized, a simple comparison of outcome averages between treated and untreated groups highlights both the effects of the treatment and the differences in characteristics between them. For the correction of the sample selection bias caused by the characteristic differences, the IPW estimator is valuable in ATE estimation.
We regarded pregnant women with the lowest level of social capital as untreated samples and the women with other levels of social capitals as treated samples in the ATE estimation with the IPW estimator. When the level of social capital is determined by individual characteristics, the characteristics would differ between treated and untreated samples. Our analysis used the IPW estimator to avoid the bias caused by the characteristic differences.
In IPW, the reciprocal of the probabilities of the assignment to the treated and untreated groups for respective samples are estimated and employed as the weighting variables in the calculation of averages within the groups. The variables control the effect of the non-randomized assignment on the averages.
The IPW estimator requires a regression equation for assigning samples between the treated and untreated groups. As a dependent variable, the equation has a dichotomous variable that is equal to “1” for treated samples and “0” for untreated samples. The individual characteristics considered to affect the sample assignment are used as the independent variables. The results of this estimation provide for each individual predicted probabilities regarding their level of belonging to the treated and untreated groups, respectively; this predicted probability is generally called a “propensity score.” The inverse probability, namely, the reciprocal of the propensity score, is used in the IPW estimator. Thus, we respectively calculated the weighted averages for the treated and untreated samples using the inverse probabilities as the weighting variables.
The mathematical specification of the IPW estimator is as follows. For individual i, zi is a dichotomous variable that reflects his/her assignment to the treated and untreated groups, respectively. If a sample is assigned to the treated group, zi = 1; if the sample is assigned to the untreated group, zi = 0. xi represents the vector of the covariates for the assignment. The predicted probability that the sample is assigned to the treated group is described as ei = p(zi = 1| xi), and the range of the probability is 0 to 1. This probability is individual i‘s propensity score. The predicted probability that the sample is assigned to the untreated group is 1 − ei. These predicted probabilities are generally obtained from the results of logistic regression analysis. When yi is an outcome of individual i, the weighted average of the outcome variable among the treated samples using the inverse probability is \( \hat{E}\left({y}_1\right)=\sum \limits_{i=1}^{N_1}\frac{z_i{y}_i}{e_i}/\sum \limits_{i=1}^{N_1}\frac{z_i}{e_i} \). Moreover, the weighted average among the untreated samples is \( \hat{E}\left({y}_0\right)=\sum \limits_{i=1}^{N_0}\frac{\left(1-{z}_i\right){y}_i}{1-{e}_i}/\sum \limits_{i=1}^{N_0}\frac{\left(1-{z}_i\right)}{1-{e}_i} \). The ATE is calculated by \( \hat{E}\left({y}_1\right)-\hat{E}\left({y}_0\right) \) [32,33,34,35,36,37].
In the survey from which our data are sourced, the question items pertaining to social capital present three or more response choices, allowing for different levels of social capital. Based on their answers to these questions, the samples are divided into groups representing various levels of social capital. Each answer is transformed into a categorical variable. We calculated the predicted probability through multinomial logistic regression, with the categorical variable set as a dependent variable. The group with the lowest level of social capital was regarded as the untreated group; the other groups were regarded as the treated groups. The analysis considered the ATE between the groups with the lowest and some different medium levels of social capital, as well as the ATE between the groups with the lowest and highest levels of social capital.
The following example illustrates the method we used to calculate the ATE, based on a question item for which an individual selects an answer from three options. Specifically, one of the items on the questionnaire regarding social capital is: “The number of friends or neighbors to whom you can talk casually about your concerns” (Supplementary Table 1, Additional File 1). The associated response options are: “none,” “one or two,” and “three or more.” This question item creates three groups with different levels of social capital. In our analysis process, first, the multinomial logistic regression analysis is performed, which provides the probabilities that the respective options are selected by an individual. When the predicted probabilities are described as ei0 (for “none”), ei1 (for “one or two”), and ei2 (for “three or more”), respectively, for individual i, ei0 + ei1 + ei2 = 1. We calculate the weighted averages of the outcome variable for the respective groups using the reciprocal of the predicted probabilities. The weighted average of the outcome among untreated samples who select “none” is: \( \hat{E}\left({y}_0\right)=\sum \limits_{i=1}^{N_0}\frac{y_i}{e_{i0}}/\sum \limits_{i=1}^{N_0}\frac{1}{e_{i0}} \). Similarly, the weighted averages among treated samples who select “one or two” and “three or more” are expressed as: \( \hat{E}\left({y}_1\right)=\sum \limits_{i=1}^{N_1}\frac{y_i}{e_{i1}}/\sum \limits_{i=1}^{N_1}\frac{1}{e_{i1}} \) and \( \hat{E}\left({y}_2\right)=\sum \limits_{i=1}^{N_2}\frac{y_i}{e_{i2}}/\sum \limits_{i=1}^{N_2}\frac{1}{e_{i2}} \), respectively. The ATEs are calculated by \( \hat{E}\left({y}_1\right)-\hat{E}\left({y}_0\right) \) and \( \hat{E}\left({y}_2\right)-\hat{E}\left({y}_0\right) \).
To calculate ATE based on the IPW estimator, two assumptions must be satisfied, otherwise this ATE evaluation cannot be justified [38]. One assumption is that each sample has a positive possibility of receiving each treatment level. When there is at least some overlap between the estimated density of the propensity scores that treated samples are assigned to an untreated group and the estimated density of the propensity scores that untreated samples are assigned to an untreated group, the overlap assumption is not violated. If the estimated density for the treated samples has most of its mass near 0, while for the untreated samples the estimated density is near 1, these densities do not have an overlapping region, and the overlap assumption is violated [38, 39].
The second assumption is that the means of the covariates corrected by the IPW estimator are balanced between treated and untreated samples. When the means of the covariates of the treated samples are close to those of the untreated samples, the assumption can be considered as not being violated [38, 40]. Before obtaining the ATEs, the validity of these assumptions must be checked.
All analyses were performed using the STATA MP software package, version 15.0 (STATA Corporation, College Station, TX).