Prediction of newborn’s body mass index using nationwide multicenter ultrasound data: a machine-learning study

Background This study introduced machine learning approaches to predict newborn’s body mass index (BMI) based on ultrasound measures and maternal/delivery information. Methods Data came from 3159 obstetric patients and their newborns enrolled in a multi-center retrospective study. Variable importance, the effect of a variable on model performance, was used for identifying major predictors of newborn’s BMI among ultrasound measures and maternal/delivery information. The ultrasound measures included biparietal diameter (BPD), abdominal circumference (AC) and estimated fetal weight (EFW) taken three times during the week 21 - week 35 of gestational age and once in the week 36 or later. Results Based on variable importance from the random forest, major predictors of newborn’s BMI were the first AC and EFW in the week 36 or later, gestational age at delivery, the first AC during the week 21 - the week 35, maternal BMI at delivery, maternal weight at delivery and the first BPD in the week 36 or later. For predicting newborn’s BMI, linear regression (2.0744) and the random forest (2.1610) were better than artificial neural networks with one, two and three hidden layers (150.7100, 154.7198 and 152.5843, respectively) in the mean squared error. Conclusions This is the first machine-learning study with 64 clinical and sonographic markers for the prediction of newborns’ BMI. The week 36 or later is the most effective period for taking the ultrasound measures and AC and EFW are the best predictors of newborn’s BMI alongside gestational age at delivery and maternal BMI at delivery. Supplementary Information The online version contains supplementary material available at 10.1186/s12884-021-03660-5.


Background
Low birthweight and childhood obesity are the leading causes of disease burden in the world. One in every seven babies was born with low birthweight (less than 2500 g) in the world for year 2015 and newborns with low birthweight are more likely to die in the first 28 days of life than their normal counterparts [1]. Likewise, 40 million children under the age of five were overweight or obese in the world for year 2016 [2] and childhood overweight or obesity is expected to have short-term and long-term consequences including asthma [3], depression [4], diabetes [5], hypertension [6], dyslipidemia [7] and cardiovascular disorders [8]. Given this global challenge, member states of the World Health Organization endorsed "No Increase in Childhood Overweight by 2025" as one of six global nutrition targets [9].
In this context, several retrospective studies of obstetric patients and their newborns endeavored to analyze newborn's weight and its major predictors [10][11][12][13]. These studies focused on ultrasound measures and maternal/delivery information, while coming from various regions including East Asia (Taiwan), Middle East (Lebanon), North America (United States) and North Europe (Denmark). Based on the linear-regression results of these studies, the following variables were good predictors of newborn's weight: abdominal circumference or diameter, biparietal diameter, gestational age at delivery, maternal weight at delivery and maternal body mass index (BMI). However, these studies did not address (1) which predictors are more important for the prediction of newborn's weight and (2) which periods are more effective for taking the ultrasound measures and managing the delivery outcome. Also, existing literature ignores newborn's BMI and highlights newborn's weight. However, newborn's BMI, which has a strong association with newborn's fat mass, would be a better indicator of newborn's adiposity, given that newborn's weight includes not only fat mass but also head size, lean mass and bone mass.
For this reason, this study introduces machine learning approaches to predict newborn's BMI based on ultrasound measures and maternal/delivery information. Machine learning (or data mining) methods are statistical methods to extract knowledge from large amounts of data. Specifically, the random forest and the artificial neural network (ANN) do not require unrealistic assumptions of linear regression such as ceteris paribus, "all the other variables staying constant". Also, the random forest can address (1) which predictors are more important for the prediction of newborn's BMI and (2) which periods are more effective for taking the ultrasound measures and managing the delivery outcome. Indeed, data in this study are larger than those in the previous studies -4590 mother-baby pairs and 64 independent variables. This study attempts to demonstrate that machine learning approaches based on ultrasound measures would be a useful noninvasive tool for predicting newborn's BMI.

Participants and variables
Data came from the medical records of 3159 obstetric patients and their newborns enrolled in a multi-center retrospective study. This study was conducted during September 2019-April 2020 and 48 general hospitals participated in this study. This study was approved by the institutional review boards of the general hospitals. This process was followed by data collection, analysis and interpretation. One hundred women with singleton pregnancies were selected from each of the general hospitals. These women were Korean citizens aged 20-44 years. They gave births during June 2015-June 2019 and their gestational age at delivery varied from 24 weeks 0 days to 41 weeks 6 days. These women did not have any disease including pre-gestational or gestational diabetes or hypertension. Newborns who were large for gestational age or had fetal growth restrictions were included, whereas those with congenital anomalies were excluded.

Analysis
Five machine learning methods were applied for predicting newborn's BMI, the dependent variable of this study: linear regression, random forest and ANNs with one, two and three hidden layers [17]. Each hidden layer had three neurons in this study. Data on 3159 participants were divided into training and validation sets with a 75: 25 ratio (2370 vs. 789 observations). The mean squared error (MSE), the average of the squares of errors among 789 observations, was introduced as a criterion for validating the models trained. Here, errors are gaps between actual and predicted values of the dependent variable, newborn's BMI. Variable importance from the random forest, the effect of a variable on model performance, was used for identifying major predictors of newborn's BMI among ultrasound measures and maternal/delivery information. R-Studio was employed for the analysis on April 2020.

Results
Descriptive statistics for continuous and categorical variables in this study are summarized in Table 1. The median (Q2) values of newborn's BMI, GA36AC1, GA36EFW1 and gestational age at delivery were 12.74 kg/m 2 , 322 mm, 2866 g and 38 weeks, respectively. Likewise, the median values of GA21AC1 and maternal BMI at delivery were 214.70 mm and 26.04 kg/m 2 , respectively. The MSEs of the five machine learning models are shown in Table 2. The random split and the statistical analysis were repeated 3 times and their average MSE was calculated for each of the five statistical methods, i.e., linear regression, random forest and ANNs with one, two and three hidden layers. Linear regression and the random forest were much better models than the ANNs for predicting newborn's BMI. Their average MSEs over the three runs were 2.0744, 2.1610, 150.7100, 154.7198 and 152.5843, respectively.
Based on variable importance from the random forest, major predictors of newborn's BMI were the first AC and EFW in the week 36 or later, gestational age at delivery, the first AC during the week 21 -the week 35, maternal BMI at delivery, maternal weight at delivery and the first BPD in the week 36 or later ( Table 3, Table  S1 (supplementary information) and Fig. 1). The findings of linear regression present useful information about the effect of a major determinant on newborn's BMI. For example, newborn's BMI will increase by 0.0142 if GA36AC1 increases by 1 mm. Likewise, newborn's BMI will increase by 0.4142 if gestational age at delivery increases by 1 week. It is to be noted, however, that the results of linear regression are based on an unrealistic assumption of ceteris paribus, "all the other variables staying constant". For this reason, the coefficients of some predictors were statistically significant in linear regression but their importance rankings were not high from the random forest, a data-driven approach with no such an assumption of "all the other variables staying constant". In this context, the findings of linear regression are to be considered as just supplementary information to the variable importance from the random forest.

Findings of study
This study introduced machine learning approaches to predict newborn's BMI based on ultrasound measures and maternal/delivery information. Based on variable importance from the random forest, the week 36 or later is the most effective period for taking the ultrasound measures and AC and EFW are the best predictors of newborn's BMI alongside gestational age at delivery and maternal BMI at delivery. These results are consistent with existing literature on the topic [18,19]. In terms of the MSE for predicting newborn's BMI, linear regression (2.0744) and the random forest (2.1610) were much better models than ANNs with one, two and three hidden layers (150.7100, 154.7198 and 152.5843, respectively). Indeed, the MSEs of linear regression (2.0744) and the random forest (2.1610) were smaller than the variation of newborn's BMI (2.4649). This suggests that machine learning approaches based on ultrasound measures would be a useful noninvasive tool for predicting newborn's BMI.
The findings of this study are consistent with those of previous retrospective studies on the prediction of newborn's weight with clinical and sonographic markers. In a study of 238 obstetric patients in Denmark, AC and BPD during the third trimester were effective predictors of newborn's weight, given that the MSE of linear regression was similar with the variation of newborn's weight [10]. In a study of 109 pregnant women in the United States, newborn's weight had positive associations with fetal adiposity in the week 30 and gestational age at delivery [11]. In a study of 1000 obstetric patients in Lebanon, newborns with maternal gestational weight gain were more likely to have macrosomia than those with normal gestational weight gain (Odds Ratio 1.888) [12]. Likewise, another study of 110 pregnant women in Taiwan reported that AC and BPD during the week 20week 24 are significant predictors of newborn's weight     together with gestational age at delivery, maternal weight at delivery and maternal BMI at delivery [13]. However, the previous studies did not address (1) which predictors are more important for the prediction of delivery outcome and (2) which periods are more effective for taking ultrasound measures and managing delivery outcome. This study provides plausible answers to these challenging questions. Moreover, conventional studies focus on newborn's weight as a measure of newborn's adiposity but the findings of this study suggest that newborn's BMI would be a good alternative. Firstly, the United States Center for Disease Control and Prevention recommends the BMI-forage chart as a screening tool for the overweight and underweight of boys and girls aged 2 to 20 years [20]. Two major rationales behind this recommendation state that (1) the BMI is a more consistent indicator across different generations than weight and (2) the BMI contains the dimensions (and strengths) of weight and height measures at the same time. Secondly, it is reported that newborn's BMI has stronger correlations with magnetic-resonanceimaging measures of newborn's fat mass than do newborn's other anthropometrics [21]. Thirdly, infant's BMI is expected to have a stronger correlation with early childhood obesity than infant's weight-for-length. Based on the medical records of 73,949 full-term infants from a large pediatric network, 47% of infants with BMI ≥ 97.7th percentile at 2 months (vs. 29% of infants with weight-forlength ≥ 97.7th percentile at 2 months) were obese at 2 years [22]. Fourthly, using newborn's BMI (instead of newborn's weight) would engender greater stability for statistical analysis. For example, the estimations of ANNs with two layers did not converge when newborn's weight (instead of newborn's BMI) was the dependent variable in this study.

Limitations of study
This study had some limitations. Firstly, for the calculation of EFW, one general hospital used a different formula. Using the same formula for EFW is expected to improve model performance in future study. However, the results of this study did not change after removing the data based on the different formula. Secondly, this study did not consider possible mediating effects among variables. Thirdly, it would be a good topic for future research to develop a BMI guideline for newborn's adiposity. According to an international guideline, adult's categories of underweight, normal, overweight and obesity are defined as BMIs smaller than 18.5 kg/m 2 , within 18.5-25.0 kg/m 2 , within 25.0-30.0 kg/m 2 and equal to/greater than 30.0 kg/ m 2 , respectively [23]. An equivalent guideline for newborns needs to be developed based on comprehensive and systematic analysis. Fourthly, this study did not consider socioeconomic factors (education, income) and other possible obstetric variables such as periodontitis, upper gastrointestinal tract symptoms, gastroesophageal reflux disease, Helicobacter pylori, pelvic inflammatory disease history, diabetes mellitus (type I, type II, gestational), hypertension (chronic, gestational) and medication history (e.g., progesterone, calcium channel blocker, nitrate, tricyclic antidepressant, benzodiazepine and sleeping pills). Recent studies on preterm birth reported that these factors would affect the delivery outcome [24,25] and it would be an important contribution to extend this study based on these new variables. Fifthly, further analysis of specific patients, e.g., symptomatic vs. asymptomatic, single vs. multiple gestation, would offer more insight on this line of research with more detailed clinical implications. Sixthly, this study did not consider various options of parameter tuning for the ANN. Its performance was worse than those of linear regression and the random forest in this study. Finding optimal parameters for the ANN is reported to be a challenging task and it will be a good topic for future research. Seventhly, the focus of this study was to find important predictors of newborn's BMI. Exploring possible mechanisms between each important predictor and newborn's BMI is expected to make a good contribution for this line of research. Finally, the values of the following variables outside 1.5*(Interquartile Range), so called "outliers", were deleted in this study: maternal weight at delivery, GA11CRL1, GA20BPD1, GA20FL1, GA21BPD1, GA21FL1, GA21BPD2, GA21FL2, GA21BPD3 and GA21FL3. It was beyond the scope of this study to evaluate other optimal strategies to handle outliers in the data.

Conclusions of study
The week 36 or later is the most effective period for taking the ultrasound measures and AC and EFW are the best predictors of newborn's BMI alongside gestational age at delivery and maternal BMI at delivery. Machine learning approaches based on ultrasound measures would be a useful noninvasive tool for predicting newborn's BMI.