Skip to main content

Building a predictive model of low birth weight in low- and middle-income countries: a prospective cohort study

Abstract

Background

Low birth weight (LBW, < 2500 g) infants are at significant risk for death and disability. Improving outcomes for LBW infants requires access to advanced neonatal care, which is a limited resource in low- and middle-income countries (LMICs). Predictive modeling might be useful in LMICs to identify mothers at high-risk of delivering a LBW infant to facilitate referral to centers capable of treating these infants.

Methods

We developed predictive models for LBW using the NICHD Global Network for Women’s and Children’s Health Research Maternal and Newborn Health Registry. This registry enrolled pregnant women from research sites in the Democratic Republic of the Congo, Zambia, Kenya, Guatemala, India (2 sites: Belagavi, Nagpur), Pakistan, and Bangladesh between January 2017 – December 2020. We tested five predictive models: decision tree, random forest, logistic regression, K-nearest neighbor and support vector machine.

Results

We report a rate of LBW of 13.8% among the eight Global Network sites from 2017–2020, with a range of 3.8% (Kenya) and approximately 20% (in each Asian site). Of the five models tested, the logistic regression model performed best with an area under the curve of 0.72, an accuracy of 61% and a recall of 72%. All of the top performing models identified clinical site, maternal weight, hypertensive disorders, severe antepartum hemorrhage and antenatal care as key variables in predicting LBW.

Conclusions

Predictive modeling can identify women at high risk for delivering a LBW infant with good sensitivity using clinical variables available prior to delivery in LMICs. Such modeling is the first step in the development of a clinical decision support tool to assist providers in decision-making regarding referral of these women prior to delivery. Consistent referral of women at high-risk for delivering a LBW infant could have extensive public health consequences in LMICs by directing limited resources for advanced neonatal care to the infants at highest risk.

Peer Review reports

Background

More than 20 million low birth weight (LBW, < 2500 g) infants are born annually [1]. LBW infants are at increased risk for mortality and serious neurodevelopmental outcomes, making LBW a major global public health problem [2]. In addition to mortality risks, LBW infants often need advanced medical care after birth to treat problems associated with prematurity (e.g., respiratory distress syndrome, infections, feeding problems) or problems associated with being born small for gestational age (SGA; e.g., hypoglycemia, hypothermia, poor postnatal growth). Since few centers in low- and middle-income countries (LMICs) have the ability to provide advanced neonatal care, allocation of advanced care towards LBW infants is a critical part of improving health outcomes for this population.

Identification of pregnant women at risk for the delivery of LBW infants prior to birth could facilitate referral of these women to delivery centers with advanced neonatal care, thereby reducing neonatal mortality related to LBW. Machine learning, or predictive modeling, has been successful at identifying high-risk groups for certain health outcomes, [3,4,5,6] and therefore could be a useful tool to risk-stratify pregnant women in low-resource settings [7]. If a machine learning tool could reliably predict women with pregnancies at high risk of LBW, it could be produced in a user-friendly interface to help providers make decisions about referral of these women to delivery centers with advanced neonatal care. Prior studies have investigated the use of machine learning techniques for the prediction of birth weight, but the majority have used small datasets ranging from less than 100 to 50,000 women [8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27]. Predictive modeling tools based on high quality data from larger data sets are needed to more accurately predict LBW in low-resource settings.

The Eunice Kennedy Shriver National Institute of Child Health and Human Development Global Network for Women’s and Children’s Health Research (GN) maintains a Maternal and Newborn Health Registry (MNHR) documenting pregnancy characteristics and outcomes for over 30,000 mother/infant dyads annually in seven LMICs. This high quality and large dataset is a unique resource to investigate predictive models for LBW in low-resource settings. For this study, our goal was to determine pregnancy characteristics associated with greater probability of delivering LBW infants using the GN MNHR dataset. We also aimed to develop and compare the performance of five predictive models to identify LBW infants using the MNHR data. Understanding these predictors may assist in identifying who will need additional care at delivery, facilitating timely advanced care for LBW infants and thereby reducing long-term morbidity and mortality. We hypothesized that predictive model analysis would identify previously-known prenatal predictors associated with LBW (e.g., infection and hypertension/eclampsia) and new predictors not previously considered or fully explored in prior analyses.

Methods

We used the GN MNHR dataset for this study, which includes data from eight GN research sites in seven LMICs (the Democratic Republic of the Congo, Zambia, Kenya, Guatemala, India [2 sites: Belagavi, Nagpur], Pakistan, and Bangladesh) [28]. The MNHR contains maternal, pregnancy and delivery characteristics collected by trained research staff using medical record abstraction and in-person interviews with pregnant women. In the MNHR, birthweights are measured on all livebirths and stillbirths, and fresh stillbirths are defined as having no signs of maceration, such as skin or soft tissue changes including skin sloughing or discoloration. The MNHR is approved by appropriate institutional review boards or research ethics committees at each participating institution. The MNHR undergoes routine quality assurance processes [29] and is registered as trial number NCT01073475 in clinicaltrials.gov.

For this study, we included singleton livebirths and fresh stillbirths in the GN MNHR who were not lost to follow-up prior to delivery and delivered at or after 20 weeks (in keeping with the MNHR definition of stillbirth occurring at or after 20 weeks) between January 2017 and December 2020 [28, 30]. We excluded maternal deaths prior to delivery, miscarriages, medical terminations of pregnancy (MTP), macerated stillbirths or stillbirth of unknown type, unknown birth outcomes, multiples, LBW status missing and births with any predictive model covariate missing.

Outcome and variable definitions

Our primary outcome was LBW, defined as birth weight < 2500 g by measured weight when available, or estimated weight. We selected LBW as a surrogate for preterm birth given the lack of reliable gestational age dating for the total birth population. We evaluated candidate predictors from the variables that are collected in the MNHR, focusing on characteristics that do not require the use of lab tests or ultrasound which may not be available in all resource-poor settings. We selected characteristics or complications that were present prior to the time of delivery since our focus was to build a predictive model that could direct care prior to delivery. We evaluated maternal characteristics of age (< 20 years old, 20–35 years old, > 35 years old), education (no formal education, primary/secondary education, University +), parity (0, 1, 2, 3, 4 +), height, maternal weight, socioeconomic status (SES) score (< 34, 34–65, 66 + , where lower scores indicate lower household assets and SES status) [31] and previous livebirth (yes, no, no previous pregnancy lasting 20 + weeks). Of note, SES data collection in the MNHR was initiated in 2017 but site initiation varied throughout the year. We also evaluated pregnancy characteristics including the number of antenatal care visits (0, 1–3, 4 +), use of iron supplementation, use of vitamin or calcium supplementation, hypertensive disorders (systolic blood pressure ≥ 140 mmHg and diastolic blood pressure ≥ 90 mmHg on two or more occasions after 20 weeks of pregnancy, proteinura, or generalized seizures in the setting of preeclampsia), severe antepartum hemorrhage (vaginal bleeding after 22 weeks of pregnancy and before the onset of labor that is > 1,000 mL or heavy enough to soak a pad or cloth in less than five minutes), and severe infection during pregnancy (serious illness with symptoms that can include fever, chills, rapid breathing, rapid heart rate, confusion, disorientation, hypotension, and cold, clammy skin).

Analytic methods

We completed exploratory data analysis of study outcomes, maternal characteristics, and pregnancy characteristics, looking for predictors that were highly correlated with each other, had no or little variation or were missing for many subjects. We generated descriptive statistics of frequencies for categorical variables and count, mean, and standard deviation for continuous variables.

We prepared data for the models to exclude participants missing one or more of the predictors. The binary outcome for the predictive models was LBW. The variables described above were included as predictors. We prepared data and descriptive tables using SAS 9.4 and ran predictive models using Scikit-learn in Python 3. We picked Belagavi as the reference because this site generally enrolled women earlier than the other sites, thus representing ‘best case scenario’ for having information available early in pregnancy. We picked parity of one as the reference since nulliparous women are at greater risk for poor outcomes and of the remaining parity groups, parity = 1 had the largest sample size.

We developed and tested five predictive models: decision tree and random forest (both tree-based models), logistic regression, K-nearest neighbors and support vector machines. The decision tree model, based on classification and regression trees (CART), splits the subjects into consecutive sub-groups based on the most important predictors at each node of the tree. The random forest model avoids overfitting that may happen with a decision tree model by using multiple trees. For both tree-based models, we used Gini impurity criterion to determine which predictors yielded the most information for each classification split. Thirdly, we employed a regularized logistic regression model, which used an L2 ridge regulation penalty to avoid overfitting. For our fourth model, we used K-nearest neighbors with weights set by distance. Finally, our fifth model type was support vector machine models where we ran linear, degree 2 polynomial and radial basis kernel functions. For all models, except the K-nearest neighbors, we improved the class imbalance in the study outcome by using balanced weights.

To develop the models, we split the data into a training dataset (75% of available data) and test dataset (25% of available data). Hyperparameters were tuned using tenfold grid-search cross validation with scoring = ’roc_auc’. In addition to hyperparameter tuning, we varied the cut point for the probability used to classify an outcome as LBW for the logistic regression model from 0.1 to 0.9. We trained the models on the training data and validated the models on the test data. To validate the models, we generated predictive accuracy measures, including calculating area under the curve (AUC) and producing receiver operator characteristic (ROC) curves. We calculated precision (positive predictive value), recall (sensitivity), and f1 scores using the classification_report() method. To further evaluate model performance, we generated calibration curves. We implemented the permutation-based importance in Scikit-Learn as permutation_importance() method. This method randomly shuffles each feature and computes the change in the model’s performance. The features which impact the performance the most are the most important ones. Additionally, we created partial dependency plots of the probability of LBW based on the predictors for the models that performed the best.

Results

Of the 179,953 women screened in the MNHR from January 2017 to December 2020, 145,206 (80.7%) women were eligible, consented, were not lost to follow-up prior to delivery, and delivered a singleton fresh stillbirth or livebirth with known LBW status and non-missing covariates (Fig. 1, Table 1). The most common reasons for exclusion were miscarriage and MTP in the Asian sites (total of 17.7% of Belagavi, 7.1% of Nagpur, 9.4% of Pakistani and 4% of Bangladeshi deliveries). The most common missing covariates were SES (11.2% missing overall) and maternal height (4% of subjects in Kenya) (data not shown). Other exclusions (maternal death prior to delivery, macerated stillbirth or unknown stillbirth type, unknown birth outcome, multiples, and LBW status missing) occurred in < 2% of deliveries at each site. Of the analysis subset, 2,268 were fresh stillbirths and 142,938 were livebirths; 13.8% were LBW (of which 98.9% were measured weights). The Asian sites had the highest LBW rates of 19.2% or more, and Zambia and Kenya had the lowest rates at 6.4% and 3.8%, respectively.

Fig. 1
figure 1

CONSORT diagram depicting reasons for exclusion and outcome for analysis population

Table 1 Screening, exclusion and outcome characteristics for analysis population from 2017 – 2020

We present the maternal and pregnancy characteristics in Table 2 for the analysis population. The majority of women included were 20–35 years old. Other maternal and pregnancy characteristics varied by site. Maternal and pregnancy characteristics by LBW status are provided in Supplement Table 1. Mothers of LBW infants were shorter (152 vs 155 cm), weighed less (49 vs 54 kg), and were less likely to have taken calcium or vitamin supplementation (14 vs 83%) than mothers who did not have a LBW infant. Mothers of LBW infants were also more likely to experience a complication of pregnancy, such as a hypertensive disorder (5.7 vs 1.7%), severe antepartum hemorrhage (2.3 vs 0.4%), severe infection of pregnancy (2.9 vs 1.1%), or fresh stillbirth (6.4 vs 0.8%).

Table 2 Maternal and pregnancy characteristics by site for the analysis population

The Pearson correlations for the variables included in the models were calculated (data not shown). Related variables include parity and previous livebirth (r =—0.72), SES and clinical site (r = 0.54), parity and age (r = 0.5), and previous livebirth and age (r =—0.45). The distribution of the outcome and predictors among the training dataset (N = 108,904) and test dataset (N = 36,302) were similar (data not shown).

The predictive accuracy measures for the five models are provided in Table 3. The logistic regression model performed slightly better than the other models with an AUC score of 0.72 and an accuracy score of 61%. The positive predictive value (model precision) for logistic regression was 22% and the sensitivity (model recall) was 72%. The harmonic mean of precision and recall (model f1-score or model sensitivity) was 34%. For logistic regression, the default cut point value of 0.5 yielded the AUC and accuracy scores that were as good as the other cut point values (data not shown). The support vector machine linear kernel model performed similarly to the logistic regression and tree-based models. The polynomial and radial basis function support vector machines performed similarly to the linear support vector machine (data not shown). The k-nearest neighbors model results were different with an AUC value of 0.58 and an accuracy score of 83%. Although the accuracy for this model was higher, the positive predictive value and sensitivity for this model were 20% and 7%, respectively. The Receiver Operator Characteristic (ROC) curves for the predictive models are provided in Fig. 2.

Table 3 Predictive accuracy measures by predictive model
Fig. 2
figure 2

Receiver operator characteristic (ROC) curves for the predictive models

Figure 3 depicts calibration curves for each of the predictive models. The Y-axis is the true fraction of newborns who are low birth weight (LBW) and the X-axis is the model-predicted probability of being LBW. The worst performing model was k-nearest neighbors; the near-horizontal line for this curve indicates the model will predict a consistent LBW percentage of around 15% regardless of the true incidence of LBW. The best performing model was the linear support vector machine, which predicts nearly perfectly for the lowest incidence rates and begins to diverge around 40% incidence.

Fig. 3
figure 3

Calibration curves for the predictive models. The Y-axis is the true fraction of newborns who are low birth weight (LBW) and the X-axis is the model-predicted probability of being LBW. The worst performing model was k-nearest neighbors; the near-horizontal line for this curve indicates the model will predict a consistent LBW percentage of around 15% regardless of the true incidence of LBW. The best performing model was the linear support vector machine, which predicts nearly perfectly for the lowest incidence rates and begins to diverge around 40% incidence

Figure 4 illustrates the permutation-based feature importance for the logistic regression model and partial dependency plots provide the directionality of these risk factors. For the logistic model, the most important variable relative to the other variables in predicting LBW was clinical site. The partial dependence plots show a higher probability of LBW for those not in the African sites, which coincides with the descriptive statistics that that the African sites had LBW rates a third of that of the Asian sites. Following clinical site, variables in order of importance that result in higher probability of LBW were lower maternal weight, 0–3 antenatal care visits, hypertensive disorder, severe antepartum hemorrhage, severe infection during delivery, and lower maternal height. The random forest and linear support vector machine models also found similar variables to be the most important in predicting LBW. The most important variables for each model are provided in Table 3. Table 4 provides regression coefficients and the model intercept for the logistic regression model.

Fig. 4
figure 4

Permutation-based feature importance for the logistic regression model. The permutation-based importance was implemented in Scikit-Learn as permutation_importance method. This method randomly shuffles each feature and computes the change in the model’s performance. The features which impact the performance the most are the most important ones. The score is how the variable compares to other variables in the model. Thus, a high score for any level of a categorical variable indicates the entire variable is important. For clinical sites, the reference group is Belagavi, India. For maternal age, the reference group is 20–35 years. For maternal education, the reference group is University + . For parity, the reference group is parity of 1. For socio-economic status, the reference group is 66 + . For previous livebirth, yes is the reference group. For antenatal care visits, the reference group is 4 + visits

Table 4 Logistic regression intercept and model coefficients for predictive model of low birth weight

Discussion

We report a rate of LBW of 13.8% among the eight GN sites from 2017–2020, with a range of 3.8% (Kenya) and approximately 20% (in each Asian site). We found that mothers of LBW infants were more likely to experience a complication of pregnancy, such as hypertensive disorder, severe antepartum hemorrhage, severe infection of pregnancy, or fresh stillbirth. We used five predictive modeling strategies to identify pregnancy characteristics that predict the outcome of LBW infants. Of the five models tested, the logistic regression model performed the best with an AUC of 0.72 and an accuracy of 61%. All of the top performing models identified clinical site, maternal weight, antenatal care, hypertensive disorders, and severe antepartum hemorrhage as key variables in predicting LBW.

Our logistic regression model had reasonable performance to predict LBW using maternal and pregnancy characteristics prior to delivery. If we created a model that had predicted every outcome to be non-LBW, our accuracy rate would be 86%, given the 14% incidence of LBW in our sample; however, the recall of such a model would be 0. The recall, or sensitivity (proportion of true positives correctly identified), of our logistic regression model was 0.72. Since this model is intended to identify women with high-risk pregnancies for referral, a preferable model is one that errs on the side of over-identification (more false positives, lower specificity) than under-identification (more false negatives, or lower sensitivity). Our logistic regression model also had a precision, or positive predictive value (proportion of positives reported that are true positives) of 0.22. Lower performance for precision would increase the number of false positives, incorrectly identifying women as high-risk for LBW when they deliver a non-LBW infant. While over-predicting women who are at high-risk for delivering a LBW infant could put strain on an under-resourced health system, this is a reasonable allowance for a screening test to direct women to increased surveillance.

A comparison of our results to those from prior studies illustrates the importance of studying critical variables in various populations/datasets and comparing across models. Our logistic regression model performed similarly to the predictive model for LBW that included different factors associated with LBW from a case–control study in North India with 500 neonates. That study identified inadequate maternal weight gain, inadequate maternal protein intake, prior preterm infant, prior LBW infant, anemia and passive smoking as factors significantly associated with LBW [26]. Their predictive model had a sensitivity of 72% and specificity of 64%. Another model, using the Bangladeshi Demographic and Health Survey data identified alive child, education, height, region, twin child and wealth index as significant risk factors for LBW [27]. This logistic regression-based classifier had an AUC of 0.59 and accuracy of 87.6%. A United Arab Emirates study from a dataset of 821 women evaluated 30 machine learning algorithms for LBW classification, and found that logistic regression with SMOTE oversampling techniques achieved an accuracy of 90.24% and recall of 90.2%, with critical variables of diabetes, hypertension, and gestational age [8]. Developing the best predictive model may require expanding data collection to include additional relevant predictors from a variety of prospective modeling studies, which would lead to better overall model performance.

In low-resource settings where prenatal ultrasound is infrequently available to evaluate fetal weight, identification of LBW in advance of delivery using predictive modeling could have a substantial impact on care. Our top performing models identified a consistent cluster of variables available prior to delivery as important predictors of delivering a LBW infant, including low maternal weight, hypertensive disorder and severe antepartum hemorrhage. Maternal weight, hypertensive disorder, and severe antepartum hemorrhage are detectable at a time when referral is still feasible, and thus could be feasibly incorporated into a clinical tool to predict LBW. In particular, maternal malnutrition is a major, potentially modifiable predictor of LBW identifiable early in pregnancy. Limited antenatal care was also identified as a risk factor, but this variable is confounded by the higher number of preterm infants that are LBW, since preterm delivery truncates the usual number of antenatal care visits. We suspect that improving data collection in these key domains could improve the reliability of the predictive model. For example, inclusion of additional clinical information such as specific maternal blood pressure, might improve the accuracy of the predictive tools and thereby enhance the clinical utility of the predictive models. Our predictive modeling study is the first step in the development of a clinical tool to support decisions regarding referral of pregnancies at high risk for LBW in low-resource settings. While our study did not identify novel predictors of LBW, a clinical decision support tool incorporating these results could enhance care by standardizing referral decisions related to the anticipated delivery of a LBW infant.

Our study also identified clinical site as a consistent predictor of LBW across our top performing models. It is important to recognize that sites are not necessarily reflective of care across a country; future studies could consider analysis by geographic clusters with similar LBW rates as an alternative approach. Ultimately, the influence of site on prediction of LBW suggests that clinical tools to predict LBW should be developed within the site that they will be used. Our analysis provides a rubric for the development of similar tools in new sites, identifying an important set of predictors for collection that are not related to site, and indicating that a traditional logistic model is sufficient for analysis. Given the importance of site in the model, additional research could also focus on understanding how site differences are related to measurable characteristics, with replication of modeling with these new characteristics to improve predictive performance and reduce the importance of site identifiers in the model.

Our study has several notable strengths. We used high quality and robust prospectively-collected research data from the NICHD GN MNHR. This unique, population-based dataset contains maternal characteristics, pregnancy characteristics and delivery outcomes collected for a large number of women in seven different LMICs in Latin America, Africa and Asia. Due to the paucity of detailed health records in LMICs, the MNHR is an exceptional resource by which to build a predictive model. We assessed and compared the performance of five different predictive models using independent training and test data. The side-by-side comparison showed that the logistic regression model performed similarly to the random forest and linear support vector machine models, which is encouraging since logistic regression models are widely used and less complex. However, our study also had some limitations. We were limited by the data collected in the MNHR to build the predictive models. We had missing data for socio-economic status and maternal height (primarily Kenya) in early 2017. Maternal weight and clinical site were both predictors of LBW but were related with lower average weights in the Asian sites compared to higher average weights in Guatemala and the African sites. While using BMI instead of weight might account for some of the difference in weight across sites, we chose to maintain maternal height and weight as separate variables in our modeling since the MNHR includes sites where stunting or underweight are serious issues. We did not have information from clinical records such as maternal blood pressure, fundal height, or other features that might have improved the precision of our model. Despite these limitations, we believe that the variables collected approximate the typical data that might be readily available in a low-resource area, where clinical variables might be difficult to obtain. We limited the analysis to five different model types; other models such as extreme gradient boosting may have performed better.

Conclusion

We identified several predictive modeling strategies that risk-stratify women in LMICs based on their risk of delivering a LBW infant using clinical variables readily available prior to delivery in low-resource settings. Our creation of these predictive models is an important first step in the development of a clinical decision support tool to prompt early referral of women at high-risk of delivering a LBW infant in LMICs. Such a clinical tool could facilitate standard referral of these women before delivery, directing the limited resource of advanced neonatal care to infants at highest risk. Timely, advanced care for LBW infants could reduce mortality and serious morbidity of these infants.

Availability of data and materials

The dataset analyzed for this study is available at the National Institute of Child Health and Human Development Data and Specimen Hub (https://doi.org/10.57982/t880-rf36).

Abbreviations

LBW:

Low birth weight

NICHD:

National Institute of Child Health and Human Development

SGA:

Small for gestational age

LMICs:

Low- and middle-income countries

GN:

Eunice Kennedy Shriver National Institute of Child Health and Human Development Global Network for Women’s and Children’s Health Research

MNHR:

Maternal and Newborn Health Registry

MTP:

Medical termination of pregnancy

SES:

Socioeconomic status

CART:

Classification and regression trees

AUC:

Area under the curve

ROC:

Receiver operator characteristic

References

  1. Blencowe H, Krasevec J, de Onis M, Black RE, An X, Stevens GA, et al. National, regional, and worldwide estimates of low birthweight in 2015, with trends from 2000: a systematic analysis. Lancet Glob Health. 2019;7(7):e849–60.

    Article  PubMed  PubMed Central  Google Scholar 

  2. Lee AC, Kozuki N, Cousens S, Stevens GA, Blencowe H, Silveira MF, et al. Estimates of burden and consequences of infants born small for gestational age in low and middle income countries with INTERGROWTH-21(st) standard: analysis of CHERG datasets. BMJ (Clin Res Ed). 2017;358: j3677.

    Article  Google Scholar 

  3. Sidey-Gibbons JAM, Sidey-Gibbons CJ. Machine learning in medicine: a practical introduction. BMC Med Res Methodol. 2019;19(1):64.

    Article  PubMed  PubMed Central  Google Scholar 

  4. Malacova E, Tippaya S, Bailey HD, Chai K, Farrant BM, Gebremedhin AT, et al. Stillbirth risk prediction using machine learning for a large cohort of births from Western Australia, 1980–2015. Sci Rep. 2020;10(1):5354.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Venkatesh KK, Strauss RA, Grotegut CA, Heine RP, Chescheir NC, Stringer JSA, et al. Machine learning and statistical models to predict postpartum hemorrhage. Obstet Gynecol. 2020;135(4):935–44.

    Article  PubMed  PubMed Central  Google Scholar 

  6. Shukla VV, Eggleston B, Ambalavanan N, McClure EM, Mwenechanya M, Chomba E, et al. Predictive modeling for perinatal mortality in resource-limited settings. JAMA Netw Open. 2020;3(11): e2026750.

    Article  PubMed  PubMed Central  Google Scholar 

  7. Sheikhtaheri A, Zarkesh MR, Moradi R, Kermani F. Prediction of neonatal deaths in NICUs: development and validation of machine learning models. BMC Med Inform Decis Mak. 2021;21(1):131.

    Article  PubMed  PubMed Central  Google Scholar 

  8. Khan W, Zaki N, Masud MM, Ahmad A, Ali L, Ali N, et al. Infant birth weight estimation and low birth weight classification in United Arab Emirates using machine learning algorithms. Sci Rep. 2022;12(1):12110.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Desiani A, Primartha R, Arhami M, Orsalan O. Naive Bayes classifier for infant weight prediction of hypertension mother. J Phys Conference Ser. 2019;1282:012005.

    Article  Google Scholar 

  10. Li J, Liu L, Sun J, Mo H, Yang J-J, Chen S, et al. Comparison of different machine learning approaches to predict small for gestational age infants. IEEE Transact Big Data. 2016;6(2):334–46.

    Article  Google Scholar 

  11. Akhtar F, Li J, Azeem M, Chen S, Pan H, Wang Q, et al. Effective large for gestational age prediction using machine learning techniques with monitoring biochemical indicators. J Supercomput. 2020;76(8):6219–37.

    Article  Google Scholar 

  12. Hussain Z, Borah MD. Birth weight prediction of new born baby with application of machine learning techniques on features of mother. J Stat Manag Syst. 2020;23(6):1079–91.

    Google Scholar 

  13. Faruk A, Cahyono ES. Prediction and classification of low birth weight data using machine learning techniques. Ind J Sci Technol. 2018;3(1):18–28.

    Google Scholar 

  14. Kuhle S, Maguire B, Zhang H, Hamilton D, Allen AC, Joseph K, et al. Comparison of logistic regression with machine learning methods for the prediction of fetal growth abnormalities: a retrospective cohort study. BMC Pregn Childb. 2018;18(1):1–9.

    Article  Google Scholar 

  15. Senthilkumar D, Paulraj S. Prediction of low birth weight infants and its risk factors using data mining techniques. Dubai, United Arab Emirates: Proceedings of the 2015 International Conference on Industrial Engineering and Operations Management; 2015.

  16. Loreto P, Peixoto H, Abelha A, Machado J. Predicting low birth weight babies through data mining. World conference on information systems and technologies. Springer; 2019.

    Google Scholar 

  17. Feng M, Wan L, Li Z, Qing L, Qi X. Fetal weight estimation via ultrasound using machine learning. IEEE Access. 2019;7:87783–91.

    Article  Google Scholar 

  18. Campos Trujillo O, Perez-Gonzalez J, Medina-Bañuelos V. Early prediction of weight at birth using support vector regression. Latin American conference on biomedical engineering. Springer; 2019.

    Google Scholar 

  19. Borson NS, Kabir MR, Zamal Z, Rahman RM. Correlation analysis of demographic factors on low birth weight and prediction modeling using machine learning techniques. London: 2020 Fourth World Conference on Smart Trends in Systems, Security and Sustainability (WorldS4); 2020. p. 169-173.

  20. Yarlapati AR, Roy Dey S, Saha S. Early prediction of LBW cases via minimum error rate classifier: a statistical machine learning approach. Hong Kong, China.: IEEE International Conference on Smart Computing (SMARTCOMP); 2017.

    Google Scholar 

  21. Al Habashneh R, Khader YS, Jabali OA, Alchalabi H. Prediction of preterm and low birth weight delivery by maternal periodontal parameters: receiver operating characteristic (ROC) curve analysis. Matern Child Health J. 2013;17(2):299–306.

    Article  PubMed  Google Scholar 

  22. Ahmadi P, Alavimajd H, Khodakarim S, Tapak L, Kariman N, Amini P, et al. Prediction of low birth weight using random forest: a comparison with logistic regression. Arch Adv Biosci. 2017;8(3):36–43.

    Google Scholar 

  23. Akhtar F, Li J, Pei Y, Imran A, Rajput A, Azeem M, et al. Diagnosis and prediction of large-for-gestational-age fetus using the stacked generalization method. Appl Sci. 2019;9(20):4317.

    Article  Google Scholar 

  24. Kumar SN, Saxena P, Patel R, Sharma A, Pradhan D, Singh H, et al. Predicting risk of low birth weight offspring from maternal features and blood polycyclic aromatic hydrocarbon concentration. Reprod Toxicol. 2020;94:92–100.

    Article  CAS  PubMed  Google Scholar 

  25. Lu Y, Zhang X, Fu X, Chen F, Wong KKL. Ensemble machine learning for estimating fetal weight at varying gestational age. Proceedings of the AAAI conference on artificial intelligence; 2019;33(01);9522–7.

  26. Singh A, Arya S, Chellani H, Aggarwal KC, Pandey RM. Prediction model for low birth weight and its validation. Indian J Pediatr. 2014;81(1):24–8.

    Article  PubMed  Google Scholar 

  27. Islam Pollob SMA, Abedin MM, Islam MT, Islam MM, Maniruzzaman M. Predicting risks of low birth weight in Bangladesh with machine learning. PLoS ONE. 2022;17(5): e0267190.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. McClure EM, Garces AL, Hibberd PL, Moore JL, Goudar SS, Saleem S, et al. The global network maternal newborn health registry: a multi-country, community-based registry of pregnancy outcomes. Reprod Health. 2020;17(Suppl 2):184.

    Article  PubMed  PubMed Central  Google Scholar 

  29. Garces A, MacGuire E, Franklin HL, Alfaro N, Arroyo G, Figueroa L, et al. Looking beyond the numbers: quality assurance procedures in the Global Network for Women’s and Children’s Health Research Maternal Newborn Health Registry. Reprod Health. 2020;17(Suppl 2):159.

    Article  PubMed  PubMed Central  Google Scholar 

  30. Bose CL, Bauserman M, Goldenberg RL, Goudar SS, McClure EM, Pasha O, et al. The Global Network Maternal Newborn Health Registry: a multi-national, community-based registry of pregnancy outcomes. Reprod Health. 2015;12(2):S1.

    Article  PubMed  PubMed Central  Google Scholar 

  31. Patel AB, Bann CM, Garces AL, Krebs NF, Lokangaka A, Tshefu A, et al. Development of the global network for women’s and children’s health research’s socioeconomic status index for use in the network’s sites in low and lower middle-income countries. Reprod Health. 2020;17(Suppl 3):193.

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

Not applicable

Funding

We received funding for this work from the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD), with the following grant numbers: U24HD9209, 5UG1HD076461, 5UG1HD078438, 5UG1HD076474, 5UG1HD076465, 5UG1HD078437, 5UG1HD076457, 5UG1HD078439, 5UG1HD096730, 5U24HD092094. MK is employed by the funder and participated in the interpretation of data for this work. The views expressed in this manuscript are those of the authors and do not necessarily represent the views of the NICHD.

Author information

Authors and Affiliations

Authors

Contributions

JKP, VT, MB, CLB, RLG, and BE conceptualized and designed this study. AL, AT, SSG, RJD, EC, WAC, MM, NFK, SS, RLG, AP, PLH, FE, RH, BP, EAL, CLB, and MB collected the data. VT and BE did the modeling and statistical analysis. JKP, VT, TN, EMM, MK, and MB interpreted data for the work. JKP, VT and MB wrote the first draft of the report. All authors revised the manuscript critically for important intellectual content. All authors had full access to all the data in the study, approved the final version of the manuscript, and agree to be accountable for all aspects of the work. JKP and VT are guarantors for this work. The corresponding author attests that all listed authors meet authorship criteria and that no others meeting the criteria have been omitted.

Corresponding author

Correspondence to Jackie K. Patterson.

Ethics declarations

Ethics approval and consent to participate

This research was performed in accordance with the Declaration of Helsinki. Written informed consent was obtained from all participants prior to enrolling in the MNHR. The MNHR was approved by the IRBs affiliated with each US-based institution and LMIC partner institution in the NICHD GN, with details as follows: The Kinshasa School of Public Health (#ESP/CE/04208/2017), The University of North Carolin3a at Chapel Hill (#13–2099), The University of Zambia (# 008–01-08), The University of Alabama – Birmingham (#IRB-080521010), The Institute of Nutrition of Central American and Panama (INCAP) (#19–13), The University of Colorado at Denver Anschutz Medical Campus (#08–0511), The International Center for Diarrhoeal Disease Research, Bangladesh (ICDDR,B) (#PR-18098), The University of Virginia (#21330), KLE University's JN Medical College (#181219008), Thomas Jefferson University (#16F.349), The Aga Khan University (#0581), Colombia University (#IRB-AAAJ7651), Lata Medical Research Foundation (#RPC # 22E), Boston University School of Medicine (#H-35430), Moi University School of Medicine (#00305), and Indiana University School of Medicine (#1011003646).

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1: Supplement

Table 1. Maternal and pregnancy characteristics by LBW status for the analysis population.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Patterson, J.K., Thorsten, V.R., Eggleston, B. et al. Building a predictive model of low birth weight in low- and middle-income countries: a prospective cohort study. BMC Pregnancy Childbirth 23, 600 (2023). https://doi.org/10.1186/s12884-023-05866-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12884-023-05866-1

Keywords