The assessment of gestational age: a comparison of different methods from a malaria pregnancy cohort in sub-Saharan Africa

Background Determining gestational age in resource-poor settings is challenging because of limited availability of ultrasound technology and late first presentation to antenatal clinic. Last menstrual period (LMP), symphysio-pubis fundal height (SFH) and Ballard Score (BS) at delivery are therefore often used. We assessed the accuracy of LMP, SFH, and BS to estimate gestational age at delivery and preterm birth compared to ultrasound (US) using a large dataset derived from a randomized controlled trial in pregnant malaria patients in four African countries. Methods Mean and median gestational age for US, LMP, SFH and BS were calculated for the entire study population and stratified by country. Correlation coefficients were calculated using Pearson’s rho, and Bland Altman plots were used to calculate mean differences in findings with 95% limit of agreements. Sensitivity, specificity, positive predictive value and negative predictive value were calculated considering US as reference method to identify term and preterm babies. Results A total of 1630 women with P. falciparum infection and a gestational age > 24 weeks determined by ultrasound at enrolment were included in the analysis. The mean gestational age at delivery using US was 38.7 weeks (95%CI: 38.6–38.8), by LMP, 38.4 weeks (95%CI: 38.0–38.9), by SFH, 38.3 weeks (95%CI: 38.2–38.5), and by BS 38.0 weeks (95%CI: 37.9–38.1) (p < 0.001). Correlation between US and any of the other three methods was poor to moderate. Sensitivity and specificity to determine prematurity were 0.63 (95%CI 0.50–0.75) and 0.72 (95%CI, 0.66–0.76) for LMP, 0.80 (95%CI 0.74–0.85) and 0.74 (95%CI 0.72–0.76) for SFH and 0.42 (95%CI 0.35–0.49) and 0.77 (95%CI 0.74–0.79) for BS. Conclusions In settings with limited access to ultrasound, and in women who had been treated with P. falciparum malaria, SFH may be the most useful antenatal tool to date a pregnancy when women present first in second and third trimester. The Ballard postnatal maturation assessment has a limited role and lacks precision. Improving ultrasound facilities and skills, and early attendance, together with the development of new technologies such as automated image analysis and new postnatal methods to assess gestational age, are essential for the study and management of preterm birth in low-income settings. Electronic supplementary material The online version of this article (10.1186/s12884-018-2128-z) contains supplementary material, which is available to authorized users.


(Continued from previous page)
Conclusions: In settings with limited access to ultrasound, and in women who had been treated with P. falciparum malaria, SFH may be the most useful antenatal tool to date a pregnancy when women present first in second and third trimester. The Ballard postnatal maturation assessment has a limited role and lacks precision. Improving ultrasound facilities and skills, and early attendance, together with the development of new technologies such as automated image analysis and new postnatal methods to assess gestational age, are essential for the study and management of preterm birth in low-income settings.

Background
Clinical trials and cohort studies investigating adverse pregnancy outcomes such as preterm birth (PTB, < 37 gestational weeks) and fetal growth restriction (suspected when the birthweight is below the 10th percentile of a birthweight for gestational age standard) rely on fetal biometry to estimate gestational age at delivery [1]. In high-income settings where most women attend health centres early in pregnancy (< 13 weeks) this approach has become routine practice, and pregnancies are dated according to fetal crown-rump length (CRL) [2].
In most low and middle-income countries (LMICs) determining gestational age at rupture of membranes and/or onset of labour is challenging, and postnatally birthweight alone is to crude a measure and is unable to differentiate between growth-restricted and preterm babies [3]. Although ultrasound technology is becoming more affordable and available, access tends to be limited to tertiary centres and private practice; the majority of pregnancies are thus dated using other methods [4]. Last menstrual period (LMP) can predict gestational age well if cycle characteristics and the date of onset of the last menstrual bleed can be clearly established, yet this has proven difficult in many LMIC settings [5,6]. Symphysio-pubis fundal height (SFH) is a cheap and feasible alternative, appears more accurate than other non-ultrasound based methods, and predicts gestational age at delivery best when sequential measurements are used [5,7]. SFH measurement at each visit is an essential part of antenatal care and a useful tool to detect pregnancies at risk of adverse outcomes. However, SFH accuracy depends on gestational age and body mass index. [7] Another option available to healthcare workers in LMICs is the Ballard score (BS) which estimates a gestational age range through postnatal examination of physical and neurological neonatal maturity characteristics [8,9]. However, its obvious clinical disadvantage is that it cannot be used to instigate critical treatment such as antenatal steroids for fetal lung maturation in women presenting in suspected preterm labour and preterm rupture of membranes before 34 gestational weeks. Nevertheless, it is a practical solution to the aforementioned challenges, in particular when women with no antenatal care come to deliver. Most reports suggest that the postnatal maturation scores are of limited use in LMICs, and perform worse than LMP and SFH [5,[10][11][12][13].
Ultrasound is increasingly used in LMICs but its availability remain limited in rural and remote settings; in addition, a great proportion of women still attend late for antenatal care, often after 24 gestational weeks [14]. In the research context, innovative methods are being developed to encourage early presentation [15], but late presentation will remain a critical issue in the general patient population. Using foetal biometry in later pregnancy to estimate gestational age has reduced accuracy as the standard deviation of growth measurements widens and foetal growth aberrations (growth restriction, macrosomia) are more likely. Late pregnancy foetal biometry can be used to correct LMP [11] and, even when used alone, it more accurately predicts gestational age and preterm birth than all other non-ultrasound methods [5], despite the fact that dating by head circumference after 24 gestational weeks (which is the most commonly used measurement) is known to underestimate gestational age, thereby overestimates preterm birth [12].
The present study is a secondary analysis of data collected as part of a large randomised controlled trial to assess the efficacy and safety of four different artemisinin-combination therapies in pregnancy [16]. Using fetal biometry as the reference we assessed the accuracy of LMP, SFH, and BS to estimate gestational age at delivery and preterm birth.

Methods
This assessment was conducted in the framework of an open label, randomized controlled clinical trial to assess the efficacy and safety of four different artemisinin-combination therapies in women presenting with P. falciparum malaria in the second and third trimester of pregnancy. The trial was conducted between June 2010 and August 2013 at seven sites across four countries, namely Burkina Faso, Ghana, Malawi and Zambia (Clin-Trial.gov code: NCT00852423). Eligible patients were randomized to one of four treatment arms and followed up weekly until day 63 and then again at delivery. The methods of the trial, including details on quality assurance and quality control are described in detail elsewhere [17], as well as the results of the main outcomes [16].

Ultrasound
Since only women in the second or third trimester were eligible into the study, gestational age at enrolment was determined using diagnostic ultrasound (US) imaging equipment (FFSonic UF-4100) with a 3.5 MHz transducer for transabdominal examination normally and a 5 MHz transducer for very thin women. Gestational age was calculated based on biparietal diameter, abdominal circumference, and femur length [18] using standard algorithms [19]. For women in the first trimester of pregnancy, the crown-rump length (CRL) was used to confirm exclusion from the study.
Comprehensive quality assurance and quality control (QA/QC) systems were put in place to ensure the quality and reliability of measurements, and the inter-site comparability of US measurements. This included centrally purchased equipment, a standard operating procedure (SOP) which was applicable and mandatory across all sites (Additional file 1), two specifically dedicated staff per site to carry out all US measurements and central training before study start. Periodical training was delivered on site by experienced obstetricians and internal QC measures conducted at each site. This included repeated measurements every first week of the month by the second trained staff member and every third week by repeated measurements of one patient.

Symphysio-fundal height measurement
SFH measurement was undertaken at enrolment using a non-elastic tape measure. Single measurement was taken from the highest point of the uterus (fundus) to the top of the symphysis pubis.

Last menstrual period
At enrolment into the study, patients were asked about the date of their LMP. LMP was defined as the date of the first day of the last menstruation.

Ballard score
The gestational age of babies delivered at the hospital was assessed using the BS. Physical and neurological criteria were recorded according to standard guidelines [8]. Each of the criteria was scored from − 1 to 5. The combined scores range from − 10 to 50, with the corresponding gestational ages being 20 weeks and 44 weeks (2 week range).

Statistical analyses
All statistical analyses were done using Stata v14 (Stata Corp, USA). For the purpose of this analysis, women with a gestational age > 24 weeks at enrolment, where the birth date of the baby was not documented, who had twins, miscarriages or stillbirths were excluded. In a sensitivity analysis women ≥24 weeks gestational age at enrolment were included.
The level of significance was defined as p ≤ 0.05 and US was considered the reference method. Mean and median gestational age for US, LMP, SFH and BS were calculated for the entire study population and stratified by country. Inter-country and inter-method comparisons were done using the Kruskal-Wallis method; correlation coefficients were calculated using Pearson's rho (r), Bland Altman plots were used to calculate mean differences in findings and 95% limit of agreements (LoA). In order to improve clarity, results of all methods were rounded to the nearest full week for scatterplots [20].
To calculate performance of each method, all babies with gestational age ≥ 37 weeks were categorized as "term", all other babies as "pre-term". A "term" result was defined as a negative outcome, a "pre-term" result as a positive result. Sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) were calculated accordingly considering US as reference method [21]. Additional analysis was performed using 32 weeks as a cut-off to define very preterm babies.

Results
A total of 3428 women in the second and third trimester of pregnancy and with microscopy-confirmed P. falciparum infection were included in the main trial [16]. Women with a gestational age > 24 weeks at enrolment (n = 1579), those without documented birth date of the baby (n = 130) or who had twins (n = 38), miscarriages or stillbirths (n = 51) were excluded from the present analysis, resulting in a total of 1630 (47.5%) women included.  (Table 1). There were no differences in mean height and weight by country. Mean gestational age at enrolment was 20.3 weeks (IQR: [16][17][18][19][20][21][22] and no patient with gestational age below 13 weeks was enrolled (Table 1).
Correlation between US and any of the other three methods was poor to moderate (LMP: r = 0.38, SFH: r = 0.63, BS: r = 0.31). However, correlations varied considerably in between countries ( Table 2 & Fig. 2).
The mean difference between US and any of the other methods was less than 1 week overall (LMP: 0.34 weeks, SFH: 0.40 weeks, BS: 0.80 weeks) but showed great intercountry variations ( Table 2 & Fig. 3). The 95% limits of agreement were considerable (LMP:-7.9 to 8.6 weeks, SFH: − 4.9 to 5.8 weeks, BS: − 3.5 to 5.1 weeks) and again showed great variation in the different countries ( Table 2 & Fig. 3).
The sensitivity analyses including women ≥24 weeks' gestation at enrolment into the study showed results consistent with the main analyses for BS with mean difference of 0.66 weeks (Pearson r = 0.27, 95% limit of agreement − 4.1to 5.4, range of averages 27.29-43.21) and higher mean differences for LMP with 1.15 weeks (Pearson r = 0.33; 95% limit of agreement − 7.9 to 10.1, range of averages 25.21-50.14 weeks) and SFH with 0.89 weeks (Pearson r = 0.63, 95% limit of agreement − 4.9 to 6.3, range of averages 22.14-46.85 weeks).
Using ultrasound as the reference, 1391 mothers delivered term babies compared to 239 preterm babies (< 37 weeks' gestation). Sensitivity, specificity, PPV and NPV

Discussion
Findings of this study suggest that all of three non-sonographic tools to estimate gestational age at delivery generally correlate poor to moderate with US. A single SFH measurement at enrolment (> 13 and < 25 weeks' gestation) correlates best, and BS the least, with the reference method ultrasound. The correlation between methods differed substantially depending on country. For example, r was close to 0 (r = 0.16) when comparing US and BS in Zambia, however performance was better in Malawi (r = 0.5). While a QC/QA system was set up for US, for the other methods such a system was not in place and therefore inter-sites comparability may be limited. When comparing US to LMP the correlation turned negative (r = − 0.06) for Burkina Faso and did not excel in any of the other countries. This contradictory finding may partly be due to the low sample size for Burkina Faso for LMP data, but presumably also reflects the substantial recall bias and uncertainty due to irregular menstrual period as well as potential differences in literacy rates.
In settings with limited access to ultrasound, SFH may be the most useful antenatal tool to date a pregnancy, at least at the range of gestational age at enrolment in this study. This is corroborated by findings from previous research, and precision may be improved when multiple measurements are available [5,7]. Although there were differences in correlation across country sites, these were less marked for SFH than for BS.
In a research context in LMIC settings, ultrasound dating and early attendance are pivotal to assess outcomes such as gestational age at delivery and preterm birth correctly. Yet, since such equipment and expertise are often unavailable SFH (preferably sequential) is probably the best alternative to US measurements in routine care and for further clinical management of the pregnancy. In order to ensure quality measurements, healthcare workers must be taught to assess SFH in a methodical manner [22], and should be supported by ongoing training and audit.
This paper provides further confirmation of the limitations of BS postnatal maturation assessment for pregnancy dating. One significant issue will undoubtedly be that of training and may explain the poor correlation observed at the site in Zambia; BS assessments, in particular its neurological component require training and refresher training [23]. Postnatal maturation assessment is the most complex of all gestational age estimation methods, whereby the examiner is required to adequately assess and process 'images' and findings from clinical examination. When one or more SFH measurements are available it may be reasonable to forgo BS assessments and focus limited resources and time on other assessments and activities, such as effective neonatal resuscitation and accurate birthweight measurements. BS retains a role in unbooked pregnancies, and its predictive ability in this context may be improved by taking birthweight into account [24], and by establishing and evaluating quality control and training methods for use in busy clinical settings. Improving the postnatal prediction of gestational age is subject of an ongoing large multi-centre study, which aims to develop a simplified and pragmatic algorithm based on existing assessment approaches, anthopometry and neonatal feeding maturity [25].
A recent call by the Bill and Melinda Gates Foundation has recognised the need for new postnatal tools [26]. Approaches such as using newborn infant screening metabolite measurements [27], complex modelling integrating a number of simple clinical parameters [25], smartphone ultrasound devices, and automated image analysis are being evaluated at present.
Accuracy of any method, including ultrasound, is assessor-dependent; training and quality control are key tools to ensure optimal measurements, which can be achieved for routine care in challenging LMIC settings [28]. Expanding ultrasound services in low-income settings may be a key strategy to improve pregnancy care and outcomes [29], and is certainly feasible [28,30]. Handheld ultrasound devices, including smartphone ultrasound, may assist with expanding services in LMICs, and batteries can be charged using solar power.
The present evaluation has a number of limitations. The current reference standard for pregnancy dating is the measurement of the fetal crown-rump length before 13 weeks' gestation. No measurements were done at this early gestation due to the inclusion and exclusion criteria of the main trial, and thus algorithms estimating gestational age from fetal head circumference, femur length and abdominal circumference had to be used,  introducing imprecision [12]. Amongst women > 24 weeks gestational age at enrolment the sensitivity analyses showed no major difference in trends with regards to agreement between methods for BS but showed higher mean difference in weeks for SFH suggesting an increasing variation [12]. Moreover, analyses were performed on measurements taken amongst women with malaria infection, which may cause early fetal growth restriction [31], and could lead to an underestimation of gestational age, in particular when HC and FL are used to date the pregnancy. Lastly, data analysis for LMP was limited primarily to one site only, and only one SFH per women was available for analysis. However, the sample size for other measurements was adequate.

Conclusions
In conclusion, in settings where ultrasound scanning is still limited SFH may be the most useful tool to predict gestational age at delivery if measured between 13 and 24 gestational weeks amongst women undergoing treatment for P. falciparum malaria. Postnatal maturation assessments have a limited role and lack precision. Improving ultrasound facilities and early attendance, together with the development of new technologies such as automated image and video analysis for both ultrasound and BS and new postnatal methods to assess gestational age, will greatly assist with the management of preterm birth in low-income settings.

Availability of data and materials
The data are available for access via the WorldWide Antimalarial Resistance Network (WWARN.org). Requests for access will be reviewed by a Data Access Committee to ensure that use of data protects the interests of the participants and researchers according to the terms of ethics approval and principles of equitable data sharing. Requests can be submitted by email to malariaDAC@iddo.org via the Data Access Form available at WWARN.org/ accessing-data. The WWARN is registered with the Registry of Research Data Repositories (re3data.org).
Authors' contributions HU, KT, BL conducted the analyses and wrote the first draft. HT1, MT, IV, HT2, GA, PG, MN, JK, MM1, VM, GC, MM2 were responsible for data collection and study conduct, SR provided guidance and training for ultrasound to the study site, MdC, YC, RR provided oversite of the study conduct, monitoring and data management, UA was responsible for the conception, design and conduct of the study. All authors read, revised and approved the final manuscript.