Symphysis-fundus height measurement to predict small-for-gestational-age status at birth: a systematic review

Background Fetal growth restriction is among the most common and complex problems in modern obstetrics. Symphysis-fundus (SF) height measurement is a non-invasive test that may help determine which women are at risk. This study is a systematic review of the literature on the accuracy of SF height measurement for the prediction of small-for-gestational-age (SGA) status at birth in unselected and low-risk pregnancies. Methods The Medline, Embase, Cinahl, SweMed, and Cochrane Library databases were searched with no limitation on publication date (through September 2014), which returned 722 citations. Two reviewers then developed a short list of 51 publications of possible relevance and assessed them using the following inclusion criteria: cohort study of test accuracy performed in a routine prenatal care setting; SF height measurement for all participants; classification of SGA, defined as birth weight (BW) < 10th, 5th, or 3rd percentile or ≥ one or two standard deviations below the mean; study conducted in Northern, Western, or Central Europe; USA; Canada; Australia; or New Zealand; and sufficient data for 2 × 2 table construction. Quality of the included studies was assessed in duplicate using criteria suggested by the Cochrane Collaboration. Review Manager 5.3 software was used to analyze the data, including plotting of summary receiver operating curve spaces. Results Eight studies were included in the final dataset and seven were included in summary analyses. The sensitivity of SF height measurement for SGA (BW < 10th percentile) prediction ranged from 0.27 to 0.76 and specificity ranged from 0.79 to 0.92. Positive and negative likelihood ratios ranged from 1.91 to 9.09 and from 0.29 to 0.83, respectively. Conclusions SF height can serve as a clinical indicator along with other clinical findings, information about medical conditions, and previous obstetric history. However, SF height has high false-negative rates for SGA. Clinicians must understand the limitations of this test. The protocol has been registered in the international prospective register of systematic reviews, PROSPERO (Registration No. CRD42014008928, http://www.crd.york.ac.uk/prospero/display_record.asp?ID=CRD42014008928). Electronic supplementary material The online version of this article (doi:10.1186/s12884-015-0461-z) contains supplementary material, which is available to authorized users.


Background
Screening for fetal growth restriction (FGR) is one of the main purposes of antenatal care.
FGR is used to describe a fetus that did not reach its genetic growth potential and is associated with increased risks of morbidity and mortality, as well as adverse effects in childhood and later life [1][2][3][4]. Because no unanimously agreed-upon definition of FGR currently exists, small-forgestational-age (SGA) is often used as a proxy. SGA is defined as weight below a specific percentile for gestational age, usually the 10th percentile. Although not all SGA neonates are pathologically growth restricted, detection of this group aims to facilitate the identification of at-risk pregnancies requiring further investigation due to potential FGR. Early identification and appropriate management of FGR can reduce perinatal morbidity and mortality [5].
In Scandinavia, screening relies on routine measurement of SF height, complemented by ultrasound measurement of fetal size in women with pregnancy complications or with a relevant history or clinical evidence of FGR [6][7][8]. SF height is a technique involving measurement of the maternal abdomen from the symphysis pubis to the uterine fundus with a tape measure. The measurement is plotted on a curve and compared with the distribution of the reference population [9,10]. If the recorded measurement is below acceptable limits according to the reference curves, further investigations of fetal growth and wellbeing are to be performed, including ultrasound estimations, uteroplacental and fetoplacental flow evaluations by Doppler, as well as cardiotocography.
Despite the routine use of SF height to predict SGA at birth, evidence for this method remains unclear. To date there is insufficient evidence from high quality trials to fully evaluate the effect of routine use of SF height during prenatal care on pregnancy outcomes [11]. Several studies have examined the accuracy of SF height in predicting SGA status at birth, but inconsistency in the results has been observed [12]. Most SF height research has been conducted in hospital-based settings and has investigated the relationship between SF height and SGA status in high risk populations [13][14][15]. Because of a different prevalence (pre-test probability) of SGA, results from hospital-based studies cannot be extrapolated to primary care settings.

Objectives
In this systematic review we aim to assess the sensitivity and specificity of SF height for the prediction of SGA status at birth in unselected and low-risk pregnancies.

Methods
Criteria for considering studies for this review Studies were selected for inclusion in the review according to the population, index test, target condition, reference standard, outcome measure, and study design.

Population
Studies examining singleton pregnancies in unselected or low-risk populations, conducted in comparable health care systems to Scandinavia (Northern, Western and Central Europe, USA, Canada, Australia, and New Zealand).

Index test
SF measurement compared to the SF distribution of the population.

Target condition
SGA or FGR.

Reference standard
Diagnosis of FGR or SGA, defined as birth weight (BW) < 10th, 5th, or 3rd percentile, or ≥ one or two standard deviations (SDs) below the mean (performed postnatally).

Outcome measures
Data required to populate 2 × 2 contingency tables.

Study design
Diagnostic cohort studies.

Search methods for identification of studies
Electronic databases (PubMed, Medline, Embase, CINAHL, Cochrane Library, and SweMEd) were searched to identify eligible diagnostic studies from the earliest year possible through September 2014. The search strategy was developed for PubMed and modified for use in other databases (see Additional file 1). The reference lists of all included publications and relevant systematic reviews were checked and forward citation searches were performed.

Electronic searches
The search strategy involved combinations of SF-related terms appearing in subject headings and as keywords. Our Medline search query was (fund* adj height*) OR (symph* adj fund*) OR (uter* adj height*) OR (symph* adj height*) OR (gravidogram*) OR (uterus fundus height*) OR (uter* fund* height*). We conducted our search and reported our findings according to the Meta-Analysis of Observational Studies in Epidemiology and Preferred Reporting Items for Systematic Reviews and Meta-Analyses statements [16][17][18].

Data collection and analysis Study selection
A list of articles meeting the inclusion criteria based on abstracts was compiled. The full texts of these studies and those of uncertain relevance were retrieved. Two reviewers (ASDP and JW) independently evaluated the studies' fulfillment of the inclusion criteria, with any discrepancy discussed with a third reviewer until a final set of relevant studies was agreed upon.

Data extraction and management
The following data were extracted from all selected studies: general information (first author, publication year, country of investigation), population (health care setting, number of participants, level of risk), study design (design, data collection), characteristics of SF height test (SF height curve, cut-off points), reference standard (SGA definition) and results (data required for the construction of 2 × 2 contingency tables). Data were entered into a database using Review Manager 5.3 software.

Assessment of methodological quality
The quality of each included study was assessed by two review authors (ASDP, JW) using the QUality Assessment of Diagnostic Accuracy Studies (QUADAS-2) checklist [19,20]. The QUADAS-2 checklist asks signaling questions in four risks of bias domains relating to patient selection, index test, reference standard, and flow and timing. Each domain is assessed in terms of risk of bias, and the first three domains are also assessed in terms of applicability. The review authors classified each item as "yes" (adequately addressed), "no" (inadequately addressed), or "unclear" (inadequate detail presented to allow a judgment to be made). The QUADAS-2 tool is shown in Additional file 2.

Statistical analysis and data synthesis
Data on sensitivity, specificity, and true-positive, falsepositive, true-negative, and false-negative results were taken directly from the source papers or, if necessary, calculated from the data provided. Positive likelihood ratios (PLRs), negative likelihood ratios (NLRs), diagnostic odds ratios (DORs), and 95% confidence intervals (CIs) were calculated.
An LR describes how many times more likely it is that a person with the target condition will receive a particular test result than will a person without it. Categorization of LRs was adopted from Deeks et al. [21] where PLRs > 10 or NLRs < 0.1 are considered to provide convincing diagnostic evidence. The DOR is commonly used as an overall indicator of diagnostic performance and calculated as the odds of a positive test result among those with the target condition, divided by the odds of a positive test result among those without the condition. As a general rule, a DORs > 100 indicates high accuracy, values of 25-100 indicate moderate accuracy, and those < 25 indicates that the test is not useful [21].
The data were displayed graphically on forest and summary receiver operating characteristic (SROC) plots [22]. The SROC curve was fitted using the hierarchical bivariate random-effects method [23]. For studies that used more than one SF threshold, the analysis was based on the cut-off point of "one value < 10 th percentile".

Investigation of heterogeneity
Both clinical and statistical heterogeneity were evaluated. Assessment of clinical heterogeneity involved comparison of SF reference curves, cut-off criteria used to identify abnormal results, and SGA definitions. Assessment of statistical heterogeneity involved visual inspection of forest plots and calculation of the inconsistency index (I 2 ), which describes the percentage of total variation across studies that is due to heterogeneity, rather than chance [24].

Results
Initial database searches retrieved 722 citations of which 525 citations remained after duplicates were removed ( Figure 1). Screening of the titles and abstracts identified 51 potentially relevant articles that were retrieved in full text format. Forward and backward citation tracking did not result in the identification of additional relevant articles. Eight articles were included in final analyses. Additional file 3 lists the reasons for excluding 43 articles on the basis of study population, design or outcome measures.

Included studies
Characteristics of included studies [25][26][27][28][29][30][31][32] are presented in Table 1. All studies were published before 1991. Most studies used locally derived SF curves. Different cut-off criteria were used to identify abnormal results, including one value < 10th percentile; two consecutive or three isolated values < 10th percentile; one value > 2 cm below the mean; one value > 2 cm below the mean or three static or falling values; and one value > two SDs below the mean. Definitions of SGA included BW < 10th percentile, < 5th percentile, and ≥ two SDs below the mean, according to local standards.

Methodological quality of included studies
The QUADAS-2 ratings of risk of bias and study applicability are shown in Table 2. Based on the inclusion criteria, no included study had a case-control design. All studies avoided inappropriate exclusions. Six of the eight studies used consecutive or random recruitment of participants. The two remaining studies [30,32] did not report such information and were considered to be at unclear risk of patient selection bias. Most studies had a low risk of bias due to patient flow and timing; seven of eight studies involved the analysis of all recruited participants and one analysis included 78% of recruited participants [32]. Studies included in this review had a low risk of bias for the conduct of the reference standard. All studies used pre-specified index test thresholds. No study reported blinding to test results, but BW is objective and should not result in bias. Regarding the applicability of studies to the  review questions, no study raised concern about the index test, reference standard or patient selection.

Statistical analysis
Tables 3, 4, 5 display core information collected from all included studies according to the SGA definition used by the study authors.
Accuracy of SF height for the prediction of SGA defined as BW < 10th percentile Seven studies assessed the accuracy of SF height for the prediction of SGA defined as BW < 10th percentile. Sensitivities ranged from 0.27 to 0.76 and specificities ranged from 0.79 to 0.92. All studies produced DORs exceeding 1 and CIs that did not include 1, implying that the positive   (Figure 2) constructed using data from these studies lies to the left of the diagonal, signifying that the SF height test has value. The I 2 value was typically high (98%). Given the small number of included studies (and thus low statistical power), subgroup analyses and covariate hierarchical modeling to investigate heterogeneity were not performed.
Accuracy of SF height for the prediction of SGA defined as BW <5th percentile One study assessed the accuracy of SF height for the prediction of SGA defined as BW < 5th percentile. This study used several cut-off points, with stricter criteria yielding lower sensitivity and higher specificity values. NLRs and PLRs did not meet the accepted criteria for classification of SF height measurement as a useful test.
Accuracy of SF height for the prediction of SGA defined as BW ≥ 2 SDs below the mean One study assessed the outcome of SGA defined as BW ≥ 2 SDs below the mean. For a less strict SF cut-off point (one value > 2 cm below mean or falling or static values), the authors reported low sensitivity (59%) and high specificity (97%). The PLR exceeded 10, but the NLR did not meet the required criterion of <0.1.

Discussion
SF height measurement seems to have some significance for the prediction of SGA defined as BW < 10th percentile. All studies reported DORs > 1. The SROC curve (Figure 2) lies to the left of the diagonal, signifying that the SF height test has value. Adequate levels of sensitivity appear to be achieved at the expense of lower specificity, with higher numbers of false-positive SF results. The study of Rogers et al. [30] positioned at the upper left of the SROC curve produced the most significant results supporting the use of SF height. Its false negative rate of only seven is likely to be due to the small size of the study. In contrast, the study of Persson et al. [29] is the largest study and has the narrowest CI. Its sensitivity and specificity lies along the SROC line, adding weight to our findings. For the prediction of SGA defined as BW < 5th percentile and BW ≥ 2 SDs below the mean, no summary measure could be performed due to the insufficient number of studies assessing these outcomes. Further assessment of the predictive value of SF in prediction of SGA defined as BW < 5th percentile and BW ≥ 2 SDs below the mean is required.
The diagnostic accuracy of SF height in other populations of pregnant women has recently been reviewed.  Goto [33] assessed the diagnostic value of SF height, mainly in developing countries. However, this review included studies across a wide range of ethnic groups, clinical settings and disease spectrums. Despite such a diverse case mix, the study did not assess its effect on the pooled estimates, thus making it difficult to interpret its finding in a low-risk setting. In view of these limitations, we applied more strict inclusion criteria in our study, focusing mainly on a more homogenous and relevant population.

Strengths and weaknesses of the review
The majority of studies available in this systematic review were conducted in the 1980s. Given the limited amount of data available for the accuracy of SF height measurement, we did not discard studies based solely on year of publication. All included studies had low concern regarding applicability, implying that evidence is relevant to current practice. The focus on nations with comparable health systems means that the findings may not be relevant to different and less well-resourced national health systems. Many parameters involving the performance of SF height measurement, such as technique, frequency of measurement, and performer's experience, potentially affect test accuracy. Unfortunately, we did not have detailed information about the test conditions, limiting our ability to explore the effects of potential differences in methods. As no universal SGA definition has been established, the studies included in this review may also have been biased by the choice of reference test. Our inclusion criteria required postnatal confirmation of SGA classification. All studies fulfilled this requirement, but most did not provide information about how gestational age was determined or which BW reference were used to classify SGA status postnatally.
This review focused on the role of SF height in detecting SGA as a proxy for FGR. However, FGR can exist without SGA. The role of SF height in this setting remains undefined because all SF height studies in this review used SGA as an outcome. Customized SF charts (adjusted for ethnicity, parity, and body mass index) are said to be better predictors of FGR [34]. Furthermore, this review did not address the issue of effect, for which additional studies would be needed to assess the role of SF height.
Ultimately, the lack of large cohort studies conducted in routine prenatal care setting that were suitable for our analysis was the main limitation of this review.
Applicability of findings to clinical practice and policy SF height can be the first parameter raising suspicion of FGR. We have previously discussed the limitations of the study populations. However, our results can be applied to low-risk and unselected pregnancies in routine prenatal care setting, which is useful for general practitioners and midwifes to assure the identification of pregnancies at risk of SGA.
We found that the SF height test had a sensitivity ranging from 0.27 to 0.76, which means it potentially fails to identify over 70% of pregnancies affected by SGA. This is important to consider in counselling of pregnant women. However, in clinical practice the SF height test is not carried out in isolation and the combination of other clinical findings, medical conditions and previous obstetric history, together will contribute to estimating the likelihood of being at risk for SGA.
Our results show that the SF height test has a high degree of specificity (≥80% in all studies), indicating that few pregnancies not characterized by SGA are referred for ultrasound examination in practice. However, in this case over-referral or the misidentification of pregnancies as at risk is of less concern than the failure to identify pregnancies at risk.
Primary screening should emphasize the importance of sensitivity over specificity to identify almost all at-risk participants. No test is perfect and there will always be problems with incorrect results, e.g., anxiety and unnecessary intervention due to a false-positive result or a false sense of security caused by a false-negative result. A positive SF screening result can usually be confirmed or refuted with further evaluation of fetal growth and well-being by a specialist.