Prediction of pre-eclampsia and its subtypes in high-risk cohort: hyperglycosylated human chorionic gonadotropin in multivariate models

Background The proportion of hyperglycosylated human chorionic gonadotropin (hCG-h) to total human chorionic gonadotropin (%hCG-h) during the first trimester is a promising biomarker for prediction of early-onset pre-eclampsia. We wanted to evaluate the performance of clinical risk factors, mean arterial pressure (MAP), %hCG-h, hCGβ, pregnancy-associated plasma protein A (PAPP-A), placental growth factor (PlGF) and mean pulsatility index of the uterine artery (Uta-PI) in the first trimester in predicting pre-eclampsia (PE) and its subtypes early-onset, late-onset, severe and non-severe PE in a high-risk cohort. Methods We studied a subcohort of 257 high-risk women in the prospectively collected Prediction and Prevention of Pre-eclampsia and Intrauterine Growth Restriction (PREDO) cohort. Multivariate logistic regression was used to construct the prediction models. The first model included background variables and MAP. Additionally, biomarkers were included in the second model and mean Uta-PI was included in the third model. All variables that improved the model fit were included at each step. The area under the curve (AUC) was determined for all models. Results We found that lower levels of serum PlGF concentration were associated with early-onset PE, whereas lower %hCG-h was associated with the late-onset PE. Serum PlGF was lower and hCGβ higher in severe PE, while %hCG-h and serum PAPP-A were lower in non-severe PE. By using multivariate regression analyses the best prediction for all PE was achieved with the third model: AUC was 0.66, and sensitivity 36% at 90% specificity. Third model also gave the highest prediction accuracy for late-onset, severe and non-severe PE: AUC 0.66 with 32% sensitivity, AUC 0.65, 24% sensitivity and AUC 0.60, 22% sensitivity at 90% specificity, respectively. The best prediction for early-onset PE was achieved using the second model: AUC 0.68 and 20% sensitivity at 90% specificity. Conclusions Although the multivariate models did not meet the requirements to be clinically useful screening tools, our results indicate that the biomarker profile in women with risk factors for PE is different according to the subtype of PE. The heterogeneous nature of PE results in difficulty to find new, clinically useful biomarkers for prediction of PE in early pregnancy in high-risk cohorts. Trial registration International Standard Randomised Controlled Trial number ISRCTN14030412, Date of registration 6/09/2007, retrospectively registered. Electronic supplementary material The online version of this article (10.1186/s12884-018-1908-9) contains supplementary material, which is available to authorized users.


(Continued from previous page)
Trial registration: International Standard Randomised Controlled Trial number ISRCTN14030412, Date of registration 6/09/2007, retrospectively registered.
Keywords: Pre-eclampsia, Screening, Biomarkers, Early-onset pre-eclampsia, Late-onset pre-eclampsia, Severe preeclampsia, Placental growth factor, hCGβ, Hyperglycosylated human chorionic gonadotropin, Pregnancy associated plasma protein a Background Pre-eclampsia is a pregnancy-specific multisystem disorder, the pathogenesis of which is incompletely understood. It occurs in 2-8% of pregnancies and is one of the leading causes of maternal and fetal morbidity and mortality globally [1]. The early identification of women at high risk for pre-eclampsia would guide in planning of their follow-up during pregnancy and in the application of preventive measures. There is evidence that low-dose aspirin, started at 12-16 weeks of gestation, reduces the risk of pre-eclampsia [2,3].
Clinical risk factors, e.g. antiphospholipid antibody syndrome, prior pre-eclampsia, chronic hypertension and pregestational diabetes have been used in early pregnancy in identifying those women at high risk of developing pre-eclampsia [4,5]. A predictive test with high sensitivity and positive predictive value that incorporates maternal risk factors, biomarkers and biophysical measurements is needed to implement prophylaxis strategies [6]. Recently, we have shown, that serum hyperglycosylated human chorionic gonadotropin (hCG-h) is a promising marker of early-onset pre-eclampsia [7], but this was a case-control study. Thus, our aim in the present study was to test hCG-h or proportion of hCG-h to hCG (%hCG-h) in a cohort of high-risk women. We wanted to investigate if hCG-h could predict pre-eclampsia when combined with other biomarkers including serum free hCG beta (hCGβ), pregnancy-associated plasma protein-A (PAPP-A) and placental growth factor (PlGF), as well as biophysical measurements, mean arterial pressure (MAP) and Doppler ultrasound measurement of the mean pulsatility index of the uterine artery (Uta-PI). For this purpose, we constructed multivariate regression models and tested their predictive value to detect the different subtypes of pre-eclampsia.

Study cohort
The present study is a part of the multidisciplinary 'Prediction and Prevention of Pre-eclampsia and Intrauterine Growth Restriction' (PREDO) project [8]. A subcohort with known risk factor status for pre-eclampsia was prospectively collected between September 2005 and June 2009 in ten participating maternity clinics in Finland. The inclusion and exclusion criteria are described in Additional file 1: Table S1. Of the women with clinical risk factors for pre-eclampsia, 66 were excluded from the analyses because they took prophylactic aspirin during pregnancy as a part of a randomised trial [9]. Originally the subcohort of the present study comprised of 267 women. Ten women were not included to the analyses for various reasons: one woman had miscarriage at 19 weeks of gestation, one woman discontinued the study due to a non-medical reason and eight women were excluded due to missing data.

Biophysical measurements
Gestational age was confirmed by first trimester ultrasound measurement of the fetal crown-rump length. Doppler ultrasound measurements were performed transvaginally at 11 0/7-13 6/7 weeks of gestation. The uterine artery flow was measured from the level of the inner os of the cervix, as it approaches the uterus laterally. The mean PI was calculated and then the multiple of the median (MoM) mean PI was calculated with the equation: Log e mean Uta-PI = 1.39-0.012 × GA + GA 2 × 0.0000198 MoM [10], where GA = gestational age in days. The MAP was calculated from the first trimester visit blood pressure measurement with the equation: MAP = diastolic blood pressure + (systolic blood pressurediastolic blood pressure)/3. The bilateral measurements of the Uta-PI were available for 83.7% (215/ 257) of high-risk women.

Biomarkers
Fasting blood samples were drawn from antecubital veins at 11 0/7-13 6/7 (mean 13 0/7) weeks of gestation. Serum was separated within an hour by centrifugation and stored in − 80°C until analysis. Serum hCG-h concentrations were determined using a time-resolved immunofluorometric assay. To eliminate the interaction between complement and B152 antibody in the hCG-h assay, serum samples were diluted 100-fold prior to analysis with EDTA-containing buffer (5 mmol/L) [11].

Outcome measures
Primary outcome was pre-eclampsia, defined as a systolic blood pressure ≥ 140 mmHg and/or a diastolic blood pressure ≥ 90 mmHg occurring after 20 weeks of gestation combined with a urinary 24-h protein excretion of ≥ 0.3 g or the dipstick equivalent in two consecutive measurements [12]. Secondary outcomes were: early-onset pre-eclampsia (diagnosed before 34 0/7 weeks of gestation), late-onset pre-eclampsia (diagnosed at or after 34 0/ 7 weeks of gestation), severe pre-eclampsia (systolic blood pressure ≥ 160 mmHg and/or diastolic blood pressure ≥ 110 mmHg and/or proteinuria ≥5 g/24 h), non-severe pre-eclampsia (pre-eclampsia not fulfilling the criteria of severe pre-eclampsia) and small for gestational age (SGA) (birthweight < − 2 standard deviations (SD) i.e. approximately < 2,5% percentile) [13]. All diagnoses were independently confirmed by a jury of two physicians and one research nurse, as described previously [9].

Statistical analyses
Binary logistic regression was used to compare the characteristics of the study groups. The mean of continuous variables was calculated to compare the differences between the groups, if the variable was normally distributed. If the number of subjects in a subgroup was low or the continuous variable was not normally distributed, the median and interquartile range was reported.
The concentrations of hCG-h, hCG, hCGβ, PAPP-A, and PlGF, as well as %hCG-h were normally distributed after log-transformation and were adjusted for gestational age using linear regression analysis. The median regression equation was calculated from the respective concentrations measured from 107 women without clinical risk factors in the PREDO cohort and from a screening cohort from the Kuopio University Hospital [7]. Then, concentrations measured from women with clinical risk factors in the PREDO cohort were compared to the gestational age-adjusted median and expressed as MoM.
The binary logistic regression was used to evaluate univariate associations between measured variables and outcomes. The results were presented with odds ratios (OR). Since the number of investigated predictor variables was large compared to the number of women who developed pre-eclampsia, prediction models were built using regularised logistic regression with the L1/ L2-norm using the R package glmnet [14]. Cross validation was used to select regularisation variables and separately to assess model fit. Three separate models were fitted for all outcomes on clinical bases. Maternal clinical background variables and MAP in the first trimester were included in model 1, since these variables are obtainable and economical. Biomarkers, which were the main interest of the study, were added in model 2 and MoM of the mean Uta-PI was added in model 3 to clarify if it still improves prediction rates. All variables that improved the model fit in the multivariate logistic regression analyses were included at each step, because we wanted to investigate the maximum prediction potential of each model. We used 10-time cross validation to compensate the lack of replication cohort.
The background variables used for model fitting were maternal age, primiparity, pre-pregnancy body mass index (BMI), infertility treatment before the present pregnancy, pre-eclampsia in a previous pregnancy, a SGA infant or gestational hypertension, type 1 diabetes mellitus and fetus mortus in a previous pregnancy. The area under the receiver operating characteristic (ROC) curve (AUC) value was determined for each model and models were compared using the R package pROC [15]. Results of multivariate logistic regression analyses are presented without confidence intervals, as confidence intervals obtained from regularised methods are problematic [16]. In addition to AUROC comparison, models were compared by calculating the p-values with DeLongs method. [17]. As our aim was to assess the screening performance of the models in clinical practice, the reference group was all women who did not develop the outcome of interest, as in the study by Kenny et al. [18]. For example, for severe pre-eclampsia the reference group consisted of women who did not develop severe pre-eclampsia: women without pre-eclampsia and women who developed non-severe pre-eclampsia.
Calculating study power is problematic regarding regularized logistic regression analyses used in present study, because power calculations are based on statistical significance and it is not possible to calculate significance with the statistical method used in this study. To give an indication of the power, power analysis was done for univariate logistic regression model. The power decreases when OR approaches 1, e.g. for pre-eclampsia (prevalence 13.2%, N 257) power is 96% with OR 0.5, 79% with OR 0.6, 49% with OR 0.7 and 22% with OR 0.8 [19].

Baseline and pregnancy characteristics
The total cohort comprised of 257 women with risk factors for pre-eclampsia. Pre-eclampsia occurred in 34 (13.2%) of the women in the study cohort. Of those who developed pre-eclampsia, 9 (26.5%) had early-onset pre-eclampsia and 17 (50%) had a severe form of the disease. Clinical characteristics of the high-risk women who did or did not develop pre-eclampsia are presented in Table 1 and characteristics by pre-eclampsia subtype in Table 2.
Twelve (4.7%) women gave birth to a SGA newborn. Of these women, eight (67%) had developed pre-eclampsia. The prevalence of each inclusion criterion in women who developed pre-eclampsia and in women who did not develop pre-eclampsia is presented in Additional file 2: Table S2. There was one stillbirth at the 27th week of gestation. Pregnancy characteristics are presented in Table 3 and the median or mean values of measured variables in Additional file 3: Table S3.

Univariate analyses
All results of univariate analyses are summarised in Table 4. None of the biomarkers were different between pre-eclamptic and non-pre-eclamptic women. The median Uta-PI and MAP were higher in women who developed pre-eclampsia than in women who did not develop pre-eclampsia. Women who developed early-onset pre-eclampsia had lower median PlGF and higher MAP compared to women who did not develop early-onset pre-eclampsia. Women who developed late-onset pre-eclampsia had lower %hCG-h and higher MAPs than other women. Median hCGβ, Uta-PI and MAP were higher, and median PlGF lower, in women with severe pre-eclampsia compared to other women. In women with non-severe pre-eclampsia, both the median PAPP-A and %hCG-h levels were lower than in other women.

Multivariate logistic regression models
The predictive models were constructed for pre-eclampsia and its subtypes as outcomes. All variables that improved the model fit in the multivariate logistic regression analyses were included, therefore each regression model is individual: for example model 1 for early-onset pre-eclampsia consists of different variables than model 1 for late-onset pre-eclampsia. The multivariate models, AUC values and diagnostic characteristics of pre-eclampsia and its subtypes are presented in Table 5.
The effect of each risk factor on the risk of developing pre-eclampsia was estimated from the model 1. The most significant factor that increased the risk of developing all pre-eclampsia (OR 1.69) and the late-onset (OR 2.40) or the non-severe (OR 2.32) subtypes was a history of pre-eclampsia. Primiparity (OR 3.34) was the most significant factor for the early-onset subtype.
For all pre-eclampsia, the best validated AUC value of 0.66 at sensitivities of 36 and 16% were achieved with 90  and 95% specificity, respectively, when combining maternal characteristics, MAP, biomarkers and Uta-PI MoM. With this model, positive predictive value (PPV) was 33% and negative predictive value was 88%. The best multivariate model for prediction of early-onset pre-eclampsia was achieved by combining maternal characteristics, MAP and biomarkers. The validated AUC value was 0.68 with 20% sensitivity at both 90 and 95% specificity. The PPV for early-onset pre-eclampsia was 25% and NPV was 88%.
For prediction of late-onset pre-eclampsia, model 3 gave the highest prediction rates with an AUC value of 0.66, with 32 and 16% sensitivity at 90 and 95% specificity, respectively. For prediction of severe pre-eclampsia, the best validated AUC value was 0.65 with 24% sensitivity at 90% specificity. It was achieved by combining MAP, the a priori risk factor of having a previous fetus mortus, biomarkers and Uta-PI MoM (model 3). A sensitivity of 23% at 95% sensitivity was achieved by combining MAP and biomarkers (model 2). The best multivariate model for predicting non-severe pre-eclampsia, with a validated AUC value of 0.60, 22 and 15% sensitivity at 90 and 95% specificity, respectively, was attained with model 3.

Discussion
To our knowledge, this is a first study investigating first trimester hCG-h as a potential pre-eclampsia predictor in a prospectively recruited high-risk cohort. In univariate analyses, %hCG-h levels were lower in women with late-onset and non-severe pre-eclampsia. Lower levels of serum PlGF were associated with early-onset and severe pre-eclampsia, and lower serum PAPP-A levels with non-severe pre-eclampsia. Higher serum hCGβ levels were associated with severe pre-eclampsia. The first trimester MAP was higher in pre-eclampsia and its subtypes, except in the non-severe subtype, when compared to all other participants. Uta-PI MoM was higher in women who developed pre-eclampsia compared to women who did not, and in women who developed severe pre-eclampsia compared to all other participants. Despite the abovementioned differences in serum %hCG-h, PlGF, PAPP-A, hCGβ, MAP and Uta-PI between the groups, multivariate models gave only a modest prediction of the disease and did not meet the requirement of a clinically useful screening test.
Pre-eclampsia in a previous pregnancy was the most important risk factor associated with pre-eclampsia in a subsequent pregnancy. This is in line with a recent With continuous variables median and interquartile range in parenthesis is presented n = number of cases, PE = pre-eclampsia Early-onset PE = pre-eclampsia diagnosis < 34 weeks of gestation; Late-onset PE = pre-eclampsia diagnosis ≥ 34 weeks of gestation; Severe pre-eclampsia = systolic blood pressure ≥ 160 mmHg and/or diastolic blood pressure ≥ 110 mmHg and/or proteinuria ≥5 g/24 h, Non-severe pre-eclampsia = PE not fulfilling the criteria of severe pre-eclampsia a mean and SD Binary logistic regression was used to evaluate the association of measured variables to pre-eclampsia and its subtypes IQR interquartile range; SD standard deviation; CI confidence interval; PE pre-eclampsia; EOPE early-onset pre-eclampsia (diagnosed before 34 0/7 weeks of gestation); LOPE late-onset pre-eclampsia (diagnosed at or after 34 0/7 weeks of gestation); BMI body mass index; SGA small for gestational age; DM diabetes mellitus; MAP mean arterial pressure; Uta-PI pulsatility index of the uterine artery; hCG human chorionic gonadotropin; hCG-h hyperglycosylated hCG; %hCG-h the ratio of hCG-h to hCG; PAPP-A pregnancy-associated plasma protein a; PlGF placental growth factor; MoM multiple of the median Early-onset PE = pre-eclampsia diagnosis < 34 weeks of gestation; Late-onset PE = pre-eclampsia diagnosis ≥ 34 weeks of gestation; Severe pre-eclampsia = systolic blood pressure ≥ 160 mmHg and/or diastolic blood pressure ≥ 110 mmHg and/or proteinuria ≥5 g/24 h, Non-severe pre-eclampsia = PE not fulfilling the criteria of severe pre-eclampsia meta-analysis, wherein women with prior pre-eclampsia had the greatest pooled relative risk for developing pre-eclampsia [5]. Interestingly, in our study, prior pre-eclampsia had the strongest association with late-onset and non-severe pre-eclampsia, whereas primiparity had the highest OR for early-onset pre-eclampsia in multivariate analyses. There are a few studies that distinguish between pre-eclampsia subtypes [20,21]. In contrast to our study, Early-onset PE = pre-eclampsia diagnosis < 34 weeks of gestation; Late-onset PE = pre-eclampsia diagnosis ≥ 34 weeks of gestation; Severe pre-eclampsia = systolic blood pressure ≥ 160 mmHg and/or diastolic blood pressure ≥ 110 mmHg and/or proteinuria ≥5 g/24 h, Non-severe pre-eclampsia = PE not fulfilling the criteria of severe pre-eclampsia. Prediction models were built using regularised logistic regression. Cross validation was used to select regularisation variables and separately to assess model fit.
The AUC values are expressed before and after 10-fold cross validation. Three separate models were fitted for all outcomes. First, background variables and MAP in the first trimester were included in the model (model 1). Next, biomarkers were added (model 2). Finally, MoM of the mean Uta-PI was added (model 3). All variables that improved the model fit in the multivariate logistic regression analyses were included.
other studies have been conducted in unselected populations. In accordance with our results, in the study of Odegård et al., prior pre-eclampsia and primiparity had the highest ORs for predicting pre-eclampsia, but primiparity did not appear to be specifically associated with either of the clinical subtypes [21]. The impaired invasion of cytotrophoblasts into the spiral arteries is thought to be one pathophysiological mechanism of pre-eclampsia [22]. The exact role of hCG-h in the pathophysiology of pre-eclampsia is not known. However, there is strong evidence that extravillous cytotrophoblasts initiate the production of hCG-h during their differentiation from proliferative cytotrophoblasts to invasive cytotrophoblasts in normal pregnancies and that hCG-h circulating in maternal serum reflects the invasion process of trophoblasts during the first trimester [23]. Thus, hCG-h or its ratio to total hCG might represent a biomarker of early placentation [24]. Lower %hCG-h in late-onset and non-severe forms of pre-eclampsia may indicate a mild developmental insufficiency of the placenta [25]. The results of the present study stand in contrast to a previous report, in which %hCG-h was lower in early-onset pre-eclampsia [7]. One possible explanation is that the median gestational age at the time of sampling was higher in this study (13.0 vs. 10.3 gestational weeks). In very early pregnancy (at 4 to 5 weeks of pregnancy) virtually all (90-100%) of the hCG in serum consists of hCG-h. The concentration decreases to 5-10% at 10 weeks and to 3% after 20 weeks [11,24]. In the present study, the regression line reflecting median %hCG-h against gestational weeks in the scatterplot showed lower MoM values in early-onset pre-eclampsia before 12 weeks gestation than in the other groups. This is in agreement with previous publications [7,26]. However, only nine women had early-onset pre-eclampsia. Thus, it is not possible to draw any definitive conclusions.
Adding the Doppler measurement of Uta-PI to the multivariate regression models increased the AUCs of the models (from model 2 to model 3) for all pre-eclampsia and its subtypes except for early-onset pre-eclampsia. This stands in contrast to earlier studies conducted in high-risk cohorts, where Uta-PI predicted early-onset pre-eclampsia and severe pre-eclampsia better than all pre-eclampsia or late-onset pre-eclampsia. However, our results are in accordance to the results from the same studies showing that mean Uta-PI is much less useful for prediction of pre-eclampsia and its subtypes in the first trimester in a high-risk population than in a low-risk or screening population [27][28][29].

Strength and limitations
It should be noted that the definition of early-onset pre-eclampsia was different in the present study than in most studies. We defined early-onset pre-eclampsia as cases diagnosed before 34 weeks of gestation, whereas the definition in most of other studies is pre-eclampsia requiring delivery before 34 weeks of gestation.
Strength of our study is a carefully characterised, prospectively collected cohort of women with clinical risk factors for pre-eclampsia. A jury of two physicians and one research nurse independently confirmed all pre-eclampsia diagnoses. Furthermore, our prospective study reflects the true incidence of early-onset and late-onset pre-eclampsia in high-risk women.
A limitation of our study is the relatively small sample size, but the high incidence of PE cases (13.2%) allowed us to obtain some interesting results. Another limitation is that only six Doppler measurements were available in the early-onset group.
The lack of a replication cohort is a limitation. Therefore we used 10-fold cross validation to obtain more realistic performance estimates for the models. Initially our models reached quite promising AUC values for predicting pre-eclampsia but after validation the values decreased. A recent study using a combination of maternal risk factors, PAPP-A, PlGF, MAP and Uta-PI for predicting pre-eclampsia in the first trimester in a screening population also reported lower detection rates after five-fold cross validation [30] than in some earlier studies where there were not any kind of validation of the prediction rates [31,32]. The reason for the modest prediction rates of our study compared to the studies conducted in low-risk or screening populations may be, that the role of impaired placentation could be less obvious in high-risk women, while maternal predisposing factors for vascular injury may become more important. It should be noted that high-risk conditions per se multiply the risk for pre-eclampsia.
The negative result of our attempt to increase the predictive power of a multivariate model with a new biomarker as well as the similar results from studies of others raises the question: Why are we not finding a good predictive model? Heterogeneous nature of pre-eclampsia poses a challenge. Myatt and co-workers [33] have suggested strategy to pre-eclampsia research including standardised data collection to hasten our understanding of the cause of the disease and to improve the early recognition.

Conclusions
This study was a preliminary study investigating the potential of hCG-h or %hCG-h to improve the prediction of pre-eclampsia in a high-risk cohort in the first trimester using a multivariate regression model. There was a significant reduction of %hCG-h levels concentrations in women with late-onset and non-severe pre-eclampsia, but in combination with other biomarkers, maternal characteristics, MAP and Uta-PI, the sensitivity and the positive predictive values of the multivariate regression models did not meet the requirements for a clinically useful screening test among high-risk women.