A systematic review and time-response meta-analysis of the optimal timing of elective caesarean sections for best maternal and neonatal health outcomes

Background The rate of caesarean sections (CS) has increased in the last decades to about 30% of births in high income countries. Many CSs are electively planned without an urgent medical reason for mother or child. An early CS though may harm the newborn. Our aim was to evaluate the gestational time point after the 37 + 0 week of gestation (WG) (after prematurity = term) of performing an elective CS with the lowest morbidity for mother and child by assessing the time course from 37 + 0 to 42+ 6 WG. Methods We performed a systematic literature search in MEDLINE, EMBASE, CENTRAL and CINAHL in November 2018. We included studies that compared different time points of elective CS at term no matter the reason for elective CS. Our primary outcomes were the rate of admissions to the neonatal intensive care unit (NICU), neonatal death and maternal death in early versus late term elective CS. Various binary and dose response random effects meta-analyses were performed. Results We identified 35 studies including 982,749 women. Except one randomised controlled trial, all studies were cohort studies. We performed a linear time-response meta-analysis on the primary outcome NICU admission on 14 studies resulting in a decrease of the relative risk (RR) to 0.63 (95% CI 0.56, 0.71) from 37 + 0 to 39 + 6 WG. RR for neonatal death showed a decrease to 39 + (0–6) WG (RR 0.59 95% CI 0.43 to 0.83) and increase from then on (RR 2.09 95% CI 1.18 to 3.70) assuming a U-shape course and using a cubic spline model for meta-analysis of four studies. We only identified one study analyzing maternal death resulting in RR of 0.38 (95% CI 0.04 to 3.40) for 37 + 0 + 38 + 6 WG versus ≥39 + 0 WG. Conclusion Our systematic review showed that elective CS (primary and repeated) before the 39 + 0 WG lead to more NICU admissions and neonatal deaths, although death is rare and increases again after 39 + 6 WG. We did not find enough evidence on maternal outcomes. There is a need for more research, considering maternal outcomes to provide a balanced decision between neonatal and maternal health. Systematic review registration Registered in PROSPERO (CRD42017078231).


Background
While the World Health Organization (WHO) states that there is no medical reason for a higher rate of CSs than 10-15%, the rates of Caesarean Section (CS) in high income countries have increased to about 30% of all births in the last decades [1][2][3]. It is assumed that a high number of CSs is electively planned without an urgent medical need neither in women nor the unborn. A previous CS is the most common reason for performing an elective CS. Researchers from the UK and USA showed that only 50% of women in the UK undergo vaginal birth after CS (VBAC) while there are with only 10% even less in the USA, even though it is recommended for the majority of women with prior CS [4,5]. Withal there is no unanimity when the optimal time point of performing an elective CS could be. While 97% of elective CSs are performed beyond 37 + 0 WG, about 60% of elective CSs are performed in, or beyond 39 (39 + 0 to 39 + 6) WG, according to an analysis of 63 English NHS trusts [6].
The reason behind is that women with a scarred uterus may have diverse risks in following pregnancies and placentation abnormalities may occur more often. The risk of scar rupture may increase with the growing unborn in the last weeks of pregnancy [7]. Injuries to the bladder and a higher risk of bleeding needing transfusion is assumed. And because of this even a higher mortality rate might be connected to late term elective caesareans compared to early term caesareans before the beginning of labor [8]. Women without prior CS/intact uterus are not touched by those risks. Still labor can occur before the planned time point of CS which may result in an emergency CS which is connected with higher risks [9].
But in childbirth the risks for the neonate may not go along with those for the mother and is even though at term (37 + 0 WG) under various health risk. Lungs are mature in 37 + 0 WG, but neonates born by CS have a general higher risk of respiratory disorders. This is especially linked to early term CS [10].
The two guidelines "Caesarean Section" by NICE and "Birth after previous caesarean birth" by the Royal College for Obstetricians & Gynecologists examine if early term CS increases respiratory morbidity of the neonate. Both recommend to perform elective CS not before the 39 + 0 WG [11,12]. Furthermore the American College of Obstetricians and Gynecologists recommend in their committee opinions 764 and 765 to not perform any indicated deliveries (both induction of labor and caesarean section) before the 39 + 0 WG, except for some specific pregnancy complications or comorbidities [13,14]. In uncomplicated dichorionic diamniotic twin pregnancies, elective delivery (vaginal or by CS) should be offered in 37 + (0-6) WG according to the guideline "Twin and triplet pregnancy" from NICE. Risks are increasing from 38 + 0 WG onwards. Nevertheless, about 60% of neonates, are born spontaneously pretermbefore 37 + 0 WG [15]. This fact may result in a relevant number of elective CS performed late preterm.
But high level evidence is lacking. There are currently no meta-analyses available which sum up the existing evidence.
As there is an ongoing trend towards more electively planned CSs, it is essential to provide a time point for the CS with the lowest risk for both, mother and child, comparing early term (37 + 0 to 38 + 6 WG) and late term (≥39 + 0 WG) delivery.
We performed a systematic review of the literature to evaluate the optimal time point with low risk of mortality and morbidity for mothers low risk for the neonate for mortality and morbidity Beforehand, in 2016, we performed a systematic review on behalf of the German Federal Ministry of Health to answer the present question [16]. Herewith we updated this review and also aimed to expand the reach of the findings with this update in English. Moreover in the original review we performed a random-effects meta-analysis only comparing 37 + 0 to 38 + 6 WG with ≥39 + 0 WG, in this update we performed another type of meta-analysis showing a linear time-response relationship.

Protocol and registration
We registered our review at PROSPERO (CRD42017078231) and published the protocol [17].

Eligibility criteria
We included women with a planned CS at term (≥37 + 0 WG), regardless if it was first caesarean or repeated CS. We included studies with singleton and multiple pregnancies. Even though multiple pregnancies deviate much from singleton pregnancies we assumed similar uncertainties about the timing of elective CS. Our interest were planned CSs at various time points. The primary outcomes were neonatal death, NICU admission and maternal death. Secondary outcomes are for neonates: hospitalization ≥5 days, respiratory morbidity, respiratory distress syndrome (RDS), transient tachypnea of the neonate (TTN), pneumothorax, hypoglycemia (Depending on the age at assessment: 0-3 h: < 2.0 mmol/l; 3-24 h < 2.2 mmol/l; > 24 h < 2.5 mmol/l) [18], Apgar Score < 7, hyperbilirubinemia needing phototherapy (jaundice), near miss (a newborn infant who nearly died but who survived a complication occurring during pregnancy, childbirth, or in the first 7 days after the termination of pregnancy). For mothers we included following outcomes: hysterectomy, bleeding needing transfusion, and near miss miss (a woman who nearly died but survived a complication that occurred during pregnancy, childbirth or within 42 days of termination of pregnancy). We report outcomes with unspecific definition like respiratory morbidity as it is defined in the relevant study. The inclusion was limited to studies in WHO Stratum A. This covers states with very low child and very low adult mortality including western Europe, North-America and various Western-Pacific states [19]. We chose this stratum because of the very low general (and child) mortality and comparable access to health services, but also because of comparable CS rates and similar indications for CS, such as organizational reasons on hospital, personal maternal and clinical base [20]. We did not define any other exclusion criteria regarding the population. We considered randomized controlled trials (RCTs), quasi RCTs and cohort studies. RCTs are much more difficult to conduct (E.g. due to spontaneous onset of labor) and we expected low numbers of RCTs. Even though cohort studies are suspected to have higher risks of systematic biases, we assumed a high amount of data owing to birth registries. We did not make any restrictions regarding the language and publication date.

Information sources
We searched MEDLINE, EMBASE, CENTRAL and CINAHL on 29th of November 2018. We did not restrict the search to any language or publication date. Study registries were searched for new and unpublished studies (ClinicalTrials, Deutsches Register Klinischer Studien and EU clinical trials register). To identify grey literature we searched Google Scholar additionally.
We also checked the references of included studies, guidelines and systematic reviews and if necessary contacted authors for additional data.

Search strategy
The search strategy was developed using MeSH terms and text words and a librarian checked the strategy by applying the PRESS checklist [21]. The search strategies are available in Additional file 1.

Study selection
Records identified through the searches were added to an Endnote X7 database and duplicates were removed. Two reviewers independently assessed the relevance of the identified titles and abstracts according to the inclusion criteria. Studies which were included for full text review again were independently assessed by the same two reviewers. Differences were discussed until a consensus was found or a third reviewer was included.

Data collection
Data was collected in an a priori-piloted abstraction table by one reviewer, the other reviewer monitored all entries for completeness and accuracy. We extracted data directly in an excel sheet. If the study authors only reported adjusted effect measures in their publications we raised enquiries to the authors for unadjusted data.

Data items
We extracted following study characteristics: Author, publication year, region, setting, study design, recruitment period, exclusion criteria, patient characteristics (Age, body-mass index, ethnicity, diseases, parity, prior CS, indication for CS, marital/educational/socioeconomical status, payer, smoking status), time points measured, outcomes. All outcomes are collected as dichotomous variables and for each time point.

Risk of bias assessment
Two reviewers independently assessed risk of bias. We discussed differences until we found a consensus or a third reviewer was included. For RCTs we used the Cochrane Risk of Bias Tool [22]. For cohort studies we used the ROBINS-I Tool [23]. We first assessed risk of bias on study level and summarized it on outcome level.

Data synthesis
We only pooled studies that were assessed to be sufficient clinical homogenous judged by reviewers with clinical expertise. If studies were sufficiently clinically homogenous, a random-effects meta-analysis was performed. We performed a multivariate dose-response meta-analysis for pooling outcomes where time starting with 37 + 0 WG up to 42 + 6 WG in weekly steps represented the different doses. We examined visually for each outcome if the assumed time-response relationship was effectively present and how the relationship was shaped [24]. Therefore, we created plots showing the intervention effect for each study over time. Based on these curves we determined the shape (e.g. linear, Ushape) specified in the dose-response meta-analysis. For most neonatal (adverse) outcomes we recognized a regressive or u-shape (with a minimum at week 39) and for maternal (adverse) outcomes a progressive trend [10,16,25]. In the first stage of our analysis, we estimated a time-response curve (i.e. gestational week-outcome) for each study across WG values observed in the whole dataset. In the second stage these curves were pooled into an overall gestational week-outcome curve. The time-response analysis followed the two-stage method for dose-response-meta-analysis by Greenland & Longnecker [26]. We calculated study-specific slopes (linear trends) and 95% confidence intervals from the natural logs of the reported effect measures and confidence intervals across WG taking the correlations between RRs into account. In case of the reference category being not the lowest category we first recalculated the data in such a way that (depending on the shape) week 39 or the lowest category was the reference category. In cases where this was not possible, we excluded the categories below the reference category for the linear time-response analysis. For studies reporting ranges of weeks the midpoint of the lower and upper cut-off was assigned for each category. When upper and lower categories were open ended, the lower and upper cut-off value was 37 and 42 weeks. Again the midpoint of the lower and upper cutoff was assigned for each category. When authors reported the median or mean per category this was used to assign the corresponding RR for each study.
Statistical heterogeneity was assessed with the Q test, I 2 statistic and prediction intervals. Prediction intervals can help with the interpretation of heterogeneity, by presenting the expected range of true treatment effects in similar research [27,28].
All analyses were performed with R 3.3.2 using the meta and dosresmeta packages [29,30].
If data were too heterogeneous, we performed a structured narrative analysis of the outcome. We used GRADE to rate the certainty in evidence [31]. Two reviewers independently performed the GRADE assessment for each outcome with the GRADEpro GDT Software. Domains assessed with the GRADE approach are risk of bias, inconsistency, indirectness, imprecision, publication bias, large effects, confounding and dose response gradients.

Risk of bias across studies
Publication bias: We assessed publication bias by visual inspection of the funnel plot. We assumed publication bias if we found asymmetry in the plots. Furthermore, we applied Egger's test and Begg's test [32,33]. A p-value < 0.1 was considered statistically significant.
Selective reporting within studies: If available, study protocols were checked and compared with reporting in studies. We searched clinicaltrials.gov to detect protocols if not stated otherwise. We desisted from contacting authors of the publications of registries for protocols.

Additional analyses
We performed subgroup analyses for repeat CS vs. first CS and for studies including exclusively multiple pregnancies. Besides general deviations in multiple pregnancies compared to singleton pregnancies we assumed that CS is planned earlier than 37 + 0 WG to 42 + 6 WG more often, and may need a time-response analysis considering other comparisons of WG.
In a sensitivity analysis for primary outcomes, we conducted a univariate random effects meta-analysis (37 + 0 to 38 + 6 vs ≥39 + 0 WG) to demonstrate reliability of the results. We used the Paule and Mandel heterogeneity variance estimator and modified Hartung-Knapp confidence intervals for the pooled estimates [34,35].

Study selection
We identified 3200 hits in the databases after duplicate removal. One hundred twenty publications were screened in full text of which we included 29 in the review. Moreover we identified six references by screening the reference lists of five systematic reviews. The references from the guidelines, the search in Google Scholar and the search in registries resulted in no additional inclusions. The included and excluded (with reason) studies are presented in Additional file 1 and Fig. 1.

Study characteristics
Of the 35 included publications, three, Brookfield, Chiossi and Tita et al. used the same birth registry [36][37][38]. Also Vilchez et al. and Zanardo et al. published two papers from the same cohort [39,40]. We used the first publications and added outcome data from the following publications. Except for one RCT from Glavind et al. all studies were cohort studies [25]. One study, Wilmink et al. examined only twin births. Two studies from Japan, Nakashima et al. and Yamazaki et al., and one from Germany, Gawlik et al., only compared the 37 + (0-6) to the 38 + (0-6) WG and four, Doan, McAlister, Nir and Zanardo et al., did not report the single WG but compared 37 + (0-6) + 38 + (0-6) to ≥39 + (0-6) week [41][42][43][44][45][46][47][48][49]. These eight studies could not be included in any meta-analysis. Patient numbers of the included studies ranged from 96 to 785,340 with a median of 13,888. Twenty-two studies reported the exclusion of women with multiple pregnancies and 15 studies the exclusion of pregnancies with fetal congenital anomalies. In 24 studies exclusion criteria for mothers with any morbidity influencing the timing of birth (e.g. hypertension, diabetes, placenta previa) were reported. Nineteen Studies evaluated NICU admission and six studies evaluated neonatal death. Maternal death was only assessed in one study. None of the studies reported or considered near miss for neonates or mothers. One study, Terada et al., reported outcomes exclusively on oxygen supplementation and respiratory support with overlapping patients, so we did not include this in the meta-analysis [50]. For detailed study and patient characteristics see Additional file 1 and Table 1.

Risk of bias within studies
Risk of bias was assessed with the Cochrane Risk of Bias tool in the RCT from Glavind et al. see Fig. 2. We assumed a moderate overall risk of bias for the study of Glavind et al. attributable to the missing blinding. All other studies were assessed with the ROBINS-I tool.
Consistently throughout all studies confounding and selection of participants were the main issues and we assumed at least serious risk of bias in these domains, see Table 2. The detailed ratings to each bias domain can be found in Additional file 1.
A number of studies attempted to control confounding by multivariable logistic regression but we could not use these data for the meta-analyses because the regarded adjustment factors varied widely. Because we pooled and mainly reported the univariate analysis, risk of confounding was assessed for this analysis. Frequent confounders were maternal age, ethnicity, maternal and neonatal comorbidities, methods to determine gestational age and study setting. Women, who were planned to have elective CS in later term ≥39 + (0-6) WG but needed unplanned CS before term because of complications, are at higher risk for drop out, so the number of healthy women with uncomplicated pregnancies potentially rises in late term CS. In contrast, women who are suspected to have more complications during birth are terminated to an earlier CS, which leads to increasing numbers of complicated pregnancies in early term CS. Therefore, we rated almost all studies as critical or serious risk of bias.
We could not see any risk of bias regarding the classification nor deviation from the intended intervention. We could not determine if there was a risk of bias because of missing data, as none of the studies described how missing data was dealt with, nor if there was missing data. Risk of bias in measurement of outcomes was driven by the suspected influence of the knowledge about the timing of CS on outcome measures. The outcome measure for death or hysterectomy is not influenced by the knowledge of term (objective outcome) whereas the neonatologists/obstetricians judgement about NICU admission is highly influenced (subjective outcome). We did  ≥1 C-Section refers to studies including women who had at least one caesarean section before not find an indication for selective reporting of the results in any study. Table 2 shows the risk of bias assessment on study and outcome level.

Risk of bias across studies
The overall body of evidence assessment resulted in an assumption of serious or critical risk of bias. Figure 2 shows the risk of bias assessment for the outcome NICU admission. We did not produce graphs for each outcome as there would be nearly no difference in the graphs (Fig. 3). All meta-analyses except the one for NICU admission included less than ten studies. We were only able to evaluate publication bias for NICU admission, which we did by consulting the funnel plot, which did not suggest publication bias (see Additional file 1). Both, Eggers and Beggs test did not indicate publication bias (Eggers test: p-value: 0.46; Beggs test: p-value: 0.83).

Results of individual studies
Individual study results for NICU admission, neonatal and maternal death can be found in Additional file 1. We only identified one study from Chiossi et al., which analyzed maternal mortality [37]. The cases are very rare (1 in week 38, 4 in week 39) and we calculated a RR of 0.38 (95% CI 0.04 to 3.40, very low quality of evidence) for 37 + 0 to 38 + 6 WG versus ≥39 + 0 WG.

Synthesis of results
We extracted the outcome data for each WG study wise in Excel. We calculated RRs with the reference category 39 + (0-6) WG and created graphs presenting the RRs over time. For each outcome and for each study, graphs were produced in the same manner and we visually inspected if a linear trend could be expected. Figures 4  and 5 show the graphs presenting the development of the primary outcomes NICU admission and neonatal death over time. The curves show the RR of the pooled 14 studies on NICU admission and respectively 4 studies on neonatal death. Both graphs are accompanied by the upper and lower CI. The course of NICU admission is decreasing from 37 + 0 to 39 + 6 WG, while the course of neonatal death shows the u-shape from 37 + 0 to 42 + 6 WG with the lowest at 39 + 0-6 WG. See Additional file 1 for the illustration of individual study results, which are underlying the models chosen.
We performed linear time-response meta-analyses for the outcomes NICU admission, respiratory morbidity, hypoglycemia, Apgar score < 7, jaundice, RDS, TTN, pneumothorax, maternal hysterectomy and maternal blood transfusions. The RR for NICU admission was 0.63 (95% CI 0.56 to 0.71, I 2 = 95.4% low quality of evidence) (See Fig. 4) for each additional WG. All outcomes except Apgar score < 7, pneumothorax and both maternal outcomes showed a significant higher risk ratio the earlier the CS was performed. Except for sepsis, hypoglycemia, maternal hysterectomy and blood transfusion, all analyses showed high heterogeneity with I 2 > 30%. See Table 3 for the individual results of the meta-analyses. All studies had a serious or critical risk of bias and therefore we rated the certainty of evidence according as low or very low, see Table 4. Only hypoglycemia was assessed as moderate certainty of evidence. Three other meta-analyses were cubic spline time-response meta-analyses with 39 + (0-6) WG as the reference. Incidence for neonatal death, sepsis and hospitalization ≥5 days all showed U-shaped curves with a minimum at 39 + (0-6) WG, i.e. a decreasing incidence form the 37 + 0 WG to the 39 + (0-6) WG and rising incidence from the 40 + 0 WG. The RR for neonatal death from 37 + 0 to 39 + 6 WG drops to 0.59 (95% CI 0.43 to 0.83, I 2 = 77.5% low quality of evidence) and after 39 + 6 rises to 2.09 (95% CI 1.18 to 3.70, I 2 = 77.5% low quality of evidence) (see Fig. 5). Sepsis and hospitalization show         Table 3). The display of the GRADE evaluation in Table 4 is insufficient for the reporting the results of the cubic spline model. Therefore we chose to report the results as free text.

Additional analysis
We performed subgroup analyses for the primary outcomes NICU admission and neonatal death as we observed very high clinical and statistical heterogeneity. We performed a subgroup analysis with the studies that only include women with repeated CS. For the incidence of NICU admission we found a reduction of 34% in the reference group 39 + (0-6) WG by pooling four studies (RR 0.66 95% CI 0.65 to 0.67, I 2 = 0 moderate quality of evidence). The time-response meta-analysis showed a reduction of neonatal mortality until the 39 + (0-6) WG (RR 0.67 95% CI 0.51 to 0.87, I 2 = 0 very low quality of evidence) and increasing mortality higher than 39 + 6 WG (RR 1.68 95% CI 1.07 to 2.65, I 2 = 0 very low quality of evidence). The individual study results can be found in Additional file 1. The included studies did not supply enough information on first CS to perform subgroup analysis for first CS.
The sensitivity analyses using univariate analysis for the primary outcomes NICU admission and neonatal death resulted in an RR of 1.67 (95% CI 1.37 to 2.0, I 2 = 88%) for NICU admission (see Additional file 1) and an OR of 2.24 (95% CI 0.29 to 17.31, I 2 = 0) for neonatal death, showing higher risks in early term. For the Funnel plot of NICU admission see Additional file 1.

Main findings
We found that the rate of NICU admission decreases from 37 + 0 WG to 39 + (0-6) WG for elective CS. Risk of bias was serious in all studies and we even identified some with critical risk. The certainty of the evidence according to GRADE is low. The risk for respiratory morbidity in neonates and other postnatal events (jaundice, hypoglycemia) decrease in the same manner. Assuming a U-shaped pattern with 39 + (0-6) WG at the minimum, we found a decreasing risk of death from 37 + 0 to 39 + (0-6) WG and increasing from then on. The certainty of the evidence is low and a sensitivity analysis showed wide confidence intervals diminishing the robustness of results. Similar results were seen in hospitalization of the neonate for more than 5 days and sepsis. Certainty of evidence is very low and low for respiratory morbidity, hospitalization of the neonate for more than 5 days and jaundice and sepsis. Only hypoglycemia showed moderate certainty of the evidence.
Maternal mortality is a very rare event in countries of WHO stratum A [70]. We only found one study considering maternal death. The other maternal outcomes hysterectomy and blood transfusion showed higher event rates in late term but this only seems to be a hint regarding the statistical uncertainty. All studies considering maternal outcomes had serious risk of bias and certainty of evidence was very low. We found one study examining twin pregnancies. Elective CS was planned more often preterm and in general earlier than singleton pregnancies. We could not pool data with that from singleton pregnancies and cannot draw any conclusion on outcomes from identified data.
For future guidelines and decision making in elective planning of CS there is only sufficient evidence regarding neonatal outcomes. The evidence suggests decreasing NICU admissions in late term, especially in repeated CS. There seems to be a U-shape risk pattern for neonatal death with the minimum at 39 + (0-6) WG. Respiratory morbidity in neonates decreases in late term, still, evidence is uncertain. We cannot draw any conclusion from the findings regarding maternal outcomes.

Certainty of evidence
We identified serious risk of bias in all included studies due to the main issues of patient selection, confounding and lack of blinding. None of the cohort studies tried to resolve the issue of allocating pregnancies with less complication to late term groups and pregnancies with more complications to early term groups. Nor did any study report the reasons why women were selected for either group. There are diverse possibilities of confounding, for example ante-and postnatal care may not only differ between institutions but also between women considered for early term CS (increased monitoring) and late term. Also NICU admission policies may vary between institutions. Moreover we assume that the knowledge of early term CS is an indicator supporting NICU admission. As we see in Glavind et al., performing an RCT is possible even if randomization must take place in a short period of about two or 3 weeks (e.g. 38 + (0-6) vs 39 + (0-6) WG) [25].

Limitations in the review process
Our review has various limitations. We admit methodological limitations by pooling studies with great heterogeneity. We included any study without differentiating inclusion criteria (e.g. elective CS without any medical indication vs. Elective CS with medical indication), which resulted in high heterogeneity. We could not use any data from the studies that controlled for confounding because the controlling variables were too heterogeneous. Some studies reported the use of ultrasound for an estimation of the gestational age or a combined method with the date of the last menstruation. Others did not report the method.
We did not differentiate or include this information in our analyses and might have missed on relevant issues. Moreover we pooled outcomes like respiratory morbidity which may differ in their definition of measuring. Furthermore, a broader assessment of maternal adverse events might be more relevant than assessing maternal death due to the rarity of events in the countries we considered in our analysis.
Various outcomes can be considered rather surrogates for neonatal morbidity than of direct importance to the patients, such as NICU admission and hypoglycemia [71]. But nevertheless NICU admission may lead to several negative effects on the development of the neonate and the parental relationship, for example the impact on breastfeeding [72,73]. As NICU admission is always connected with various stressors it may also negatively affect the long term development of the neonate [74,75]. Moreover the outcome hypoglycemia is a surrogate for neuronal energy and may affect (longterm) neurological development of the neonate [76,77].
By constructing meta-analyses for NICU admission we summed up data for all WG ≥37 + 0 to 39 + 6 WG because not all included studies specified later WG and also the linear trend showed no change after 39 + 6 WG. For the other outcomes we ignored missing data in > 39 + 6 WG and let the linear trend continue decrease, remain or even change and further on increase (cubic spline models).
We limited our research to high income countries with very low general and child mortality. Those countries have similar rates of elective CS and comparable reasons for CS (e.g. medical, women's preference, hospitals preference). We excluded lower WHO strata due to various reasons: General and especially child mortality is higher among other due to worse access to health care, and access to health care also indicates the use of CS, for example in central African regions where health care is limited CS rate is lower than 5%. Meanwhile access to health care and elective CS rate vary within one country in rural areas and areas with more infrastructure reflecting prosperity of the people e.g. China, Middle Eastern countries. As women who receive elective CS in low and middle income countries may vary much more regarding the risk and also backgrounds (education, prosperity, access to healthcare and cultural beliefs), this should be covered in a more precise and separate analysis [20].

Conclusions
We found that elective CS before the 39 + (0-6) WG lead to more NICU admissions, respiratory morbidity of the neonate and neonatal deaths, though death is rare and increases again after 39 + 6 WG. The decreasing respiratory morbidity is in accordance with the current NICE and RCOG guidelines (refs). Except for repeated CS, evidence is very heterogeneous. Nevertheless one can assume due to the strength of effects performing elective CS in late term is advantageous for neonatal morbidity. Glavind et al. performed a systematic review comparing the 38 + (0-6) and 39 + (0-6) WG for NICU admission, respiratory morbidity and maternal adverse events [78]. They showed similar results in the neonatal outcomes and also did not have enough data on maternal adverse events to make any conclusion. Our results do not differ from the original work for the German ministry of health [16], although our methods differed slightly and we assume a more precise validity of the results owing to the time-response analysis. There is not enough evidence on maternal outcomes to support a decision between early and late term CS. There is a need for more research, especially on maternal outcomes to provide a balanced decision between neonatal and maternal health. Moreover it would be desirable to know more about the reasons that can cause heterogeneity to support patient individual decisions based on pregnancy characteristics, morbidities or maternal characteristics.

Deviations from the protocol
We deviated from the protocol in the extraction of two outcomes. First we did not extract birth weight of the neonate, as we came to the decision, that early term births have naturally lower birth weight than full term neonates. We neither extracted the outcome maternal adverse events, as they were defined so differently and heterogenic, that we could not see any coherence. e did not request study protocols directly from the authors, as we assumed that the probability that protocols for registry studies were developed is low. As we did not pool maternal mortality we end up not using any beta binomial model for pooling data at all. Furthermore we did not pool adjusted data as adjustment factors were too heterogeneous.