Setting
The setting of this study, Manitoba, is generally representative of Canada as a whole, ranking in the middle for several health and education indicators [12, 13]. At the time of the 2011 Census, approximately 1.2 million people resided in Manitoba, with more than half (783,247) living in the two urban areas, Winnipeg and Brandon [14]. Teenage pregnancy rates in Manitoba exceed the national; in 2010 teenage pregnancy rates in Canada were 28.2 per 1000, in Manitoba the rate was 48.7 per 1000 [15]. The Manitoba teen pregnancy rates in 2010 were slightly lower than rates in England and Wales (54.6 per 1000), and the United States (57.4 per 1000) [16, 17].
Data
The Manitoba Population Health Research Data Repository contains province-wide, routinely collected individual data over time (going back to 1970 in some files), across space (with residential location documented using six digit postal codes), for each family (with changes in family structure recorded every 6 months) and for each resident. Health variables are measured continuously from physician claims and hospital abstracts (as long as an individual remains in Manitoba) [18].
A research registry identifies every provincial resident, with information on births, arrival and departure dates, and deaths created from the provincial health registry and coordinated with Vital Statistics files. Given approximately 16,000 births annually, follow-up (about 74 % over 20 years) is comparable to that in the largest cohort studies based on primary data [19]. Previous research using similar data shows the results are not biased by individuals leaving the province or dying. Information on data linkage, confidentiality/privacy, and validity of the datasets used have been described elsewhere [20–22]. Children are linked to mothers using hospital birth record information; the mother was noted in essentially all cases [23]. Sisters were defined as having the same biological mother.
The cohort consists of women who were born in Manitoba between April 1, 1979 and March 31, 1994, stayed in the province until at least their 20th birthday, had at least one older sister, and had no missing values on key variables. In this study, teenage pregnancies are defined as those between the ages of 14 and 19; pregnancies prior to age 14 were excluded due to low numbers and for comparability to other studies. For this reason, families in which at least one sister had a pregnancy before age 14 were removed (34 families). To address threats of independence, when a family had more than one younger sister (more than two daughters), one younger sister was randomly selected. Figure 1 diagrams the selection trajectory for the 17,115 individuals selected—boxes in bold indicate the included cohort. At age 14, just over 85 % of girls in this cohort were living in the same postal code as at least one older sister.
Outcome
Teenage pregnancy was defined as having at least one pregnancy between the ages of 14 and 19 (inclusive). A pregnancy is defined as having at least one hospitalization of with a live birth, missed abortion, ectopic pregnancy, abortion, or intrauterine death, or at least one hospital procedure of surgical termination of pregnancy, surgical removal of ectopic pregnancy, pharmacological termination or pregnancy or intervention during labour and delivery. Pregnancy status was determined by ICD-9-CM codes (for diagnoses before April 1, 2004), ICD-10-CA codes (for diagnoses on or after April 1, 2004), and Canadian Classification of Health Intervention (CCI) codes in the hospital discharge abstract database [24]. Appendix 1 presents specific codes used to determine pregnancy status.
Independent variable
The independent variables of interest were whether an individual had an older sister with a teenage pregnancy (defined for all sisters as described above) and whether an individual’s mother bore her first child before age 20.
Covariates
Based on an extensive literature review and availability of information in the database, several key variables describing neighborhood, maternal, and individual characteristics were included [4, 25]. Covariates measure characteristics in the younger sister’s life before age 14. Neighborhood socioeconomic status at age 14 was measured by the Socioeconomic Factor Index (SEFI) (higher SEFI score corresponds with lower socioeconomic status), which is generated using Manitoba (Statistics Canada) dissemination areas [26]. This index combines neighborhood information on income, education, employment, and family structure. These neighborhoods typically include between 400 and 700 urban individuals and are somewhat larger in rural areas. Neighborhood location at age 14 was divided into urban (Winnipeg and Brandon), rural south (South Eastman, Central, and Assiniboine Regional Health Authorities), and rural mid/north (North Eastman, Interlake, Parkland, Nor-Man, Churchill, and Burntwood Regional Health Authorities). The maternal characteristic included is marital status at birth of child. An individual’s number of older sisters was also accounted for.
Three time-varying covariates between birth and age 13 for the younger sister were included in the study- mental health conditions, residential mobility, and family structure change. These variables can occur at specific points in time and the timing of their occurrence can differ across individuals. Mental health is defined using the Johns Hopkins University Adjusted Clinical Group (ACG) software; this software groups medical and hospital diagnoses over the course of a year into 27 Major Expanded Diagnostic Clusters (MEDCs) [27]. If for 1 year between birth and age 13, the diagnoses an individual received fell into the ‘Mental Health’ MEDC, that individual was categorized as having mental health conditions before age 13. Residential mobility was measured by at least one residential move (defined by change in six digit postal code) between birth and age 13. At least one change in family structure (parental divorce, death, marriage, remarriage) between birth and age 13 was noted as ‘family structure change’.
Low educational achievement has been linked to an increased risk of teenage pregnancy [28]. The earliest measure of educational achievement available is the Grade 9 Achievement Index, which was built on a technique developed by Mosteller and Tukey using enrollment files, course grades, and the provincial population registry [29, 30]. As some of the individuals in this cohort experience their first pregnancy before completing grade 9, this covariate is only appropriate for girls having their first pregnancy after their 16th birthday. Sensitivity testing was done with this population to determine how strongly educational achievement affected the odds of the variables of interest.
Analytic approach
The relationship between pregnancy during one’s teenage years and having an older sister who became pregnant during adolescence or having a mother who bore her first child as a teenager is confounded by many measured and unmeasured characteristics. We adjusted for these confounding characteristics using 2:1 propensity score matching [31]; two controls were matched with every case as this “will result in optimal estimation of treatment effect [32]”. Propensity score matching both enables adjustment for several confounders simultaneously and facilitates diagnostic tests to identify whether the adjustment strategy created comparable exposure groups (i.e., whether women with and without an older sister who got pregnant during adolescence are similar on observed characteristics) [31]. Logistic regression models were used to calculate propensity scores for two responses—the predicted probability of having an older sister having a teenage pregnancy and the predicted probability of having a mother bearing her first child before age 20. For each model, we investigated the comparability of our two groups—those with and without an older sister having a teenage pregnancy, and those with and without a mother who bore her first child as a teenager—using two diagnostics. A kernel density plot verified that the distribution of propensity scores in our two groups overlapped [33]; each case was matched to two controls using greedy matching [34]. Second, after matching, the balance of the covariates was assessed using standard differences and t-tests. Covariate balance was checked by t-statistics calculated for the standardized differences between cases and controls for each covariate before and after matching. Any point outside of the two vertical dotted lines signified a statistically significant difference between the cases and controls on that covariate (at p = 0.05) (Figs. 2 and 3).
Conditional logistic regression analysis of the matched cohorts examined the impact of an older sister’s teenage pregnancy and of a mother’s teenage childbearing on teenage pregnancy. Sensitivity analysis helped assess the validity of the assumption of no unobservable confounders, and assessed how strong the influence of unobserved covariates would have to be in order to nullify our findings [35, 36]. The lower limit of the 99 % confidence interval (selected to be more conservative) was used to determine the threshold unobserved covariates would have to reach to void the observed relationship.