Validating the British Columbia Perinatal Data Registry: a chart re-abstraction study

Background The British Columbia Perinatal Data Registry (BCPDR) contains individual-level obstetrical and neonatal medical chart data for virtually all births occurring in British Columbia, Canada. The objective of this study was to assess the validity of information in the BCPDR by performing a provincial chart re-abstraction study. Methods A two-stage stratified clustered sampling design was employed. Obstetrical facilities were stratified based on geographic location and obstetrical volume. Charts of mothers and newborns with a length of stay of five or more days or transfer to another facility following the delivery were oversampled. A total of 85 maternal and 32 newborn variables were assessed for completeness (percent completion) and validity (sensitivity and specificity for categorical variables, intra-class correlation coefficient [ICC] for continuous variables). Results 1,084 maternal and 1,142 newborn charts were abstracted. Mandatory variables such as primary indication for induction and primary indication for cesarean delivery were 100 % complete. Some variables such as pre-pregnancy weight were relatively more complete in the re-abstraction as compared with the BCPDR (83.0 % vs 76.8 %; p < 0.001). The validity of key surveillance variables was high (e.g., HIV screening completed [sensitivity 98.0 %, 95 % confidence interval (CI) 97.0–98.8 %; specificity 72.3 %, 95 % CI 60.8–81.9 %], induction of labour [sensitivity 93.9 %, 95 % CI 90.2–96.5 %; specificity 98.7 %, 95 % CI 97.7–99.3 %], primary elective cesarean delivery [sensitivity 96.0 %, 95 % CI 83.8–99.7 %; specificity 99.8 %, 95 % CI 99.4–100.0 %], gestational age from newborn examination [ICC 0.99, 95 % CI 0.99–0.99]). Examples of variables with lower validity included total admissions prior to delivery episode, maternal smoking status, and timing of breastfeeding initiation. Conclusion Many important clinical and population health variables in the BCPDR had excellent validity. Some key variables warrant strengthening through improved definitions, system changes, and abstractor training. Electronic supplementary material The online version of this article (doi:10.1186/s12884-015-0563-7) contains supplementary material, which is available to authorized users.


Background
Perinatal Services BC (PSBC), an agency of the Provincial Health Services Authority (PHSA), has the mandate to improve the capacity and processes of provincial perinatal services through strategic leadership on the full continuum of perinatal care in British Columbia (BC), Canada [1]. PSBC's mandate is directly supported by the operation and maintenance of the BC Perinatal Data Registry (BCPDR), a provincial database that contains individual-level obstetrical and neonatal medical chart data for virtually all births occurring in BC [2].
The BCPDR has maintained provincial coverage of hospital deliveries and Registered Midwife-attended home births since 2000. The registry collects over 300 data elements for approximately 45,000 births per year. The scope of data spans the antepartum, intrapartum, and postpartum periods and includes information on maternal, fetal, and newborn characteristics. Data from the BCPDR are widely used for surveillance and research purposes and to support health care providers, researchers, and policy makers in their work to improve fetal, neonatal, and maternal health outcomes as well as to enhance the delivery and quality of perinatal care in BC [3][4][5].
As an administrative database with data entry performed by multiple abstractors at numerous sites across the province, the BCPDR is vulnerable to errors. Data errors can result from incomplete or illegible chart documentation, incorrect data entry, misinterpreted or ambiguous data definitions, and inadequate abstractor training and monitoring [6][7][8]. To identify and minimize errors, the BCPDR is subject to a rigorous system of ongoing quality checks at both the hospital and provincial levels. Small-scale validation studies have provided additional insights into the reliability of data captured in the registry [9]. However, previous validation studies have typically been one-time projects focusing on a single jurisdiction and/or on select variables and cannot be generalized to provincial-level data or all variables contained in the database. The objective of our study was to perform a large-scale provincial evaluation of BCPDR data elements using expert chart re-abstraction.

Methods
A two-stage stratified clustered sampling design was used to obtain a provincially representative sample of medical charts to undergo re-abstraction. For hospital births, the province's 52 obstetrical facilities were stratified based on geographic location (Vancouver Island, Vancouver Coastal, Fraser, Interior, and Northern Health Authorities, plus BC Women's Hospital [Provincial Health Services Authority]) and obstetrical volume (<1,000, 1,000-2,499, and ≥2,500 deliveries per year). Home births attended by Registered Midwives were sampled independently from two strata based on site of data abstraction. A target sample size of 1,110 charts for each of maternal delivery and baby newborn (neonatal) discharges was based on achieving a precision of +/-3 % assuming (conservatively) an estimated proportion of 50 %. The study was powered to detect registry-level differences and was not designed to detect facility-level differences. The sample was allocated across strata using disproportional allocation methodology [10], which increased the sample size for small strata and decreased the sample size for large strata. The sample was equally distributed across all facilities selected within each stratum.
The sampling frame of charts was derived from the BCPDR and included all admission episodes with discharge dates between April 1, 2010 and March 31, 2012. Separate sampling frames were created for maternal and newborn charts. Within each hospital or home birth stratum, half of the charts were selected from the maternal frame and half of the charts were selected from the newborn frame. To ensure adequate sampling of fields pertaining to complex cases, the charts of mothers and babies with a total length of stay of five or more days or transfer to another facility following the delivery or newborn episode were oversampled by 50 % of the total sample. Each selected maternal chart was linked to the corresponding newborn chart(s), and vice versa. Linked maternal and newborn charts were re-abstracted including all babies (siblings) from multi-fetal pregnancies. This was done to ensure that the most complete information was re-abstracted for each pregnancy. In some cases, important newborn information was documented in the maternal chart only and vice versa. Pulling both maternal and newborn charts to abstract together helped to generate the most complete information in each of their re-abstracted charts. The final sample included 1,114 mother-baby dyads (or triads) from 17 facilities or births at home.
Re-abstraction was performed by five senior health records personnel with extensive experience working with the BCPDR. Abstractors underwent an additional threeweek training period during which inter-rater agreement on an independent sample of maternal and newborn charts was subjectively assessed on an ongoing basis. Differences in abstraction were discussed and consensus was used to determine how to best align responses during the study. Data were entered directly into the existing BCPDR data entry tool using laptops provided by PSBC. Hard error checks (i.e., data entry restrictions based on logic checks) were replaced with soft warnings to permit increased flexibility with data entry. All data fields were re-abstracted with the exception of diagnosis, procedure, and doctor service fields typically imported from the Canadian Institute for Health Information's Discharge Abstract Database (DAD). Two extra fields designed specifically for this study were included to assess the potential impact of missing chart documentation on the quality of dating ultrasound information. An additional qualitative data collection tool was developed to capture feedback on the usability of individual data input fields from the perspective of the abstractors. The qualitative tool was also used to document information on missing charts and other feedback from the re-abstractors.
Permission to access patient information was obtained from each of the hospitals' data stewards. As a quality assurance project, this study was exempt from Research Ethics Board review under article 2.5 of the TCPS2 (the overarching ethical framework for research involving human participants in Canada including the University of British Columbia BC Children's and Women's Hospital Research Ethics Board). Primary data collection took place from February to April 2013 and was mostly performed on-site to allow access to paper charts. For a small number of facilities with electronic medical records, re-abstraction occurred from a satellite location within the same Health Authority.
The re-abstracted database was linked with the original BCPDR database using a unique numeric identifier assigned to each mother or baby. Analyses were performed using SAS version 9.3 and STATA version 13. Proportions of missing values for each variable were quantified and differences in completion between the reabstraction database and the BCPDR were tested using a modified Rao-Scott chi-square with a p-value <0.05 considered to be significant [11]. We assessed the validity of variables that were completed in 10 % or more of deliveries to ensure that we would be able to estimate validity with reasonable statistical precision. Variables with less than 10 % completion were typically those that did not apply to all pregnancies (e.g., the variable indicating 'eligibility for vaginal birth after cesarean' is only completed for women with a previous cesarean delivery). The reabstracted data were used as the gold standard, and agreement of the BCPDR data with this gold standard was assessed by calculating sensitivity, specificity, and positive predictive value of categorical variables and the intra-class correlation coefficient (ICC) for continuous variables. Date variables were dichotomized based on completion (completed/missing) and assessed for validity using sensitivity and specificity. Accompanying qualitative data were reviewed and analyzed for themes to identify reasons for discordance between the re-abstracted and original data. The stratified clustered sampling design was incorporated in the analysis using appropriate sampling weights.

Results
Analyses were based on 1,084 maternal charts and 1,142 baby charts. The oversampling criteria resulted in an overrepresentation of multi-fetal pregnancies. Therefore, more newborn charts were re-abstracted compared to maternal charts. In total, 82 maternal and 25 newborn variables met the ≥10 % completion criterion and were assessed for validity. As shown in Table 1, the maternal and newborn characteristics in the final weighted cohort were similar to those of all births in the province. Table 2 presents the completeness of mandatory (i.e., produces a hard error if left blank) variables and other maternal variables routinely used for surveillance. Most variables had high (≥80 %) levels of completion. Examples of variables with lower levels of completion in the BCPDR included pre-pregnancy weight (77 %), admission weight (58 %), last menstrual period date (68 %), and first ultrasound date (71 %). Fourteen variables were significantly more complete in the re-abstraction (e.g., height, last menstrual period date). In contrast, Hepatitis B screening results, cervical dilation on admission, and spontaneous labour were significantly more complete in the BCPDR. Completion of first ultrasound date was higher in the re-abstraction database. Further analysis based on the two extra fields indicated that the proportion of missing first ultrasound information would have been reduced by 50 % if routine coding instructions had included first ultrasounds between 4 and 24 weeks (instead of the current instructions to include only ultrasounds between 4 and 19 weeks). The completion of additional maternal variables is presented in Additional file 1.
Completeness of mandatory and other neonatal variables routinely used for surveillance is shown in Table 3. All variables had high levels of completion in the BCPDR (>90 %) with the exception of gestational age from newborn examination (75 %). Completion of additional newborn variables is shown in Additional file 2. Variables that were applicable to only a subset of records (e.g., 10 min Apgar score, surfactant given, resuscitation interventions) had lower levels of completion. Tables 4 and 5 summarize the prevalence and measures of validity for selected mandatory and other common categorical and continuous maternal variables routinely used for surveillance. Sensitivity, specificity, positive predictive value, and/or ICC were high for most maternal variables. For instance, the BCPDR captured completion of HIV screening with 98.0 % sensitivity (95 % confidence interval [CI] 97.0-98.8 %), and a positive

Discussion
This provincial chart re-abstraction study showed overall high quality of data contained in the BCPDR with some variation in the completion and validity of certain variables. In general, maternal, antenatal, labour and delivery, drug administration, maternal trauma, postpartum, and newborn information was relatively complete. Within these groups, specific variables related to gestational dating and maternal height and weight appear to be underreported. Many maternal and newborn variables in the BCPDR had high levels of validity where values were available from both the original and re-abstracted records. Lower levels of validity were observed for total prior admissions during the current pregnancy, position and presentation of baby during labour and delivery, VBAC eligibility, primary indications for induction and cesarean delivery, delivery provider, newborn resuscitation, breastfeeding variables, and postpartum infection.
Variables with lower completion rates mostly required precise measurements or specific dates. Low completion rates in the re-abstracted database may suggest this information is not available in the chart or is not documented in a format that is consistent with current BCPDR data entry specifications (e.g., "high school" instead of the number of school years completed). Low completion variables also tended to be related to sensitive risk factors such as maternal smoking. For other variables, a significantly higher rate of completion in the re-abstracted database suggests the information is available in the chart, but may not be documented in the recommended chart location that is typically reviewed by facility abstractors (e.g., height, weight). It was not surprising that variables in the re-abstraction were generally more complete. Abstractors recruited for this project were asked to thoroughly review all aspects of each chart to retrieve the most accurate information. Also, re-abstraction was carried out without time constraints. In the "real world", facility abstractors may be required to complete the abstraction process within a finite amount of time (e.g., 20 min per chart) and as a result, are limited in the time and number of locations they can search for information.     Data entry restrictions for the date and gestational age at first ultrasound fields impacted completion of these variables. The 4-19 weeks restriction was implemented at the time of the BCPDR's inception when determination of gestational age by ultrasound was considered to be the most accurate prior to 20 weeks. However, recent clinical practice guidelines suggest that ultrasound remains the most accurate method for estimating delivery date up to 23 weeks [12].
Finally, some variables had lower completion rates in the re-abstraction database. Among those with the greatest discrepancy, the lower completion rate for cervical dilation on admission likely resulted from clarifications that arose during the training period about criteria required to abstract this variable. BCPDR guidelines direct abstractors to record the cervical dilation measurement taken within the first hour of admission [13]. However, it was unclear if this definition includes measurements taken in the one hour prior to admission (e.g., during triage) as well as in the one hour after admission. For the purposes of the reabstraction, abstractors were asked to record only measurements taken in the one hour after admission, which may have increased the number of missing values. For gestational age from newborn examination, the lower completion rate in the re-abstraction may have resulted from different abstractor practices related to gestational age descriptions on the medical chart. For example, care providers may document the gestational age from newborn examination as "term" on the medical chart. Some site abstractors may have translated this into a gestational age (e.g., 40 weeks) for the purposes of the BCPDR, whereas the re-abstraction staff would have left the field blank.
Potential explanations for disagreements between the two databases were highly variable dependent. Incorrect documentation of position and presentation has been identified previously through routine data quality  reviews. To address this known issue, a PSBC Bulletin was issued in 2011 to provide clearer guidance to abstractors for determining this information from the chart [14]. This educational strategy was implemented part-way through the re-abstraction study period, which may account for some of the disagreement observed. For cesarean delivery indications, feedback from abstractors highlighted that several indications may be provided for a delivery, requiring the abstractor to determine the most important (primary) indication for purposes of data entry. The same also applies to the primary indication for induction. Discordance within primary indication for operative delivery has also been reported previously and attributed, in part, to the absence of a specific place for consistent documentation in the chart and ambiguity of indication categories in the BCPDR [15]. Furthermore, the relatively large proportion of records with an 'other' primary indication for operative delivery suggests that existing response options may not be appropriate for current practice.
The discordance in other variables such as postpartum infection may also be explained by the absence of a specific place for documentation on the provincial perinatal forms. The lower sensitivity for "unknown time of stillbirth" likely reflects the larger clinical challenge of determining time of fetal demise in utero. Lower validity across the newborn resuscitation variables may have resulted from lack of clarity in the abstractor guidelines regarding definitions of resuscitation, ventilation, invasive and non-invasive CPAP, as well as time and place of intervention. These definitions have since been expanded upon and clarified in an updated version of the PSBC Reference Manual [16]. The lower sensitivity of breastfeeding initiation was likely impacted by the use of multiple versions of the Newborn Clinical Path record, which contained different breastfeeding interval categories, across the province. Qualitative feedback from the abstractors also indicated challenges with determining precise time of breastfeeding initiation where this was not clearly documented in the charts. This was reflected in the relatively large 'unknown' categories for both breastfeeding variables. The moderate ICC noted for gravida was due to a small number of identified typos in the re-abstraction study and should be interpreted as negligible for the purposes of this evaluation. The discrepancy for total prior admissions during the current pregnancy may have been due to reduced access to medical charts from prior admissions during the re-abstraction and should also be interpreted with caution. Finally, site abstractors were likely more familiar with names and designations of local health care providers thereby contributing to disagreement for the delivery provider variable.
The Niday Perinatal Database (NPD) in Ontario, Canada, has undergone a similar quality assurance evaluation using chart re-abstraction to determine the reliability, completeness, and comprehensiveness of provincial perinatal data. The findings for most data fields between the NPD and the BCPDR were similar. Examples of variables with different findings include gestational age at delivery and birth weight, both of which had excellent validity in the BCPDR but had poor agreement in the NPD. In contrast, the validities of "breech" and "dystocia" as indications for cesarean delivery were lower in the BCPDR compared to the NPD [17]. Although we did not assess the validity of diagnosis, procedure, and doctor service fields imported into the BCPDR from the DAD, clinical coding practices of hospitals contributing to the DAD are reviewed on an ongoing basis e.g., [18]. Validation of key perinatal fields captured by the DAD has also occurred through comparison to another provincial perinatal database in Canada, the Nova Scotia Atlee Perinatal Database [19].

Strengths and limitations
Key strengths of this study include a large sample size, provincial representation of hospital and home births in BC, and inclusion of many variables across the perinatal continuum. Furthermore, the mixed methods approach allowed us not only to quantify discordance and    validity measures, but also to elucidate potential reasons for differential variable performance using qualitative feedback. For the purposes of this validation study, information documented in the medical chart was assumed to be accurate. However, the findings are limited by the absence of a true gold standard with which to compare BC's perinatal data registry. We compared the registry data against data obtained from abstractors who were highly experienced in obstetrical coding, routinely worked with BCPDR data, and underwent an extensive training period to clarify ambiguities in the abstractor guidelines prior to primary data collection. Electronic medical records with data entered by care providers at the point of care may help to increase the accuracy of the BCPDR in the future; however, until such time as a provincially-integrated system is available, chart abstraction is required. The results presented here were derived using sampling weights based on a sampling frame of designated obstetrical facilities. Thus, the results may not reflect the small number of charts for births that occurred in non-obstetrical facilities during the study period. High ICCs for continuous variables should be interpreted with caution as they were calculated after excluding charts with missing values. Finally, the sample of charts included in the study was too small to validate variables that represent low prevalence interventions such as some methods of induction and augmentation, conditions such as blood transfusions and severe maternal and newborn morbidity, and most maternal risk factors (e.g., gestational hypertension, gestational diabetes, antepartum hemorrhage, and congenital anomalies in prior pregnancy).