Comparison of birth certificates and hospital-based birth data on pregnancy complications in Los Angeles and Orange County, California

Background The incidence of both gestational diabetes mellitus and preeclampsia is on the rise; however, these pregnancy complications may not be systematically reported. This study aimed to examine differences in reporting of preeclampsia and gestational diabetes between hospital records and birth certificate data, and to determine if such differences vary by maternal socioeconomic status indicators. Methods We obtained over 70,000 birth records from 2001 to 2006 from the perinatal research database of the Memorial Care system, a network of four hospitals in Los Angeles and Orange Counties, California. Memorial birth records were matched to corresponding state birth certificate records and analyzed to determine differential rates of reporting of preeclampsia and diabetes. Additionally, the influence of maternal socioeconomic factors on the reported incidence of such adverse pregnancy outcomes was analyzed. Socioeconomic factors of interest included maternal education levels, race, and type of health insurance (private or public). Results It was found that the birth certificate data significantly underreported the incidence of both preeclampsia (1.38 % vs. 3.13 %) and diabetes (1.97 % vs. 5.56 %) when compared to Memorial data. For both outcomes of interest, the degree of underreporting was significantly higher among women with lower education levels, among Hispanic women compared to Non-Hispanic White women, and among women with public health insurance. Conclusion The Memorial Care database is a more reliable source of information than birth certificate data for analyzing the incidence of preeclampsia and diabetes among women in Los Angeles and Orange Counties, especially for subpopulations of lower socioeconomic status.


Background
Adverse pregnancy outcomes such as gestational diabetes mellitus and preeclampsia have important consequences on the growth, development, and health of children and mothers alike. Gestational diabetes mellitus is defined as carbohydrate intolerance with onset or recognition during pregnancy [1]. The true prevalence of gestational diabetes is unknown, with estimates ranging from 1 to 14 % of pregnancies affected in the United States annually [2][3][4]. Preeclampsia is a pregnancy complication that is characterized by the onset of hypertension and the presence of protein in the urine at >20 weeks gestation in a previously normotensive woman [5]. It is estimated to have affected 3.8 % of US deliveries in 2010, with the rate of severe preeclampsia rapidly increasing over the past three decades [6].
Despite the fact that the incidence of both gestational diabetes and preeclampsia is on the rise, there is concern that these and other pregnancy complications are not systematically recognized, diagnosed, or reported. In fact, numerous studies have found significant inconsistency between medical information that is reported on birth certificates and other medical records such as hospitaldischarge records. A recent study found that in Washington State, most pregnancy complications in general, and gestational diabetes in particular, are substantially underreported on birth certificates compared to hospital-discharge data [7]. Another study conducted in Ohio discovered that birth certificate data was less reliable than hospital records for identifying maternal risk factors, comorbid conditions, and complications of pregnancy, labor, and delivery [8]. Moreover, a number of other studies have discovered underreporting of birth defects [9], delivery complications [10], and other standard measures taken at birth [8] using state birth certificates alone. A review of the current literature suggests that birth certificates are generally not reliable sources of information on tobacco and alcohol use, prenatal care, maternal risk, pregnancy complications, labor, and delivery [11].
In addition to the question of systematic underreporting of adverse pregnancy outcomes, there is considerable interest in determining if this underreporting varies according to socioeconomic status. The socioeconomic gradient in preeclampsia is well established, with more recent data suggesting that there is a significant negative association between socioeconomic status and preeclampsia. Studies have found such an association using maternal education level [12], census tract income level [13], and household income level [5] as socioeconomic indicators. However, inconsistent findings have also been reported, with older studies finding no association between low socioeconomic status and preeclampsia [14][15][16].
Studies that have investigated the association between socioeconomic status and gestational diabetes, however, have generated conflicting results that vary according to the socioeconomic indicators used. Numerous studies have found low socioeconomic status to be associated with a higher risk of gestational diabetes using maternal employment status [17], neighborhood income level [18,19], education level [20,21], type of hospital services used [22], and family income level [23,24] as socioeconomic indicators. In contrast, a study that measured a combination of indicators including maternal employment status, education level, parity, and monthly income level, found no association between low socioeconomic status and gestational diabetes [25]. Other studies have also found no association using neighborhood deprivation level [26], insurance status [27], and level of maternal education [28] as markers.
It has not yet been determined if the discrepancies in the results of these studies may be accounted for by differential reporting rates among populations of varying socioeconomic levels, a feature that may vary across study settings. Additionally, the reliability of California's birth certificate registry compared to available medical records has yet to be determined. Thus, this paper examines the possibility of underreporting of adverse pregnancy outcomes in patients in Los Angeles County and Orange County, California, using data from both birth certificates and hospital birth records, with a specific focus on possible differential rates of underreporting according to socioeconomic status.

Birth record data
Birth record data from the period of 2001 to 2006 were obtained from the Memorial Care System, a network of four hospitals that maintains a perinatal database for research purposes [29]. Records are inputted into this database by nurses when patients are admitted to the hospital for delivery. The four hospitals -Anaheim, Long Beach, Orange Coast, and Saddleback Memorial Medical Centersare located in Los Angeles and Orange Counties in Southern California in the United States. Additionally, birth certificate data were obtained for the four hospitals and the same study period from the California Department of Public Health [30].

Data linkage
Birth records from the Memorial Care System were matched to birth certificate records using fuzzy matching logic through the SAS "COMPGED" function, which measures the "generalized edit distance" that summarizes the degree of difference between two text stringsi.e. the number of deletions, insertions, or replacements in the characters of a word required to arrive at the observed word [31]. Memorial records were matched to birth certificate records according to their calculated degree of similarity based on different combinations of four variables: mother's first and last name, date of birth (DOB) of child, DOB of mother, and hospital of delivery. These variables were selected because they were the only variables available in both datasets that were specific enough to identify a particular birth. To maximize the matching rate, we conducted multiple matching procedures. Four variables were used in the matching process: DOB of child, hospital of delivery, DOB of mother, and mother's name. The following criteria by order of priority were used to compile the final dataset: 1) An exact match on all of the four variables, 2) An exact match on DOB of child, hospital of delivery, and mother's full name, but non-exact match on DOB of mother, 3) an exact match on DOB of child, hospital of delivery, and DOB of mother, but nonexact match on mother's name, and 4) an exact match on DOB of child and hospital of delivery, but non-exact match on both mother's name and DOB of mother. We assigned a partial match on DOB of mother if two of the three fields in the date of birth (i.e. year, month, day) were the same. We assumed a partial match on mother's name if the first three characters in both the first name and the last name of the mother were matched. There are respectively ten and three records with any missing values in the four matching variables in the Memorial and the birth certificate data, and they were excluded from the matching process.

Data analysis
We summarized the basic socio-demographic variables in both the Memorial and the birth certificate data and in the matched and the unmatched groups for each dataset. T-tests for continuous variables and chi-square tests for categorical variables were used for the comparisons of the rates of preeclampsia and diabetes, and the socio-demographic variables between the matched and the unmatched groups within each specific dataset.
Further, for the matched birth records, we compared the influence of socioeconomic factors (i.e. maternal education, race, and type of health insurance (private or public)) on the reported incidences of preeclampsia or diabetes during pregnancy. The Memorial data contained only the lumped variable of diabetes (gestational diabetes and pre-existing) during pregnancy. Prior to 2006, gestational and pre-existing diabetes were grouped into a single variable in the birth certificate data. After 2006, gestational diabetes and pre-existing diabetes were reported separately. Therefore, we analyzed diabetes over 2001-2005 and conducted sensitivity analysis using 2001-2006 data as well.
Given the potential for underreporting of adverse pregnancy outcomes on birth certificates, we did not specifically examine the agreement of the two databases using parameters such as sensitivity, specificity or Kappas.
Rather, we focused on determining if underreporting of diabetes and preeclampsia would disproportionately affect low socioeconomic status groups when level of maternal education, type of health insurance, and race were used as socioeconomic indicators. In order to explore this, we calculated the ratio of incidences of diabetes (both preexisting and gestational) and preeclampsia using birth certificate data compared to Memorial data. A permutation based statistical test was used to check whether the incidence of preeclampsia and diabetes was significantly underreported on birth certificate data when compared to Memorial data. The socioeconomic variables used for the comparison were those based on birth certificate data, which have been shown to provide reliable information on these variables [32][33][34]. Maternal education level was categorized as either high school or lower, or college or higher. Maternal race was categorized as Asian, Black, Hispanic, Non-Hispanic White, or Other, and insurance type was defined as either public or private.

Characteristics of study population
After removing multiple births (e.g. twins and triplicates) and the records with missing matching variables, we obtained 66,352 and 71,512 records in the Memorial and the birth certificate data, respectively. After excluding these records, a total of 62,200 records (93.7 % of Memorial data and 87.0 % of birth certificate data) were matched and subsequently used in our analysis ( Table 1). The exact match on all of the four variables accounted for 95 % of all the matched records. Sensitivity analysis was conducted on the partial match on mother's name using different lengths of the sub-string of first and last names (from left to right: 2 to 15 characters). Since matched by the first three characters in both the first name and the last name of the mother the majority of records were matched by exact names, this only slightly changed the number of the final matched records (data not shown). Hence, we only reported the results based on the matching criteria listed above. Summary statistics showed some differences in the characteristics of the study population between the Memorial and the birth certificate data and between the matched and unmatched records in each specific dataset ( Table 2). The majority of women were between the ages of 30 and 39, had private health insurance, and had college or higher levels of education than did high school or lower (maternal education was only available on the birth certificate data). Hispanic and Non-Hispanic White made up the two largest categories in both the Memorial and the birth certificate data and in the unmatched and matched records in each dataset. Compared to the matched group, the unmatched groups on average had higher rates of preeclampsia and diabetes, were older, had lower educational attainment, and had a higher percentage of Hispanic women and a lower percentage of Non-Hispanic White and Black women in both the Memorial and the birth certificate datasets ( Table 2). Since we had a large sample size (N = 4162 to 62200), the majority of t-tests and chisquare tests showed statistically significant results when comparing the matched and the unmatched group in each dataset. Nevertheless, the pattern of difference in the unmatched vs. matched group was consistent in the Memorial data and the birth certificate data.

Preeclampsia: 2001-2006
It was found that the birth certificate data significantly underreported the incidence of preeclampsia when compared to the Memorial data (1.38 % vs. 3.13 %) ( Table 2) (p < 0.01). The underreporting in birth certificate data was also observed in subcategories of the socioeconomic variables we examined.

Percentage of preeclampsia by maternal education
Of the mothers with known education level, the birth certificate data showed that the incidence of preeclampsia was significantly higher among mothers with education levels of college or higher compared with mothers with education levels of high school or lower (1.46 % vs. 1.24 %; p = 0.02) ( Table 3). No difference was observed in the Memorial data. Nevertheless, the degree of underreporting of preeclampsia using birth certificate data compared to Memorial data was significantly higher among women with lower education levels (p = 0.01).

Percentage of preeclampsia by race
Both the Memorial and birth certificate data indicated the highest rate of preeclampsia in Black women (4.12 and 1.90 %, respectively) among the four major race/ethnicity categories. Within the Memorial data, Asian and Hispanic women had significantly lower rates of preeclampsia than Black women (p < 0.01). The same held true for the birth certificate data. In addition, Hispanic women had significantly lower rates of preeclampsia than Non-Hispanic White women (p = 0.03) in the birth certificate data, while Asian women had significantly lower rates of preeclampsia than Non-Hispanic White women (p = 0.01) in the Memorial data.
Compared to the Memorial data, the birth certificate data showed a significantly higher degree of underreporting of preeclampsia in Hispanic women compared to Non-Hispanic White women (p = 0.01). No significant difference for the degree of underreporting in birth certificate data was observed between the other race/ethnicity groups.

Percentage of preeclampsia by insurance
The Memorial data indicated a marginally significantly lower rate of preeclampsia in women with private insurance compared to those with public insurance (3.32 % vs. 3.02 %; p = 0.05). A different but insignificant pattern was observed in the birth certificate data. Compared to the Memorial data, the birth certificate data showed a significantly higher degree of underreporting of preeclampsia within the public insurance group compared to the private insurance group (p = 0.02).

Diabetes (2001-2005)
Similar to preeclampsia, the birth certificate data significantly underreported the incidence of diabetes when compared to the Memorial data (1.97 % vs. 5.56 %) (Table 2) (p < 0.01). Memorial data indicated that the incidence rate of diabetes during pregnancy was higher among women with lower socioeconomic status, but the pattern was insignificant in the birth certificate data. Sensitivity analysis showed similar results based on data in the 2001-2006 period (results not shown).

Percentage of diabetes by maternal education
No significant patterns were observed in birth certificate data. However, of the women whose educational level was known, the Memorial data indicated that the incidence of diabetes was significantly higher among women with levels of education of high school or lower compared to college or higher (6.18 % vs. 5.18 %; p < 0.01) ( Table 4). The birth certificate data showed a similar but insignificant pattern. Furthermore, the degree of underreporting of diabetes using birth certificate data compared to memorial data was significantly higher among women with lower education levels (p = 0.01).

Percentage of diabetes by race
The birth certificate data found that the incidence of diabetes was highest among Asian women (2.65 %) and lowest among Non-Hispanic White women (1.54 %), while the Memorial data indicated the highest incidence rate among women of race/ethnicities classified as "other" . For both the Memorial and birth certificate data, the difference in incidence rates was significant (p < 0.01) between all the race/ethnicity group pairs except Asian and Hispanics. Furthermore, the degree of underreporting using birth certificate data compared to Memorial data was significantly higher among Hispanic women compared to Non-Hispanic White women (p < 0.01).

Percentage of diabetes by insurance
Memorial data found that the incidence of diabetes was significantly higher among women with public health insurance compared to women with private health insurance (6.25 % vs. 5.25 %, p < 0.01). The birth certificate data showed a similar but insignificant pattern. The degree of underreporting of diabetes using birth certificate data compared to Memorial data was significantly higher among women with public insurance compared to those with private insurance (p = 0.05).

Discussion
We found that the birth certificate data that was analyzed significantly underreported the incidences of preeclampsia and diabetes compared to the Memorial data, which was collected specifically for research purposes. In addition, the degree of underreporting was disproportionately distributed across groups of different socioeconomic status, with certain socioeconomic indicators exhibiting higher degrees of underreporting. The degree of underreporting of both preeclampsia and diabetes using birth certificate data was significantly higher women with lower education levels compared to women with higher levels of education, in Hispanic women compared to non-Hispanic White women, in women with public insurance compared to those with private insurance. These results indicate a disparate underreporting problem in low socioeconomic groups for pregnancy complications when race, level of education, and insurance status are used as socioeconomic indicators.
Several reasons may exist to explain the discrepancies between our two datasets, such as the weaknesses of birth certificate data discovered by several other studies. These include inadequate auditing of birth certificate data by individual hospitals, variations in data collection, diagnosis, and reporting procedures across hospitals, the use of nonclinical or untrained personnel to record data, and budgetary restrictions that prevent state agencies from thoroughly assessing and ensuring the quality of The classification of socioeconomic status was based on variables reported in the birth certificate data *Statistically significant result at an alpha level of 0.05 birth certificate data [8,10,[35][36][37][38]. The Memorial data, in turn, may have had better quality on pregnancy complications because it was a research database that underwent more stringent quality checks by nurses. Although the Memorial database is not a gold standard, it is believed that it is more accurate than birth certificate data because information is recorded when patients are physically present and able to verify records. We attempted to maximally match the Memorial and birth certificate records. However, there were still 8.47 % and 5.74 % of the Memorial records that could not be matched to the birth certificates due to missing matching variables (i.e. mother's name and mother's date of birth) and likely moderate to serious problems in misspelling of the names, respectively. We found differences in the matched and unmatched groups in the rates of preeclampsia and diabetes and in socio-demographic parameters. But since the patterns of difference in the matched and the unmatched groups was consistent between the Memorial data and the birth certificate data, we do not expect it to change our main conclusion of the underreporting problem in the birth certificate data.
Since the Memorial data did not differentiate between gestational diabetes and diabetes when reporting pregnancy outcomes, it was not possible to investigate gestational diabetes specifically in this study. However, because the birth certificate data began reporting gestational diabetes separately starting in 2006, we were able to perform two separate analyses on diabetes using 2001-2005 and 2001-2006 as periods of interest. Both periods showed the same patterns of underreporting of diabetes, and we thus suspect that our findings on total diabetes likely hold for gestational diabetes as well. It has been estimated that approximately 90 % of pregnancies that are complicated by diabetes mellitus represent women with gestational diabetes mellitus [39]. In the 2006 birth certificate data in California, we observed that gestational diabetes accounted for 72 % of the total diabetes (Table 2). This high degree of overlap between gestational diabetes and diabetes during pregnancy further suggests that the findings of this study regarding diabetes in general may also be applicable to cases of gestational diabetes in particular.
These results are consistent with previous studies, particularly those that determined that birth certificates are not reliable sources of information regarding preeclampsia, gestational diabetes, and other maternal complications and characteristics, particularly when compared to hospital discharge records [7][8][9][10][11]. Thus, our conclusion that the birth certificate database used in this study underreported the incidence of preeclampsia and gestational diabetes is supported by similar patterns found elsewhere in the United States. However, to our knowledge this is the first study to assess the reliability of hospital data and birth certificates in southern California, and the first to address differential reporting of preeclampsia and diabetes during pregnancy by socioeconomic status in the United States. The socioeconomic differences seen in the underreporting of preeclampsia and gestational diabetes as specific outcomes of interest is a unique observation that has not been studied in southern California. However, similar results have been found by studies that have analyzed related, though not identical, variables elsewhere in the United States. Most notably, a study on the sensitivity of birth certificate reports of birth defects in Atlanta found that Non-Hispanic Black maternal race/ethnicity and maternal education levels lower than high school were independently associated with a lower probability of a birth defect diagnosis being reported on a birth certificate [9]. The authors of this study hypothesized that this observation might be explained by disparities in access to healthcare, as well as variations in personnel and birth certificate completion procedures across hospitals. Although our study did not analyze birth defects, the underreporting of adverse pregnancy outcomes we found according to racial and education level factors followed a similar pattern and can be explained by the same observations. Further research must be performed to elucidate an explanation for the poor reliability of this particular set of birth certificate data for pregnancy complications, as well as the observed socioeconomic gradient in underreporting of such outcomes. Nevertheless, these findings have important implications for future public health research. Studies that rely solely on birth certificate data to draw conclusions regarding pregnancy complications should be aware of a potential bias towards underestimating the incidence of these conditions, particularly in low socioeconomic groups. This is critical for the descriptive study of socioeconomic disparities in pregnancy complications, and might contribute to explain why discrepant results were reported in the past [17][18][19][20][21][22][23][24][25][26][27][28], beside any true difference in disparities across study settings. Such biases are also critical for etiologic research studying the relationships between pregnancy complications and potential risk factors, especially when these are unevenly distributed according to socioeconomic status. For instance, exposure to most air pollutants (e.g., primary particles from road traffic) is typically higher in populations with low socioeconomic status than in better-off ones [40,41]. In such a situation, a higher underreporting of maternal complications in populations with lower socioeconomic status would create a downward bias while measuring the association between air pollution and pregnancy complications. Consequently, researchers should attempt to use high quality health outcome data such as the Memorial database, either in place of or in conjunction with birth certificate data, whenever possible in order to minimize bias.
Furthermore, these findings indicate that there is a considerable need to improve the quality of birth certificate data in California, as far as pregnancy complications are concerned. There is a possibility that the quality of birth certificate has improved since 2006, the last year of this analysis. It would be beneficial to assess the quality of current birth certificate data in order to identify areas that still require improvement. However, historical birth certificate data are still of high importance for research studies that examine the impact of in-utero exposure on various long-term health effects (e.g. cognition and school performance in children, obesity, cardiovascular diseases etc.). Standardizing data collection and reporting procedures across hospitals would help minimize the discrepancies seen between birth certificate data and hospital databases such as the Memorial database. Because diabetes and preeclampsia are conditions that are oftentimes diagnosed prior to delivery and not at the hospital of delivery, there is also a need to improve the integration of prior medical records from other sources with hospital and birth certificate records. What is more, the fact that the birth certificate data underreported both preeclampsia and diabetes and did so to a higher degree among groups of lower socioeconomic status suggests that it would be most effective to focus standardization efforts on these particular conditions and among these identified groups, including Black and Hispanic women, women with lower levels of education, and women with public insurance. Finally, the most disadvantaged women may not have access to health care; thus improving health care access for low-income and minority people may also improve the reporting of pregnancy complications.

Conclusion
In summary, this comparison of two birth record databases found that the Memorial database is a more reliable source of information than birth certificate data for analyzing the incidence of preeclampsia and gestational diabetes among women in Los Angeles County. This is especially true for subpopulations of lower socioeconomic status. Efforts to improve the available sources of data for the study of adverse pregnancy outcomes should thus focus on improving the reliability of birth certificate data, particularly for women of lower socioeconomic status.