Identifying urban built environment factors in pregnancy care and maternal mental health outcomes

Backgrounds Risk factors related to the built environment have been associated with women’s mental health and preventive care. This study sought to identify built environment factors that are associated with variations in prenatal care and subsequent pregnancy-related outcomes in an urban setting. Methods In a retrospective observational study, we characterized the types and frequency of prenatal care events that are associated with the various built environment factors of the patients’ residing neighborhoods. In comparison to women living in higher-quality built environments, we hypothesize that women who reside in lower-quality built environments experience different patterns of clinical events that may increase the risk for adverse outcomes. Using machine learning, we performed pattern detection to characterize the variability in prenatal care concerning encounter types, clinical problems, and medication prescriptions. Structural equation modeling was used to test the associations among built environment, prenatal care variation, and pregnancy outcome. The main outcome is postpartum depression (PPD) diagnosis within 1 year following childbirth. The exposures were the quality of the built environment in the patients’ residing neighborhoods. Electronic health records (EHR) data of pregnant women (n = 8,949) who had live delivery at an urban academic medical center from 2015 to 2017 were included in the study. Results We discovered prenatal care patterns that were summarized into three common types. Women who experienced the prenatal care pattern with the highest rates of PPD were more likely to reside in neighborhoods with homogeneous land use, lower walkability, lower air pollutant concentration, and lower retail floor ratios after adjusting for age, neighborhood average education level, marital status, and income inequality. Conclusions In an urban setting, multi-purpose and walkable communities were found to be associated with a lower risk of PPD. Findings may inform urban design policies and provide awareness for care providers on the association of patients’ residing neighborhoods and healthy pregnancy. Supplementary Information The online version contains supplementary material available at 10.1186/s12884-021-04056-1.


Background
The built environment, referring to the surroundings and physical artifacts of where humans live, is considered to be one of the five major social determinants of health (SDoH) [1]. The built environment is strongly associated with our way of life through determining the housing quality, mode of transportation, and exposure to pollutants, among others. Poor built environment has been reported to lead to adverse effects on physical and mental health by disrupting sleep, hindering healthy lifestyles, and lowering access to healthcare [2][3][4][5]. There is a gender difference in the association between the built environment and health. For example, an increased risk of depression among female was reported by Mulling et al. to be associated with living in an underdeveloped neighborhood characterized by inadequate sewer treatment, water supply, and dependable supply of electricity [6]. In addition, the Chicago Community Adult Health Study found the women's use of preventive care to be associated with objective and perceived neighborhood support and stressors such as odors, presence of trees, and noise levels [7].
The existing literature motivated this study to examine the impact of the built environment on health and healthcare utilization among women, and particularly, pregnant women as a population to further investigate [8][9][10]. Levels of prenatal care vary across the United States [11][12][13]. A substantial proportion of pregnant women, especially those with a higher comorbidity burden or low health literacy, seek and depend on care provided by emergency departments (ED) rather than primary and obstetric care [13][14][15]. The lack of adequate prenatal care is considered a risk factor for poor pregnancy outcomes and lack of proper postpartum care for mothers and infants [16]. Previous studies have studied the built environment on maternal health and birth outcomes including birth weight, gestational age, Apgar score, and newborn intensive care unit admission rates [5,17]. Yet, evidence is still accumulating on how the built environment affects the variability in prenatal care and maternal mental health outcomes. In particular, few studied the concurrent impacts of prenatal care and built environment on mental health outcomes [18][19][20][21][22]. Existing studies have also commonly relied on the subjective perceived measures obtained from interviews and questionnaires to define postpartum depression (PPD), thus limiting larger-scale analysis [7,18,19,23].
A study conducted in Mexico found that an increase in average particulate matter ≤ 2.5 μm in diameter (PM 2.5 ) exposure during pregnancy was statistically associated with an increased risk of PPD at six months and also for the late-onset PPD [18]. Likewise, in a US-based cohort, Sheffield et al. discovered that increased PM 2.5 exposure in mid-pregnancy was associated with higher anhedonia and depressive symptoms specifically in Black women [19]. He et al. found that pregnant women with no prior history of mental illness, exposed to noise, specifically night-time noise, have a higher risk of hospitalization for depression and other mental health disorders later in life [20]. In addition, Crockett et al. showed that the lack of public transportation acted as a barrier in accessing care for rural low-income African American pregnant women at risk for PPD [21]. Similarly, a study involving a home-based intervention for depression in low-income mothers noted that residing in neighborhoods with poor housing, higher crime rates, lack of essential resources increased the notion of uncertainty in life and participants required more encouragement to remain in the study [22].
In this study, based on existing evidence above, we hypothesize that the built environment, through a wide range of measures influencing the accessibility to the transportation system and infrastructure elements, green space, and other urban structure, is associated with variability in prenatal care and subsequent maternal mental health outcomes. Given findings from previous literature on the impact of the built environment on women's mental health and use of healthcare, we defined PPD as our primary outcome [24]. PPD has been associated with increased infant mortality, higher rates of hospitalizations, impaired mother-child attachment, developmental problems in children, and increased stress within families [25][26][27][28]. The plethora of physical and psychological effects of PPD reported in previous studies include postpartum weight retention, reduced physical health, bodily pain, anxiety, low self-esteem, risky addictive behavior of substances, and suicide ideation [29]. The biological risk factors of PPD include genetic factors, age, pregnancy complications, medical illness, and smoking during pregnancy [4,[30][31][32]. The social, cultural, and environmental risk factors include income status, domestic violence, lack of social support, quantity and quality of green spaces, and residential noise pollution [31,[33][34][35][36][37].
We tested our hypotheses by linking patients' health data extracted from de-identified electronic health records (EHR) with publicly available census-tract level data on the built environment. Routinely collected from clinical encounters, EHR data capture detailed longitudinal health data on health and health service utilization. Increasingly, EHR data have been used as a source of longitudinal data in population health studies for its ability to provide detailed and rich health information within patient cohorts [38]. Leveraging a large cohort of nearly 9,000 women in New York City from 2015 to 2017, we applied machine learning algorithms to EHR data to identify patterns in prenatal care [39]. We then evaluated the relationships among prenatal care patterns, PPD incidence, and the built environment using structural equation modeling [40]. The association found may inform patients, care providers, and public health policymakers in supporting a healthy pregnancy and new motherhood through a better understanding of the built environment as a modifiable social determinant of health.

Methods
Study setting EHR data EHR data on 8,949 pregnant women from an urban academic medical center from 2015 to 2017 were extracted. The cohort inclusion and exclusion criteria are described in Fig. 1. We excluded patients whose ages were below 18 or above 45, had no encounter recorded in the EHR from 1 year prior to pregnancy to 1 year after delivery, or missing home locations information. We extracted patient information including gender, age, race, ethnicity, body mass index (BMI), marital status, outpatient and inpatient diagnoses, outpatient and inpatient prescription medication orders, and corresponding encounter dates from the EHR data. Patient age was calculated as the time difference between the birth date and first prenatal checkup date. The gestational week was calculated using the date of delivery and the specific gestational age at prenatal checkups. Marital status was defined as single (single, divorced, widowed, unknown), and married, as extracted from unstructured clinical notes using regular expression. The trimester of each event was determined using the difference in time between each event and delivery. All diagnoses were represented as Systematized Nomenclature of Medicine-Clinical Terms (SNOMED-CT) codes [41]. Anatomical Therapeutic Chemical (ATC) Classification System was used to standardize the specific drug prescription and dosage information [42]. The primary outcome of PPD was defined as having at least one diagnosis of depression within 1 year after childbirth based on SNOMED codes (see Additional file 1).
Built environment data Accessibility to public and active transportation, and green spaces Three indicators were defined to measure the accessibility to public and active transportation facilities [43][44][45][46] within a 500-meter radius [47]: the number of bus stops, the number of subway stations within, and the length of bike paths within. The spatial data on public transportation and bike facilities were obtained in shapefile formats from New York State [48]. We used ArcGIS 10.6 spatial analysis tools to count the number of bus stops and subway stations within each 500-meter radius around each patients' home location and also to measure the length of bike paths within the 500-meter radius. Access to green spaces, as defined by the City of New York under recreation land use, were calculated using the green areas that fall within the 500-meter buffer.

Exposure to traffic
We obtained traffic data from the New York activitybased travel demand model referred to as "New York Best Practice Model (NYBPM)" [49]. The model predicts daily traffic volume in each roadway link for the different types of vehicles by two categories: light-(passenger vehicles and taxis) and heavy-duty (buses and trucks) vehicles for their different levels of health impacts [50]. The vehicle kilometer traveled (VKT) as an indicator for travel activity within the 500-meter radius was then calculated. The buffer was chosen since both monitoring [51] and simulation [52] studies have shown that vehicle pollution concentration reaches the background level. VKT is calculated by multiplying traffic volume by the distance of travel, representing the amount of traffic activity.

Land use
Four indicators were defined to measure the role of land use: entropy-based land use mix (LUM) index, retail floor area ratio (RetFAR), street connectivity, and sidewalk availability. The variables measure the availability and variety of land use types (i.e., type of activity) within 500 m of the subject's home location. The land use data including information about land use class and parcel area at the parcel level were extracted from the parcel shapefile obtained from New York State [48]. The LUM index within a 500-meter radius measures the heterogeneity of land use, such as residential, commercial, retail, and industrial, within the radius [53]. The LUM index ranges between 0 and 1, where 0 represents homogeneity and 1 represents maximum heterogeneity [53]. Higher LUM values indicate higher walkability of the area. The RetFAR is the retail building floor area divided by the retail land area within the 250-m radius [53]. RetFAR is indicative of pedestrian-orientated design and higher walkability. Examples with higher and lower RetFAR are multi-floor departmental stores and open-style outlets, respectively. The number of intersections within the 500meter radius is another land use indicator used to measure the walkability of the neighborhood [54]. The number of intersections was extracted from the transportation network developed for the NYBPM travel demand model. To calculate the sidewalk area, as a measure of access to walking facilities, within the 500-meter radius, we used the sidewalk shapefiles [49].
Air pollution PM 2.5 and ozone (O 3 ) concentrations at the census tract level for the period of 2015-2017 were obtained from the Center for Air, Climate and Energy Solutions which applied Land Use Regression (LUR) models to estimate every subject's exposure to air pollution [55]. PM 2.5 and O 3 together could represent both regional background and hotspot air pollution levels.

Other SDoH
Lastly, SDoH information at the census-tract (11-digit Federal Information Processing Standard code) level was extracted using the FACETS dataset [56]. Variables used in the analysis included census-tract level average percent of college degree education, GINI index, and uninsured percentage from American Community Survey, a binary indicator of low access to healthy food within half-mile from the Food Access Research Atlas, United States Department of Agriculture, the populationweighted distance to closest 7 parks from the Centers for Disease Control and Prevention, and lastly walk score scales the from Rundle-Columbia Built Environment and Health Research Group.

Patterns of prenatal care
We extracted the health and healthcare utilization information during the prenatal period for each patient from the EHR data. Patients who had similar overall prenatal care patterns were categorized into clusters as having experienced generally similar prenatal events. The similarity between pairs of patients was measured using the longest common subsequence (LCS) distance. LCS measures the longest overlap that 2 sequences have in common; thus, larger LCS indicates a more similar course of the clinical events. In this study, we compared the sequence of each patient's clinical events (e.g., encounters, diagnoses, prescription medications) to others in the cohort to generate pairs of LCS distances. Based on the similarity, the categorization of patients was performed using the hierarchical clustering algorithm, a well-established machine learning method for detecting underlying clusters in a population [39]. The final number and size of the clusters were determined using the Silhouette value [39]. This method was previously used to mine EHR data to identify health and healthcare utilization patterns among patients with chronic kidney disease, heart failure, and undifferentiated abdominal pain [39,57,58]. Because of the large number (n > 6,000) of unique clinical events recorded in the EHR data, we limited the pattern mining to focus on variables that were found to be most predictive of PPD in a related work preparatory to this study [59]. The list of variables, including complications during pregnancy and medication usage, are shown in Additional file 2. The cluster analysis was done in Python 3.6.5 and R 4.0.0.
Within each cluster, we applied a well-established sequence mining algorithm, Sequential Pattern Discovery using Equivalent Classes (SPADE) algorithm, [60] to discover and visualize common patterns within each cluster identified above. These patterns include sequential events that are shared by a large enough portion of the patients. For the implementation of SPADE, we used the "arulesSequences package" in R version 3.4.3 (R Foundation for Statistical Computing, Vienna, Austria).

Statistical analysis
The distribution of study variables described in sections EHR Data and Built Environment Data (Table 1) were assessed within each identified cluster. Multivariate Imputation by Chained Equations (MICE) was used to address the missing value issue [61]. We further studied the relationship between prenatal care, as reflected by the cluster membership, the built environment characteristics, and incidence of PPD using structural equation models (SEMs) [40]. Two SEMs were constructed for the primary and secondary outcomes separately. All independent variables were considered, but removed if there was multicollinearity as determined by variable inflation factor larger than 10. Statistical analysis was done using Stata/IC 16.0 and R 4.0.0. We applied Chi-square tests for categorical variables and analysis of variance (ANOVA) for continuous variables to compare the differences across clusters. P-value of 0.05 was used as the significance threshold. Table 1 shows the descriptive statistics of the study cohort where continuous variables are presented as mean (standard deviation (SD)), and categorical variables are presented as N (% in total cohort). The average age of our patient population was 33.7 years (SD = 4.59). Nearly half (49.27 %) of the patients were White, and the majority were married (86.7 %) and had Commercial insurances (84.0 %). Over 3 % of the cohort were diagnosed with PPD. A total of 3,900 (43.6 %) and 482 (5.4 %) patients had at least one ED visit pre-and post-delivery.

Results
We identified 3 clusters with 1,934 (cluster 1), 4,129 (cluster 2), and 2,886 (cluster 3) patients, respectively, based on their clinical event sequences. For the primary outcome of PPD, 6.72 % of the women in cluster 1 had a diagnosis of PPD within 1 year after childbirth, which was higher than clusters 2 (2.66 %) and 3 (1.14 %) (P < .001). Table 2 displays the distributions of variables used to determine the clusters. The distributions of variables were all statistically different across clusters except for antidepressant prescriptions and the diastolic blood pressure in the third trimester. Table 3 presents a posthoc analysis of the distribution of demographics, medications, diagnoses, and built environment factors that were significantly different across the three clusters. The mean (SD) age across three clusters were 35.01 (4.73) years, 33.78 (4.29) years and 32.68 (4.66) years, respectively (P < .001). There were more unmarried patients in cluster 1 than the other two clusters (P < .001). In   addition, the number of ED visits in both the pre-and post-delivery periods in cluster 1 was higher (P < .001) than the other clusters. We observed higher rates of prescription medications in cluster 1, such as analgesics, antipyretics and opioids (P < .001). Also, more patients in cluster 1 had complications during pregnancy, unplanned pregnancies, high-risk pregnancy, abnormal glucose level, elderly primigravida and advanced maternal age gravidas than the other two clusters (P < .001). Figure 2 showcases sequential patterns in the prenatal care identified from the study data. Green pathways signify pathways leading to PPD diagnoses, in contrast to blue pathways that do not. The pathways are made up of events (left-hand side) which converge or diverge to other events (right-hand side) before the development, or the absence, of PPD. For example, we found that 45 mothers with PPD had unplanned pregnancies followed by diagnoses of mental health disorders during pregnancy. Also, we observed that 64 mothers who developed PPD had multiple prescriptions of antidepressants during pregnancy. Another pattern indicated 50 mothers who developed, and 569 mothers who did not develop, PPD were prescribed Opioids, Other Analgesics and Antipyretics, and Anti-inflammatory and Antirheumatic Products, non-steroids, before the refill of Antiinflammatory and Antirheumatic Products, non-steroids. Table 4 displays the results from the SEM for the outcome of PPD. Regarding the primary outcome, patients in clusters 1 (odds ratio = 6.3, P < .001) and 2 (odds ratio = 2.43, P < .001) are more likely to have a diagnosis PPD within 12 months after childbirth than women in cluster 3. Relative to cluster 3, patients in cluster 1 are more likely to have patients living in census tracts that have lower PM 2.5 (odds ratio = 0.858, P = .02), lower retail floor area ratio (odds ratio = 0.882, P = .03), lower LUM (odds ratio = 0.508, P < .001), higher GINI (odds ratio = 4.317, P = .002), and higher college degree percentage (odds ratio = 4.401, P < .001). Patients are also more likely to be older (odds ratio = 1.115, P < .001) and not married (odds ratio = 0.404, P < .001). Relative to cluster 3, patients in cluster 2 are more likely to have patients living in census tracts that have lower PM 2.5 (odds ratio = 0.890, P = .03), lower retail floor area ratio (odds ratio = 0.867, P = .001), lower GINI (odds ratio = 0.412, P = .02), and higher college degree percentage (odds ratio = 4.996, P < .001). Patients are also moderately more likely to be older (odds ratio = 1.046, P < .001) and not married (odds ratio = 0.560, P < .001). Race and insurance types (commercial, Medicaid, Other including Medicare) were not significantly associated with the cluster membership in the models although the unadjusted association was significant.
We further contrasted the characteristics of PPD cases across clusters as shown in Additional file 3. The  association between PPD and the built environment factors was examined and shown in Additional file 4. The factors that were significantly associated with increased risk for PPD were the number of intersections within a 500-meter radius, the number of bus stops within a 500meter radius, and retail floor area ratio, while adjusting for GINI index for income inequality which were also significant in the model.

Discussion
There were two major findings in this study. Three clusters of prenatal health and healthcare utilization patterns were discovered from a cohort of women whose pregnancies were managed entirely or partially in an urban academic medical center from 2015 to 2017. The distribution of the primary outcome, PPD, was significantly different across the clusters. Clinically, the clusters differed in maternal age, BMI, marital status, medication use, chronic conditions, and complications during pregnancy. In addition, we found that the cluster membership was associated with built environment factors related to walkability, access to retail resources, air quality, and neighborhood income equality. These findings contribute to the growing body of evidence that the built environment in the community confers an impact on the trajectories of health and health service utilization during pregnancy. The associations found between retail, land use and the study outcomes among the pregnant cohort are novel and important contributions to the literature. The mixed land use and more retail access may be a proxy for the connectedness of the neighborhood in providing community support to women. These community resources potentially lead to increased opportunities for social contact, lower stress levels, and higher physical activity levels, which is consistent with previous literature tying maternal mental health to green space [9,10]. Air quality has been linked with adverse birth outcomes including preterm birth and miscarriages in previous literature [9]. However, we found that lower PM 2.5 concentration to be associated with clusters with higher PPD incidences in contrary to previous literature. In our urban study setting, PM 2.5 concentration is highest in the most affluent area and becomes lower as we move out to other parts of the study setting. Therefore, our findings on the association of poor air quality with higher incidence of PPD potentially reflect patient cohorts who are predominantly in or outside the most affluent part of the city who have better access to mental health reporting and care. Patterns learned from this study may inform expecting and new mothers, their care providers, as well as guideline and policymakers, to better prepare and navigate pregnancy and postpartum care. Additionally, our findings may have implications for policies during the current COVID-19 pandemic as our communities and their stores face significant changes.
There are limitations to the study. All diagnoses in the study were defined using diagnostic codes. Therefore, missed and under-diagnosis of health conditions during pregnancy, including PPD, is a crucial limitation. It is possible that this study missed PPD patients who did not disclose symptoms due to stigma against mental health, and patients who were diagnosed outside of our health system. The underdiagnosis and misdiagnosis may be more prevalent among women who live in lowincome neighborhoods. Some of these limitations may be addressed in future work by patient interviews and questionnaires, and prospective cohort studies. Additionally, the application of natural language processing on unstructured clinical notes may allow us to elicit underdiagnosed and missed PPD as well as other conditions. Moreover, we were not able to address the possible reporting bias in our study population with respect to information such as race and marital status. Nearly 15 % of the racial information was unknown from the EHR data. Future studies may explore the leveraging of patient-reported outcome data in overcoming this limitation. Furthermore, in analyzing the medication data, we did not consider the dose-response relationship between medications and the outcome as prescription fill information was not available. Detailed medication dose and frequency information can be analyzed in future work if pharmacy claims data become available. PPD cases across the clusters did not differ significantly in our analysis, but this may be due to the small sample size. Applying our methods to a larger patient cohort may allow us to further delineate the association of the built environment and potentially different PPD patient types. Lastly, while this study used data from a single health system in NYC, further work will aim to validate our findings using EHR data from other institutions and across different cities in the US.

Conclusions
We found that the built environment quality is associated with variability in prenatal care and maternal mental health outcomes in a large retrospective cohort study using EHR data. Built environment qualities that were identified in a structural equation model include LUM and RetFAR as indicators for walkability and street connectivity, and air quality. Findings from this study may inform healthcare providers and public health policymakers in understanding modifiable risk factors such as isolation that are associated with poor pregnancy care and outcomes.