Exploring the accuracy of self-reported maternal and newborn care in select studies from low and middle-income country settings: do respondent and facility characteristics affect measurement?
BMC Pregnancy and Childbirth volume 23, Article number: 448 (2023)
Accurate data on the receipt of essential maternal and newborn health interventions is necessary to interpret and address gaps in effective coverage. Validation results of commonly used content and quality of care indicators routinely implemented in international survey programs vary across settings. We assessed how respondent and facility characteristics influenced the accuracy of women’s recall of interventions received in the antenatal and postnatal periods.
We synthesized reporting accuracy using data from a known sample of validation studies conducted in Sub-Saharan Africa and Southeast Asia, which assessed the validity of women’s self-report of received antenatal care (ANC) (N = 3 studies, 3,169 participants) and postnatal care (PNC) (N = 5 studies, 2,462 participants) compared to direct observation. For each study, indicator sensitivity and specificity are presented with 95% confidence intervals. Univariate fixed effects and bivariate random effects models were used to examine whether respondent characteristics (e.g., age group, parity, education level), facility quality, or intervention coverage level influenced the accuracy of women’s recall of whether interventions were received.
Intervention coverage was associated with reporting accuracy across studies for the majority (9 of 12) of PNC indicators. Increasing intervention coverage was associated with poorer specificity for 8 indicators and improved sensitivity for 6 indicators. Reporting accuracy for ANC or PNC indicators did not consistently differ by any other respondent or facility characteristic.
High intervention coverage may contribute to higher false positive reporting (poorer specificity) among women who receive facility-based maternal and newborn care while low intervention coverage may contribute to false negative reporting (lower sensitivity). While replication in other country and facility settings is warranted, results suggest that monitoring efforts should consider the context of care when interpreting national estimates of intervention coverage.
The vast majority of maternal and newborn deaths occur in settings characterized by the least amount of data on intervention coverage and quality of care [1, 2]. Accurate data on effective intervention coverage, the proportion of individuals experiencing health gains from a service among those who need the service, is key to monitoring and scaling up the delivery of essential interventions to populations in need [3, 4]. Intervention coverage data is routinely used to track progress in national and global commitments such as Sustainable Development Goal 3—which includes a target to reduce the maternal mortality ratio to less than 70 per 100,000 live births, Countdown to 2030, as well as WHO strategies Ending Preventable Maternal Mortality (EPMM) and the Every Newborn Action Plan (ENAPP) [5,6,7,8]. In response to evidence that intervention coverage indicators may overestimate progress due to poor content of care[9,10,11,12], strategies have shifted emphasis from monitoring health care access to quality adjusted coverage [4, 13].
In resource-limited settings, data on the coverage of maternal and newborn health interventions often relies on women’s reports collected in nationally representative household surveys such as the Demographic and Health Surveys (DHS) and Multiple Indicator Cluster Surveys (MICS) . Self-reported data from population-based surveys, however, assumes that women accurately recall interventions received during the antenatal, intrapartum, and postnatal periods. A growing number of studies have assessed the validity of self-reported maternal and newborn care interventions used (or with the potential to be used) in these surveys [15,16,17,18,19,20,21,22]. Collectively, evidence from these studies demonstrate considerable variability in indicator validity across settings (accuracy metrics defined in Fig. 1), leading to the question of why? [15,16,17, 19, 23].
The validity of self-reported data on maternal and newborn health interventions received may be influenced by a variety of factors. These include women not knowing whether an intervention occurred because they were not aware it was performed (i.e., it was not explained, or in the case of newborn interventions, it was performed outside of the mother’s view). Recall may also be influenced by the nature and timing of questions. Prior research on maternal recall of interventions received in the intrapartum period has found that indicators which include technical terms (e.g., names of medications or diseases), refer to the timing (e.g., whether an intervention occurred immediately or within the first few minutes after birth) or sequence of events (e.g., whether infant wrapped before being laid on the mother’s chest) or are performed within the first hour of birth were unlikely to be recalled with high accuracy [15,16,17, 24]. The recall period may also influence reporting. Both DHS and MICS surveys typically ask women to recall events related to births occurring 2–5 years prior. Previous analysis of the recall accuracy of intrapartum and immediate postnatal care in Kenya has suggested that while accuracy generally declines with time, select interventions that are recalled with high accuracy at facility discharge maintain acceptable accuracy at 13 to 15 months follow-up .
Question comprehension related to respondent background characteristics or their expectation of care may also influence recall accuracy. For example, if a woman had a positive experience and/or delivered in a facility perceived to be high quality, she may be more likely to indicate that an intervention assumed to be beneficial occurred. Background characteristics may influence reporting if lower education contributes to poor understanding of questions with technical or complex wording or if higher parity leads to confusion of care with a previous pregnancy. Adolescents may have lower reporting accuracy because they have had less familiarity with the health system. Accurate coverage estimates among adolescents is of particular importance given that infants born to nulliparous adolescent mothers are at higher risk of neonatal and infant mortality than any other maternal age group, are more likely to delay care seeking, and receive fewer components of maternal health care [25,26,27,28], which further emphasizes the need for accurate coverage estimates in this group.
Explanatory analyses to examine patterns in the accuracy or consistency of reporting by respondent characteristics (e.g., age, education or prior parity) or by infant or facility characteristics have varied by indicator and setting, making it difficult to discern broad patterns [18, 29,30,31]. One study in rural Nepal found that maternal age and place of delivery (facility vs. home) did not influence maternal reporting of infant outcomes; while accuracy related to infant birth size was higher among multiparous mothers . Another study of intrapartum care recall among mothers in Ethiopia found that older women (ages 35–39 relative to ages 10–24) were more likely to report postpartum complications inconsistently while those who delivered in a health facility were more likely to inconsistently report on newborn immediate thermal care practices . A third study which assessed recall of facility-based postnatal care interventions among women in Kenya, found no pattern in reporting accuracy by maternal age, education, parity or infant age . Overall, heterogeneity in the types of indicators assessed, study methodology, question wording and limited sample sizes for subgroup analysis in some studies complicates collective understanding.
To better inform how respondent and facility characteristics that influence the accuracy of self-reported maternal and newborn care, we synthesized data from five previous validation studies conducted in low and middle-income country settings. Studies were purposely sampled due to known similarities in question wording and validation design. Using these data, we examine whether respondent characteristics (e.g., age, education, prior parity), facility quality, or intervention coverage consistently predicted recall accuracy.
We synthesize patterns in reporting accuracy from a unique set of known validation studies led by the Population Council which used the same validation design to assess comparable indicators of maternal and newborn care in multiple low and middle-income country settings. We draw on five validation studies of maternal and newborn care reported across two publications [18, 32]. Three studies assessed antenatal care indicators (Bangladesh, Cambodia, Kenya) and five studies (Bangladesh, Cambodia, Kenya (2) and eSwatini) assessed postnatal care indicators for the mother and newborn. Studies were purposely selected from two multi-country intervention studies as each study used the same or very similar wording for client questionnaires and observer checklists (Additional Files 1 and 2). Table 1 describes the study context and sample characteristics for each study. In all studies the samples consisted of women of reproductive age who received facility-based care and were interviewed at discharge (exit interview). Women’s self-reports at exit interview were compared against direct observation by a trained third-party observer using a structured checklist (reference standard).
Data on routine postnatal care in Kenya and eSwatini were drawn from the Integra Initiative, a quasi-experimental study which aimed to strengthen provider capacity to give postnatal care to the (1) infant and (2) the mother, integrated with (3) family planning, (4) HIV counseling, testing and services, and (5) screening for and management of sexually transmitted infections . The study population for the Integra study was women who attended a postnatal check for themselves and/or for their newborn (> 24 h to < 10 weeks of delivery) at a participating study facility and who provided informed consent to be interviewed. There were eight facilities (public health units/MCH-FP) in three regions (Lubombo, Manzini and Shiselweni) of eSwatini and 12 public health facilities located in the former Eastern province (present-day Kitui and Makueni counties) in Kenya. In total, matched exit interview and observer data were available for 545 women in Kenya and 319 in eSwatini.
Data on receiving antenatal or postnatal care in Bangladesh, Cambodia, and Kenya were originally collected as part of an evaluation of a voucher and accreditation intervention (henceforth “voucher study”) which assessed whether the voucher program improved service quality by verifying service delivery through reimbursements to providers [34,35,36]. The theory of change was that subsidized service demand stimulates greater service utilization and competition between service providers to improve service quality . Providers were effectively rewarded for quality service delivery through reimbursement of service provision at a contracted level of quality. As such, voucher intervention facilities were used as a proxy for higher quality of care relative to propensity-score matched control facilities. Although voucher intervention status is not a comprehensive measure of facility quality, existing evidence supports the link between such voucher-accreditation approaches with improved facility readiness and quality for reproductive health service delivery [37, 38]. While the influence of voucher schemes on antenatal care has been comparatively less studied, evaluation of the Kenya Safe Motherhood Voucher Scheme found significant improvement in the overall quality in the components of delivered postnatal care relative to comparable control facilities . Evaluation of the Bangladesh voucher scheme also found some evidence of postnatal care service quality improvements among high performing voucher facilities relative to control areas, however, differences in antenatal service quality were less substantial . While the Cambodia scheme was found to increase ANC service utilization, no published findings with regard to quality are available .
Voucher studies in this analysis included a total of 22 government health facilities from six divisions of Bangladesh (Barisal, Chittagong, Dhaka, Khulna, Rajshahi and Sylhet), 40 government facilities from five provinces of Cambodia (Kampong Speu, Kampong Thom, Kampot, Prey Veng, Takeo), and 62 facilities in Kenya, which were a mixture of public (64%), private-for-profit (16%), faith-based (15%) or NGO (5%) and were located in Kisumu, Kiambu, Kitui counties and two informal settlements in Nairobi. Approximately half of facilities in each location were assigned to voucher or propensity-matched control facility status. In total, 3,169 women were interviewed and observed for antenatal care (n = 1,036 in Bangladesh, 957 in Cambodia and 1,176 in Kenya) and 2,462 for postnatal care (n = 208 in Bangladesh, 635 in Cambodia and 1,619 in Kenya).
Indicator selection and data extraction
All comparable indicators with available validation data from at least three studies were extracted, as this was considered sufficient for meta-analysis . For each indicator, two-by-two contingency tables which compared women’s self-report to the observer report (reference standard) were tabulated to obtain the number of true positive, false positive, false negative and true negative responses. “Don’t Know” responses were set to missing for validity analysis but reported in the tables as this response type is distinct from women who think they know whether an intervention was received.
Predictors were defined a priori as maternal age, maternal education, parity, type of facility (whether facilities were voucher accredited or control) and intervention coverage (observed intervention prevalence in each setting). Predictor selection was informed by prior evidence of factors with the potential to influence reporting accuracy. Age strata were adolescent (ages 15 to 20) vs. adult (ages 21–52). The adolescent age group was inclusive of clients aged 20 to maximize sample size for stratification. Prior parity was defined as first pregnancy (for ANC) or birth (PNC) vs. two or more prior pregnancies or births. Education was defined as less than primary completion vs. primary completion or greater. As described above, whether facilities were voucher accredited was used as a proxy for facility quality. Finally, intervention coverage was calculated as the mean observed indicator prevalence in each study.
To examine differences in reporting accuracy by respondent and facility characteristics, forest plots of sensitivity (true positive rate) and specificity (true negative rate) stratified by predictors of interest were examined. As a summary benchmark, high sensitivity and specificity was considered 80% or higher. This threshold was selected based on the empirical distribution of the accuracy of self-reported data related to maternal and newborn care. Stratified forest plots for age are shown, as this was the primary outcome of interest. To statistically test whether predictors were a source of heterogeneity between primary validation studies we used fixed effects and bivariate random effects models, as data allowed.
Bivariate random effects models were constructed when study indicators were validated in at least five studies, the minimum number required for model estimation [41, 42]. A bivariate random effects approach is the standard in diagnostic test accuracy as both sensitivity and specificity are simultaneously estimated, which accounts for the trade-off between sensitivity and specificity . Typically, as a threshold is varied to increase the sensitivity, the specificity often decreases and vice versa . Bivariate models also account for variation in the diagnostic threshold used across studies (e.g., differences in observer ratings due to variation in training procedures or other factors across studies). Bivariate models accommodate study-aggregate covariates (i.e., intervention coverage, i.e., the prevalence of a given indicator in each study) to examine whether the predictor affects sensitivity, specificity, or both. Intervention coverage was examined as a predictor for PNC reporting accuracy only, as a minimum of five studies was required for parameter estimation. Within-study predictors (i.e., individual-level respondent and facility characteristics) predictors are not accommodated in bivariate random effects models and were compared by assessing the degree of overlap in summary estimates (and corresponding 95% CIs) for stratified bivariate models.
As all ANC indicators as well as the predictor facility quality were collected in three studies only, univariate fixed effects models were constructed. Univariate models estimate the diagnostic odds ratio, which describes the odds of obtaining an affirmative response from a respondent who received the intervention compared to a respondent who did not receive the intervention . To assess whether results varied by level of the predictor, overlap in the summary diagnostic odds ratio (DOR) and corresponding 95% CIs for fixed effects models were examined. Univariate fixed effects models do not account for the trade-off between sensitivity and specificity or between-study heterogeneity [41, 42], however, they give reasonably consistent estimates of the DOR irrespective of variation in diagnostic threshold . Given these limitations, the ANC and facility-quality results from univariate fixed effects models are presented in Additional files 3, 4 and 5. Emphasis is given to results from bivariate fixed effects models in the discussion of study results.
Finally, indicators based on a small number of true (observed) positive or true negative cases which resulted in low precision (margin of error greater than 15 percentage points for bivariate models or a diagnostic OR of five or greater for univariate models) are reported in the data tables, but not discussed in the text. Results from the bivariate and univariate models were obtained using the mada package in R Studio (Version 1.1.383, Boston MA) .
Study and sample descriptions
Participant sociodemographic characteristics across studies are presented in Table 2. The pooled sample size for postnatal care was 3,326 women and for antenatal care indicators was 3,169 women. There were comparable indicators with sufficient sample size (no multiple zero cells) in three or more countries for 12 postnatal care indicators and six antenatal indicators.
Among postnatal care clients, mean age was highest in Cambodia (26.8 years) and lowest in Bangladesh (23.8 years). Higher educational attainment (completion of secondary school or more) was greatest among postnatal clients in eSwatini (73.9%) and lowest among participants in the Kenya Integra study (19.1%). Postnatal clients in the Kenya Integra study were most likely to be primiparous (29.9%).
Among antenatal clients, mean age was slightly lower in Bangladesh (23.5) relative to Cambodia (27.8 years) and Kenya studies (25.2 years). On average, a higher proportion of antenatal clients in Bangladesh completed secondary school or more (62.0%) relative to those in Cambodia (38.8%) and Kenya (40.8%). Antenatal clients were most likely to be pregnant for the first time in Bangladesh (39.6%).
Indicator validity across studies
Figures 2 and 3 display PNC indicator sensitivity and specificity across included studies. In general, indicators of PNC had higher sensitivity than specificity. With few noted exceptions, estimates of sensitivity and specificity demonstrated wide variability by study. One of twelve PNC indicators demonstrated a sensitivity of greater than 80% in all five studies—whether the infant was weighed. An additional six PNC indicators had a sensitivity of approximately 80% or higher in three of five studies – blood pressure check, breast exam, abdominal exam, discussion of family planning, infant physical exam (undressed), and discussion of breast/infant feeding. In contrast, no PNC indicator achieved a specificity of 80% or higher in all five studies. All PNC indicators which reflected aspects of the maternal physical exam also achieved a specificity of approximately 80% or more in three of five studies – blood pressure check, breast exam, abdominal exam, vaginal exam, anemia check/referral and whether the provider asked or checked for excessive bleeding. One counseling related indicator – whether dangers signs for the mother were discussed – also achieved a specificity of 80% or more in three of five studies. Few indicators of newborn PNC achieved a specificity of 80% or higher in any one study. No PNC indicator achieved both sensitivity and specificity greater than 80% in more than one study, underscoring considerable heterogeneity in validity results across settings.
ANC indicators performed similarly to PNC with generally high sensitivity across the three settings and variable specificity (Additional File 3). Three indicators of the maternal physical health checks during an ANC consultation had a sensitivity greater than 80% in all three studies: weight taken, blood pressure check and abdominal exam. While no ANC indicator had a specificity of greater than 80% in all three settings, two ANC indicators – urine screen and fetal heart rate monitoring – had a specificity of at least 80% in two or more settings. No ANC indicator had both sensitivity and specificity of 80% or higher in more than one study.
Respondent characteristics: maternal age, education, and prior parity on PNC reporting accuracy
Age-stratified results showed overlap in the 95% CI for sensitivity and specificity between adolescent and adult strata for all PNC indicators across studies (Fig. 4). Across individual studies, there was no clear pattern indicating that adolescent-reported sensitivity and specificity was better or worse than adult reporting. Wide confidence intervals in individual study estimates obscured any significant differences between age groups.
Estimates of sensitivity and specificity across postnatal care interventions (Table 3) stratified by adolescent and adult group also revealed substantial overlap in the 95% CI for all indicators, suggesting no differences by age. Similarly, no systematic differences between stratified bivariate models were observed for the predictors of education or prior parity for any postnatal care indicator examined (Tables 4 and 5).
The same general patterns were observed in univariate fixed effects estimates obtained for ANC indicators by age, education, and parity (Additional files 4 and 5). Although there were some exceptions, for most indicators there were either no differences by subgroup or comparison was not possible due to low precision.
Differences in the accuracy of PNC indicators were inconsistent by facility quality (whether respondents attended a voucher intervention facility or comparable control facility) (Additional file 6). Of eight indicators with reasonable precision for comparison, two indicators differed by facility quality level but in mixed directions. The odds of correct reporting on whether the infant was examined (undressed) was greater among respondents who visited non-voucher facilities (proxy for lower facility quality), while whether information on infant danger signs was discussed was more likely to reported accurately among mothers who attended voucher intervention relative to control facilities (proxy for higher facility quality).
Visual inspection of paired forest plots of sensitivity and specificity for PNC indicators sorted by intervention coverage (Figs. 2 and 3) illustrate that, for most indicators, there is a trend of decreasing specificity (more false positive reporting) with higher levels of intervention coverage across studies. In the forest plots, there is some evidence that indicator sensitivity improves with increasing prevalence (most apparent for indicators of the maternal physical exam), however, this pattern is less strong.
Results of the likelihood ratio test which compared model fit for a bivariate random effects model that incorporated intervention coverage as a study-level covariate relative to an intercept-only model confirmed that intervention coverage significantly explained heterogeneity in reporting accuracy between studies for the majority (9 of 12) indicators (Table 6). Separate tests that examined the influence of intervention coverage on indicator reporting accuracy demonstrate that indicator specificity decreased with higher intervention coverage levels for the majority (8) of indicators, implying greater false positive reporting. Results also show that increased sensitivity was also positively associated with intervention coverage for half (6) of the indicators, implying low false negative reporting. The relationship between intervention coverage and indicator sensitivity and specificity was variable among ANC indicators (Additional File 6), with only three studies per indicator.
We assessed heterogeneity in self-reported antenatal and postnatal care by respondent and facility characteristics using data from five studies across Sub-Saharan Africa and Southeast Asia. Results show that no indicator of antenatal care nor postnatal care achieved a combined high sensitivity and specificity (80% or higher) in more than one study, underscoring variability in validity estimates across settings. We also did not find strong evidence that accuracy in self-reported ANC or PNC care systematically varied by maternal characteristics, such as adolescent vs. adult age, education, parity, or by facility quality. Higher intervention coverage level, however, was associated with reduced specificity (higher false positive reporting) and somewhat improved sensitivity (lower false negative reporting) for most indicators.
That validity did not systematically vary by respondent characteristics or facility quality is perhaps a surprising, although reassuring result in terms of approaches to data collection and indicator construction. Our finding is largely consistent with prior studies of respondent characteristics on reporting accuracy for received maternal health services which have found that associations vary by both indicator and respondent attribute [30, 31, 47]. In addition, no consistent evidence related to facility quality (voucher intervention or control facility) was observed across indicators. This finding aligns with a study which assessed how the accuracy of women’s perceptions of facility quality predicted her choice of where to receive care in informal settlements of Nairobi, Kenya. The study found substantial evidence of ‘information asymmetry’ – that a high proportion of women (two in five) were unable to discern which facilities offered the highest technical quality of care prior to using the facility’s services . It may be that inaccurate perceptions of facility quality explain, in part, why facility quality was inconsistently related to reporting accuracy. It is also possible that women value different aspects of care, including the patient care experience, than those typically emphasized in monitoring efforts . Our measure of facility quality may have been an incomplete proxy for how women perceive quality care.
The finding that higher study intervention coverage (i.e., prevalence) is associated with reduced specificity and somewhat improved sensitivity is also in accordance with prior findings and has important implications for efforts to monitor maternal and newborn quality of care. While sensitivity and specificity are independent of prevalence in their mathematical calculation, several studies and reviews have suggested an association . A study by Carter and colleagues, which assessed the reliability of maternal recall of delivery and immediate newborn care indicators in Nepal, for example, also documented an inverse association between indicator specificity and higher intervention coverage . This pattern may be the result of reporting biases in the classification of the reference standard (i.e., the observer report) and/or in women’s self-reports (the ‘test’) . For example, it is possible that in settings where an intervention is commonplace, respondents are more likely to anticipate that it will occur and in turn respond affirmatively. This type of reporting bias would lead to higher false positive reporting (lower specificity), implying monitoring efforts would overestimate coverage in high coverage settings. A high expectation of care could also imply few false negative reports (high sensitivity), which was observed for about half of the indicators in our analysis. In high coverage settings women who did receive the intervention were unlikely to be undercounted. However, in low coverage settings, underestimation (low sensitivity) may be an issue. For monitoring progress in the quality of maternal and newborn care, the reduced specificity in high prevalence settings and lower sensitivity in low prevalence settings is of public health importance. Although descriptive only, our results suggest that monitoring efforts should consider the context of care when interpreting national estimates and time trends in intervention coverage, as mismeasurement may occur in both directions dependent on setting.
A strength of this study is that we were able to synthesize patterns in reporting accuracy across several studies which used the exact or very similar question wording and recall time by interviewing women at facility discharge for a routine antenatal or postnatal care visit. This addresses the limitations of prior studies on this subject which have not been able to discern patterns across settings and have smaller sample sizes for subgroup analysis. The ability to examine validation results across settings descriptively and with statistical assessment lends robustness to our main findings. However, several important limitations remain. Primarily, few studies have examined the accuracy of maternal reports of antenatal and postnatal care using comparable indicators and it is possible that a relevant study was missed. Further research to examine variability in indicator accuracy across settings is warranted. The few number of studies assessed contributed to low precision in our analysis, particularly for ANC indicators which were only assessed in three studies and used fixed, rather than random effects models. Results from the fixed effects models should be considered exploratory as variability by study, correlation between sensitivity and specificity, and heterogeneity attributed to threshold differences across studies are not accounted for. For example, observer training for what constituted an intervention having taken place may have varied across studies. Further, it was possible to incorporate study aggregate variables (i.e., intervention coverage) only, rather than within-study covariates (e.g., respondent age, education parity) . To assess variability by respondent individual characteristics we used stratification, which reduces precision. For example, the sub-sample of adolescents across studies was relatively small, despite increasing the age category to include respondents aged 20 years. Finally, given data availability, it was not possible to examine facility type (e.g., public sector or not, tier of facility) across studies. This is a topic for future research. We hypothesize that intervention coverage within facility type may, at least in part, contribute to observed differences in validity.
Despite noted limitations, the finding that reporting accuracy does not consistently vary by respondent or facility characteristics is reassuring news for efforts to monitor the quality of maternal and newborn care. Evidence of consistently lower reporting accuracy by respondent characteristics such as adolescent age could, for example, suggest that self-reported data may be insufficient to inform country-level interventions, policies and resource allocation for a group at high risk of adverse maternal or infant health outcomes ([25, 26], and this was not the case. However, study findings do suggest that caution is warranted when interpreting results, obtained by participant self-report, of interventions to improve quality of maternal and newborn care in very low, or alternatively very high, prevalence settings as false negative and false positive reporting may be more likely in either setting. National monitoring efforts should consider the context of care in the interpretation of country estimates of the coverage of self-reported quality of care and triangulate with other available data sources such as facility registries. Further research to validate indicators in additional study settings and which models the extent different intervention coverage levels affect the ability to detect changes in coverage between countries and over time is warranted. With sufficient confidence in such models, adjustment factors could be applied to coverage estimates in global monitoring efforts to account for bias attributed to differences in intervention prevalence. At the very least, caution is warranted in the interpretation of coverage estimates from very high or low prevalence settings.
Results from this study provide no evidence to suggest that self-reported receipt of maternal and newborn health interventions are consistently influenced by respondent characteristics including adolescent vs. adult age group, education, parity or facility quality. Rather, this analysis suggests that accuracy differences across studies is, at least in part, explained by differences in the prevalence of the intervention across settings. This study suggests that high-intervention coverage settings may contribute to higher false positive reporting (poorer specificity) among women who receive PNC care at health facilities and undercount intervention coverage (lower sensitivity) in low prevalence settings. Caution may be warranted when interpreting population-based household survey estimates of quality, or change in quality over time, in very high or very low prevalence settings.
Availability of data and materials
The datasets analyzed during the current study are available from the corresponding author on reasonable request.
Demographic health survey
Multiple indicator cluster survey
Area under the receiver operating curve
Lawn JE, Cousens S, Zupan J. 4 million neonatal deaths: When? Where? Why? The Lancet. 2005;365(9462):891–900.
Day LT, Ruysen H, Gordeev VS, Gore-Langton GR, Boggs D, Cousens S, et al. "Every Newborn-BIRTH" protocol: observational study validating indicators for coverage and quality of maternal and newborn health care in Bangladesh, Nepal and Tanzania. J Glob Health. 2019;9(1):010902.
Munos MK, Blanc AK, Carter ED, Eisele TP, Gesuale S, Katz J, et al. Validation studies for population-based intervention coverage indicators: design, analysis, and interpretation. J Glob Health. 2018;8(2):020804.
Amouzou A, Leslie HH, Ram M, Fox M, Jiwani SS, Requejo J, et al. Advances in the measurement of coverage for RMNCH and nutrition: from contact to effective coverage. BMJ Glob Health. 2019;4(Suppl 4):e001297.
Countdown to 2030 Collaboration T, Requejo J, Victora CG, Amouzou A, George A, Agyepong I, et al. Countdown to 2030: tracking progress towards universal coverage for reproductive, maternal, newborn, and child health. Lancet. 2018;391(10129):1538–48.
United Nations. Transforming our world: the 2030 Agenda for Sustainable Development .:. Sustainable Development Knowledge Platform. 2018. Available from: https://sustainabledevelopment.un.org/post2015/transformingourworld. [Cited 2019 Jan 3].
World Health Organization. Every Newborn: An action plan to end preventable deaths (ENAP). 2014. Available at: https://www.who.int/initiatives/every-newborn-action-plan.
World Health Organization. Towards ending preventable maternal mortality (EPMM). 2021. Available at: https://www.who.int/publications/i/item/9789240040519.
Marchant T, Tilley-Gyado RD, Tessema T, Singh K, Gautham M, Umar N, et al. Adding content to contacts: measurement of high quality contacts for maternal and newborn health in Ethiopia, North East Nigeria, and Uttar Pradesh, India. PLOS ONE. 2015;10(5):e0126840 (Roy JK, editor).
Nesbitt RC, Lohela TJ, Manu A, Vesel L, Okyere E, Edmond K, et al. Quality along the continuum: a health facility assessment of intrapartum and postnatal care in Ghana. PLoS ONE. 2013;8(11):e81089 (Szyld E, editor).
Hodgins S, D’Agostino A. The quality-coverage gap in antenatal care: toward better measurement of effective coverage. Global Health Sci Pract. 2014;2(2):173–81.
Leslie HH, Malata A, Ndiaye Y, Kruk ME. Effective coverage of primary care services in eight high-mortality countries. BMJ Glob Health. 2017;2(3):e000424.
Grove J, Claeson M, Bryce J, Amouzou A, Boerma T, Waiswa P, et al. Maternal, newborn, and child health and the sustainable development goals—a call for sustained and improved measurement. The Lancet. 2015;386(10003):1511–4.
Arnold F, Khan SM. Perspectives and implications of the improving coverage measurement core group’s validation studies for household surveys. J Glob Health. 2018;8(1):010606.
Blanc AK, Warren C, McCarthy KJ, Kimani J, Ndwiga C, RamaRao S. Assessing the validity of indicators of the quality of maternal and newborn health care in Kenya. J Glob Health. 2016;6(1):010405.
Blanc AK, Diaz C, McCarthy KJ, Berdichevsky K. Measuring progress in maternal and newborn health care in Mexico: validating indicators of health system contact and quality of care. BMC Pregnancy Childbirth. 2016;16(1):255.
McCarthy KJ, Blanc AK, Warren CE, Kimani J, Mdawida B, Ndwidga C. Can surveys of women accurately track indicators of maternal and newborn care? A validity and reliability study in Kenya. J Glob Health. 2016;6(2):020502.
McCarthy KJ, Blanc AK, Warren CE, Mdawida B. Women’s recall of maternal and newborn interventions received in the postnatal period: a validity study in Kenya and Swaziland. J Global Health. 2018;8(1):010605.
Stanton CK, Rawlins B, Drake M, dos Anjos M, Cantor D, Chongo L, et al. Measuring coverage in MNCH: testing the validity of women’s self-report of key maternal and newborn health interventions during the peripartum period in Mozambique. PLOS One. 2013;8(5):e60694.
Tuncalp O, Stanton C, Castro A, Adanu R, Heymann M, Adu-Bonsaffoh K, et al. Measuring coverage in MNCH: validating women’s self-report of emergency cesarean sections in Ghana and the dominican republic. PLOS One. 2013;8(5):e60761.
Liu L, Li M, Yang L, Ju L, Tan B, Walker N, et al. Measuring coverage in MNCH: a validation study linking population survey derived coverage to maternal, newborn and child health care records in rural China. PLoS ONE. 2013;8(5):e60762.
Kanyangarara M, Katz J, Munos MK, Khatry SK, Mullany LC, Walker N. Validity of self-reported receipt of iron supplements during pregnancy: implications for coverage measurement. BMC Pregnancy Childbirth. 2019;19(1):113.
Stanton CK, Dubourg D, De Brouwere V, Pujades M, Ronsmans C. Reliability data on caesarean sections in developing countries. Bull World Health Organ. 2005;83(6):449–55.
Yoder P, Rosato M, Mahmud R, Fort A, Rahman F, et al. Women’s recall of delivery and neonatal care: a study of terms, concepts and survey questions. Maryland: ICF Macro; 2010.
Kozuki N, Lee AC, Silveira MF, Sania A, Vogel JP, Adair L, et al. The associations of parity and maternal age with small-for-gestational-age, preterm, and neonatal and infant mortality: a meta-analysis. BMC Public Health. 2013;13(Suppl 3):S2.
Finlay JE, Özaltin E, Canning D. The association of maternal age with infant mortality, child anthropometric failure, diarrhoea and anaemia for first births: evidence from 55 low- and middle-income countries. BMJ Open. 2011;1(2):e000226.
Magadi MA, Agwanda AO, Obare FO. A comparative analysis of the use of maternal health services between teenagers and older mothers in sub-Saharan Africa: evidence from Demographic and Health Surveys (DHS). Soc Sci Med. 2007;64(6):1311–25.
Owolabi OO, Wong KLM, Dennis ML, Radovich E, Cavallaro FL, Lynch CA, et al. Comparing the use and content of antenatal care in adolescent and older first-time mothers in 13 countries of west Africa: a cross-sectional analysis of Demographic and Health Surveys. The Lancet Child and Adolescent Health. 2017;1(3):203–12.
Zimmerman LA, Shiferaw S, Seme A, Yi Y, Grove J, Mershon CH, et al. Evaluating consistency of recall of maternal and newborn care complications and intervention coverage using PMA panel data in SNNPR, Ethiopia. PLOS ONE. 2019;14(5):e0216612 (Ortiz-Panozo E, editor).
Chang KT, Mullany LC, Khatry SK, LeClerq SC, Munos MK, Katz J. Validation of maternal reports for low birthweight and preterm birth indicators in rural Nepal. J Glob Health. 2018;8(1):010604.
Carter ED, Ndhlovu M, Munos M, Nkhama E, Katz J, Eisele TP. Validity of maternal report of care-seeking for childhood illness. J Glob Health. 2018;8(1):1–13. https://doi.org/10.7189/jogh.08.010602.
McCarthy KJ, Blanc AK, Warren CE, Bajracharya A, Bellows B. Validating women’s reports of antenatal and postnatal care received in Bangladesh, Cambodia and Kenya. BMJ Glob Health. 2020;5(4):e002133.
Warren CE, Mayhew SH, Vassall A, Kimani JK, Church K, Obure CD, et al. Study protocol for the Integra Initiative to assess the benefits and costs of integrating sexual and reproductive health and HIV services in Kenya and Swaziland. BMC Public Health. 2012;12:973.
Bellows B, Warren C, Vonthanak S, Chhorvann C, Sokhom H, Men C, et al. Evaluation of the impact of the voucher and accreditation approach on improving reproductive behaviors and status in Cambodia. BMC Public Health. 2011;11(1):667.
Warren C, Abuya T, Obare F, Sunday J, Njue R, Askew I, et al. Evaluation of the impact of the voucher and accreditation approach on improving reproductive health behaviors and status in Kenya. BMC Public Health. 2011;11(1):177.
Talukder N, Rob U, Musa SA, Bajracharya S, Keya KT, Noor FR, et al. Evaluation of the impact of the voucher program for improving maternal health behavior and status in Bangladesh. Dhaka: Population Council. Available at: https://knowledgecommons.popcouncil.org/cgi/viewcontent.cgi?article=1937&context=departments_sbsr-rh.
Bellows NM, Bellows BW, Warren C. Systematic review: the use of vouchers for reproductive health services in developing countries: systematic review. Tropical Med Int Health. 2011;16(1):84–96.
Njuki R, Abuya T, Kimani J, Kanya L, Korongo A, Mukanya C, et al. Does a voucher program improve reproductive health service delivery and access in Kenya? BMC Health Serv Res. 2015;15(1):206.
Watt C, Abuya T, Warren CE, Obare F, Kanya L, Bellows B. Can reproductive health voucher programs improve quality of postnatal care? A quasi-experimental evaluation of kenya’s safe motherhood voucher scheme. PloS one. 2015;10(4):e0122828 (Kumar A, editor).
Bajracharya, Ashish, Dingle A, Bellows B. Trends and determinants of facility deliveries and antenatal care use in Cambodia: Evidence from the recent increase in service uptake. In: XXVIII International Population Conference of the International Union of the Scientific Study of Population (IUSSP). Cape Town; 2017.
Macaskill P, Gatsonis C, Deeks J, Harbord R, Takwoingi Y. Cochrane handbook for systematic reviews of diagnostic test accuracy chapter 10 analysing and presenting results. In: Deeks JJ, Bossuyt PM, Gatsonis C, editors. Cochrane handbook for systematic reviews of diagnostic test accuracy version 1.0. The Cochrane Collaboration; 2010. Available from: http://srdta.cochrane.org/.
Lee J, Kim KW, Choi SH, Huh J, Park SH. Systematic review and meta-analysis of studies evaluating diagnostic test accuracy: a practical review for clinical researchers-part II. statistical methods of meta-analysis. Korean J Radiol. 2015;16(6):1188–96.
Reitsma JB, Glas AS, Rutjes AWS, Scholten RJPM, Bossuyt PM, Zwinderman AH. Bivariate analysis of sensitivity and specificity produces informative summary measures in diagnostic reviews. J Clin Epidemiol. 2005;58(10):982–90.
Doebler P, Münster W, Holling H. Meta-analysis of diagnostic accuracy with mada. Available at: https://rdrr.io/rforge/mada/f/inst/doc/mada.pdf.
Glas AS, Lijmer JG, Prins MH, Bonsel GJ, Bossuyt PMM. The diagnostic odds ratio: a single indicator of test performance. J Clin Epidemiol. 2003;56(11):1129–35.
Dinnes J, Deeks J, Kirby J, Roderick P. A methodological review of how heterogeneity has been examined in systematic reviews of diagnostic test accuracy. Health Technol Assess. 2005;9(12):1–113 (iii).
Carter ED, Chang K, Mullany L, Khatry S, LeClerq S, Munos M, et al. Reliability of maternal recall of delivery and immediate newborn care indicators in Sarlahi Nepal. BMC Pregnancy Childbirth. 2021;21:82.
Siam ZA, McConnell M, Golub G, Nyakora G, Rothschild C, Cohen J. Accuracy of patient perceptions of maternity facility quality and the choice of providers in Nairobi, Kenya: a cohort study. BMJ Open. 2019;9(7):e029486.
Sheffel A, Heidkamp R, Mpembeni R, Bujari P, Gupta J, Niyeha D, et al. Correspondence to: Understanding client and provider perspectives of antenatal care service quality: a qualitative multi-method study from Tanzania. J Glob Health. 2019;9(1):11101.
Leeflang MM, Bossuyt PM, Irwig L. Diagnostic test accuracy may vary with prevalence: implications for evidence-based diagnosis. J Clin Epidemiol. 2009;62(1):5–12.
Conceived of the study design: AKB. Read and met the ICMJE criteria for authorship: AKB, AB, BB, KJM, CEW. Collected the data: BB, AB, CEW. Analysed the data: KJM. Wrote the first draft of paper: KJM, AKB. Provided critical revision to paper: BB, AB, CEW. Agree with the manuscript and conclusions: AKB, AB, BB, KJM, CEW.
This study was supported by the Bill & Melinda Gates Foundation through the Improving Measurement and Program Design grant (OPP1172551) to the Johns Hopkins Bloomberg School of Public Health. The funder had no role in the design of the study and collection, analysis, and interpretation of data or in the writing of the manuscript.
Ethical approval and consent to participate
Ethical clearance for the Voucher study was granted by the Population Council’s Institutional Review Board (IRB) (approval number 496 for Cambodia, 470 for Kenya, and 498 for Bangladesh), the National Ethics Committee for Health Research (NECHR) (approval number 173 and 186) in Cambodia and the Kenya Medical Research Institute (KEMRI) Ethical Review Board (approval number 164). Ethical clearance for the Integra protocol was granted by the Population Council’s IRB (approval number 444), the Ethics Review Committee of London School of Hygiene & Tropical Medicine (approval number 5426), the Kenya Medical Research Institute (KEMRI) Ethical Review Board (approval number 114), and the Scientific Ethics Committee of the Swaziland Ministry of Health (approval number MH/599C). In all primary studies, informed consent was obtained from all participants prior to data collection and in method in accordance with the Declaration of Helsinki. All methods were carried out in accordance with relevant guidelines and regulations. For the present study, an exemption waiver from the Population Council IRB was obtained to conduct secondary analysis of the de-identified data (#EX2018002).
Consent for publication
The authors declare no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Antenatal care (ANC) indicator construction.
Postnatal care (PNC) indicator construction.
Antenatal Care Indicator Sensitivity (Panel A) and Specificity (Panel B) by Country of Study, Sorted by Indicator Prevalence. Bangladesh (BA), Cambodia (CA) and Kenya (KE).
Antenatal Care Indicator Sensitivity and Specificity by Country of Study and Age Group (Adolescent vs. Adult).
Univariate fixed effects model: self-reported ANC indicator accuracy by respondent and facility characteristics.
Univariate fixed effects models: Self-reported PNC indicator accuracy by facility quality (non-voucher vs. voucher intervention facility).
About this article
Cite this article
McCarthy, K.J., Blanc, A.K., Warren, C.E. et al. Exploring the accuracy of self-reported maternal and newborn care in select studies from low and middle-income country settings: do respondent and facility characteristics affect measurement?. BMC Pregnancy Childbirth 23, 448 (2023). https://doi.org/10.1186/s12884-023-05755-7