Neonatal resuscitation: EN-BIRTH multi-country validation study

Background Annually, 14 million newborns require stimulation to initiate breathing at birth and 6 million require bag-mask-ventilation (BMV). Many countries have invested in facility-based neonatal resuscitation equipment and training. However, there is no consistent tracking for neonatal resuscitation coverage. Methods The EN-BIRTH study, in five hospitals in Bangladesh, Nepal, and Tanzania (2017–2018), collected time-stamped data for care around birth, including neonatal resuscitation. Researchers surveyed women and extracted data from routine labour ward registers. To assess accuracy, we compared gold standard observed coverage to survey-reported and register-recorded coverage, using absolute difference, validity ratios, and individual-level validation metrics (sensitivity, specificity, percent agreement). We analysed two resuscitation numerators (stimulation, BMV) and three denominators (live births and fresh stillbirths, non-crying, non-breathing). We also examined timeliness of BMV. Qualitative data were collected from health workers and data collectors regarding barriers and enablers to routine recording of resuscitation. Results Among 22,752 observed births, 5330 (23.4%) babies did not cry and 3860 (17.0%) did not breathe in the first minute after birth. 16.2% (n = 3688) of babies were stimulated and 4.4% (n = 998) received BMV. Survey-report underestimated coverage of stimulation and BMV. Four of five labour ward registers captured resuscitation numerators. Stimulation had variable accuracy (sensitivity 7.5–40.8%, specificity 66.8–99.5%), BMV accuracy was higher (sensitivity 12.4–48.4%, specificity > 93%), with small absolute differences between observed and recorded BMV. Accuracy did not vary by denominator option. < 1% of BMV was initiated within 1 min of birth. Enablers to register recording included training and data use while barriers included register design, documentation burden, and time pressure. Conclusions Population-based surveys are unlikely to be useful for measuring resuscitation coverage given low validity of exit-survey report. Routine labour ward registers have potential to accurately capture BMV as the numerator. Measuring the true denominator for clinical need is complex; newborns may require BMV if breathing ineffectively or experiencing apnoea after initial drying/stimulation or subsequently at any time. Further denominator research is required to evaluate non-crying as a potential alternative in the context of respectful care. Measuring quality gaps, notably timely provision of resuscitation, is crucial for programme improvement and impact, but unlikely to be feasible in routine systems, requiring audits and special studies. Supplementary Information The online version contains supplementary material available at 10.1186/s12884-020-03422-9.

• Neonatal resuscitation programmes are being scaled up globally, yet coverage of resuscitative interventions is not routinely tracked.
Resuscitation coverage and quality measures have not yet been validated in either population-based surveys or routine facility registers.
• Challenges exist for measurement of resuscitation coverage indicators:°N umerator: Which action during clinical resuscitation (e.g. stimulation or bag-mask-ventilation [BMV]) is both measurable and valid?°D enominator: What is measurable and useful (e.g. live births plus fresh stillbirths or non-breathing, or non-crying babies)?
• EN-BIRTH is the first observational study (> 23,000 births) to assess validity of neonatal resuscitation coverage measurement, in both exit survey of women's report and routine register records. Using time-stamped data, we analysed coverage and quality of neonatal resuscitation in five hospitals in Bangladesh, Nepal, and Tanzania.
Survey-what did we find and what does it mean?
• Denominator options: Crying at birth had low "don't know" responses (< 3%) in exit survey. Compared to observed crying within a minute of birth, sensitivity was high (> 95%); however, specificity was low (< 22%). Survey-reported BMV coverage validity was consistently low for all denominators assessed.
Register-what did we find and what does it mean?
• Numerator options: Stimulation and BMV were recorded by 4 of 5 labour ward registers, yet accuracy varied between hospitals even with the same register design. BMV sensitivity ranged from 12.4-48.4% and specificity was high (> 93%). For stimulation, sensitivity was low at 7.5-40.8% and specificity was more variable (range 66.8-99.5).
• Denominators: Livebirths and fresh stillbirths were recorded in all registers. The "non-crying/non-breathing" combined denominator was only in the Bangladesh registers and could not be validated.
Register-recorded BMV coverage was consistent whichever denominators was applied.
Gap analysis for quality of care and measurement • Most newborns (71.4-94.7%) who did not respond to stimulation did receive BMV, but only 1% within the recommended 1 minute after birth.

Key findings (Continued)
What next and research gaps?
• Population-based surveys are not likely to be useful for measuring neonatal resuscitation coverage, given low validity of exit-survey report. Additionally, household surveys would be underpowered since resuscitation is required by a small proportion of babies.
• Routine hospital registers have potential to track resuscitation coverage indicators, but implementation research is needed to standardise design and processes, including data flow to Health Management Information Systems. BMV is the most accurate numerator, true denominator measurement is complex and requires more research, including assessment of non-crying.
• Data use with feedback loops and support to frontline healthcare workers could help improve data quality and quality of care. Local clinical quality improvement and special studies are important to reduce quality gaps, particularly for timely BMV, and help meet global goals to end preventable deaths.

Background
Annually, 7-14 million newborns (5-10%) are estimated to require stimulation to initiate breathing at birth and 6 million newborns require bag-mask-ventilation (BMV) [1,2]. Intrapartum-related events (previously termed "birth asphyxia") are a leading cause of neonatal mortality, accounting for 11% of under-five deaths [2,3]. Such intrapartum-related events can cause stillbirths just before birth and neonatal deaths just after. The majority (> 98%) of stillbirths are in low-and middle-income countries (LMICs) and an estimated 50% are intrapartum [4]. Resuscitation is recommended for all babies who do not breathe after birth since live births may be misclassified as stillbirths [5,6]. Meeting Sustainable Development Goal (SDG) targets by 2030 for ending preventable neonatal deaths requires universal coverage of high quality care around birth for women and their babies, including resuscitation for those who do not breathe at birth [7,8]. Globally~80% births are now in facilities [9], with many LMICs scaling up neonatal resuscitation programs [10][11][12]. However, lack of measurement for coverage and quality of neonatal resuscitation impedes tracking of progress [13]. The definition of coverage requires a numerator capturing the intervention (or a component) divided by a target denominator regarding clinical need. A good indicator may not include all of the clinical intervention but should "indicate" well and also not incentivise undesirable practices. Resuscitation coverage measurement has specific challenges. Clinical algorithms have multiple actions that could be used as numerators, notably: stimulation of the baby or the action of BMV. Suction is indicated for some babies, but inappropriate suctioning can be harmful, thus should be avoided for a measurement focus [14].
Resuscitation algorithms start at birth for all babies, including fresh stillbirths, being dried and assessed for crying or breathing. WHO guidance on basic resuscitation focuses on the baby who is not breathing spontaneously or is depressed [15]. A global partnership called "Helping Babies Breathe," (HBB) widely used for neonatal resuscitation training in LMICs, uses crying during thorough drying as a rapid and objective assessment, then evaluating breathing ( Fig. 1) [16]. In line with WHO guidelines, if the baby is not crying and not breathing, then stimulation is provided to improve or initiate breathing, and clearing of the airway if it is blocked with secretions. If the baby is not breathing after these actions BMV should begin within 1 minute of birth.
Most data on maternal and newborn health care coverage in LMICs relies on population-based surveys, notably Demographic and Health Surveys (DHS) and Multiple Indicator Cluster Surveys (MICS), none of which capture neonatal resuscitation. Routine facility data are currently an underutilised source for neonatal resuscitation coverage for routine Health Management Information Systems (HMIS). Interventions around the time of birth are typically recorded in one or more hospital documents: individual patient records, labour and delivery ward registers, and intervention-specific registers (e.g., neonatal resuscitation register) [17]. Previous research has demonstrated availability of some neonatal resuscitation data in routine labour ward registers [18,19]. Use of HMIS data aggregated from registers is impeded by concerns regarding data quality [20], but to date no validation studies have been undertaken regarding either survey or routine register data for neonatal resuscitation coverage indicators.
The Every Newborn Action Plan, agreed by all 195 United Nations member states, includes an ambitious Measurement Improvement Roadmap [8] to validate coverage indicator measurement for care and outcomes around the time of birth. Every Newborn-Birth Indicators Research Tracking in Hospitals (EN-BIRTH) study was undertaken in three countries (Tanzania, Bangladesh, and Nepal) and aimed to assess validity of measurement of selected newborn and maternal indicators for routine facility-based tracking of coverage, quality of care, and outcomes [21].

Objectives
This paper is part of a supplement based on the EN-BIRTH multi-country study, 'Informing measurement of coverage and quality of maternal and newborn care', and focuses on neonatal resuscitation measurement with four objectives:

Methods
EN-BIRTH was an observational, mixed methods study comparing data from clinical observers (gold standard) to survey-reported and register-recorded coverage of perinatal care and outcomes (Fig. 2). Detailed information regarding the research protocol, methods, and analysis has been published separately [21,22]. Data were collected from July 2017-July 2018 in five public CEmONC hospitals in three high mortality burden countries: Bangladesh (BD) -Maternal and Child Health Training Institute (MCHTI), Azimpur and Kushtia District Hospital, Nepal (NP) -Pokhara Academy of Health Sciences, and Tanzania (TZ) -Temeke Regional hospital and Muhimbili National Referral Hospital. (Additional file 1). Baseline health facility assessments established that all five hospitals had capacity to resuscitate newborns. Resuscitation guidelines used in all five hospitals were based on HBB [16]. Participants were consenting women admitted in labour for care around birth. Exclusion criteria included imminent birth and no fetal heart on admission. Clinically trained researchers observed participants 24 h per day and recorded data on the baby's condition at birth (e.g., crying/breathing) and care (e.g., stimulation and BMV). The observers received refresher training in HBB as part of their clinical observation training before the study started [21]. Data were collected with a custom-built android tablet-based application, including timestamps for observations. Research data collectors interviewed women after discharge before exit from hospital regarding their baby's condition after birth and care received. Resuscitation and

U N C O R R E C T E D P R O O F
outcome data were extracted from routine hospital registers. Metadata definitions of selected indicator options for validity testing are shown in Additional file 2. To determine the reliability of the observational data (gold standard) supervisors duplicated observation (and register data extraction) for a subset of 5% to calculate Cohen's Kappa coefficients. Health workers and data collectors were interviewed about barriers and enablers to use of routine registers in recording of perinatal care and outcomes. Results are reported in accordance with STROBE Statement checklists for cross-sectional studies (Additional file 3). Quantitative analysis was undertaken using R version 3.6.1 [23].

Objective 1: Numerator for indicator measurement validation
Livebirths and fresh stillbirths (hereafter referred to as "newborns"), were considered to require initial assessment for resuscitation, whilst macerated stillbirths were excluded. We explored accuracy of two possible numerator options N1) Stimulation and N2) BMV in both survey and register data compared to observation data.
In exit surveys, where a woman reported her newborn had difficulty breathing at birth, she was asked about resuscitation practices. In line with common survey indicator reporting, where women replied, "don't know" we considered the survey-reported stimulation/BMV response as "no".
We compared observed coverage (gold standard) of stimulation and BMV to survey-reported and register-recorded coverage. We calculated absolute differences between measured coverage (survey or register) and observed coverage to understand under-or overestimation at the population level. Using two-way tables, we calculated individual-level validity statistics: sensitivity, specificity, and percent agreement (true positive + true negative / total) of register-recorded and surveyreported BMV coverage to observed coverage. Area under the curve, inflation factor, positive predictive value, and negative predictive value were also calculated. All calculations were stratified by hospital with 95% confidence intervals. Pooled results for validity analyses were calculated using random effects meta-analysis, presented with i 2 , τ 2 , and heterogeneity statistic (Q).

Objective 2: Denominator comparisons
We explored neonatal resuscitation coverage measurement using three possible denominator options: D1) all newborns (total births excluding macerated stillbirths), D2) newborns not crying within the first minute after birth and D3) newborns not breathing within the first minute after birth.

U N C O R R E C T E D P R O O F
Objective 3: Gap analysis for coverage and quality of care, and measurement We examined gaps in coverage and timely neonatal resuscitation amongst a subset of newborns with a clinical need for resuscitation within 1 minute of birth. These newborns were not breathing in the first minute after birth and did not respond to stimulation (or suction when performed). For this (A) eligible population subset, we analysed four gaps for neonatal resuscitation: (B) coverage gap for BMV, (C) quality of care gap between any BMV coverage, and timely coverage (within 1 minute), (D) measurement gap for survey-report, and (E) measurement gap for register-record.

Objective 4: Barriers and enablers to routine recording
Qualitative data collection tools for focus group discussions and in-depth interviews were informed by the Performance of Routine Information System Management Series (PRISM) conceptual framework [25]. Detailed qualitative methods and overall results are available in an associated paper [26]. A purposive sample of nurses, midwives, doctors, and EN-BIRTH data collectors from each of the five hospitals participated. Analysis identified themes based on three domains: register design, filling, and use [25]. In addition, respondents were asked questions regarding the order in which resuscitation is documented in registers, patient notes, and other documents as well as how long after resuscitation is documentation entered in the labour ward register. This paper presents emerging themes regarding recording of neonatal resuscitation.

U N C O R R E C T E D P R O O F
Overall, 98.7% were alive at discharge from labour and delivery, 1% were fresh stillbirths, and less than 1% were born alive but died on the labour ward. Nearly one-third of births (29.5%) were by caesarean section, highest (73.6%) in Azimpur BD. Among 22,752 newborns (denominator option D1), 3688 (16.2%) were stimulated (numerator option N1) and 998 (4.4%) received BMV (numerator option N2) (Fig. 4). Within the first minute after birth, 5330 were observed as non-crying (denominator option D2), and among these 3860 were also observed as non-breathing (denominator option D3).

Assessing biases in the data
Duplicate case observation inter-rater reliability showed substantial agreement (> 0.71) for resuscitation elements (Additional file 4). Register extraction agreement was lower and varied greatly between sites, ranging from − 0.035 to 0.939. Register-recorded coverage (0.8-34.8%) underestimated coverage in the Bangladesh hospitals and overestimated coverage in the Tanzania hospitals (Fig. 5). While sensitivity was low (< 41%), specificity was high across most sites (66.8-99.5%).
Objective 2: Denominator for indicator measurement comparison Denominator option 1: all newborns (live births and fresh stillbirths) The validation of birth outcomes is reported separately [27]. Survey validity ratios for BMV coverage measurement using this all newborn denominator performed poorly (0.11-0.71) and register validity ratios were moderate to poor (0.70-1.22) (Fig. 7).

Denominator option 3: non-breathing newborns
Prevalence of not breathing within the first minute ranged from 11.7% in Azimpur BD to 21.0% in Pokhara NP. The survey validity ratio for BMV coverage measurement using this non-breathing denominator performed poorly (0.14-0.49). Sensitivity ranged from 0 to 20.8% while specificity was > 97% across hospitals. Register validity ratios were better, but still classified as poor (0.45-0.78). While sensitivity ranged from 11.1-51.3%, specificity was high across all hospitals (> 92%).

Objective 3: Coverage and quality gap analysis
Among the subset proxy for true clinical need [newborns who did not cry/breathe in the first minute with no response to stimulation (or suction if needed)], most received BMV, ranging from 71.4% in Azimpur BD to 94.7% in Pokhara NP (Fig. 8) but timely coverage was very low (1%). Survey-reported coverage (< 28%) substantially underestimated true coverage. Register-recorded coverage also underestimated true coverage and ranged widely from 0.0% in Kushtia BD to 52.9% in Temeke TZ.
Among newborns receiving any BMV on the labour ward, the proportion receiving the first ventilation breath within 1 minute of birth ranged from 0.2% in Temeke TZ to 8.0% in Pokhara NP. Across the three denominators explored, time to initiation of BMV was similar (Fig. 9).

Objective 4: Barriers and enablers to routine recording Register design
Labour ward registers varied in design, between the five hospitals (Fig. 5). Bangladesh labour ward registers had three specific columns for recording neonatal resuscitation: (i) "baby did not breathe/cry after birth" (tick box for 'yes' and tick box for 'no'), (ii) "stimulation" (instructions to tick for 'yes' and leave blank for 'no') and (iii) "BMV" (instructions to tick for 'yes' and leave blank for 'no'). The Tanzanian register captured resuscitation steps by numerical code in a column headed "Helping Babies Breathe" (suction = 1, stimulation = 2, BMV = 3) or "no", and blanks are treated as not recorded. There was no specific column in the Nepal register for resuscitation.

Documentation practices in registers
Resuscitation practices were recorded in varying order into multiple documents (Additional file 11). Reported time between care and documentation ranged from 2.5 min in Pokhara NP to 22.5 min in Temeke TZ.
Register design Register design largely acted as a barrier to recording in Pokhara NP: "Drying, stimulation, and bag-mask ventilation are written [in the patient's chart], but in the main register it is not present… we do not have routine care of the newborn in the register, only in the patient's chart." -Data collector, Pokhara NP

U N C O R R E C T E D P R O O F
In the other hospitals health workers duplicated documentation in registers with multiple other documents (e.g. partographs, patient case notes) (Additional file 12).
Register filling Aspects of register filling acted as both barriers and enablers. Training and support from senior nurses enabled improved accuracy of documentation while limited time acted as a barrier. Health workers across the hospitals discussed the lack of time to document, particularly for complicated cases and resuscitation when you are focused on delivering care: "Just after finishing [resuscitation], you must keep everything clear… time is a problem… you must estimate, there are times it is difficult and other times you ask the [senior nurse]… because in an emergency you all work together; thus, you remind each other." -Health worker, Temeke TZ Health workers in Pokhara NP received specific support for documentation in neonatal resuscitation: "We have received training on HBB and we were trained for documentation in that. We were doing documentation before, but we received direction for improving it." -Health worker, Pokhara NP However, while health workers in Pokhara NP record resuscitation in other documents, it is not recorded in routine hospital registers.
Register use While improved patient care and use of data by managers motivated documentation and was affirmation of the care health workers were giving, not all respondents could identify the use for resuscitation data in routine registers.
Feedback was lacking where documentation didn't line up with clinical need: "Sometimes when you look at the [APGAR] score of the baby, maybe it's 5, you wonder why they didn't perform resuscitation, there's a possibility they [did] but they haven't documented that… There's no one

U N C O R R E C T E D P R O O F
to follow up on that… The person responsible for data comes and copies what's written in the register, be it a low score… but they never ask them why they didn't perform resuscitation if the baby had a low score" -Health worker, Temeke TZ Conversely, in Bangladesh, health workers were not sure what happened with resuscitation data: "Resuscitation is an emergency subject. There remains a referral slip while resuscitating a baby on emergency that indicates the baby went to  "Minor things like suctioning were not recorded and they only documented on a resuscitation case that took more than ten minutes." -Data collector, Muhimbili TZ However, the importance of documentation was noted for organizational and personal protection: "For instance, if a child has been born but unfortunately, let us say she had a problem, you have resuscitated her, but you did not document… and the mother/parent has become very angry and start complaining, or the whole management has become angry with you why the child had this situation, but you did not record what you have done ... You will not defend yourself, but documentation

Discussion
EN-BIRTH study's large sample size (22,752 live births and fresh stillbirths) allowed the first validity assessment of measurement for neonatal resuscitation coverage in routine hospital registers and surveys, against a gold standard of clinical observation. We found that survey report poorly captured resuscitation indicators. Routine labour ward registers performed better, but variably, and have potential, especially with data quality improvement. Survey-reported coverage was challenging, which is not surprising. We found most women who reported their baby had trouble breathing after birth did not know if their baby had been stimulated or received BMV. We recommend resuscitation need or BMV questions should not be added to existing population-based surveys. Furthermore, the sample size required for this relatively low-incidence practice, would be challenging even in DHS surveys with large, nationally representative samples [28].
The numerator for neonatal resuscitation is key. Stimulation by rubbing the baby's back is easily conflated with the similar action of drying every newborn baby and was not recognized at all by mothers (< 3% in survey report). Suction is only necessary if the airway is blocked and a measurement focus on suction may unintentionally encourage this potentially harmful practice which can cause bradycardia. BMV is the most distinguishable option for a clear subset of non-breathing babies and had higher accuracy than stimulation. Though underestimated in surveys, accuracy of BMV was still performed better than stimulation by survey-report. Additionally, BMV is a more suitable intervention for which to assess quality and links to health facility assessments where standard questions include presence and recent use of neonatal bag and masks.
Health facilities are where~80% of women now deliver [9], providing an opportunity to track neonatal resuscitation coverage through routine facility data using BMV as the numerator. Four of the five routine registers assessed were already capturing BMV count data. At the population level, register-recorded coverage of BMV was within 2.1% of observed coverage although individuallevel validation metrics suggested low sensitivity. Selective register design is important in capturing what is needed yet avoiding documentation over-burdening. In Tanzania, the register column labelled "HBB" aligns measurement with scale-up programming. The design in Bangladesh instructed health workers to leave the column blank when BMV is not done; thus, calculating completeness and differentiating between truly 'not done' and register 'incomplete' was impossible. Where register instructions in Tanzania state to write "no" if BMV was not done, completeness was moderate to high (54.6-91.0%). Although data collectors rarely indicated data were not readable (< 0.5%), there were low interrater kappa results for register-recorded BMV in some sites [22]. Because extraction/aggregation is the first step for data flowing to higher levels in the health system, more research is needed to improve this. Capturing reliable data depends on user-friendly, appropriate recording systems, however, accuracy varied even within the same country using identical register design, highlighting the importance of information culture and supervision. Our qualitative findings suggest differences in understanding of importance and utility of resuscitation data at different hospitals.
Denominators are notably challenging for interventions such as resuscitation which are indicated based on clinical need for only a subset of babies [29]. Current WHO guidance recommends number of live births in a facility with a footnote that this is pragmatic whilst ongoing work to test different denominators, including EN-BIRTH, is completed [30]. Here we have included live births plus fresh stillbirths, for whom resuscitation is recommended. Any newborn without maceration or major malformations, even if they appear completely lifeless, should be given the chance of resuscitation [31]. The reduction in stillbirth rates associated with resuscitation training [32][33][34] are likely results of reduced misclassification of live births as stillbirths.
Measuring the true denominator for clinical need for resuscitation is complex. Newborns require BMV if nonbreathing/gasping after initial drying/stimulation or if they suffer subsequent apnoea at any time. Breathing well may be difficult to measure as the concept excludes gasping, fast breathing and grunting. It is critical to emphasise these breathing patterns during clinical training as BMV is indicated for some (e.g. gasping) but not all of these breathing patterns. EN-BIRTH observers collected breathing or not breathing as a binary variable as during formative research it was decided other breathing patterns were not feasible to capture. In our study, 2/5 registers captured non-breathing but as a composite noncrying and non-breathing indicator. Consequently, accuracy of this denominator in registers could not be assessed.
Non-crying has potential utility as a denominator as it is simple for health workers to capture and is part of the process in assessing need for resuscitation. Additionally, crying at birth is a single event and thus more straightforward to record as opposed to breathing which is a process and might change over time, particularly for preterm babies. While not all non-crying babies will require further steps of resuscitation, almost all babies who do need BMV are non-crying. One study has shown babies breathing but not crying after birth, have an increased risk of death [35]. We found the observed coverage of BMV ranged from 3.6-17.8% among babies not

U N C O R R E C T E D P R O O F
crying in the first minute. Further research is required to assess if non-crying is useful and benchmarking is feasible. However, as considerations turn towards respectful newborn care and minimal handling, further research is needed related to newborn physiological responses after birth and what is appropriate to measure.
Apgar scores are captured in all the routine hospital registers in our study, including in Pokhara NP, which captured no resuscitation interventions. Apgar scores do not capture interventions around the time of birth, rather describe a newborn's physical condition and response to any interventions at 1 and 5 minutes after birth and are already known to have limitations, notably low inter-rater reliability. The one-minute Apgar score, which includes heart rate, does not fit well with current resuscitation algorithms which recommend checking the baby's heart rate after a minute of ventilation (2 minutes after birth). As such, the Apgar score is not a useful denominator for neonatal resuscitation and as usually written in individual patient records, we suggest exploring replacing this column in routine labour ward registers with data elements that can be used for coverage measurement e.g. not crying after birth.
Timely resuscitation is essential and even small delays in starting resuscitation can contribute to death or disability [36]. Our assessment of quality of care focused on timeliness of the start of BMV within the first minute after birth. While coverage of BMV was high (85%), only 1% of newborns received the first ventilation within 1 minute of birth. In the all newborns denominator, not all will require BMV within 1 minute of birth as many were crying/breathing at birth and subsequently became distressed or apnoeic. A coverage gap for BMV of fresh stillbirths is to be expected as it is not appropriate to resuscitate those babies who are diagnosed before birth to have died in utero e.g. confirmed by ultrasound. Measuring timing of BMV is clearly not feasible in surveys and very unlikely to be possible in routine labour ward registers. Given this major quality gap regarding timing of resuscitation initiation, local audit and special studies are important to drive quality improvement.
Strengths of this study include the multi-site and multicountry design and large sample size enabling the capture of multiple decision points on resuscitation algorithms. We evaluated how several possible numerators/denominators performed using clinical observation as a gold standard. We assessed possible bias in the observation data with double observation for a subset of cases. Overall, BMV had good inter-observer agreement. Whilst clinically trained observers provided gold standard data on coverage of interventions, subjectivity remains possible e.g. differentiating stimulation from immediate drying. To limit this, the tablet application was designed to capture stimulation in a specific neonatal resuscitation section separate from the immediate care practices, such as drying. The low coverage of stimulation amongst non-crying/breathing newborns (34-38%) may reflect poor quality of care or difficulty in measurement for stimulation by an observer.
Some other limitations should be noted. Survey-reported coverage was assessed in exit survey, closer in time to the events in question than standard population-based surveys with 3-5-year reference periods. In survey, only women who answered 'yes' to a question asking whether their baby had difficulty breathing at birth were asked further questions about resuscitation, thus some who may have recognised newborn stimulation were not counted towards survey-reported coverage. Additionally, the EN-BIRTH study sample may be healthier than the average in these facilities (excluding women too sick to consent, women with no fetal heart rate heard at admission, etc). As the study sites were CEmONC hospitals, case mix, coverage, and measurement may differ at lower-level facilities.
Importantly, the true denominator of babies in need of BMV will not be captured by facility measurement, especially the disadvantaged who are more likely to deliver at home in LMICs. However, home births are less likely to receive BMV in most LMIC, so facility measurement is likely to capture nearly all the numerator in terms of newborns receiving BMV. Hence approaches such as those used in immunisation when the denominator is missing may help to estimate the coverage of the whole population for contexts with many home births.

Conclusion
Neonatal resuscitation is a high impact evidence-based intervention for a leading cause of under-five mortality, preventable stillbirth and disability. Yet the current lack of coverage measurement is impeding global tracking of scaleup in high-burden countries. We found bag-maskventilation was the most reliable numerator. Measuring the true denominator for clinical need is complex and further denominator research is required, including respectful care considerations, evaluating non-crying as a potential alternative. Based on these results, we do not recommend tracking this indicator through population-based survey. Register measurement of neonatal resuscitation has potential and if standardised and included in HMIS, could aid in tracking progress towards global targets across countries. An appropriate resuscitation denominator could potentially replace Apgar, which was recorded as a column in all five registers. Implementation research is needed regarding how to improve register data quality. Measuring and addressing quality of care gaps, notably for timely provision of resuscitation in the first minute, is crucial for programme improvement and impact, but unlikely to be feasible in routine systems, requiring audits and special studies. Improving data is possible and necessary, informing progress to meet global goals and meet every family's aspiration that their baby will survive and thrive. Publication of this manuscript has been funded by CIFF. CIFF attended the study design workshop but had no role in data collection, analysis, data interpretation, report writing or decision to submit for publication. The corresponding author had full access to study data and final responsibility for publication submission decision.

Availability of data and materials
The datasets generated during and/or analysed during the current study are available on LSHTM Data Compass repository, https://datacompass.lshtm.ac.uk/955/.