Birthweight: EN-BIRTH multi-country validation study

Background Accurate birthweight is critical to inform clinical care at the individual level and tracking progress towards national/global targets at the population level. Low birthweight (LBW) < 2500 g affects over 20.5 million newborns annually. However, data are lacking and may be affected by heaping. This paper evaluates birthweight measurement within the Every Newborn Birth Indicators Research Tracking in Hospitals (EN-BIRTH) study. Methods The EN-BIRTH study took place in five hospitals in Bangladesh, Nepal and Tanzania (2017–2018). Clinical observers collected time-stamped data (gold standard) for weighing at birth. We compared accuracy for two data sources: routine hospital registers and women’s report at exit interview survey. We calculated absolute differences and individual-level validation metrics. We analysed birthweight coverage and quality gaps including timing and heaping. Qualitative data explored barriers and enablers for routine register data recording. Results Among 23,471 observed births, 98.8% were weighed. Exit interview survey-reported weighing coverage was 94.3% (90.2–97.3%), sensitivity 95.0% (91.3–97.8%). Register-reported coverage was 96.6% (93.2–98.9%), sensitivity 97.1% (94.3–99%). Routine registers were complete (> 98% for four hospitals) and legible > 99.9%. Weighing of stillbirths varied by hospital, ranging from 12.5–89.0%. Observed LBW rate was 15.6%; survey-reported rate 14.3% (8.9–20.9%), sensitivity 82.9% (75.1–89.4%), specificity 96.1% (93.5–98.5%); register-recorded rate 14.9%, sensitivity 90.8% (85.9–94.8%), specificity 98.5% (98–99.0%). In surveys, “don’t know” responses for birthweight measured were 4.7%, and 2.9% for knowing the actual weight. 95.9% of observed babies were weighed within 1 h of birth, only 14.7% with a digital scale. Weight heaping indices were around two-fold lower using digital scales compared to analogue. Observed heaping was almost 5% higher for births during the night than day. Survey-report further increased observed birthweight heaping, especially for LBW babies. Enablers to register birthweight measurement in qualitative interviews included digital scale availability and adequate staffing. Conclusions Hospital registers captured birthweight and LBW prevalence more accurately than women’s survey report. Even in large hospitals, digital scales were not always available and stillborn babies not always weighed. Birthweight data are being captured in hospitals and investment is required to further improve data quality, researching of data flow in routine systems and use of data at every level. Supplementary Information The online version contains supplementary material available at 10.1186/s12884-020-03355-3.

• An estimated 20.5 million low birthweight (LBW) babies are born each year, and tracking progress in the highest burden countries still relies on population-based surveys, which are known to have missing data and substantial heaping (preference for recording weights ending in 00). Improving birthweight data in both routine systems and surveys is essential. • EN-BIRTH is the largest multi-country, multi-site study (> 23,000 births) to assess availability, validity and quality of birthweight data in both survey and routine registers. Qualitative data explored barriers and enablers for routine register recording of birthweight.
Survey-what did we find and what does it mean?
• Survey-reported birthweight coverage underestimated observed coverage by nearly 5% and LBW prevalence by 1%. • Survey-reported birthweight heaping was 1.5 times higher than the observed heaping. • Women with stillborn babies reported a much lower coverage of weighing than observed.
Register-what did we find and what does it mean?
Gap analysis for quality of care • Nearly all (95.9%) babies were weighed within 1 h, however, only 14.7% were weighed on digital scales. Stillbirths were weighed much less often, despite birthweight data being fundamental to classifying and intervening to prevent stillbirth. • Substantial heaping of observed birthweights included those at 2500 g, so the LBW rate will likely be inaccurate. • Birthweight heaping indices were approximately two-fold lower using digital compared to analogue scales and also 3-5% lower during day shifts compared to night shifts.

What next and research gaps?
• Routine register-records outperformed exit-survey report accuracy for measurement of birthweight and LBW in these hospitals. Further research is needed to assess if survey-reported accuracy decreases over time. • Investment is needed to explore how digital scales, standardised register process and design can improve birthweight routine data measurement quality further. • Improving data flow of currently available hospital birthweight data into Health Management Information Systems (HMIS) has potential to close the large LBW data gap in high-burden LMIC settings.

Background
Birthweight closely correlates with newborn survival and lifelong health. The World Health Organization (WHO) recommends measuring birthweight within the first hour of life, ideally using calibrated digital scales with 10 gramme (g) precision [1]. Low birthweight rate has agreed global targets and data are needed to track progress [2]. Among neonatal deaths, 80% have low birthweight (LBW) defined as < 2500 g [3,4]. An estimated 20.5 million LBW neonates were born in 2015; 91% were born in lowand middle-income countries (LMICs), with almost half in south Asia (48%) and a quarter in sub-Saharan Africa (24%) [3,5]. LBW survivors continue to have a higher risk of morbidity, including stunting, lower intelligence quotient, and cardiovascular disease later in life [6][7][8]. Stillborn babies, estimated at > 2 million per year and 84% in LMICs, have similar contributing factors to placental failure as LBW livebirths, yet are not visible as standard birthweight indicator definitions use a livebirth denominator [9]. Tracking coverage of birthweight measurement is recommended and LBW rate is one of only four newborn health measures in WHO's 100 core health indicators [10]. Global nutrition targets set by WHO include a 30% reduction of LBW infants from 2012 to 2025 [2], but the required annual rate of reduction is currently off target [11]. Birthweight data are essential to reach the target neonatal mortality rate (NMR) of Sustainable Development Goal (SDG) 3.2 by 2030 [12]. NMR and stillbirth rates stratified by birthweight group need to be used for perinatal death surveillance and response in settings where accurate gestational age and cause of death assessment is not possible [13]. At an individual level, birthweight data ensures that at-risk newborns receive the immediate care they need and serves as the first measurement for monitoring a child's growth to promote health outcomes throughout the lifecourse.
Birthweight data are not available for almost one-third (39.7 million) of newbornsthe majority in LMICs [3]. Available birthweight data in high mortality burden countries are mostly from population-based surveys, notably the Demographic and Health Surveys (DHS) Program and the United Nations Children's Fund (UNICEF) Multiple Indicator Cluster Surveys (MICS) [14,15]. As > 80% of births globally are now in facilities [15], potentially more birthweight data can be made available through routine Health Management Information Systems (HMIS) [4,14]. When birthweight data are available, concerns about quality, including heaping, limit use and usefulness. Previous birthweight-related indicator validation studies in LMICs have mostly focused on household survey measurement [16][17][18][19], with few addressing routine facility measurement [20]. The validity of birthweight measurement through routine hospital registers in LMIC has not previously been studied. The barriers and enablers that affect the quality of birthweight data in routine hospital registers in LMIC are not known.
The Every Newborn Action Plan, agreed by all United Nations member states and > 80 development partners, includes an ambitious measurement improvement roadmap [12,21] with urgent focus to improve data for use towards high-quality care around the time of birth [12,22]. As part of this roadmap, the Every Newborn -Birth Indicators Research Tracking in Hospitals (EN-BIRTH) study aimed to validate the measurement of selected newborn and maternal indicators for routine tracking of coverage and quality of facility-based care [23,24].

Objectives
This paper is part of a supplement based on the EN-BIRTH multi-country validation study, 'Informing measurement of coverage and quality of maternal and newborn care', and focuses on birthweight with three objectives:

Methods
The EN-BIRTH study was a mixed-methods observational study and detailed information regarding the EN-BIRTH research protocol and overall validation results have been published separately [23,24]. This is the first analysis of the EN-BIRTH birthweight data. A study on birthweight measurement processes and perceived value is published elsewhere in the supplement [25]. Data were collected between June 2017 and July 2018 in five public comprehensive emergency obstetric and newborn care (CEmONC) hospitals in three high burden countries: Maternal and Child Health Training Institute (MCHTI), Azimpur and Kushtia District Hospital in Bangladesh (BD); Pokhara Academy of Health Sciences in Nepal (NP); Temeke Regional Hospital and Muhimbili National Referral Hospital in Tanzania (TZ) (Additional files 1 and 2). Results are reported in accordance with STROBE Statement checklists for observational studies (Additional file 3). Study participants were consenting women recruited on admission to labour and delivery ward and their newborn babies. We use the term "newborn" in this paper to cover both live births and stillbirths (total births). Exclusion criteria at admission were imminent birth and no fetal heartbeat heard. Trained research clinical observers collected the birthweight from the weighing scale (external gold standard) as the health worker weighed the newborn. Data were time-stamped when documenting birthweight in grammes and type of weighing scale (digital or analogue). Separate groups of data extractors captured birthweight data from existing routine labour ward registers and women's responses to exit-survey after discharge. Data were captured using a custom-built android tablet-based application [26] (Additional file 5).
Implausible observed birthweights (< 350 g or > 6000 g) were excluded from all analyses. Calculations were done for each hospital then combined using a random effects meta-analysis approach. We used 95% confidence intervals to indicate uncertainty when applying our results to a different population. We calculated I 2 and τ 2 to assess heterogeneity between hospitals. Results were stratified by mode of birth (vaginal/caesarean), birth outcome (live birth/stillbirth), and type (single/multiple (twins or higher)) and association determined using chi-squared test.
Analyses were undertaken using Stata version 16 [27] and R statistical programming version 3.5.0 used for graphs [28].

Assessing biases in the data
To determine the reliability of our gold standard, we calculated Cohen's Kappa coefficient for 5% of the sample observed by both supervisors and data collectors [23]. To assess any change in routine register recording practices due to study observer presence, we compared absolute differences between completeness of register extracted study data with one-year pre-study register data collected retrospectively [29]. We also calculated Kappa coefficients for a 5% sample of double-extracted study register data.

Objective 1: Determine numerator for indicator measurement accuracy/validity
We evaluated measurement of two aspects of birthweight data: a) Birthweight coverage defined as the number of facility total births (live births and stillbirths) that were weighed, among total births, expressed as a percentage. b) LBW prevalence defined as the number of facility total births (live births and stillbirths) whose birthweights were < 2500 g, among total births weighed, expressed as a percentage.
To assess data accuracy, we compared both routine register-recorded coverage and exit interview surveyreported coverage with the gold standard, observed coverage ( Fig. 1). Population-based surveys (e.g. DHS and MICS) typically measure coverage from "yes" responses and combine "don't know" with "no" responses as "no coverage." Thus, we analysed survey-reported coverage in this way and also with "don't know" excluded to evaluate effect on accuracy. We interpreted register "not recorded" to mean the baby had not been weighed. LBW classification was calculated using available numeric birthweight data from all three sources.
To understand how coverage measurement affected low and normal birthweight categorisation, we calculated "validity ratios". Similar to verification ratios in DQR methods [30], a ratio higher than 1.0 implies overestimation of survey-reported or register-recorded coverage compared to observed, and a ratio lower than 1.0 implies an underestimate. Cut-off ranges adapted from DQR methods were used for heat-maps [30].
Individual-level validity "diagnostic test" methods were calculated using two-way tables. When column totals were ≥ 10, we calculated sensitivity, specificity, negative predictive value, positive predictive value, area under the curve and inflation factor; otherwise we present percent agreement [23,32]. Individual-level agreement was assessed using Bland-Altman plots [33].

Objective 2: Gaps in coverage and quality of birthweight measurement
We calculated gap analyses for high-quality birthweight among (A) total births as the total eligible population; (B) birthweight coverage; (C) right timeliness of measurement -weighed ≤1 h after birth; (D) right device -digital scales.
Data completeness for registers was assessed. Birthweight heaping and rounding were evaluated for observed, survey-reported and register-recorded data in two ways: First, the proportion of total birthweights that Fig. 1 Birthweight validation design, EN-BIRTH study. Adapted from EN-BIRTH protocol [23] were multiples of 500 g; second, the proportion of heaped weight values (e.g. 2500 g) relative to all weight values within the adjacent 500 g bracket (e.g. 2250-2750 g). We stratified by type of weighing device and time of birth by midwifery shift time (day/night). To demonstrate the effect of heaping on LBW prevalence in routine register documentation, we adjusted LBW prevalence by reallocating 25% of babies with an exact birthweight of 2500 g to the LBW category and compared with exit-survey findings using the same method [34].

Objective 3: Barriers and enablers to routine register recording
We evaluated barriers and enablers to recording of birthweight in routine registers as part of the wider barriers and enablers objective of the EN-BIRTH study. The structure of the routine labour ward register was correlated with completeness and accuracy of measurement [31].
We designed three tools: a) semi-structured in-depth interview (IDI) guide, b) semi-structured focus group discussion (FGD) guide, c) "care-to-documentation checklist." Experienced qualitative researchers conducted IDIs with two purposively sampled groups of respondents in each EN-BIRTH study hospital: 1) hospital midwives and doctors involved in birthweight measurement and 2) study data collectors. To triangulate results, FGDs were carried out with health workers. The sample size was determined using saturation sampling. Qualitative data were thematically analysed by categorizing pre-identified codes based on the Performance of Routine Information System Management (PRISM) conceptual framework [35] using NVivo 12 for data management. The care-to-documentation checklist was completed after the IDI and captured details regarding: which health worker cadre weighs the baby; who documents the birthweight; into which documents (patient notes, registers, partograph, etc.); the typical order of documentation; estimation of how long between weighing the baby and documentation. Data were entered into Microsoft Excel and analysed in R version 3.6.1 [28]. This paper specifically presents emerging themes regarding birthweight recording across three topics: 1) Register design 2) Register filling and 3) Register use. Detailed methods and results of all emerging themes for register recording of all EN-BIRTH selected indicators are available in an associated paper [36].

Results
Among the total 23,471 births observed, 22,617 (96.3%) newborns were weighed after birth and implausible weights were 0.01% (Additional file 4). Exit-survey interviews were completed by 88.4% of their mothers and register data were extracted for 95.3% (Fig. 2).
Background characteristics are shown in Table 1. 12.1% of mothers were adolescents < 20 years and almost half of women (48.4%) had completed secondary education. Live births were 97.3% and twins/triplets 3.9%. The proportion of babies delivered by caesarean section varied widely, from 7.2% in Temeke TZ to 73.2% in Azimpur BD. Hospital register design in Bangladesh was updated during the study as part of a national standardisationwe present revised register results in the multi-site tables and figures and report the effect of this natural experiment in Additional file 6.
Inter-rater reliability was very high for both observation and data extraction (Additional file 7). Routine register completeness comparison before and during study showed decrease in completeness by < 1.5%, except in Kushtia BD, which increased from 66.1% to 85.2% (Additional file 8).
Coverage data by observation, survey-report and register-record are shown in Fig. 3. Coverage comparisons and individual-level metrics are shown in Tables 2 and 3. Any association with delivery mode, multiple births, and stillbirth are shown in Additional files 9, 10 and 11.
Register-recorded coverage of 96.6% (93.2-98.9%) underestimated the observed coverage of 98.8%. Heterogeneity was low, τ 2 = 0.03 (Additional file 12). In Temeke TZ, coverage was overestimated by 0.1% and in the other four hospitals underestimated by 0.3-12.1%. Sensitivity was > 88% and specificity ranged from 3.5% in Muhimbili TZ to 82.0% in Kushtia BD. Register-recorded coverage was significantly higher among babies born from vaginal deliveries compared to caesarean section, as well as live births compared to stillbirths (Additional files 10 and 11).

Birthweight heaping and rounding
Observer-assessed birthweight heaping was two-fold lower by digital (15.7%) compared to analogue scales (36%). Survey-report further increased heaping (digital 25.3%, analogue 43.4%). Register-record increased heaping by only 1.4% for digital scales and 1.1% for analogue (Table 4). Heaping indices were consistently lower for digital than analogue scales across all 500 g increments (Table 5), and higher during night than day shifts (Table 4). Re-allocation of 25% of 2500 g birthweights to the LBW category increased LBW prevalence by 2.0% for register-record and 2.5% for survey-report (Additional file 15).

Objective 3: Barriers and enablers to routine recording
All study hospital labour ward registers had a specific column to record birthweight, usually recorded in kilogrammes to 1 decimal place, despite the Bangladesh revised register column heading specifying the unit in grammes (Fig. 6). IDIs were conducted with 40 nurse-midwives/doctors and 65 EN-BIRTH study data collectors and one FGD was conducted in each hospital (n = 5). Emerging themes functioning as both barriers or enablers in the five hospitals are shown in Fig. 7.

Register design
All respondents stated the labour ward register was adequately designed for birthweight measurement. Complexity of documentation systems was expressed by respondents as a barrier, since birthweight is also written in several other formal and informal documents. The order of birthweight documentation was first into the register in Bangladesh, while in Nepal and Tanzania birthweights were recorded in one to three other documents before the register (Additional file 16).

Register filling
All respondents stated recording birthweight in labour ward registers is standard practice. Birthweight is usually written down by the same nurse-midwife who weighed the newborn, but only after providing all other care around the time of birth for mother and baby. Estimated time from weighing the newborn to birthweight register documentation averaged 4-31 min, up to a maximum of 1-3 h (Additional file 17). Shortage of time was a frequently measured barrier to high quality register documentation. EN-BIRTH data collectors described seeing that when busy, health workers may record the birthweight on a separate piece of paper, or ask the mother or another colleague to remember the weight, and transfer this weight later into formal documents. The baby may be weighed again if later no one can recall the birthweight.
The enabler of additional actors only available during the day shift was mentioned.
"Most of the time documentation was done appropriately because there were students who could offer assistance during the day. But it was very difficult during night shift because the midwife should do everything by herself like getting the birthweight, resuscitation … when it comes to recording she will find that she has forgotten most of the information." -Health worker, Muhimbili TZ EN-BIRTH study clinical observers commented on the barrier that health workers did not trust the precision of the weighing scales and sometimes used their personal judgement and rounded birthweights: "If [the analogue scale] shows 4 kilo 300 grammes, they assume it [is] 4 kilo, 500 grammes." -Data collector, Azimpur BD

Register use
Health workers acknowledged the importance of birthweight data and described its use for clinical care only: "Information recording is critical and exact [numbers] should be recorded … we take special care on managing babies with low birthweight, high birthweight … [which] can require paediatrics consultation." -Health worker, Pokhara NP No respondent mentioned birthweight data for use higher up the health system. A barrier to use was expressed in the low level of trust in the birthweight data quality: "Some nurses do not record the details after they have helped a mother to deliver … if [documents are] not fully filled so people start to estimate, so this leads to non -accurate data about the weight of a child … we sometimes fill not actual data." -Health worker, Temeke TZ

Discussion
Birthweight measurement in our five CEmONC study hospitals was almost universal and routine facility registers measured coverage of weighing at birth and LBW classification more accurately than exit interview surveys. These findings align with our qualitative study in one EN-BIRTH hospital, Temeke TZ, which reported birthweight is highly valued by both health workers and mothers [25]. Routine registers' high completeness and accuracy for birthweight across all five hospitals was especially notable. Importantly, we found register records for LBW babies had both high sensitivity and specificity > 90%, which was even higher than a study from Nigeria that reported sensitivity 62% and specificity 85% [20]. Birthweight coverage for babies of any birthweight (LBW and not LBW) similarly had high overall sensitivity of 97.1%; however, specificity was very low (4-15%) in three hospitals. We postulate this might be due to the baby being weighed and register documented Don't know % = proportion of women who answered "Don't Know" when asked the weight of their child Fig. 4 Validity ratios for survey-reported and register-recorded low/normal birthweight prevalence compared to observation, EN-BIRTH study. Heat-mapped using WHO's Data Quality Review (DQR) 5%, 10%, 15% and 20% cutoffs [30] after observation had ceased (higher false positives). The exception was Kushtia BD's higher specificity of 82.0%, which may relate to the lower register completeness overall (85.2%) (higher true negatives). Register birthweight for LBW babies outperforming all birthweight babies may reflect the extra care given by health workers to the more vulnerable babiesfor example, weighing more quickly after birth and thus being captured by the EN-BIRTH observers.
Survey-reported birthweight at the point of hospital discharge soon after birth was also accurate compared to observation. Our results align with a systematic review of 40 studies that showed high agreement between surveyrecalled and register-recorded birthweights as the standard [37]. For weighing coverage, survey-report compared to observation had high sensitivity but lower specificity. Similar to registers, this could be due to mothers' correct report of baby weighing after observation stopped. Survey- report for LBW babies again outperformed their counterparts, likely for the same reasons of extra care given to LBW babies. This is in contrast to previous studies that revealed mixed but generally low accuracy for LBW prevalence, ranging from a sensitivity of 45% in a study conducted in Nepal to 71% in Kenya [16,18,19,38]. These validation studies evaluated survey report from soon after birth to household survey 22 months later.
Quality of birthweight measurement was mixed. Whilst liveborn babies had timely birthweights, we found quality gaps for other dimensions, especially the widely recognized heaping on multiples of 500 g [5,29,34]. The EN-BIRTH study design permitted exploration of cumulative heaping at different measurement capture points: the birthweight observation, exit interview and register-record. We found heaping increased slightly between observation and register-record despite the reality that usually the same health worker weighs and documents. Notably, heaping doubled when the same data were captured from women's report at exit interview. Obtaining a precise birthweight for all babies is fundamental. For instance, a baby whose true birthweight of 2480 g if rounded to 2500 g would not be correctly identified as LBW and fail to receive appropriate care. The same logic applies to identifying newborns weighing 2000 g or less, for whom kangaroo mother care is recommended.
The stillbirth birthweight gap was a striking finding in all hospitals except Pokhara NP. If gestational age is uncertain, the definition of stillbirth uses birthweight, vital for the minimum dataset for perinatal death surveillance and response to reduce preventable death [39]. As such, we suggest tracking coverage of stillbirth birthweight has potential as an indicator of respectful maternal and newborn care. More in-depth analyses regarding stillbirths in the EN-BIRTH study is reported separately [40].
Digital scale measurement gave lower heaping indices across all weights compared to analogue scales in our study. A 1980s Canadian study had postulated that digit bias was attributed to the use of analogue scales; however, a British study later found that significant rounding and truncation persisted even with digital scales [41,42]. Few published studies have explored the relationship between type of scale and LBW estimates. We found less heaping at 2500 g using digital scales, implying more babies would have been correctly classified as LBW. One previous study in India also found that the percentage of LBW babies identified by digital scales (29.5%) was higher compared to analogue scales (23%) [43].
In our study, two of five CEmONC hospitals were not, or rarely using, digital scales despite the relative low cost of these devices. This high usage of analogue scales remains a concern because heaping and rounding may be attributed to the instrument's imprecision and/or the health workers' subsequent lack of confidence in the measurement. Increasing the availability of digital scales at hospitals is important; however, some nurses stated their preference to use analogue scales because they were more familiar with these devices [44]. Thus, beyond providing digital scales, training and supportive supervision are required to improve quality of birthweight measurement. Our findings provide additional support to inform health system decisions to invest in digital scales for all facilities providing care at birth and improve accuracy of birthweight, especially LBW measurement.
High-quality care must be consistently provided during both day and night shifts. Our qualitative interview findings of lower availability of health workers under increased time pressure during night shifts lends explanation for poorer quality birthweight measurement  Table 4 Heaping index of observer-assessed, survey-reported, and register-recorded birthweights stratified by time of birth, EN-BIRTH study  Table 5 Heaping index of observer-assessed, survey-reported, and register-recorded birthweights, EN-BIRTH study    at night. We suggest that available hospital birthweight data, stratified by day/night time of birth, could be explored as a tracer indicator for measuring quality of care. Additionally, these data can be used to assess the needs for consistent staffing during all shifts, so midwives have sufficient support to complete care and documentation tasks in a timely manner. We identified opportunities to improve quality of birthweight register data. In Bangladesh, although original and revised register designs both included birthweight, register-recorded completeness improved substantially after introduction of the revised register design. The improvement was seen in both hospitals in Bangladesh; however, it was lower in Kushtia BD, illustrating that design alone is not sufficient. In Azimpur BD, health workers continued to record birthweight in kilogrammes to one decimal place, despite the revised register instructions to measure in grammes. Logistical challenges of revised register stock-outs in Kushtia BD necessitated using original registers again during data collection. Improving feedback loops between health workers and those at other levels of the health system using facility birthweight data is critical. Feedback could increase understanding of how birthweight data are used, why accurate measurement is essential and how to address the opportunities to improve quality of birthweight measurement in LMIC settings.

Strengths and limitations
A major strength of this study was the multi-site, multi-country design using direct observation as gold standard to compare to register records and survey report. The large sample size of > 23,000 facility births enabled diagnostic validation testing with stratification by normal and low birthweight and by mode of birth. Our observational gold standard was assessed by duplicate observation, and the effect of register recording completeness due to the presence of researchers was assessed by comparison with pre-study data extraction. Another strength is our inclusion of stillbirths, lending insight into an important public health issue, as often only live births are included when calculating birthweight indicators [44,45]. Although the changes in the Bangladesh registers midway were unexpected, this provided the opportunity to examine the results of a "natural experiment." However, our study also had limitations. We did not observe whether scales were calibrated prior to birthweight, which could contribute to heaping. The clinical observers read the scale at the same time as the health worker and thus could have also contributed to the observed heaping. The data collection tablet app platform collected birthweight only in grammes, while health workers recorded in registers either kilogrammes or grammes. This may Fig. 7 Barriers and enablers to routine register recording of birthweight, EN-BIRTH study. This figure illustrates the overall barriers and enablers to facility-based data collection identified by EN-BIRTH participants. The bold text are the issues specific to birthweight. The transition from red to green is a reminder that most factors identified by participants could serve as either a barrier or enabling factor depending on the facility-level resources and management have introduced information bias, affecting birthweight in terms of accuracy and reliability and a missed opportunity to compare any effect of unit of measurement on birthweight data quality. For the purposes of calculating the heaping indices, we assumed that all the birthweights of interest were heaped despite a proportion of them being truly a multiple of 500 g. We could not apply a correction for multiplicity.
Our findings of highly accurate register-recorded birthweights in CEmONC hospitals may not be generalizable to facilities at other levels of the health system. Moreover, our study intentionally focused on facility delivery; while the global facility delivery rate is > 80%, in the EN-BIRTH study countries, it is only 37% in Bangladesh, 57% in Nepal and 63% in Tanzania [15,46]. The validity of birthweight measurement in population-based studies has been addressed in a parallel study [47].

Research gaps
Globally, there remains a large gap between facility births and availability of birthweight data in routine systems in both south Asia (19.6%) and sub-Saharan Africa (48.3%) [48]. Further research regarding data flow and quality of aggregated facility birthweight data from facilities at all levels of the health system is critical.
Implementation research is also needed to explore how hospital birthweight data quality can be improved: using standardized weighing technique training to reduce heaping, utilizing calibrated digital scales and streamlining documentation. Even when stillbirths were weighed, women were not able to accurately report that weighing had happened. More research is required to better understand how information is provided to women following a stillbirth, and even if women are routinely allowed to see their stillborn baby. Since EN-BIRTH only assessed women's report at hospital exit, follow-up studies are needed to determine if exit surveyreported accuracy decays over time, considering household surveys are usually every 2-5 years. Studies could be conducted to explore if household survey estimates of LBW are improved if birthweight is recorded on health cards given to parents, which they can show at the time of the survey [49].

Conclusions
We found high individual-level validity for coverage of weighing at birth and LBW classification in both registers and surveys, with the former outperforming the latter. Our results provide evidence supporting the use of both these data sources to increase the availability of birthweight data in LMICs. Surveys will remain an important data source especially in the most vulnerable populations, where deliveries mostly occur at home. Given the increase in facility births worldwide, birthweight data recorded in registers and incorporated into routine administrative systems can provide essential information for programs and policies. Currently, registers are an underused source of information. However, registers could offer a cost-efficient way to generate more frequent coverage measurements compared to intermittent population-based surveys. Register data completeness are already high. Closing data quality gaps for birthweight heaping will require standardised processes and ensuring facilities have sufficient staffing to carry out care and documentation in a timely manner. Only then will each and every newborneven the smallest, sickest, and most marginalizedbe counted and weighed, and countries have better data to track how many survive and thrive.