A questionnaire to measure women’s experiences with pregnancy, birth and postnatal care: instrument development and assessment following a national survey in Norway

Background The Norwegian authorities monitor the quality of public health-care services, including from the patients’ perspective. The aim of this paper is to describe the development and psychometric properties of a pregnancy- and maternity-care patients’ experiences questionnaire (PreMaPEQ). Methods The PreMaPEQ and data collection procedures were developed based on a literature review, reference group activities, user interviews, cognitive interviews and a pilot test. The PreMaPEQ was then used in a national survey that included a retest distribution. The participants were identified from the hospital records where the birth took place. The invitation to take part was sent by mail and the questionnaire was distributed in electronic (i.e. via the Internet) and (subsequently) paper forms. The completed questionnaires were assessed using descriptive statistics, explorative factor analyses, psychometric measures and confirmatory factor analysis (CFA). Results The PreMaPEQ response rate for the national sample was 56.6 % (N = 4904), and retest data were provided by 123 women. Statistics and theoretical considerations were used to construct 16 scales, covering the following 4 phases of the care: pregnancy control (4 scales), the delivery (3 scales), the postnatal hospital stay (5 scales) and the services in the public health clinic (4 scales). All scales had a Cronbach’s α of >0.7, and all but three scales had an intraclass correlation coefficient for test-retest stability of >0.700. CFA revealed a satisfactory fit between the questionnaire data and the model, with a four-factor solution of the care experiences with pregnancy, birth and postnatal care. CFA provided support for the suggested structures, and demonstrated that the first-order factors are indicators of a second-order factor. Conclusion The PreMaPEQ appears to be an acceptable, valid and reliable tool for collecting women’s experiences of the whole course of maternity care in health systems that have features in common with the Norwegian health system. Electronic supplementary material The online version of this article (doi:10.1186/s12884-015-0611-3) contains supplementary material, which is available to authorized users.


Background
Collection of patient-reported outcomes, including patient experiences, is an important aspect of evaluations of health services. According to an international review, several countries have programs for monitoring the quality of health care using surveys that inquire into the experiences of patients and other health-care users. These surveys call for descriptions of mainly non-technical aspects of the health-care services and may involve different target populations, such as the general population, broad groups of service users, or patients with specific conditions [1]. The users of the results also vary among the different national programs between health authorities, health-care managers at different levels, health insurers, providers, potential service users, and researchers. Depending on the survey design, the results can be used to monitor health-system performance and/or inform quality improvement efforts at the level of service delivery.
In Norway, the responsibility to conduct surveys of those who use health services is assigned to the Norwegian Knowledge Centre for the Health Services (NOKC), which is a public organization that operates under the Norwegian Directorate of Health. NOKC has developed a variety of data collection tools and surveyed a range of target groups. The explicit purpose of these surveys is fourfold: social legitimacy and control, business control, professional quality improvement, and to inform choices made by patients. In 2009, the Ministry of Health and Care Services issued a white paper entitled "A happy event. About a comprehensive pregnancy, birth and postnatal care" [2], in which the Ministry commissioned a national user survey of women who had recently given birth and their partners. The whole course of the health-care event (i.e., from pregnancy to postnatal care) was to be included, with special attention paid to immigrant women.
The purpose of this paper was to describe the development and the psychometric properties of the pregnancyand maternity-care patients' experiences questionnaire (PreMaPEQ).

Instrument development
Development of the PreMaPEQ followed an established procedure used previously by NOKC for developing data-collection tools and routines for conducting surveys in new target groups. The initial literature search in 2009 did not identify any instruments that met the specifications of this study [3], but did provide some information about relevant topics. Hence, a development process was commenced [4], the first step of which was to set up a reference group including service users, authorities and clinicians. The purpose of this group was to collect comments and views during the development process from various stakeholders representing important expertise. The group met three times and discussed questionnaire contents and inclusion criteria. The second step was to explore what is important for people in this situation by interviewing women and their partners with recent experiences with the services. The interviews were semi-structured, individual, and face to face, and the answers underwent conventional content analysis [5]. The interviewees varied with regard to age, parity and ethnicity, and the information they provided was highly consistent with findings from the literature search. The third step was to construct the questionnaire itself. Four sections were constructed, each with a specific colour to reflect the different phases of the healthcare course; that is, pregnancy control with green headings, birth with red headings, postnatal hospital stay with orange headings, and finally follow-up in the community health clinic (helsestasjon in Norwegian) with violet headings. The items (i.e. questions) in the questionnaire asked whether desirable properties or behaviours were present, with the intention of collecting a description that was concrete and factual rather than being judgmental [6]. Among the many possible response formats [7] most of the questions were answered on the following five-point ordered response scale: 1 = not at all, 2 = to a small extent, 3 = to some extent, 4 = to a large extent and 5 = to a very large extent. The alternative answer of "not applicable" (NA) was also allowed where it was important to discriminate between user missing and answers that were skipped because the respondent had not used the service in question. This five-point scale performed better that a ten-point scale in a previous study, and was considered more suitable for assessing patient experiences [8]. Consequently, the former has been chosen to be consistently applied in NOKC's surveys, making it possible to compare over time and, to some extent between different groups of health care users. In the fourth step, the questionnaire was administered to 18 women, who were then interviewed to evaluate whether its structure, questions and response formats were performing as intended [9]. The fifth step was a pilot test that was conducted in a university hospital, in which all women aged 16 years old or older who gave birth over a 2-month period were included (births in which death occurred were excluded). The pilot was conducted as a small version of the large survey to come, to provide an opportunity to detect flaws and weaknesses in the study in time to correct them [10]. This pilot test also provided the opportunity to test and compare the efficiency and cost of two different data-collection routines, for which reason the sample was randomly divided into two groups. The sixth step involved revising the questionnaire in accordance with the findings of the pilot test, and then evaluating it by administering it to 13 women followed by interviewing them to evaluate that final version [11]. Technical revisions were made to capture the various combinations of services the women had used, to cover the diversity in a best possible way. The last question asked whether the respondent would be willing to answer a new questionnaire after a short interval so that the stability of the results could be tested (i.e. test-retest).
The printed version of the questionnaire consisted of 145 items on 16 pages. The contents were generic, so that unusual but still relatively frequent events were left out, such as unplanned births at home or during transport. The women may have used many different combinations of services, both public and private, and not all variations could be captured by the questionnaire.
The final questionnaire was translated into English for facilitating responses from immigrants. The translations were carried out according to recommended procedures [12]. We had no knowledge of the linguistic abilities of the potential respondents, so the invitation was to all otherwise eligible women. See Additional file 1 for the printed version in English.
The same procedure was used to construct a questionnaire targeting the women's partners; however, that questionnaire was not assessed in this paper.

Setting
The focus of this nationwide survey was the entire care course from the first visit in prenatal services, through birth and postnatal care in birth institutions, and finally to the follow-up in community health clinics. The financing and delivery of prenatal monitoring and postnatal follow-up in health clinics is the responsibility of the municipality of residence of the individual women, while the birth institution-be it in a large hospital or a local maternity clinic-is the responsibility of the state, via the hospital trusts.

Sampling
Women who gave birth in the last quarter of 2011 in a Norwegian institution and who were 16 years old or older were included. Experiences from previous surveys led to the requirement of samples with 400 potential respondents from each hospital. The women to be included were drawn randomly from institutions with a large number of births (more than 400 during the inclusion period), and women were included consecutively from institutions with less than 400 births. The Medical Birth Registry conducted the sampling routine. Before any list of names and addresses were transferred to the Knowledge Centre, information from the National Population Registry Office was collected and any birth in which either the woman or child had died was excluded.

Data collection
Potential respondents to the questionnaire were first contacted by mail about 17 weeks after the birth: they were sent a letter with information about the survey and an invitation to participate via the Internet, including a specific username and password. Two reminders were sent to non-respondents, both of which included a printed version of the questionnaire in addition to a username and password.
The names and addresses were deleted when all of the mailings were completed, and the questionnaire data were supplemented by clinical information from the Medical Birth Registry. Statistics Norway provided data regarding the country of origin for the women included in the survey and their parents.
Informed consent was considered expressed when the women had received the mailed information and submitted their response. The Regional Committee for Medical and Health Research Ethics (REK sør-øst D) approved the study.

Data analyses
SPSS software (version 15.0, SPSS, Chicago, IL, USA) was used to analyse the sample and variables, and for exploratory factor analysis (EFA).
Questionnaire items regarding experiences were entered into principal axis factoring analyses according to phase; that is, items pertaining to pregnancy control, birth, postnatal hospital stay and the public health clinic. EFA results were interpreted as supportive of a factor if loadings exceeded 0.40 and there were no cross-loadings. Some correlations between the factors were assumed, and the oblique rotation method of analysis was chosen [13].
In cases where the resulting factors after EFA included diverse phenomena, the factors were split and the items grouped according to the structure and process categories in Donabedian's framework [14] or to ensure that the survey results reflected recognizable elements of care delivery. For example splitting structure (resources and organization) and process (personal relationships) or splitting information about women's health and information about the child. We believe that this will render the survey results more useful for local improvement purposes. Items that were not included in a factor, due to   poor factor loadings or high missing rate, were reported as single items. The final outline of the questionnaire is shown in Table 1.
Most single items were scored on a scale of 1-5, and the index scores were transformed linearly to a scale of 0-100.
Impression of potential differences between respondents and non-respondents were obtained by comparing those who responded to the first contact with those who responded after two reminders. In wave analyses such as this, the latter-which are the most difficult to obtain-are considered proxies for non-respondents [15].
The LISREL analysis program (version 8.8) was used to test the goodness of fit of the models [16] by confirmatory factor analysis (CFA), and was applied to further test the relationships between the manifest variables and their underlying latent constructs. The 16 scales were constructed on the basis of a combination of the theoretical structure and process categories in Donabedian's framework, our EFA results, and to some extent the specific content of the items. Therefore the scales were not only data-driven, but also founded on a theoretical understanding underpinning the analyses. Accordingly, the CFA did not test a model based solely on the correlation of the test items but whether the measures are consistent with our understanding of the nature of the construct. Consequently, the objective was to test whether the data fitted our hypothesized model based on theory and analytic research.
It was hypothesized that there was a second-order factor structure for the instrument, with experiences with pregnancy control, the birth, the postnatal hospital stay, and the public health clinic as the lower-order factors, and care experiences in pregnancy-, birth-and postnatal care as the higher-order factors. In CFA there are two types of latent variables; endogenous and exogenous. Exogenous variables always act as independent variables, endogenous on the other hand, are variables that are influenced by other variables in the model.
Various fit indexes were used, including the root-meansquare error of approximation (RMSEA), goodness-of-fit index (GFI), comparative fit index (CFI) and incremental fit index (IFI). An RMSEA of ≤0.05 and GFI and CFI values of ≥0.90 are generally taken to indicate a good fit. IFI values range from 0 to 1, with larger values indicating a better goodness of fit.

Results
Among the 8670 eligible women in the sample, 4904 returned completed questionnaires, giving a 56.6 % response rate. Every fifth woman in the total sample responded after the first contact, and responding via the Internet was their only option (see Table 2). Among those who received one or two reminders (i.e. those who could choose to fill in a paper questionnaire instead of answering via the Internet), the majority opted to answer on paper (23 %) rather than via the Internet (12 %). The median completion time was 20 min for the women responding via Internet (inter quartile range 15-31).
As listed in Table 3, the respondent and non-respondent groups differed with regard to age, parity, and ethnicity.
EFA yielded six factors describing experiences with pregnancy care. One factor pertaining to cooperation, for example between the community midwife and the hospital, was disqualified by a large number of "Don't know" responses. Another factor regarding incorrect treatment and conflicting information was disqualified because it failed the internal consistency tests [7]. Hence, the pregnancy-care phase was described by four scales in the final instrument (Table 4).   Twelve items produced one factor covering diverse aspects of birth care, and we chose to split that factor. Furthermore, we decided that two items about how the partner was taken care of should form a single index. Ten items were regrouped based on Donabedian's framework to cover interpersonal relationships (three items), and structure in the form of organization, material and human resources (seven items). One factor containing three items pertaining to incorrect treatment and conflicting information during the birth was left out because it failed internal consistency tests.
The fourth column in Table 4 lists the prevalence of omitted or NA responses. The prevalence rates of such responses were highest for check-ups by a general practitioner and by a midwife, at just above 20 and 14 %, respectively. This reflects that not all of the women had their check-ups completed by both health-care personnel. The prevalence of missing or NA responses with regard to experiences in the delivery room was 10-11 %. We believe that this was attributable to unclear wording of a filter question. Incorrect answers to this filter question led to the items comprising this index being withheld from respondents who used the Internet option to complete the questionnaire. For the remaining items, the proportion of respondents who gave the NA response was highest for questions that would be largely irrelevant for all respondents other than first-time mothers.
The mean proportion of omitted answers was 2 % (range = 0.8-3.8 %) among the items included in the indexes [17]. There was no tendency for this proportion to increase from the start to the end of the questionnaire.
Comparison between those who responded to the first contact and those who responded to the second reminder revealed statistically significant differences on 7 of the 16 indexes; the scores for the responders to the first contact were the most positive on all but one index, with a mean difference of 1.5 points (range = 1.3-3.7).
The internal consistency of the constructed scales, as measured by Cronbach's α, varied from 0.727 to 0.926 (median = 0.870). The test-retest stability of the scales, as measured by the intraclass correlation coefficient, varied from 0.662 to 0.890 (median = 0.795) between the scales.
The measurement model for the endogenous latent variables stipulates the relationship between the endogenous latent variables (η) and the corresponding manifest variables (γ). In this model the manifest variables included latent variables. The questionnaire was comprehensive, but it was not possible to include all of the observed variables or items. The measurement of experiences with pregnancy control, the delivery, the postnatal hospital stay, and the public health clinic comprised four, three, five and four manifest variables, respectively. The structural part of the model stipulates the relationship among the endogenous latent variables (η) and the exogenous latent variable (ζ).
The four-factor solution of the care experiences for pregnancy, birth and postnatal services in the PreMaPEQ was tested, and revealed that there was a satisfactory model fit to the data (χ 2 = 1125.98, p < 0.001, degrees of freedom = 95, RMSEA = 0.074, GFI = 0.93, CFI = 0.90 and IFI = 0.97). The results are shown in Fig. 1. The exogenous latent variable was labelled pregnancy and maternity care, and the four endogenous latent variables were labelled experiences with pregnancy control, the birth, the postnatal hospital stay, and the public health clinic, introducing a second-order analysis. Experiences with pregnancy control (γ = 0.77) and postnatal care (γ = 0.77) had the strongest relationships with the exogenous latent variable, but experiences with the delivery (γ = 0.73) and the public health clinic (γ = 0.47) were also strongly associated with the exogenous latent variable.

Discussion
This study developed and assessed the properties of a tool, the PreMaPEQ, for measuring user experiences with health care through the phases of pregnancy and childbirth. The instrument development procedure was designed to ensure good content validity, and the assessment indicated that the questionnaire has good reliability. The PreMaPEQ can be used either as a whole or in parts that are adapted to the service in question. The English version is ready to use, since several measures were taken in the translation process to ensure consistency with the Norwegian questionnaire. There are some limitations to this study. We would have preferred a higher response rate than 56.6 %. However, this compares well with the rate we have achieved in recent surveys among somatic patients in Norway  Table 4) [18]. The low percentage of omitted answers suggests good acceptability, and the lack of increase in omitted answers indicates that the length of the questionnaire does not tire the respondents. Few respondents used the NA response option when this was presented, which indicates that the topics of the questionnaire are relevant to a large majority of the population.
The procedures for assessing aspects of reliability produced coefficients that for the majority of the scales were above the recommended 0.700 limit [7].
It can be argued that a questionnaire with 145 questions is too long, producing a response burden that is too great. The length was a consequence of including all of the phases applicable to this specific field of care. It is a political ambition to produce services of the same quality for all Norwegian citizens. The use of a centralized national survey that is uniform across the different phases of care and geographical regions is more likely to yield data that are comparable.
The purpose of the literature review previous to the national survey was to identify and describe relevant national surveys and validated instruments with a primary focus on user experiences and satisfaction with different parts of maternity care [3]. The review showed that there are variations in approach and methods for both national surveys and validated instruments regarding how long after the birth women are asked to complete the questionnaires, from ten days to 14 months [19][20][21]. The women received the questionnaire about 17 weeks after the birth. This relatively long period was necessary since we also wanted their information about the postnatal contacts they had with the public health clinic. Although this long period may have caused some recall bias, studies have indicated that information about major life events, such as pregnancy and childbirth, are more easily retrieved compared to information about fluctuating phenomena [22].
We considered it important to include experiences from the public health clinics in the national survey, but acknowledge that precision of reported data about past experiences will always be threatened by the limitations of the respondents memory and the influence of exposure status on the recalling process. As pointed out by Bat-Erdene and colleagues, maternally reported data about the events occurring during labour and delivery are widely used, but the validity of this data is rarely confirmed [23]. However, their study showed that maternal recall at four months post-partum of important events that occurred during labour and delivery is excellent.
The results of the EFAs and tests of internal consistency provided empirical support for the multiitem scales, and confirmed that experiences with the care received during pregnancy, birth and postnatal care are multidimensional concepts. The confirmatory factor analysis provided support for the structures suggested by the EFAs, and demonstrated that the first-order factors are indicators of a second-order factor.