We used a few stages to develop and test the Farsi version of the tool. The first stage was to translate the tool and pilot for face validity, content validity and reliability of the tool. The next stage involved the evaluation of the psychometric properties with a large sample size for construct validity.
Translation procedure
Fourteen items common to CEQ from earlier translation work by Professor Abbaspoor and colleagues (Ahvaz University of Medical Sciences, Iran) were used in CEQ 2.0. The remaining 9 items in CEQ 2.0 were translated from English into Farsi by two female professional translators, native in Farsi and very skilled in English, in two separate translations. These translations were reviewed by the research team, compared with each other, contradictions were corrected, and a Farsi version was created by integrating both translations. Then, the Farsi version was back-translated into English by two translators, native in Farsi and very skilled in English. The back-translators were not familiar with the CEQ questionnaire. The back-translation was very close to the original English CEQ. The translated Farsi version was reviewed by two experts (one expert in translation of questionnaire and one familiar with the concepts) (Additional file 2). The Farsi version was evaluated by four women about simplicity and clearness. All four women found the items of CEQ 2.0 simple and easy to understand.
Face validity
Face validity was assessed qualitatively based on the opinions of 10 experts in the fields of Midwifery, Reproductive Health, Obstetrics and Gynecology, Clinical Psychology, Nursing and Tool Development, who were asked to comment on the simplicity, transparency and relevance of the translated items. The items were then corrected in terms of use of appropriate and transparent vocabulary, grammar, and importance of items based on their context in Iran. In a pilot test, 20 women answered the CEQ 2.0 in the postpartum period and were asked to comment on its simplicity in terms of understanding, relevancy, and ambiguity of the items. According to their opinions, no further changes were necessary. Face validity was also quantitatively measured using the item impact method based on the women’s opinions. To this end, the items were scored based on a 4-item Likert scale anchored by 4 (very important) to 1 (not important at all). Then, the impact score was obtained using the following formula (Impact Score = Frequency (%) × Importance). Frequency reflects the number of respondents who scored the items a 4, and importance reflects the mean score. An impact score higher than 1.5 was considered valid [30].
Content validity
The content validity was obtained based on expert opinions, Content Validity Ratio (CVR) and Content Validity Index (CVI) values. A checklist with two parts was designed for each expert. The first and second parts of the checklist were designed for calculation of CVI and CVR, respectively. The first part of the checklist assessed clarity, simplicity, and relevance of items based on a 4-point Likert scale. The second part assessed the necessity of each item based on a 4-point Likert scale from not useful to necessary. A CVR higher than 0.62 and CVI higher than 0.79 were considered valid [31].
Reliability
Reliability was determined using the internal consistency test and test-retest reliability. The internal consistency was calculated using the Cronbach’s Coefficient alpha. A Cronbach’s alpha higher than 0.7 was considered reliable [31]. The test-retest reliability was calculated through test-retest of 20 eligible women with a two-week interval and the calculation of Intra Correlation Coefficient (ICC). An ICC between 0.6 and 0.8, and higher was regarded as good and excellent, respectively [30].
Study participants
This study enrolled primiparous women, aged at least 18-years-old, with cephalic presentation at the gestational age of 38–42 weeks undergoing a vaginal childbirth. Women with obstetric problems, such as placenta previa or placental abruption, elective or unplanned caesarean section, mental disability, deaf-mute, history of depression during pregnancy or postpartum depression, maternal report of using antidepressants, and major congenital anomalies, were excluded.
Ethical consideration
The study protocol was confirmed by the Ethics Committee of Tabriz University of Medical Sciences (code: IR.TBZMED.REC.1396.786). All participants signed the informed written consent form. For illiterate participants, their fingerprints were taken after oral presentation of information.
Recruitment and data collection
First, 44 urban health centres and 10 rural health centres were selected among the total urban (87 centres) and suburban (15 centres) health centres in Tabriz. Then, women who had a vaginal childbirth at least 4 weeks and maximum 16 weeks prior were identified as eligible from each health centre. Next, a list of mothers in each health centre was prepared based on their electronic medical records. The required sample size for each centre was determined using the proportional to size method and the participants were randomly selected. The researcher contacted the selected mothers and invited them to participate after explaining the research objectives and confidentiality of their information. In a 15–20-min meeting with each participant, the socio-demographic and CEQ questionnaires were completed by the researcher. The obstetrics information was extracted from the participants’ medical records after obtaining their permission.
Sample size
For purification of the assessment tool in factor analysis, Nunnally & Bernstien (1994) recommended a minimum sample size of 10 per item [32]. As a result, the initial sample size was estimated to be 250; however, due to the use of cluster sampling and application of design effect of 2, the sample size was increased to 500.
Statistical analyses
Data were analysed using SPSS Statistics for Windows version 25.0 (IBM Inc., Armonk, NY, USA) and STATA software [ver.15] (StataCorp, College Station, Texas 77,845 USA). Construct validity was assessed by a) exploratory factor analysis; b) confirmatory factor analysis; and c) discriminant validity which was evaluated by the known-groups method.
Exploratory factor analysis
Scale-based EFA was performed for each scale separately. The exploratory factor analysis was assessed by the Kaiser-Meyer-Olkin (KMO) and Bartlett’s test of sphericity for each separate scale. Values higher than 0.7, along with significance of test confirms the adequacy of the exploratory factor analysis [33]. Moreover, the Eigen value and Scree Plot were used to determine how many factors should be retained for the tool. The second stage of the scale-based exploratory factor analysis, including factor rotation, was mathematically calculated. The goal of this stage was to make the factor constructs simple and interpretable. One way to achieve a simple structure in the scale-based exploratory factor analysis is using the Principal Axis Factoring (PAF) for extracting factor and oblimin rotation (with delta value of zero and Kaiser normalization). The correlated items were summarized into new variables, called factor. After the extraction of factors, each of them was named based on the variables (items) of each factor. If the Principal Axis Factoring of a factor is lower than 0.3, it is poorly correlated with the extracted set of factors and may be removed [34].
Confirmatory factor analysis
To assess the structure of factors obtained from the exploratory factor analysis, the model was fitted using the confirmatory factor analysis. The factor analysis investigates the confirmation of the exploratory model theoretically and the relationship between factors. The fitness of indices was used to evaluate the model fitness. To confirm the model by these indices, Root Mean Square Error of Approximation (RMSEA) was considered lower than 0.08, Standardized Root Mean Square Error of Approximation (SRMSEA) < 0.08, Comparative Fit Index (CFI) ≥ 0.90, Tucker- Lewis Index (TLI) ≥ 0.95, Normed chi-square (x 2/ df) < 5.0 [34, 35].
Discriminant validity
The discriminant validity was assessed using the known-group method and the independent Mann-Whitney U-test to investigate the intergroup difference in overall scores of childbirth experience and its subdomains by labour duration [20, 36], oxytocin augmentation [37], and the sense of control over childbirth [38]. Sense of control over birth was measured by a question “Did you feel you had control on your labour and childbirth?” with the response options of Yes (1) or No (0). According to some studies into the childbirth experience, it is expected that women with shorter labour, without oxytocin augmentation, and those who reported sense of control over childbirth have a better childbirth experience. The effect size was determined based on the Cohen’s definition (the mean difference between the two groups, and then dividing the result by the pooled standard deviation) [39]. The values between 0.2 and 0.5, between 0.5 and 0.8, and higher than 0.8 were considered low, moderate, and high, respectively [40].