First, we performed in-depth interviews with ten patients with GH, PE and/or IUGR and with ten obstetrical care professionals about the physical, psychological, and social burden and other consequences of GH, PE and IUGR. From these qualitative interviews, a list of 42 aspects emerged, which was aggregated into five attributes : 'maternal health ante partum', 'time between diagnosis and delivery', 'process of delivery', 'maternal outcome', and 'neonatal outcome'. The attribute levels were chosen according to interviewees' responses, literature review, and primary and secondary outcome measures from the HYPITAT (ISRCT08132825) [21, 22] and DIGITAT (ISRCT10363217) [20, 25] trials. Each attribute had 2 to 7 levels; all were defined to be present with certainty (i.e. no risks involved).
We converted the attributes and levels into vignettes containing both a visual and a written representation (Additional file 1). The visual part depicts a time line to visualize the course of maternal and neonatal health over time. The time lines start when GH, PE or IUGR is diagnosed and they end one year post partum. A text box over the maternal timeline depicts the process of delivery: induction of labour, onset of delivery, and mode of delivery. Colours were used to display severity of health states; explanation of the colours and obstetric/perinatal terms were given on a detailed reference sheet (Additional file 2). Details of this procedure are explained elsewhere .
The total number of usable unique vignette pairs was 37,990. Because of this large number of usable vignette pairs we applied an incomplete factorial design of 240 single vignettes for the VAS and TTO, and 120 paired vignettes for the DCE method (for details, we refer to ). We checked for the assumptions of orthogonality and level balance.
The 240 (120 paired) vignettes were distributed over six booklets. Each booklet consisted of two parts: 20 panel session vignettes (18 single VAS/TTO vignettes (9 paired DCE vignettes), plus the best and worst possible vignettes which we used for anchoring, and 26 home-assignment vignettes (22 single vignettes (11 paired vignettes), plus 4 single re-test vignettes (2 paired vignettes)). In this study we compared the outcomes between groups using just one of the six booklets. The other five were used for the larger study in which the outcomes of the total design was the objective . All participants of current study valued the same set of 46 vignettes.
Participants in the valuation study were 24 patients, 30 obstetrical care professionals and 27 laypersons. The group of patients consisted of women who had a pregnancy that was complicated either by GH, PE, or IUGR and have participated to either the DIGITAT or the HYPITAT trials [22, 26]. These women participated in the study within six months after childbirth. The group of obstetrical care professionals consisted of gynaecologists, midwives, and residents in gynaecology, none of them with specific expertise in health state valuation, but all involved in the Dutch Obstetric Consortium (for more information, see http://www.studies-obsgyn.nl). They were recruited by email invitation. The laypersons were men and women over 18 years of age who had previously participated in valuation studies . The laypersons and patients received a €50 participation fee.
Each participant valued the single vignettes with a VAS and TTO, and each paired vignette with a DCE.
The VAS is a psychometric rating method with equal-interval categories . Our VAS depicted a 100-point vertical thermometer ranging from 0: 'worst imaginable health state' (lower anchor) to 100: 'best imaginable health state' (upper anchor), the standard EuroQoL-format . Each respondent was asked to draw a horizontal line on the VAS to indicate where the combined maternal and neonatal health state vignette should be positioned, taking the top and bottom anchors into consideration.
The aim of the TTO method is to elicit the maximum amount of time in full health that respondents are willing to trade to avoid a suboptimal health state . Our TTO method involved a two-step procedure: first, the respondent had to state how much maternal time he/she was roughly willing to give in and, second, given the rough indication, how much maternal time he/she was exactly willing to trade (see Additional file 3) . We specifically asked respondents to state how much time of the mother's life in full health he/she was maximally willing to trade off in order to attain full health for both mother and infant, given their health states as presented in the vignette. Respondents could trade-off between 0 days and 10 years of the mother's life.
The aim of DCE is to derive patients' preferences for a number of different aspects ('attributes') of a health state by presenting hypothetical choices between two or more scenarios in which the levels of the attributes are systematically varied . In our study, respondents were invited to choose the best one of two alternative vignettes (forced choice) within a vignette pair. For an example of one vignette, see Additional files 1 and 2.
The study consisted of group sessions with 6 to 16 participants per group. There were two sessions with laypersons, two sessions with patients, and three sessions with professionals. The participants within each session were of the same respondent group. Each session was conducted by a trained moderator (DB, GJB, JAH, MFJ) who followed a detailed protocol adapted from the Dutch Disability Weights , MiDAS  and IBIS  protocols. Ethical consideration was not deemed necessary for this type of study.
In the group sessions, participants were invited to value the first 20 vignettes (18 single vignettes (9 paired vignettes for DCE), plus the best and worst possible vignettes) with first DCE, then VAS, and finally TTO. We explained the vignettes thoroughly, and respondents could practice on some sample vignettes in order to get used to the layout vignettes, the meanings of the used colours, and the weighting of the health states. The DCE task took about 10 minutes, the VAS task 15 minutes and the TTO task about 20 minutes. After the valuation tasks, participants filled out a questionnaire on background characteristics and their obstetric history.
Each session was followed by an individual home assignment. In the individual home assignment, the participants valued the remaining 26 vignettes: 22 single vignettes (11 paired vignettes for DCE), 4 single retest vignettes for VAS and TTO and 5 paired retest vignettes for DCE. They valued the vignettes in the same order as in the group session (first DCE, then VAS, and finally TTO). Finally, they completed a questionnaire on the user-feasibility of the written and visual components of the vignettes (response mode: 'comprehensible', 'neutral', 'incomprehensible'); the reference handout; comprehensibility of the five individual attributes; difficulty of each valuation method (response mode: 'easy', 'not easy but not difficult', 'difficult'); and the self-reported amount time needed to complete the home assignment. A telephone number was provided which the participants could dial if they needed assistance with the tasks or the questionnaires.
We measured feasibility, reliability, and comparability of each combination of group and valuation method.
Regarding feasibility, differences between groups of the time needed to complete the home assignment was calculated using one-way analysis of variance (ANOVA) followed by Tukey's post hoc test. Linear regression analysis was used to determine the impact of sex, age, educational level, and respondent group on the time needed to complete the home assignment. Feasibility ratings between groups were compared with the χ2 test or Fisher's Exact Test.
Reliability was investigated by generalizability theory (G-theory) with restricted maximum likelihood estimation (REML) was used to determine the variation explained by respondent group in the VAS and TTO valuations. The test-retest reliabilities of the TTO and VAS were analyzed per group using intra-class correlation coefficient (ICC; two-way random effects, single measures, absolute agreement; 95% CI). The DCE test-retest reliability was assessed using Cohen's unweighted kappa (κ). Within-group consistencies of the VAS and TTO were calculated per group using ICC (two-way random effects, single measures, absolute agreement; 95% CI) to measure the rate of consensus within each group.
To measure comparability, crude VAS (vas) and TTO (tto) scores were conventionally transformed into a 0-1 score as follows, where 1 represents the optimum :
VAS = (vas/100)
TTO = 1-((tto/3650)^(1.61))
DCE = Σ βXi
The DCE score for a health state was indirectly derived by adding up all attribute level coefficients of the health state (βXi is the coefficient of attribute X, level i). The mean transformed (VAS and TTO) and the indirectly derived (DCE) vignette scores were calculated for all 46 presented vignettes. The correlation between each two valuation methods per respondent group was plotted to visualize group clustering in valuations and to expose valuation tendencies per group-method combination.
The relative attribute weights (coefficients) were calculated per group for the VAS and TTO using linear regression of the transformed VAS and TTO scores, and by application of multinomial logit (conjoint analysis) on the DCE scores. ICCs were interpreted according to the guidelines of Landis and Koch . The estimated relative attribute weights were compared between groups within methods, and between methods within groups, with Kendall's Tau-b correlation coefficient.
Analyses were conducted using SPSS 15.0 for Windows (SPSS Inc). Multinomial logit was performed using SAS 9.1.2 (SAS Institute Inc). A p-value < 0.05 (two sided) was considered to indicate statistical significance.