Subjects
We conducted a hospital-based case–control study on mothers whose fetuses or neonates were between the 28th week of gestation and the 7th day after birth (including live births, fetal deaths, and stillbirths) and were diagnosed with non-syndromic cleft lip with or without cleft palate (NSCL/P) between July 2012 and June 2013 in 52 birth defects’ surveillance hospitals in Hunan Province, China. Mothers who delivered normal infants at the same hospitals as the cases were randomly selected as the controls. Additionally, the interval of the birth dates between the normal infants and the patients with NSCL/P was no more than 1 month. Those mothers were aged 20–45 years. The diagnosis of NSCL/P was performed by the clinical geneticists of those birth defects surveillance hospitals. Infants with chromosomal anomalies and other birth defects of known aetiology were excluded from the survey. Infants with cleft palate only were also excluded from the study. Those who could not cooperate with the survey were excluded from the study.
In this hospital-based study, the control-to-case ratio was 2:1, due to the relatively small number of cases and a large number of potential controls to be selected from the birth defects’ surveillance hospitals. In case of few cases, using the control-to-case ratio of 2:1 could ensure the necessary statistical power to identify important predictors.
Data collection
The survey was conducted by obstetricians and gynecologists who were also trained investigators using the unified questionnaire with the participants in person by face to face interview. The unified questionnaire was designed by the experts on our research team, and was modified based on the pilot study. The contents of questionnaire were classified 5 categories and 28 variables, including sociodemographic characteristics of the mothers, economic status of their families, family histories, conditions of the mothers from 6 months before conception through the first trimester of pregnancy and characteristics and conditions of the fathers.
Measurements of variables
Sociodemographic characteristics and family income Maternal age was classified into four scales (years): 20–24, 25–29, 30–34, ≥35. Maternal education level was classified into three categories: primary school and below, middle school, college and above. Maternal occupations included farmers, migrant workers, employers/managers, workers, staffs in administrative institutions, and housewives or else. Family income was classified into four scales (yuan/year/person): ≤5000, 5001–10,000, 10,001–15,000, >15,000.
Family histories Family histories of NSCL/P were defined as one or more first relatives of one person suffering from NSCL/P. In this study, family histories of NSCL/P were included the family histories of mother and father. Abnormal reproductive histories referred to the histories of stillbirth, spontaneous abortion, or birth defect.
Conditions of the mothers In this study, most variables were dichotomies, collected from the questionnaire using the questions with answers yes or no, including occupational hazards exposure, premarital medical examination, chronic disease, upper respiratory tract infection, reproductive system infection, complications of pregnancy, contraceptive intake, folic acid intake, housing renovation and strong tea drinking. The exposure time of maternal variables was defined as from 6 months before conception through the first trimester of pregnancy. Occupational hazards exposure was defined as having been exposed to those toxic and hazardous substances in their workplace, including organic solvents (benzene, toluene, n-hexane, methyl alcohol, glycol ether), noxious gases (hydrogen sulfide, ammonia, formaldehyde, sulfur dioxide, ozone), heavy metals (Pb, Hg, Cd, Cr, As), X-ray, noise, etc. Premarital medical examination was used for couples to get married, in order to prevent diseases that might affect the health of offsprings and promote reproductive health, including the testing of serious hereditary diseases, infectious diseases, and psychiatric disorders. Chronic disease was defined as mothers or fathers had suffered from chronic diseases in 6 months before conception, such as heart disease, kidney disease, liver disease, hypertension, diabetes, anemia, etc. Housing renovation was defined as the house lived by mother had been renovated not more than 6 months. Strong tea drinking was defined as more than 200 ml per day on average. Pickled/smoked food intake, vegetable and fruit intake, fish/shrimp/meat/egg intake, and milk/soymilk intake were classified into three scales (times/week): ≤ 2, 3–5, >5, and the exposure time was defined as the first trimester of pregnancy. Smoking referred to active smoking in the study, and the exposure levels were classified into five scales (cigarettes/day): 0, 1–10, 11–20, 21–40, >40. Alcohol drinking was defined as drinking any liquor, including beer, wine and white spirit in the first trimester of pregnancy, the exposure levels were classified into three scales (times/week): 0, 1–2, ≥3.
Characteristics and conditions of the fathers In the present study, there were six variables related to the fathers, including age, occupational hazard exposure, chronic disease, smoking, alcohol drinking, and strong tea drinking. The definitions of paternal variables were the same as the maternal variables, and the exposure time was defined as 6 months before their wives’ conceptions.
Quality control
We modified the questionnaire based on the pilot study. Before the formal survey, unified and strict training was provided to all of the investigators. The subjects were strictly selected according to the inclusion criteria and the diagnosis criteria. Five percent of all of the completed questionnaires were reviewed randomly, and the questionnaires with missing data >10 % and/or errors in logic >10 % were excluded from the study. To ensure the quality of the data entry, dual input was used, and logic checks were performed on the input data.
Statistical analysis
A large number of variables (28 variables) were investigated in this study. We used univariate logistic regression to identify the NSCL/P-associated significant risk factors and then used Fisher discriminant analysis to establish a simple and useful prediction model based on the significant predictors. Univariate analysis could not control the confounding effect of other variables, or avoid the collinearity of some variables. Thus, in the Fisher discriminant analysis, we used a stepwise method to determine the final prediction, which could control the confounding effect and overcome the collinearity between variables.
Fisher discriminant was to find a linear combination for categorical groups, as the discriminant scores (Z) were calculated to maximize the between-group variance and minimize the within-group variation. The linear combination was known as a Fisher discriminant function as follows:
$$ Z={C}_1{X}_1+{C}_2{X}_2+{C}_3{X}_3+\cdots +{C}_m{X}_m $$
where Z: discriminant scores between two groups; X
1, X
2, X
3, ⋯, X
m
: discriminant variables; C
1, C
2, C
3, ⋯, C
m
: discriminant coefficients for each discriminant variable. The discriminant variables could be selected via two methods: ‘enter variables together’ and ‘enter variables stepwise’. The stepwise method selected the discriminant variables on basis of Wilks’ lambda statistic, and in general, the F value was set at F Entry = 3.84 and F Removal = 2.71. The discriminant function established by stepwise discriminant was simpler and more effective. Assuming that the mean discriminant score of the controls was \( {\overline{\mathrm{Z}}}_{\mathrm{A}} \), \( {\overline{\mathrm{Z}}}_{\mathrm{B}} \) for the cases and \( \overline{Z} \) for the total, then \( \overline{Z}=\frac{{\overline{\mathrm{Z}}}_{\mathrm{A}}+{\overline{\mathrm{Z}}}_B}{2} \). According to the discriminant function, we calculate the discriminant score of Z
i
for each subject; if Z
i
>\( \overline{Z} \), the subject is considered highly likely to be a case, and if Z
i
≤\( \overline{Z} \), the subject is regarded as a control.
Using Epidata 3.1 software (Jens M. Lauritsen, Michael Bruus and Mark Myatt, Odense, Denmark), we constructed a database and then entered the data. The data that were obtained were analyzed using SPSS 18.0 software (IBM, Chicago, IL, USA). The results were considered to be significant at P <0.05.