Intraclass correlation coefficients in the Brazilian network for surveillance of severe maternal morbidity study

Background The purpose of the study was to evaluate intraclass correlation coefficients (ICC) of variables concerning personal characteristics, structure, outcome and process in the Brazilian Network for Surveillance of Severe Maternal Morbidity study conducted to identify severe maternal morbidity/near miss cases using the World Health Organization criteria. Method It was a cross-sectional, multicenter study involving 27 hospitals providing care for pregnant women in Brazil. Cluster size and the mean size of the primary sampling unit were described. Estimated prevalence rates, ICC, their respective 95% confidence intervals, the design effect and the mean cluster size were presented for each variable. Results Overall, 9,555 cases of severe maternal morbidity (woman admitted with potentially life-threatening conditions, near miss events or death) were included in the study. ICC ranged from < 0.001 to 0.508, with a median of 0.035. ICC was < 0.1 for approximately 75% of the variables. For process-related variables, median ICC was 0.09, with 0.021 for those related to outcome. These findings confirm data from previous studies. Homogeneity may be considered minor, thus increasing reliability of these findings. Conclusions These results may be used to design new cluster trials in maternal and perinatal health and to help calculate sample sizes.


Background
Cluster studies are widely used in epidemiological research to evaluate health interventions and implement public policies. In these cases, selection units or randomization units consist of population groups (specific geographical areas) or healthcare units (hospitals) or healthcare sectors rather than individuals [1,2].
In single-stage cluster sampling, all subjects belonging to each group are included to obtain data of interest.
Unlike simple random sampling (SRS) in which each individual has an equal likelihood of being selected within the general population, data obtained from clusters may not be sufficiently representative to allow for generalization. This is due to a greater degree of homogeneity in characteristics of the population under observation, as opposed to heterogeneity found in the general population [2].
Reliability of estimates obtained from studies using cluster sampling may be analyzed by measuring interand intra-cluster variance. One way to perform these measurements is to calculate δ, the intraclass correlation coefficient (ICC), for variables evaluated in the study [3]. Another way to perform measurements is to calculate the design effect (DEFF).
ICC is a coefficient that measures the degree of homogeneity among elements within clusters. The ICC of a variable indicates to what extent variance of a parameter can be explained by variation between clusters [4,5]. Its value depends on the type of variable, cluster size and prevalence of the condition [6]. Coefficients closest to zero suggest intraclass heterogeneity, indicating that the variable is randomly distributed among clusters. Similarly, values close to 1 indicate homogeneity in a sample and the variance of cluster units is greater than that of single elements [2].
Design effect (DEFF) indicates the extent to which the variance of parameter estimate is a result of study design, in this case the cluster sampling, compared to what would be obtained if sampling had been carried out by the SRS method [7]. For example, a DEFF value of 3 indicates that the variance of parameter estimation is three times greater than would be found if the study had been based on a random sample of equal size. Components involved in calculating the DEFF are ICC and mean cluster size. The larger the ICC and the larger the cluster, the higher the DEFF value [8].
The routine adoption of ICC calculation in cluster studies may improve interpretation of the results and facilitate the development of new studies in the field. These values could be used as a correction factor for the calculation of sample size in future cluster studies, thus avoiding underestimates, since, in studies in which SRS is used, the sample size required to achieve sufficient statistical power is generally smaller [4].
The Brazilian Network for Surveillance of Severe Maternal Morbidity study was developed with the objective of identifying cases of severe maternal morbidity/near miss (woman admitted with potentially life threatening conditions, near miss) or death in 27 hospitals distributed throughout Brazil. The participating centers were selected using the following criteria: the availability of each center to participate in the surveillance study, the geographical region of the country in which the hospital was situated and a requirement that the hospital in question performed at least 1,000 deliveries per year. Therefore, the participating centers constituted the primary sampling units, while the subjects at each center represented the units of analysis [9].
The objective of the present report is to evaluate the intraclass correlation coefficients of the variables associated with outcome, process, personal characteristics and structure, based on the data collected by the Brazilian Network for Surveillance of Severe Maternal Morbidity, as well as the design effects.

Methods
The Brazilian Network for Surveillance of Severe Maternal Morbidity, established in 2009, conducted a cross-sectional, multicenter study involving 27 hospitals providing care for pregnant women in Brazil. The objective of this network was to identify cases of severe maternal morbidity/near miss, using the criteria established by the World Health Organization (WHO) to characterize these conditions [10]. According to this definition, a maternal near miss is a woman who experienced a very serious complication during pregnancy and almost died but survived at least until the 42 nd day of postpartum period [10].
The selection of clusters was based predominantly on the location of the hospitals in order to ensure broad coverage throughout the country, and on a minimum of 1,000 deliveries per center per year to enable the calculated sample size to be reached. The sample size calculation was based on an prevalence of 8 cases of near miss for every 1,000 deliveries [11], a maternal death ratio of 140 for every 100,000 liveborn infants, with confidence level of 95%, and considering a precision level of 8/1,000 for near miss and 8.5/1,000 for maternal death; it was added approximately 25% based on the fact that this definition of near miss had not yet been tested. In result, around 75,000 deliveries would have to be monitored to identify approximately 600 cases of near miss and 100 maternal deaths. The study was evaluated and approved locally by the respective Institutional Review Board of each participating hospital and also by the National Council for Ethics in Research.
Over the course of one year, all the pregnant women admitted to these institutions were monitored. Women found to have any of the morbidity criteria were included in the analysis. The data of interest to the study were collected from the women's charts immediately after they were discharged from hospital [12]. After data collection was complete in June 2010, procedures were initiated to verify and correct any inconsistencies to ensure the quality of the data obtained. These procedures included daily check of all data inputted in the database, appointment of possible inconsistence to researchers, correction of errors and programmed check of all dataset at the end of study.

Data analysis
Initially, the cluster size and the mean size of the primary sampling unit (PSU) obtained from the total sample of 9,555 women were described. Estimated prevalences of each dichotomized variable, intraclass correlation coefficients (ICC), their respective 95% confidence intervals (95% CI), design effects (DEFF) and mean cluster size for each variable were calculated. The software programs used for the analysis were SPSS, version 17.0 [13] and Stata, version 7.0 [14], taking into consideration the cluster sampling plan (centers) for data analysis. Information on health facilities and live births for the whole country came from official data [15,16].
Sampling plan used in the Brazilian network for surveillance of severe maternal morbidity study A single-stage cluster sampling was used, with 27 primary sampling units (PSU) corresponding to the 27 participating centers (hospitals). The sampling plan did not involve stratification of the PSU or weighting of the data. The unit of analysis (subject) was the registration of each woman admitted with potentially life threatening conditions, near miss or death.
Prevalence -ratio estimator (r) [2] y αβ (α for cluster and β for individual) is the total number of subjects in the sample possessing a certain category (for example "Yes") for the variable y αβ (dichotomized); for example, y: the total number of subjects in the sample with "prenatal care at the same facility", and x ¼ X α x α is the size of the available sample (valid) for that variable, where x α is the sample size for the cluster 'α'. For this study, the ratio estimator is: According to Kish [2], the intraclass correlation coefficient is: where s 2 a is the variance between clusters; s 2 b is the variance within clusters; b is the size of clusters andŝ 2 is the estimate of S 2 : (the variance in individual level). The estimateŝ 2 is obtained by: Stata's equivalent computing formula [14] is: where 'F' is the Snedecor's F-value from the ANOVA table and 'a' is the number of groups. The variance estimate for ICC is obtained by an extensive asymptotic formula, and because this it was not showed.

Design effect -DEFF [2]
Deff where var actual (r) is the estimated variance according to the complex design being studied and var SRS (r) is the variance in the estimator considering the design as if it were calculated using a SRS of the same size, n.
Variance estimator of r under the design being studied: Variance estimator of r under SRS: where r is the ratio estimator under the SRS.

Recalculation of the sample size
Using the ICC from the main outcome as a correction factor [17]: Where 'a' is the number of clusters, 'Deff' the design effect and 'ICC' the intraclass correlation coefficient of the main outcome.

Results
Of the 82,388 deliveries performed in the participating institutions during the study period, 82,144 liveborn infants and 9,555 cases of maternal morbidity were included in the study. The mean size of each cluster was 354 and the distribution of cases per center and region and of the liveborn infants per region is shown in Table 1.
The ICCs ranged from < 0.001 to 0.508, with a median of 0.035. The ICC was below 0.1 for approximately 75% of the variables. In this block of variables, the mediann ICC was 0.021, while in the other 25%, median ICC was 0.195. The ICCs for the variables related to the process are shown in Table 2, with values ranging from 0.001 to 0.508 (median 0.09), while DEFF values varied from 1.5 to 292.63 (median 20.52). Table 3 shows the variables related to outcome, with ICCs that ranged from < 0.001 to 0.375 (median 0.021) and DEFF values of 0.94 to 289 (median 6.24). ICC values were < 0.05 for approximately 74% of the variables associated with outcome, while the proportion of variables for which ICC was < 0.07 was almost 80%. Tables 4 and 5 show the ICC values with their respective 95% confidence intervals, prevalence rates, DEFF values and the mean size of the clusters for the variables referring to population/obstetric characteristics and structure, respectively. With the exception of some variables related to structure, particularly those associated with delays in the service or healthcare system, ICC values were very low.
Based on the fact that the sample size calculation did not take into account the ICC for the primary outcome of interest, because it was not available from other studies at the time of project development, the sample size was recalculated using the ICC for the variable near miss/deaths as the correction factor.
This was an exercise to evaluate if our sample would have been sufficient for analysis considering the cluster design. This variable was used because it is the main outcome of interest in this study and its value was 0,077. Using the formula for recalculating the sample size in such situations [17], results indicate that a sample size of 7,072 would be needed to identify the near miss/death cases and it corresponds to 74% of the total number of subjects included in the Network.

Discussion
The values of the intraclass correlation coefficients found in this study can be considered low, close to zero, for the majority of the variables. Intraclass heterogeneity was greater in the variables related to outcome in comparison with the others.
In selecting the clusters, stratification by region was not performed. Proportionally more of the centers were situated in the southeast of the country (48%), and consequently a greater number of liveborn infants were also born in this region (46%). This distribution is in conformity with the actual distribution of healthcare institutions and the proportionality of liveborn infants per region of the country. According to the Ministry of Health's National Register of Healthcare Institutions, approximately 45% of the institutions registered are situated in the southeastern region [15] and, in 2008, 38.5% of all liveborn infants were born in this region [16]. Proportionality was also maintained in comparison with the other regions.
Although the surveillance of cases of severe maternal morbidity/near miss was prospective, the data were collected from the patients' charts immediately following the women's discharge from hospital. Therefore, for some variables, the number of individuals for whom data is available is less than the total number of cases, since it was impossible to recover some of the missing data. This possible loss of part of the data for some of the variables was predicted and taken into consideration in planning the study. Severely ill women may not be able to consent to participate in studies of this type because they may be unconscious, may die or may find themselves in various different situations of emotional fragility. Therefore, data collection following hospital discharge was completely anonymous and avoided the need to obtain informed consent, what allowed a larger number of research subjects to be included, factors that are important in studies of serious conditions with a relatively low prevalence rate.
Previous studies have shown that ICC values are generally higher for variables related to process compared   to those for variables related to outcome, since for the same intervention (measure of process), responses may differ between the different individuals under therapeutic management (outcome measurement). Furthermore, higher healthcare levels tend to increase the degree of homogeneity [17][18][19][20]. The explanation for this is that when the level of care is higher, there is a greater likelihood that management techniques will be standardized and institutional protocols will be used. In these cases, when one of the objectives is to evaluate compliance with established guidelines, homogeneity may be interpreted as a leveling of institutions with respect to certain recommendations. This tendency is also seen in the present results, which were obtained from secondary and tertiary care hospitals, the majority of which were teaching hospitals. In this type of institution, the majority of procedures are performed in conformity with evidence-based healthcare protocols. Indeed, the mean ICC value for the variables related to process was 2.6 times higher than the mean ICC value for the variables related to outcome.
The variable with the highest ICC was "Major type of healthcare insurance used for hospital admission" with a value of 0.508. This homogeneity was expected, since only 3 of the 27 centers accepted private patients, all the other centers being exclusively public healthcare services.
Some of the variables related to process were obtained from a specific section of the study focusing on delays during patient care. In addition to obtaining data from the women's charts, this study also involved another step in which the investigators were instructed to make a subjective analysis of the chain of healthcare services provided, based on the data available on the charts. In addition, after all the variables had been completed, the data for each subject were reanalyzed by the principal investigators and standard procedures for the classification of delays were implemented for all the cases in all the clusters. Therefore, the greater homogeneity found for these variables may be explained by this "standard correction" adopted for all the centers.
Previous studies in the primary care sector have reported ICC values < 0.05 for variables related to outcome and ICC values > 0.05 for variables related to the process [20]. In the field of maternal and perinatal healthcare, Taljaard et al. calculated ICC values based on data obtained from secondary/tertiary services [18]. The ICC of some variables analyzed in this study can be compared with ours as reported in Table 6. The findings of those investigators showed that, in general, ICC values for the variables related to the process tended to be > 0.07, with values < 0.07 for the variables related to outcome. The present findings are in agreement with this observation. Furthermore, these values can probably be considered as a good parameter of variance for calculating sample size in new studies in this area, similar to the 0.08 value used by van de Ven et al. [21].
Pagel et al. [22] estimated ICC of data from five community-based cluster-randomized controlled trials, all evaluating community interventions to improve maternal and newborn health outcomes. The mean cluster size of these studies ranged from 3,934 to 27,953 people. Of 9 key perinatal indicators, only maternal death has the possibility of comparison with our calculations and its ICC ranged from 0.00 to 0.00051. All those comparisons show what was already known. The smaller the cluster size, the higher the ICC, and the opposite occurs regarding the prevalence of the condition. Mortality events are rare in community based population and the prevalence rises with morbidity association. Pagel et al. [22] established as a limitation of their findings that the ICC estimates for rare outcomes as maternal mortality are not likely to be reliable.
Taljaard et al. [18] collected data from hospital based population, as our study, and obtained ICC similar to our findings.
This study is based on pregnant women who had at least one of the potentially life threatening conditions. So, prevalence of any factor that is associated with these conditions will be higher in the study sample than in the general obstetric population. This could potentially inflate ICC estimation as ICC generally tends to increase with higher prevalence. The ICCs of this study also depend on other factors such as the hospitals' characteristics. Caution in applying these ICCs to other settings should be used and this could be considered as a possible limitation of the study.
The absence of previous studies using these new near miss criteria standardized by the WHO [10] made it impossible to obtain the respective ICC values for the variables of interest related to outcome. Therefore, in this study sample size was calculated based exclusively on previous prevalence rates of the condition.
Considering the mean general characteristics of women in this study, most of whom received care in public hospitals linked to universities, and that the ICC values showed sample heterogeneity, studies conducted in middle income countries with similar characteristics to Brazil probably can use the ICC values of this study as the basis for their calculations.

Conclusions
The Brazilian Network for Surveillance of Severe Maternal Morbidity conducted a pioneering, cross sectional, multicenter study on the application of the new WHO near miss criteria to identify severe cases in obstetrics. This paper reports the intraclass correlation coefficients for study variables. The results found are in agreement with those of previous studies and homogeneity of the data obtained from variables related to outcome may be considered minor (median ICC 0.021), increasing reliability of the study estimates. These values may be used to plan new cluster studies in maternal and perinatal health, mainly studies associated with severe maternal morbidity/near miss events. They may be useful for sample size calculation.

Competing interests
The authors declare that there are no competing interests.
Authors' contributions JPD and JGC were responsible for the original idea of the study. SMH, JGC and MAP were responsible for its implementation. All of them plus the  whole group were responsible for care of women, data collection, data consistency and cleaning. MHS, SMH and JGC were responsible for planning and performing analysis. SMH and MHS wrote the first draft of the manuscript which was finalized with output of all others who read and approved the final version.