An integrated systems biology approach to the study of preterm birth using "-omic" technology - a guideline for research

Preterm birth is the leading cause of neonatal mortality and perinatal morbidity. The etiology of preterm is multi-factorial and still unclear. As evidence increases for a genetic contribution to PTB, so does the need to explore genomics, transcriptomics, proteomics and metabolomics in its study. This review suggests research guidelines for the conduct of high throughput systems biology investigations into preterm birth with the expectation that this will facilitate the sharing of samples and data internationally through consortia, generating the power needed to study preterm birth using integrated "-omics" technologies. The issues to be addressed include: (1) integrated "-omics" approaches, (2) phenotyping, (3) sample collection, (4) data management-integrative databases, (5) international consortia and (6) translational feasibility. This manuscript is the product of discussions initiated by the "-Omics" Working Group at the Preterm Birth International Collaborative Meeting held at the World Health Organization, Geneva, Switzerland in April 2009.

Preterm birth, (PTB -birth before 37 weeks gestation), is the leading cause of neonatal mortality and is associated with up to 75% of long-term morbidity including developmental delay, cerebral palsy, retinopathy of prematurity, and hearing and vision problems [1,2]. Despite medical advances and better understanding of uterine activation and parturition, the rates of PTB have been increasing over the past three decades in developed countries [3]. with current rates ranging from 5-7% [4]. and also complicate 9.6% of all births worldwide [5]. Late PTBs, defined as delivery at 34 +0 weeks to 36 +6 weeks of pregnancy [6], have risen 25% since 1990, [7]. now accounting for three quarters of preterm deliveries. This stark increase may be attributed to fetal indications, preterm premature rupture of membranes (PPROM) and its associated risks, and the increase in multiple pregnancies associated with assisted reproductive technology [8] Complicating our understanding of PTB is that it's etiology is multifactorial and varies by gestational age. Among factors associated with increased risk of PTB are maternal smoking during pregnancy, [9,10] advanced maternal age, [11,12] sub-optimal weight gain during pregnancy, [13] maternal stress, [14][15][16] decidual thrombosis [17], cervical insufficiency [18,19] and the presence of infection [20][21][22]. In addition, a variety of environmental and genetic play a role in PTB; however the effect size of these factors is not clear. In the United States, PTB occurs disproportionately in women of African ancestry [23,24] even when controlling for social confounders. Twin studies suggest that the heritability of PTB may be 17-36% [25,26]. Clinically, the best predictor of PTB is a prior history, [27,28]. where recurrence risk increases by approximately 15% with each PTB [29]. Further, data suggest that the risk of PTB is inherited across generations [30]. As evidence increases for a genetic contribution to PTB, so does the need to explore genomics, transcriptomics, proteomics and metabolomics in its study.
High throughput systems biology, referred to as "-omics" technology has revolutionized research methodologies. Through these high throughput technologies and the generation of massive data sets, it is now possible to do in an afternoon what previously took several years and yet our understanding of the complex phenotypes of PTB remain incomplete, inconsistent and without clinical clarity. The "-omics" era has seen many publications (> 250, 000) however only a limited number (~6, 000) have been in reproductive medicine ( Figure 1). Many of the "-omics" publications relating to PTB have assessed single classes of "-omics" data, utilizing genomics, transcriptomics or proteomics in isolation. The results of many of these "-omics" publications have failed to replicate and their practical value has been limited, failing to translate into clinical practice. The limited successes of singular approaches emphasize the need for integrated approaches to investigate complex phenotypes across "-omics" categories.
To support both singular and integrated systems biology approaches, the "-omics", or systems biology, movement has seen the development of multiple consortiums utilizing high throughput platforms to investigate complex phenotypes. Central to the study of complex phenotypes are accurate phenotype definitions. In the study of PTB, this necessitates collaboration among multiple research groups working synergistically to define phenotypes and to provide adequate sample size [31]Consortia, by design, employ multiple sites for the collection of phenotype data and biological samples with the goal of creating sample sizes large enough to power studies at levels impossible for any single research group, institute or funding opportunity. Moreover, "-omics" technologies require high quality biologic samples with specific, consistent and precise collection and handling. Key to effective consortia is consistency in information gathered, specimen collection, storage and management without which merging of data is problematic.
There is a need for guidelines for the conduct of integrated "-omics" studies into PTB. The genomic, transcriptomic and proteomic working group from the Preterm Birth International Collaborative (PREBIC) meeting in 2009 propose these suggested guidelines. The aim of this article is to establish guidelines for "-omics" studies of PTB such that data and samples collected can be merged, compared and replicated through consortia capable of integrated systems biology methodologies. The issues to be addressed in this guideline include: (1) integrated "-omics" approaches, (2) phenotyping, (3) sample collection, (4) data management-integrative databases, (5) international consortia and (6) translational feasibility.
Integrated "-omics" Systems Biology Approaches Until recently the "-omics" era consisted of studies in genomics (the study of genes and their functions), transcriptomics (the study of the complete set of RNA transcripts produced by the genome at one time), and proteomics (the study of the complete set of proteins produced by a species). Recently, through the development of new technologies, metabolomics (the study of small-molecule metabolite profiles generated by cellular Figure 1 Systems Biology "-Omics" Publications in Relation to Pregnancy. Published articles utilizing selected systems biology approaches from 1999-2010. Those related to pregnancy generally less than 3% (note log scale) of the total published articles, and have only begun to increase in 2009. Data abstracted from PubMed with search terms: Human AND English + [transcriptome OR transcriptomics, transcriptome OR transcriptomics + pregnancy, proteome OR proteomics, proteome OR proteomics + pregnancy, genome OR genomics, genome OR genomics + pregnancy, metabolome OR metabolomics, and metabolome OR metabolomics + pregnancy]. processes) has further expanded the "-omics" field. The considerations for using each "-omics" platform in studies of PTB and its sequelae have been reviewed elsewhere [32]. Additionally, the limitations of investigations using "-omics" in isolation have been discussed [33] and emphasize the need for integrated "-omics" approaches as the future path of research.
A step-wise integrated approach is central "-omics", yet without strategic implementation, integrating "-omics" fields may be plagued by limitations comparable to the utilization of singular approaches. Each step, or technique, yields distinctly different information (Figure 2) in discovery research. Transcriptomics, not in isolation but rather as an entry point for investigation, presents unique advantages for the study of PTB and perhaps other complex phenotypes alike. Unlike genomics, transcriptomics provides a snapshot of what appears to be happening at a given point in time in a biological sample. Therefore, if patterns are observed which are specific to PTB phenotypes, the functional consequences (protein products) or genetic predisposition (single nucleotide polymorphisms -SNPs) may be ascertained and feedback interactions and processes explored. Proteomics and metabolomics are also promising techniques as analytic steps essential to integrated discovery research studies. These steps are able to build upon the patterns revealed by transcriptomics, as transcriptomes are putative precursors to the actual physiology. However, sample processing requirements and still rapidly evolving technologies pose special challenges in the use of proteomics and metabolomics, as opposed to candidate driven research. In comparison, genomics is limited by the lack of linear associations between genetic variants and complex phenotypes (Figure 3). It holds its intrinsic value in secondary analyses and should also be including in integrated investigations. Utilizing multiple complimentary techniques strategically in integrated studies may reveal the pathophysiological insights and clinical clarity PTB research seeks to discover.

The Phenotype of Preterm Birth
The World Health Organization defines PTB as "birth before 37 weeks (or 259 days) gestation" [34]. Preterm birth is therefore unique among adverse pregnancy outcomes in that it is defined by a time point and not be specific etiology or pathophysiology. However, as an obstetric syndrome, PTB represents a common end point to a wide variety of clinical conditions that have been classified inconsistently in a number of ways that have included: 1) gestational age at which delivery occurs; 2) clinical presentation resulting in PTB; and 3) putative pathophysiology responsible for PTB. These classification systems are not mutually exclusive, with each of them offering different benefits depending on the scientific or clinical question of interest, but investigators must be clear and consistent in defining the PTB phenotype within the study population.
The most common classification system for PTB is based on gestational age at delivery where cases are classified into strata of extreme prematurity (< 28 weeks gestation), severe prematurity (28-31 weeks gestation), moderate prematurity (32-33 weeks gestation) and near term prematurity (34-37 weeks gestation) [3]. The majority of PTB occurs between 34 and 37 weeks gestation with smaller numbers occurring at lower gestational ages [35]. Unfortunately, accurate assessment of gestational age is not always possible, especially in low and middle-income settings where the burden of disease is high, and resources limited. The last menstrual period is frequently unknown, early dating ultrasound unavailable, and where most births occur home. The need for better tools to accurately assess gestational age is therefore a key research imperative to facilitate "omics" biology in PTB.
Until such tools are available, alternative classification based upon clinical presentation or proposed pathophysiology are more likely to be of value in understanding the genetic and physiological processes that lead to PTB.
In clinical classification schemes, after excluding multi-fetal pregnancy, severe fetal malformations and fetal death in-utero, PTB can be broadly classified into two clinical pathways -iatrogenic PTB and spontaneous PTB ( Figure 4). Iatrogenic, or medically indicated, preterm birth occurs when the benefits to either the mother or fetus of delivery outweigh the benefits of continuing pregnancy. Iatrogenic PTB occurs in about 25% of all PTB with variations from 8.7% to 35.2% according to studied populations [36]. This clinical phenotype includes preeclampsia, diabetes, other maternal medical conditions and fetal growth restriction, a range of conditions with differing etiologies, risk factors and clinical outcomes. As a result of an increased ability to monitor fetal health during pregnancy and recognize the onset of maternal disease earlier, iatrogenic PTB is becoming more common and the increasing rate of late PTB is thought to be largely attributable to iatrogenic causes [3] Spontaneous preterm birth can result from either spontaneous preterm labor (defined as regular contractions with cervical changes at less than 37 weeks gestation) or preterm pre-labor rupture of membranes (PPROM) defined as spontaneous rupture of the membranes at least 1 hour before the onset of labor and at less than 37 weeks gestation [37]. SPTB accounts for approximately 50% of all PTB (range 23.2% to 64.1%) [36,38]. It is more frequent in populations without any established risk factors in which it represents 50% to 70% of all preterm deliveries [39,40] It is important to recognize that classifications of PTB based on clinical presentation or gestational age at delivery result in different study groups as the proportion of PTB that are iatrogenic and spontaneous varies between study populations, [41][42][43][44][45][46] racial backgrounds [41][42][43][44][45][46] and gestational age [41,47]. Regardless, it is likely that the ultimate pathophysiological mechanisms and pathways involved in PTB are similar across groups.
It has been hypothesized that PTB and term birth share similar final physiological pathways, but that these pathway are triggered early in PTB [48]. This common pathway of parturition involves the activation of various physiological processes including myometrial contraction, decidual activation, membrane extracellular matrix degradation, weakening and rupture and cervical ripening resulting in labor and delivery [37,48]. While the physiological triggers of this final pathway in term birth are still not well understood, the proposed pathological triggers that have been proposed for PTB are outlined in Figure 5. Activation of one (or more) of these triggers and their interaction with environmental factors and genetic susceptibility in the host can lead to activation of the common parturition pathway at an earlier gestational age and result in PTB.
This complexity is often disregarded in "-omics" research into PTB where spontaneous PTB, preterm premature rupture of membranes, and even iatrogenic PTB are commonly grouped together for analyses. The heterogeneity resulting from grouping known clinical presentations together decreases the sensitivity and power of many research studies. It is vital that PTB is distinguished by its different phenotypes prior to all analyses, including those utilizing "-omics" technologies. Although this may decrease numbers in any given study, it may increase biological homogeneity, thereby potentially replacing the lost statistical power.

Sample Collection and Processing
Sample collection for "-omics" research is not without it's own considerations, complications and detailed protocols. For PTB research, the biological sample collected is determined by the research question of interest; therefore a wide variety of fluids or tissues such as cervical mucus, blood, urine, saliva, vaginal discharge, myometrium, and uterine tissues are appropriate depending on the investigation. Collection of these specimens should be carefully planned a priori and the consistency of handling closely monitored to assure specimens are representative of the physiology rather than a reflection of ex vivo handling. To ensure samples are of maximum utility to individual and consortia investigations, extracted DNA, RNA, proteins or metabolites should be handled consistently. Table 1 provides an outline of approaches that can be utilized. The availability of biological specimens by itself is not sufficient for integrated  "-omics" approaches to PTB research. Detailed documentation of the phenotypes is, as noted above, essential. Regardless of the classification system for PTB employed (Section 2), each sample should have the minimal dataset available (Table 2 [41]) and if possible, the optimal dataset (Table 3 [41]) described originally for genetic epidemiology studies of PTB but which translate directly into integrated "-omics" approaches in general.

Data Managements -Integrative Databases
A major limitation to current progress in understanding the genetic predispositions to PTB is that only a limited number of genetic epidemiologic studies are available representing various ethnic/racial groups globally [49]. Therefore, there is a critical need for large and comprehensive clinical resources linked to biospecimen banks. At the level of individual investigators or small teams of researchers, clinical, environmental and biological data are continually being collected for studies with relatively small sample sizes. While it is possible to obtain high quality, mergeable data on large numbers of high-risk pregnancies, the use of this approach is limited (in part) by the absence of field standards and guidelines. Without these navigational beacons, the current use of inadequately sized cohorts or samples has resulted in inconsistent and possibly spurious initial findings for "-omics" results in PTB studies. This is likely due to the multi-genic/multifactorial origin of PTB where any given factor/gene may contribute at most a few percent to the phenotypic variation.
If consistency is present in sample and data collection, an integrated international dataset becomes possible and transparent. The creation of integrated databases that contain both clinical (phenotype) data and biospecimen data has two additional major benefits: access and dissemination. This will allow researchers across the globe to work synergistically to attempt to answer many of the unanswered questions about PTB utilizing adequate sample sizes and the latest developments in technology. Robust technology and analytical infrastructure is imperative to support the vast amounts of data generated. All samples require rapid preparation and preservation (+/-protease inhibitors) to prevent non-specific protein degradation

Metabolomics Blood Amniotic fluid Urine
Samples require rapid metabolic "quenching" (e.g., flash freezing or acid precipitation) to prevent degradation of metabolome Suggested biological samples and sample handling considerations for each stream of "-omics" technology for the study of preterm birth. The optimal sample to collect should be determined by the driving research question for each investigation.

International Consortia
PTB is a global problem with increasing rates in developed countries yet the vast majority of cases occur in the developing world (Table 4) [5]. International consortia are therefore needed to bring together resources, experts and data from low, middle and high-income countries to facilitate "-omics" research of PTB and to disseminate results to all who may benefit. An integrated "-omics" approach for PTB holds the potential for enormous scientific and, ultimately, clinical benefit. The ultimate goal of such research is the improvement of biological understanding leading to prevention, early diagnosis tools, and treatment for PTB and its associated outcomes.
There are currently consortia established to investigate genome wide associations with PTB. All of these consortia are limited in their sample sizes due to the costs of genotyping and it is likely that meta-analyses of these data will be required to make substantial advances in our understanding of the genetic contributions of PTB. Most international consortia have an organized structure including an executive that contains both consortia leadership and members from each of the individual studies or data collection sites contributing data to the consortia. Detailed memorandums of agreement are required between both participating universities and researchers to facilitate smooth working of these consortia.
One example is the Preterm Birth Genome Project, a global consortium to study genetic predisposition in PTB. This consortium includes investigators not only from industrialized nations but also from low and middle-income countries in South America, Eastern Europe, Asia and Africa. The Preterm Birth Genome Project (PGP) was initiated within PREBIC members in September 2007. This consortium includes investigators from four continents and has established a memorandum of agreement to collaborate on GWAS by pooling resources (DNA) and establishing a database of phenotype definitions. The goals of the PGP consortium have been to 1) create a community of researchers to identify PTB susceptibility genes; 2) pool resources from multiple investigators to conduct GWAS across multiple geographic populations including detailed phenotypic and environmental data; 3) to establish a large pool of replication samples; and 4) to enable deep re-sequencing of genes with significant and/or interesting findings in GWAS. This consortium has been highly successful in both collecting resources (> 5000 cases, > 5000 controls) and also funding research into this rapidly evolving field.
A recent consortium is established by PREBIC biomarker working based on a systematic review of SPTB biomarker literature published between 1965-2008. Due to heterogeneities in study designs including above detailed issues of study designs, phenotype definitions   The global burden of preterm birth varies by region with the highest rates occurring in developing regions. Adapted from BECK S, WOJDYLA D, SAY L, et al. The worldwide incidence of preterm birth: a systematic review of maternal mortality and morbidity. Bull World Health Organ;88: 31-8. and assay variability between different laboratories, no biomarker emerged as a risk predictor. Preterm Birth Biomarker Project (PBP) is setup to address these issues. This study will identify homogeneous studies/samples from around the globe to be tested on a panel of potential PTB biomarkers. Similarly, further consortia will be required to utilize "-omics" technology and a systems biology approach to study PTB; however, those in existence rarely have adequate samples amenable to multiple "-omics" analyses. We hope that this paper will motivate others to increase the variety of biological samples collected to better address the major hurdles to the study of PTB.

Translational Feasibility -Barriers and Constraints
Despite the increasing number of publications documenting the utilization of "-omics" technologies for PTB prediction or preterm labor diagnosis, translation into clinical utility is absent despite its continued promise. This apparent gap in knowledge translation reveals both barriers and constraints to applying insights gained from "-omics" investigations of complex diseases. The inconsistencies in defining PTB phenotypes, sample handling methods and environmental variables have, not surprisingly, made reproducibility of study findings nearly impossible. These mixed messages plague the PTB literature and limit the interpretation of "-omics" generated knowledge, hindering translational feasibility. This is of course not unique to PTB.
Genomic analyses of complex traits such as PTB implicitly and explicitly make assumptions regarding the nature of the risks conferred by genetic variants. The most important of these assumptions is that variants in the nucleotide level are linear (or nearly so) in terms of their effects on disease risk. Therefore, one can test for associations between single nucleotide polymorphisms (SNPs) and PTB with the expectation that the role of any given change is transparent to intermediate processes that are included in the central dogma of molecular biology and its correlates (DNA to RNA to protein; Figure 2). Specifically, a gene is transcribed and the mRNA translated in such a way that changes in base pair composition in a gene encoding a critical protein are easily detectable at the phenotypic level. Although this is a very powerful model and in general approximately true (especially for Mendelian disorders), recent research has indicated that this unidirectional process is not universal and many non-linear processes are part of the progression (Figure 4). The failure of the linear model has many implications for the genetic/genomic analyses of PTB. Foremost is the fact that any changes in the primary DNA sequences are not necessarily directly or easily translated into phenotypic changes. Instead, a large number of intermediate processes modulate the effects of DNA variation. Therefore, changes in the DNA may be difficult to detect using a simple association methodology even though they play a key role in disease etiologies. The goal therefore is to more completely model the overall process of gene to phenotype. As a field, we need to recognize that this approach in time will lead to clinical advances to better predict disease and to the design of more effective preventative strategies and treatment. While it is universally accepted that translation is the goal, this is not a realistic deliverable or tangible aim of any single investigation or approach, although this is often implicitly promised.
The now clear need to link data sets from multiple studies is pivotal to the progress of PTB research. However, the goal of integrated international consortia using "-omics" generated data will further complicate translational feasibility if ethical considerations are not addressed proactively at the individual study and consortia levels. When conducting integrated "-omics" investigations ethics boards, participants and researchers face new challenges beyond that of the complexity of the huge data sets produced. The most significant of these include: (1) ensuring robust informed consent and community engagement, (2) dissemination or sharing of research results or other benefits, and (3) sensitivity to the special concerns of participants, typically, pregnant women.
First, consortia will need to address a reoccurring question "how can robust informed consent be sought from participants?" [50][51][52]. Because the details of future joint ventures are often unknown or impossible to anticipate, obtaining robust informed consent from participants at the time of enrolment to share samples with international consortia is difficult. Similarly, ethics committees face challenges during review, as they cannot evaluate each unanticipated use of data. Participants themselves may be hesitant to partake in studies when the destiny of their samples and information is unknown. The de-identification of biological samples and clinical information is designed to protect the confidentiality of individual participants. However, exactly what information collected by the original study team would be required by a secondary investigator to merge datasets? This may result in only the partial de-identification of participants. Therefore each study design must consider and clarify during the process of informed consent whether consent to share information internationally is optional or required for participation, a choice which may introduce participant bias into study populations. Furthermore, when samples and data are shared internationally, it becomes unclear who bears the onus to maintain the security of the integrated databases and storage of specimens. The security of internationally shared information raises significant ethical and legal considerations.
The second challenge surrounds the dissemination of research findings. Linking and integrating large data sets and their associated biospecimen banks is not inherently straightforward, nor is the dissemination of knowledge generated back to the original communities. These ethical challenges are not unique to PTB research but rather impact biobanking and international consortia efforts in all fields; as such models and lessons developed in relation to cancer research, for example, may be tailored to PTB research. In regards to participants, consortia comprised of representatives from each of the contributing data pools may facilitate the dissemination of study results to their respective participants while still maintaining subject privacy at the consortia level. This would also enable the channeling of information to the original investigators and local communities whom supported the primary data and sample collection. The obligation to return consortia generated results to participants will depend upon the scope and duration of the relationship between the consortia, investigators and participants and therefore may not be possible in all cases. Sharing aggregate consortia generated results may be facilitated by password-protected web-based research updates and newsletters, to keep project level investigators and participants aware of ongoing aggregate findings. These can include contact information for participants and researchers with questions about the studies and aggregate findings. Such sites can also serve as a place for posting educational information about healthy pregnancies, child development or parenting strategies. Because of the psychological burden that attends PTB for women and parents globally, international consortia may be in a position to facilitate social networking among participants or communities by allowing voluntary anonymous participant-participant communication through these websites. This is a way to engage participants in long-term studies and to provide benefit, when significant clinical findings (and therefore direct benefit) for individual participants are not expected.
Third, PTB research also needs to consider the specific expectations and experience of participants [53]. Women or couples who have suffered through one or more preterm births, pregnant women who have experienced a prior preterm or stillbirth, or have a family history of PTB, will likely experience heightened anxiety about their pregnancy that should be taken into consideration during the recruitment process [53]. Similarly, such women may have an expectation that by participating in research, they will "find a cure" to prevent preterm delivery in this pregnancy or a subsequent pregnancy [54]. Attention to these sensitive issues should shape the informed consent process and be considered by consortia utilizing data and specimens collected from these women. As integrated datasets come to the forefront of PTB research, the investments and interests of the participants, local investigators, local communities and international research communities cannot be forgotten. International consortia may be positioned to best preserve these interests by facilitating shared ethical practices rather than leaving each investigator or institution to wrestle with these issues on their own.

Conclusion
The "-omic"s era presents an exciting time for PTB research. Opportunities now exist to address complex biology utilizing technology that can achieve in a matter of hours what once took many years if possible at all. Although the "-omics" revolution has promise, there are important limitations and constraints to these approaches that cannot go unnoticed. Critical needs at the current time include: 1) improved phenotyping for PTB; 2) large and well-characterized case and control samples with DNA; 3) one or more genome-wide association studies for PTB with broad replication across different populations; and 4) an international consortium for PTB "-omics". Only through the use of multicenter collaborations, careful, detailed phenotyping, specific and consistent sample collections, integrated systems biology approaches and the shedding of simplistic assumptions of the gene-to-phenotype cascade will "-omics" technologies be able to provide new insights into the complex pathophysiology of PTB. The possibilities are within reach and consortia may offer the answer to data management. These guidelines for research provide the direction necessary to harness the promises of "-omics" technologies for advances in the understanding, treatment and prevention of PTB.