- Open Access
Using deep learning to predict the outcome of live birth from more than 10,000 embryo data
BMC Pregnancy and Childbirth volume 22, Article number: 36 (2022)
Recently, the combination of deep learning and time-lapse imaging provides an objective, standard and scientific solution for embryo selection. However, the reported studies were based on blastocyst formation or clinical pregnancy as the end point. To the best of our knowledge, there is no predictive model that uses the outcome of live birth as the predictive end point. Can a deep learning model predict the probability of live birth from time-lapse system?
This study retrospectively analyzed the time-lapse data and live birth outcomes of embryos samples from January 2018 to November 2019. We used the SGD optimizer with an initial learning rate of 0.025 and cosine learning rate reduction strategy. The network is randomly initialized and trained for 200 epochs from scratch. The model is quantitively evaluated over a hold-out test and a 5-fold cross-validation by the average area under the curve (AUC) of the receiver operating characteristic (ROC) curve.
The deep learning model was able to predict live birth outcomes from time-lapse images with an AUC of 0.968 in 5-fold stratified cross-validation.
This research reported a deep learning model that predicts the live birth outcome of a single blastocyst transfer. This efficient model for predicting the outcome of live births can automatically analyze the time-lapse images of the patient’s embryos without the need for manual embryo annotation and evaluation, and then give a live birth prediction score for each embryo, and sort the embryos by the predicted value.
Since Louis Brown was born, the first test tube baby , more than seven million babies have been born around the world attribute to assisted reproduction technology (ART) . In the early stage of IVF technology development, multiple embryo transfer was the main transfer method. However, multiple pregnancy was often accompanied by premature delivery, more expenditure and higher risk of complications [3,4,5,6]. Therefore, with the development of assisted reproductive technology, single embryo transfer has gradually become the first choice of IVF. However, single embryo transfer still faces an urgent problem: how to choose the best embryo to transfer to maintain the ideal success rate . The trend of choosing single embryo transfer is closely related to the improvement and progress of embryo selection technology. Therefore, embryo identification and selection technology are particularly important and significant. In order to solve this problem, scholars have developed several methods for identifying and selecting the best embryos for transfer, such as: blastocyst culture, time-lapse photography imaging system and pre-transfer genetic testing [8,9,10].
Embryologists evaluated and observed the embryos used optical microscope, which was taken out from the conventional incubator at a specific time point during the first 5 days of life before the time-lapse imaging system was applied to the clinic . Because of this disadvantage, many events in the embryonic development process have been missed . And the emergence of time-lapse photography technology had just made up for this shortcoming.
Embryologists use the time-lapse photography system to observe and evaluate the embryo that in a stable environment, rather than exposed in a variable condition (such as changing gas composition, unstable humidity, insecure temperature and movement conditions), and can obtain a lot of information between embryo development, time and embryo potential [13, 14].
Scholars have introduced the mathematical technology of artificial intelligence into ART, in order to acquire more information from the pictures obtained by the TL system, which may trigger a revolution. AI is a term that can be divided into many areas, such as: artificial neural network (ANN), fuzzy logic, genetic algorithm (GA), machine learning and deep learning [15, 16].
The emergence of time-lapse incubation makes it possible to record the complete cycle of an embryo from a blastomere to a blastocyst, when all morphokinetic features centralized . Meanwhile, owing to its abundant time-lapse data, time-lapse incubation emerges up many new research ideas combined with deep learning technology which is known as a data driven method. Deep learning can uncover numerous subtle features which may not be paid attention to manually but do help the corresponding classification or prediction. When fed with enough well labeled data, deep learning model have the ability to find an optimal representation of the given dataset by continuously conducting back-propagation. Thus, we can explore the general pattern which lead to a specific mapping from data to our desired tasks.
The deep learning literature that has been reported on embryo selection is a design study with blastocyst formation or clinical pregnancy as the end point. To the best of our knowledge, there is no research on deep learning models designed with the end of live birth outcome. In this study, we want to analyze the data of single-center, large sample of single blastocyst transfer to obtain an efficient predictive model.
Materials and methods
This was a noninterventional, retrospective, single-center cohort study of patients undergoing routine practice. In order to reflect the broad range of patients typically encountered in clinical practice, no inclusion/exclusion criteria were applied on baseline characteristics. The time-lapse embryo data used in our work are collected from Reproductive Medicine Center of Tongji Hospital, Huazhong University of Science and Technology, Wuhan, China. The whole dataset contains 33,738 embryo samples captured by Embryoscope Plus time-lapse microscope system. The fertilization time of these embryos were from January 2018 to November 2019, and we continuously pay return visits until January 2021 to confirm whether these IVF treatments lead to live birth outcomes. All patients signed written informed consent and underwent the routine clinical treatment performed in our center. No additional intervention was performed.
The study conformed to the Declaration of Helsinki for Medical Research involving Human Subjects. It was approved by the Ethical Committee of Reproductive Medicine Center, Tongji Hospital, Tongji Medicine College, Huazhong University of Science and Technology.
The classification of the outcome of each embryo was shown in Table 1. And the final indicator was live birth. The whole dataset contained 33,738 embryos with labels of positive, negative, and pending, as shown in Fig. 1. The pending embryos referred to the unthawed embryos which could be exploited in our future work, but were excluded in the experiments of this paper. Meanwhile, only the single blastocyst transfer embryos were collected, including fresh cycle and frozen-thaw cycle. Thus, the engaged dataset in this paper contained 15,434 embryos with positive and negative labels.
Embryo culture and frozen-embryo transfer (FET)
The methods used for sperm preparation, for IVF and embryo culture, have been described previously . Briefly, semen was collected in sterile containers by masturbation after 3–5 d of sexual abstinence and then maintained at 37 °C for 30 min. After liquefaction, samples were analyzed for sperm concentration, motility and morphology according to the World Health Organization criteria. The oocytes were incubated in G-IVF medium (Vitrolife) and fertilized 3 to 4 h after retrieval. Normal fertilization was defined as zygotes with two pronuclei (2PN) and fertilized oocytes were continuously cultured in G1 medium for 2 more days. Then, the embryos were transferred to G2 medium and continued to be cultured for 3 more days. The additional good-quality blastocysts were cryopreserved for subsequent frozen-embryo transfer (FET) cycles. For the FET cycles, oral estradiol (Progynova, Bayer) was provided, 2 mg/d from cycle day 1–4, 4 mg/d from day 5–8 and 6 mg/d from day 9–12. Transvaginal ultrasound scanning was performed to assess the endometrial thickness and ovulation from day 13; the estradiol dosage was adjusted based on the endometrial thickness. Administration of 40 mg progesterone intramuscularly was given when the endometrium reached a thickness of 8 mm or maximum. Administration of 60–80 mg of progesterone was provided for the following 5 days. Blastocysts transfer was performed on day 6, after 5 days of progesterone administration.
Serum hCG was measured to diagnosis a pregnancy 2 weeks after embryo transfer and then was tested serially to monitor rising titers. A clinical pregnancy was defined as the presence of a gestational sac with fetal heart activity observed on ultrasound examination 5 weeks after oocyte retrieval . The live birth outcome data were obtained by telephone interview of the parents after delivery.
Deep learning model
In this work, we designed an end-to-end deep learning model to predict live birth probability. We label our embryo samples by 0 and 1 according to real live birth outcomes, where 1 represents live birth whereas 0 represents not. The designed supervised network regresses the discrete prediction value between 0 and 1 under the guidance of ground truth labels.
The network structure consists of seven convolution modules and two fully connected layers. The first module contains three convolution blocks which represents a combination of a convolution layer, a batch normalization layer and a following ReLU (Rectified Linear Unit) as an activation function. As is widely known that the residual block proposed in ResNet  is demonstrated effective in numerous classification tasks, the subsequent six convolution modules who share the same architecture are composed of three basic residual blocks and a convolution block. Feature maps are down sampled at the last convolution block of each module. The whole network in this work can be described a ResNet like network, as shown in Table 2. but the number of modules differs from that in benchmark structure. Also, the complexity of our model is much higher than the benchmark model, specifically reflected on the number of convolution kernels.
We utilize BCE-Loss (binary cross entropy loss) as a loss function to guide the backpropagation during training term when the model constantly optimizes itself. Since the loss function calculate the distance between output predictions and target labels, our purpose is to minimize the loss value.
Aimed at the extremely imbalance of the positive and negative samples, we implement the following measures during the training term. In the cross-validation experiment, we perform data augmentation after splitting the dataset according to Table 3. The specific method is as follows: Firstly, we conduct abundant data augmentation measures, including affine transformations and randomly coarse dropout. Affine transformations refer to flip, translation, rotation, scaling, each operation occurs randomly at a probability of 50 %. Coarse dropout means randomly drop some local pixels, the selected local pixels are painted in solid black, we set the probability ranging from 2 to 5%. Secondly, we over sample the positive samples at a certain multiple, which equals to the ratio of positive and negative samples, i.e., sixteen in our experiments. The original images captured by time-lapse incubation are 8002 pixels, which should be further resized to 2242 for network training after data augmentation.
We used the SGD optimizer with an initial learning rate of 0.025 and cosine learning rate reduction strategy. The network is randomly initialized and trained for 200 epochs from scratch.
The model is quantitively evaluated over a 5-fold cross-validation by the average area under the curve (AUC) of the receiver operating characteristic (ROC) curve.
ROC curve connects all points described by true positive rate and false positive rate under all possible thresholds, which is a boundary value between positive and negative samples. Considering that true positive rate and false positive rate are in a trade-off relationship corresponding to thresholds, we can quantify the discriminating power by calculating the area under the curve, this is so-called AUC. A binary classifier who has incomparable discriminating power can possess an AUC value of 1, whereas the weakest who almost emerge the judgement randomly possess an AUC value of 0.5, and a higher AUC value implies a better performance. AUC is more reasonable than accuracy especially in classification tasks with imbalance data.
In order to comprehensively evaluate the performance of our model, we perform a hold-out test and a 5-fold cross-validation simultaneously . In the hold-out test or so-called train-val-test approach, we randomly split the dataset in a ratio of 5:1:1 for training set, validation set, and test set, respectively. In the latter evaluation method, we randomly divide our data into five parts with equal size, where the proportion of positive and negative samples in each separate is same. Then, five models should be trained. In each case, a specific subset is selected for validation while the remaining four subsets serve as a training set. Finally, we can figure out the mean AUC of the five folds to evaluate the performance on the whole dataset. Compared with hold-out test, cross-validation can eliminate the possible overestimating or underestimating caused by undesired sample division.
From January 2018 to November 2019, a total of 5913 cycles used the time-lapse culture system. Among them, some patients have not been transferred in fresh cycle, and their embryos have not yet been thawed. In the end, 3382 fresh cycles and 3270 frozen-thaw transfer (FET) cycles were included in the study and 33,738 embryos samples were analyzed. Basic information of the patients included in this study was shown in Table 4.
Analysis of the ROC was shown in Fig. 2. The resulting AUC of this research to predict live birth on the testing dataset was 0.968.
Table 3 showed the results of 5-fold cross-validation. The average value of AUC was 0.968. The AUC was reproducible in individual train-validation runs.
The AUC value of the conducted hold-out test was 0.957, which was evaluated on the test set. The result was comparable with the 5-fold cross-validation.
This study is a preliminary study of deep learning with live birth data as the end point during the IVF cycle. Our results show that Timelapse images can be combined with deep learning technology for clinical applications.
Morales et al. , Xu et al.  and Santos Filho et al.  used static images to assess embryo quality or select the best embryos to be transferred in the absence of early embryo development data. These methods lack support of more comprehensive data.
Dirvanauskas et al.  used convolutional neural network (CNN) to predict the developmental stage of the embryo analyze by analyze embryo images obtained from the time-lapse photography system, with a success rate of 97.62%. However, this method does not have the ability to predict pregnancy. Khosravi et al. developed a new framework (STORK) based on the inception of Google’s model to predict the quality of embryos with an AUC as high as 0.98. The study has a large sample size, complex model, and high accuracy, but it cannot be used to predict live births . It is demonstrated that our model has a better performance when compared with existing benchmark model, but it still deserves to be optimized since it’s high complexity. Such a complex network requires considerable computing resources, so it depends highly on hardware device.
There is also a latest report that creates the predictive model of blastocyst transfer . The author analyzed the data of more than 10,000 embryos and obtained a predictive model with an AUC of 0.93. However, the predictive endpoint of this study is the clinical pregnancy, which is the most prominent difference from our study. In this study, we hope to get the best predictive effect, so we chose to predict the blastocyst transfer based on the final live birth outcome.
Obviously, there is no single method that can solve all the problems in the field of assisted reproduction, and different methods have their own key research directions. The model we developed was very complex and has a high accuracy rate. That includes a large sample size, and the sample database covers patients and clinical programs with various conditions. The results are repeatable and have high clinical guidance significance. However, we have to admit that our data come from the embryo images obtained by the time-lapse photography system after fertilization, ranging from 105 h to 125 h, instead of video data which lacks early embryo development data. If we generalize this model into the task of prediction from 3-day embryos, more refined works need to be done. As we all known, more spatiotemporal features can be captured if we use the entire video as an input. But we find the predictive power will not progress obviously if we use the whole video as input rather than the blastocyst frames, considering the parameters of a model are greatly restricted due to the capacity of machines when faced with video data.
There is no clear evidence that AI applied to IVF can increase the cumulative success rate [28, 29]. Whether a patient can finally give birth to a healthy baby is not only related to the embryo itself, but also to the patient’s own health, age, reproductive history, clinical plan and many other factors. Our deep learning model does not include these variables in the database, which is also the direction we need to work hard in the future. It is worth noting that the live birth rate in this study showed a high level (45.6%). As we all know, age and ovarian reserve are very important factors that determine the clinical pregnancy rate and live birth rate of IVF . This higher live birth rate may be related to the younger population in this study (average age is 30.4 years) and better ovarian reserve (average number of oocytes retrieved is 12.9).
In 2019, an important paper was reported in AI-assisted embryo selection, the author retrospectively analyzed time-lapse videos and clinical outcomes of 10,638 embryos from eight different IVF clinics . The deep learning model they reported was able to predict fetal heart pregnancy from time-lapse videos with an AUC of 0.93. We think our research is different. This article is a single-center research. The advantage of this lies in the data analysis of large samples in a single center, which avoids the influence of different embryo operation procedures and different embryo culture systems. On the other hand, we directly used the live birth outcome as the deep learning model label. The false positive data of aborted embryos can be excluded.
There is another flaw in this study, that is, the samples are all from blastocyst transfer, and there is no model design for cleavage embryo transfer. In fact, we have tried deep learning for the evolution of the cleavage stage, but the effect is not satisfactory. This may be one of the reasons why there is no model reported for predicting the outcome of the cleavage stage embryo [27, 31, 32].
In conclusion, this model has good predictive value for embryos selection by deep learning. It can help embryologists choose the best embryos for transfer, freezing and thaw, and can shorten the time for patients from embryo transfer to becoming a parent.
Availability of data and materials
The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.
Steptoe PC, Edwards RG. Birth after the reimplantation of a human embryo. Lancet. 1978;2(8085):366.
Tiitinen A. Single embryo transfer: why and how to identify the embryo with the best developmental potential. Best Pract Res Clin Endocrinol Metab. 2019;33(1):77–88.
Pinborg A, Wennerholm UB, Romundstad LB, Loft A, Aittomaki K, Söderström-Anttila V, et al. Why do singletons conceived after assisted reproduction technology have adverse perinatal outcome? Systematic review and meta-analysis. Hum Reprod Update. 2013;19(2):87–104.
Tiitinen A. Prevention of multiple pregnancies in infertility treatment. Best Pract Res Clin Obstetr Gynaecol. 2012;26(6):829–40.
Bergh C, Wennerholm UB. Obstetric outcome and long-term follow up of children conceived through assisted reproduction. Best Pract Res Clin Obstetr Gynaecol. 2012;26(6):841–52.
Ledger WL, Anumba D, Marlow N, Thomas CM, Wilson EC. The costs to the NHS of multiple births after IVF treatment in the UK. BJOG. 2006;113(1):21–5.
Aparicio B, Cruz M, Meseguer M. Is morphokinetic analysis the answer? Reprod BioMed Online. 2013;27(6):654–63.
Gallego RD, Remohi J, Meseguer M. Time-lapse imaging: the state of the artdagger. Biol Reprod. 2019;101(6):1146–54.
Sutherland K, Leitch J, Lyall H, Woodward BJ. Time-lapse imaging of inner cell mass splitting with monochorionic triamniotic triplets after elective single embryo transfer: a case report. Reprod BioMed Online. 2019;38(4):491–6.
Ebner T, Sesli O, Kresic S, Enengl S, Stoiber B, Reiter E, et al. Time-lapse imaging of cytoplasmic strings at the blastocyst stage suggests their association with spontaneous blastocoel collapse. Reprod BioMed Online. 2020;40(2):191–9.
Bormann CL, Thirumalaraju P, Kanakasabapathy MK, Kandula H, Souter I, Dimitriadis I, et al. Consistency and objectivity of automated embryo assessments using deep neural networks. Fertil Steril. 2020;113(4):781–7 e781.
Rubio I, Kuhlmann R, Agerholm I, Kirk J, Herrero J, Escribá MJ, et al. Limited implantation success of direct-cleaved human zygotes: a time-lapse study. Fertil Steril. 2012;98(6):1458–63.
Meseguer M. Time-lapse: the remaining questions to be answered. Fertil Steril. 2016;105(2):295–6.
Kirkegaard K, Ahlström A, Ingerslev HJ, Hardarson T. Choosing the best embryo by time lapse versus standard morphology. Fertil Steril. 2015;103(2):323–32.
Fernandez EI, Ferreira AS, Cecilio MHM, Cheles DS, de Souza RCM, Nogueira MFG, et al. Artificial intelligence in the IVF laboratory: overview through the application of different types of algorithms for the classification of reproductive data. J Assist Reprod Genet. 2020;37:2359–76.
Curchoe CL, Bormann CL. Artificial intelligence and machine learning for human reproduction and embryology presented at ASRM and ESHRE 2018. J Assist Reprod Genet. 2019;36(4):591–600.
Bori L, Paya E, Alegre L, Viloria TA, Remohi JA, Naranjo V, et al. Novel and conventional embryo parameters as input data for artificial neural networks: an artificial intelligence model applied for prediction of the implantation potential. Fertil Steril. 2020;114(6):1232–41.
Huang B, Qian K, Li Z, Yue J, Yang W, Zhu G, et al. Neonatal outcomes after early rescue intracytoplasmic sperm injection: an analysis of a 5-year period. Fertil Steril. 2015;103(6):1432–7 e1431.
Zhu L, Xi Q, Zhang H, Li Y, Ai J, Jin L. Blastocyst culture and cryopreservation to optimize clinical outcomes of warming cycles. Reprod BioMed Online. 2013;27(2):154–60.
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition; 2016.
Vabalas A, Gowen E, Poliakoff E, Casson AJ. Machine learning algorithm validation with a limited sample size. PLoS One. 2019;14(11):e0224365.
Morales DA, Bengoetxea E, Larrañaga P. Selection of human embryos for transfer by Bayesian classifiers. Comput Biol Med. 2008;38(11–12):1177–86.
Xu L, Wei X, Yin Y, Wang W, Zhou M. Automatic classification of human embryo microscope images based on LBP feature. In: Chinese Conference on Image & Graphics Technologies, vol. 2014; 2014.
Santos Filho E, Noble JA, Poli M, Griffiths T, Emerson G, Wells D. A method for semi-automatic grading of human blastocyst microscope images. Human Reprod (Oxford, England). 2012;27(9):2641–8.
Dirvanauskas D, Maskeliunas R, Raudonis V, Damasevicius R. Embryo development stage prediction algorithm for automated time lapse incubators. Comput Methods Prog Biomed. 2019;177:161–74.
Khosravi P, Kazemi E, Zhan Q, Malmsten JE, Toschi M, Zisimopoulos P, et al. Deep learning enables robust assessment and selection of human blastocysts after in vitro fertilization. NPJ digital Med. 2019;2:21.
Chavez-Badiola A, Mendizabal-Ruiz G, Flores-Saiffe FA, Garcia-Sanchez R, Drakeley AJ. Deep learning as a predictive tool for fetal heart pregnancy following time-lapse incubation and blastocyst transfer. Hum Reprod. 2020;2:2.
Ahlstrom A, Park H, Bergh C, Selleskog U, Lundin K. Conventional morphology performs better than morphokinetics for prediction of live birth after day 2 transfer. Reprod BioMed Online. 2016;33(1):61–70.
Goodman LR, Goldberg J, Falcone T, Austin C, Desai N. Does the addition of time-lapse morphokinetics in the selection of embryos for transfer improve pregnancy rates? A randomized controlled trial. Fertil Sterility. 2016;105(2):275–285.e210.
Aboulghar MM, El-Faissal Y, Kamel A, Mansour R, Serour G, Aboulghar M, et al. The effect of early administration of rectal progesterone in IVF/ICSI twin pregnancies on the preterm birth rate: a randomized trial. BMC Pregnancy Childbirth. 2020;20(1):351.
Tran D, Cooke S, Illingworth PJ, Gardner DK. Deep learning as a predictive tool for fetal heart pregnancy following time-lapse incubation and blastocyst transfer. Hum Reprod. 2019;34(6):1011–8.
Babayev E, Feinberg EC. Embryo through the lens: from time-lapse cinematography to artificial intelligence. Fertil Steril. 2020;113(2):342–3.
The author would like to thank the nurses and doctors who helped to collect and organize the data. Thanks to Dr. Zhou Li for his suggestions on the experimental design of this article, and Dr. Xinling Ren, Dr. Li Wu, and Dr. Lixia Zhu for their suggestions on writing this article.
This work was supported by the National Natural Science Foundation of China (81801531).
Ethics approval and consent to participate
The study conformed to the Declaration of Helsinki for Medical Research involving Human Subjects. All patients signed written informed consent and underwent the routine clinical treatment performed in our center. No additional intervention was performed. It was approved by the Ethical Committee of Reproductive Medicine Center, Tongji Hospital, Tongji Medicine College, Huazhong University of Science and Technology (No. S097).
Consent for publication
The authors report no financial or commercial conflicts of interest.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Huang, B., Zheng, S., Ma, B. et al. Using deep learning to predict the outcome of live birth from more than 10,000 embryo data. BMC Pregnancy Childbirth 22, 36 (2022). https://doi.org/10.1186/s12884-021-04373-5
- Time-lapse microscopy
- Embryo development
- Embryo quality