Introduction
Remarkable efforts have been put forward to discover prominent non-invasive molecular biomarkers for determination of embryo ploidy status. Non-invasive assessments of embryo ploidy status are de-sirable to replace the need for performing embryo biopsy as a standard sample collection procedure for chromosomal analysis in preimplantation genetic testing for aneuploidy (PGT-A). Current attempts at non-invasive preimplantation genetic testing for aneuploidy (niPGT-A) through biomarkers identified in spent embryo culture media have garnered the attention of several IVF experts (1-3). Correspondence in outcomes between the two methods (PGT-A and niPGT-A) has improved over time but its implementation for routine PGT testing in IVF remains under further investigation. Concurrently, embryo morphokinetic parameters (4-8), specific metabolomics (9), proteomics (10), and artificial intelligence-based image analysis (11-14) have exhibited clinical values that could potentially define embryo ploidy without invasive interventions. Among the aforementioned approaches, developing an AI-based model, which could predict IVF outcomes, is at the top of the list as the most imminent niPGT-A approach.
Widespread utilization of embryo images to develop AI-based prediction models has been observed and even commercialized products are currently available (15). In the broad sense of AI, blastocyst images may contain essential information that could not be comprehended by the naked human eye. Through embryo image processing, the availability of such information, which may encompass intelligence on embryo viability (13, 16) or ploidy status classification (17, 18), is explored. Different types of inputs (trained images) have been utilized in the current literature to develop AI models that predict specific outcomes. Huang et al. (17) have developed an AI platform using 10 hr of blastocyst expansion images (generating±30 sequential images consecutively) extracted from a time-lapse video with the output being embryo ranking for further use. A 2019 study conducted by Tran et al. (19) used raw time-lapse videos (approximately 112 hr of culture) as an input to train a deep learning system that can predict implantation potential. However, other research groups have utilized static images captured by a standard light optical microscope (13, 16) or a combination of static images from a light optical microscope and captured time-lapse images to generalize the applicability of the model considering that time-lapse incubator is not widely adopted worldwide (20). In addition to the diverse inputs, there are also variations in the methods used for AI development (1, 11-13). Both machine learning (ML) and deep learning (DL) algorithms were used for image processing. While image processing in ML sometimes uses additional algorithms such as genetic algorithm (20), a subclass of DL, convolutional neural network (CNN) is the prominent algorithm that can perform a complex task.
The purpose of the current study was to construct a deep learning-based model for ploidy status prediction of human blastocysts by utilizing U-NET segmentation as well as three-and ten-hr sequential time-lapse embryo images before the commencement of blastocyst biopsy. This strategy was performed to confirm whether the two different culture periods contain any useful information that could boost the accuracy of the CNN model, considering embryo development is highly dynamic, particularly when approaching the implantation process.
Methods
Patient population, data collection, and pre-pro-cessing images: This was a single-center cohort study. A total of 425 couples who underwent PGT-A were identified in the private online database of Morula IVF Jakarta Clinic, Jakarta, Indonesia between January 2021 and October 2022. This study protocol was reviewed and approved by the Ethical Committee of Universitas Indonesia with approval number of KET-74/UN2.F1/ ETIK/PPM.00.02/2022.
The indications of studied subjects were recurrent IVF failure following the transfer of top-quality embryo(s), having a history of recurrent miscarriages, and advanced maternal age. These align with the clinic's policy regarding the recommendation of PGT-A for infertile couples. Infertile couples undergoing PGT-A were excluded from the analysis if their PGT-A sample failed to pass quality control, required rebiopsy, or yielded undetermined results. Ovarian stimulation and embryo culture procedures were conducted as previously described (21). Briefly, the embryo was cultured using a time-lapse incubator (Miri TL; Esco Medical, Denmark) immediately following ICSI or IMSI procedure, under a culture condition of 37oC, 6% CO2, and 5% O2. Throughout the study period, either G-TL (Vitrolife, Sweden) or SAGE (Origio, Denmark) was utilized. Blastocyst quality was measured according to the Gardner Grading System by measuring the quality of inner cell mass (ICM), trophectoderm, and the expansion of the blastocyst cavity. Top-quality blastocysts were defined as grades based on blastocoel cavity expansion and AA, AB, and BA according to the quality of ICM and trophectoderm (22). On day 4 of embryo culture, three pulses of laser (OCTAX Laser ShotTM) were shot at the zona pellucida to facilitate herniation of trophectoderm cells. Biopsy procedures were conducted on either day 5 or day 6 depending on the blastocyst expansion. Up to 2-5 trophectoderm cells were collected using a specific pipette (blastomere aspiration pipette; COOK, Ireland). After washing the biopsied embryonic cells in PBS medium supplemented with 1% polyvinylpyrrolidone (Origio, Denmark), samples were then loaded into a 0.2 ml PCR tube (Gen-Follower, China) and sent to the genomic laboratory for ploidy analysis.
Altogether, 1.020 blastocysts were biopsied for PGT-A. Next-generation sequencing (MiSeq Sequencing System; Illumina, USA) was utilized for determining the ploidy status, serving as the ground truth of the dataset. VeriSeq PGS kit was used following the VariSeq PGS Library Prep reference guide (15052877 v04). The PGT-A procedure was conducted following a detailed procedure as previously reported (23). Ploidy analysis was performed using Blue-Fuse software (Illumina, USA), which generated three types of outcomes: euploid (a mixture of euploid and<30 % aneuploid cells), aneuploid (mosaicism with more than 80% aneuploid cells), and mosaic (a mixture of euploid and 30-80% aneuploid cells, with low-level mosaicism defined as 30-50% aneuploid cells while the remaining cells were categorized as high-level mosaicism) (24, 25).
Recorded time-lapse videos of 1.020 blastocysts with known ploidy status were retrieved from time-lapse incubators. This study only utilized blastocyst images captured from the TL videos as input for the CNN-based model. The image extraction process involved a combined effort, aligning raw tabular data with time-lapse video files utilizing Python scripting based on their metadata information. Sequential blastocyst images for specific three- and ten-hr periods preceding the biopsy procedure were extracted and summarized in table 1. The decision to extract blastocyst images 10 hr prior to biopsy was based on prior findings, which emphasized the increased importance of data related to blastocyst formation compared to earlier preimplantation stages (26, 27).
Additionally, an attempt was made to explore whether blastocyst expansion patterns observed 3 hr prior to biopsy could be sufficient and effectively utilized in developing a predictive model. Extracted images with an obscured focus of the blastocysts and those with embryos misaligned from the capture of the TL’s internal camera were excluded. The AI environment was established on a Windows operating system, using an Intel CPU and NVIDIA GPU. Python was the programming language for script management and the TensorFlow library was utilized for CNN classification (28) and to build a U-NET image segmentation model. Tabular data was handled using Pandas and NumPy libraries, and partial data preparation was conducted in Microsoft Excel. Image augmentation was performed using the OpenCV library, as depicted in figure 1.
U-NET image segmentation: A fully convolutional neural network architecture, U-NET, was used for blastocyst image segmentation in the present study as it could easily be updated and trained using a limited dataset. The architecture was inspired by Ronneberger et al. (29), consisting of an encoder and a decoder. However, a few modifications were made to the original structure to tailor it to the unique demands of our image dataset. The encoder area processes the input images to learn and identify structures of the whole blastocyst through convolution, dropout, and max pooling methods. Briefly, each pixel on the raw images was assigned to the groups that belong to a specific part within the image. The decoding area mapped the position of blastocysts through convolution, up-sampling, and skip connections. Specifically, the positions were determined by concatenating or joining the encoder parts with the decoder part in an end-to-end fashion.
Conceptually, the U-NET model is built using convolution, max pooling, dropout, up-sampling, and concatenate layers with each layer complementing the encoder and decoder part. U-NET model is named after its architectural shape (Figure 2). The convolution layer serves as an encoder function responsible for image feature extraction, through which a filter is applied to create a feature map of the input image. The size of the filter and feature map can be specified based on the following sequence of layers. The max pooling layer is responsible for selecting the maximum value from the prominent feature map for every pool or group, thereby enhancing the sharpness of features. On the other hand, the dropout layer temporarily reduces the number of features within a node, mitigating the risk of overfitting or underfitting the model. The application of the dropout layer will result in slight changes during each iteration of model training. Up-sampling layer adjusts the layer dimension to an appropriate node size. However, up-sampling cannot recover any lost information during the process. Lastly, to combine and merge two different nodes into one single node, a concatenate layer is utilized. To visualize, the concatenate layer acts as a bridge that combines two different nodes into one. Eventually, targeted blastocysts could be masked as an output for CNN model training (Figure 3).
Training and validation of the CNN-based model: Our research utilized supervised learning throughout the model-building process to avoid any misfits of wrong segmentation or the possibility of data cluster misplacements and to reduce the like lihood of a systematic error. The non-segmented and segmented images, captured at three- and ten-hr intervals, respectively were tested individually with 80-20 data split to overview the capability of the deep learning approach in predicting ploidy status based on morphological uniqueness and differentiating features of the embryos, which will be elaborated with a pre-trained CNN model. The ratio was determined to minimize the risk of overestimating measurement error (30).
The CNN training environment was set to a similar condition with its pre-trained model as the differentiator between each training sequence. The three pretrained CNN models include ResNet (31), InceptionV3 (32), and EfficientNet (33) which are appropriately selected due to their wide applications in the deep learning field (34-36). Each pretrained model holds various combinations of so-called nodes or layers, and certain layers are designed with features that facilitate efficient model construction. For instance, batch normalization standardizes input layers with a mathematical approach based on the activation condition. Moreover, each pre-trained model was built uniquely and hence possessed different input conditions or image sizes. Despite the relatively small size of an embryo, significant physical information is encapsulated within its image state. Image size or pixels thus indicate meaningful details that correspond to high-performance computational resources.
Furthermore, the model performance is influenced by a combination of image size and feature extraction methods or the pre-trained model selections. Higher image size and layers, however, do not guarantee a more robust model performance. On the contrary, processing larger images with more pixels would require a greater amount of computing power compared to processing smaller images (37). Table 2 shows several pre-trained models with their special input conditions.
Matrix evaluation: Matrix evaluation has become an important part of the model training algorithm as it impacts the model performance in conducting certain prediction tasks. CNN could automatically learn from its previous training iteration and calibrate its layers to an appropriate condition. However, such automation is not dynamic enough to be refined because the natural process of CNN is somewhat hidden and challenging to disclose. Consolidation of several matrix evaluations, therefore, becomes the standard for an AI-based classifier or predictor.
Our approach to obtaining the most robust AI-based model is through the adoption of two matrix evaluations, namely Accuracy and Loss. Individual pre-trained models were assessed based on the accuracy and loss matrix. Accuracy represents the ratio of true positive and true negative values to the total number of classification cases, signifying the initial overall performance of the model. Accuracy has been defined as a standard evaluation metric due to its simplicity and ability to encompass all classification outcomes whether true or false predictions. Loss matrix calculates the confidence level of a model in creating a prediction. A low loss matrix indicates a model with a high confidence value in performing the classification and vice versa. Specifically, the loss function plays a crucial role in evaluating whether a model requires an update in its capability to predict the targeted outcomes.
Results
Extraction of sequential TL blastocyst images over a period of three and ten hr prior to the biopsy procedure yielded around 31 and 97 unique images per TL video, respectively. From a pool of 181 euploid blastocysts TL videos, 5,659 images were captured within the 3 hr preceding biopsy, and 17,772 images were obtained within a 10-hr period before biopsy. Of 390 aneuploid blastocysts TL videos, a total of 12,094 images were gathered from the 3-hr period prior to the biopsy, which increased to 37,915 images within the 10-hr timeframe preceding biopsy. Among 449 TL videos of mosaic blastocysts, the total number of extracted images within the 3 and 10-hr window was 13.889 and 43,637 images, respectively (Table 1). Slight disparities between the sequence of embryo images have culminated in a more complex model assignment for classification. Consequently, using multiple time point images leads to a more accurate result compared to a single time point image classification.
In the current study, every pre-trained model employed possessed distinct layers, each characterized by its unique features as depicted in table 2. The image input sizes for EfficientNetB6, ResNet50V2, and InceptionV3 were 528, 224, and 299, respectively.
A comparison of the two-time image extraction and classification pathway between U-NET segmented and non-segmented embryo images is illustrated in table 3. Integration of U-NET image segmentation significantly boosted model accuracy in both time points. With ResNet50V2 algorithm, accuracy elevated from 0.59 to 0.61, while with the InceptionV3 algorithm, it surged from 0.63 to 0.66. The highest accuracy of 0.66 was attained when employing the ten-hr image series alongside U-NET image segmentation for model prediction development using the InceptionV3 algorithm. Interestingly, our model exhibited a unique trend of slightly higher accuracy when employing non-segmented InceptionV3 and EfficientNetB6 algorithms with three hr of data.
Discussion
Key findings: This study presented a subset of deep learning algorithms known as convolutional neural network (CNN) to predict blastocyst ploidy status. The utilization of U-NET architecture for image segmentation resulted in a model with higher accuracy compared to using raw images without U-NET segmentation. U-NET image segmentation was proposed to enhance the capability of model classification. U-NET segmentation enables an automated process of isolating embryos from the culture dish and any unnecessary image information, allowing the prediction model to focus on prominent embryo images. The present study also serves to strengthen image recognition-based artificial intelligence in the field of IVF. In this study, an AI-based image classification was conducted through a deep learning procedure.
This study demonstrated two case study comparisons using datasets of different durations: three hr and ten hr. Increased model accuracy was observed when sequential images from a ten-hr embryo culture period (prior to biopsy) were extracted. From an AI perspective, these results proved that a higher amount of training data indeed coincides with improved model performance. In addition, dynamic development of blastocyst, particularly during the expansion phase, may contain meaningful information, thus leading to enhanced performance of the AI model as previously suggested (17). Our model demonstrated a trend of slightly better accuracy when utilizing non-segmented InceptionV3 and EfficientNetB6 models with three hr of datasets. However, it is challenging to explicitly elucidate this trend due to the utilization of deep learning methodology, frequently referred to as a "black box" in research.
Artificial neural network (ANN) nodes, mimicking how the human brain works, have the ability to effectively learn specific patterns in the given image. As the ANN nodes are organized hierarchically, each node calculates the weighted sum of the given input by applying a specific activation function to the sum component, which acts as a receiver of weighted input. As a result, ANN can produce the final model that can differentiate the targeted outcome (38). Multilayer perceptron (MLP) is the most common type of ANN architecture and is also a popular foundation of CNN architecture. In general, each neural layer in MLP comprises an input layer, one or several hidden layers, and an output layer. CNN is similar to MLP with regard to using calculated weights in each node and receiving several different inputs as a sum to classify the outcome. Nonetheless, CNN contains multiple MLPs with a high number of neural layers and nodes. In addition, CNN algorithm employs convolution mathematical operations which become an essential part of computation in CNN architecture. Briefly, CNN architecture comprises several extraction phases and fully connected layers that can map specific patterns to classify the outcomes (39). While CNN demonstrates remarkable ability in image classification, the main limitation of CNN is its fully automated differentiation process, which cannot be understood by human logic. This characteristic is often referred to as a "black box". Hence, many IVF experts have argued if the classification is trustworthy.
Comparison with previous studies: Notably, several studies that utilize embryo images for ploidy prediction are reported in the current literature. A 2020 study conducted by Chavez-Badiola et al. (11) is known to be the first that used static images for ploidy status classification. A total of 751 embryo images with known ploidy status were used to construct a ploidy prediction algorithm called the ERICA model through deep learning neural network. The model attained an accuracy of 0.70 in model validation and testing. In 2021, two studies reported the use of raw time-lapse videos as input for ploidy prediction model development. Lee et al. (12) utilized sequential images of embryos captured from time-lapse video and used a deep learning model, Inflated 3D ConvNet (I3D) which obtained high accuracy of 0.74. In contrast, Huang et al. (40) chose to combine the time-lapse embryo videos with the clinical characteristics of the studied participant into the model and achieved an accuracy of 0.8. An interesting study conducted by Huang et al. (17) showed that the average time of blastocyst expansion could be a better predictor for blastocyst ploidy classification. Using U-NET architecture for semantic segmentation, an AI-based approach called AI-qSEA1.0 expansion assay was implemented to rank the quality of blastocysts for clinical use. The researchers then retrospectively analyzed the outcomes of the respective cycles and observed that euploid blastocysts that resulted in live births had a higher expansion rate than those of euploid blastocysts that did not result in live births (p=0.007). As AI-based studies varied highly in terms of the inputs, algorithms or classifiers used, and outcomes, the results cannot be compared (41). Each constructed model is distinct, primarily due to the utilization of different datasets and research designs. In clinical practice, it is expected that the availability of a non-invasive AI-based algorithm could serve as an alternate method for embryo selection in cases in which PTG-A is unaffordable.
The strength of the present study lies in the demonstration of U-NET implementation for blastocyst segmentation which was proven to enhance model training for image classification with the deep learning method. Nonetheless, the limitation of this study is that the blastocyst segmentation does not differentiate the distinct parts of the blastocyst such as the inner cell mass, trophectoderm area, blastocoel cavity, or zona pellucida thickness. This could serve as a solution to improve prediction accuracy. Additionally, the accuracy of the obtained model is not sufficient for being used as a non-invasive approach for predicting blastocyst ploidy status in clinical settings. Also, an imbalance was observed in the training step results for both cases. This noteworthy discovery highlights the presence of bias in the image classification model for predicting embryo ploidy status.
Conclusion
This study demonstrated that extracting TL blastocyst images over a ten-hr period and implementing image segmentation prior to utilizing embryo images in a CNN-based model could enhance the accuracy of the developed model for predicting ploidy status.
Acknowledgement
The authors express gratitude toward the staff of Morula IVF Jakarta Clinic for providing and allowing the authors to use the time-lapse video for developing a ploidy status prediction model.
Funding: The authors received a PUTI Grant from Universitas Indonesia with grant number NKB-1293/UN2.RST/HKP.05.00/2022.
Conflict of Interest
The authors have no conflicts of interest to declare.