Multi-Label Benthic Foraminifera Identification with Convolutional Neural Networks

preprint OA: closed
Full text JSON View at publisher
Full text 168,455 characters · extracted from preprint-html · click to expand
Multi-Label Benthic Foraminifera Identification with Convolutional Neural Networks | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Multi-Label Benthic Foraminifera Identification with Convolutional Neural Networks Kübra YAYAN, Cem BAĞLUM This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-3970510/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Geological studies are of great importance in order to observe the change of living species over the years, to make inferences by using the information provided by the observed species, and to understand the developing and changing structure of the world we live in over the years. However, the examination and interpretation of fossil specimens is a complex and lengthy process. Especially in thin sections where microfossil studies are carried out, more than one fossil and non-fossil structures are often observed together. The detection and classification of fossil specimens with the help of computers simplifies this process as much as possible compared to manual classification processes. This study represents a comparative analysis of three distinct image classification models: CNN, ResNet, and VGG. We developed a custom Convolutional Neural Network (CNN) architecture aimed at advancing the analysis and classification of geological specimens, particularly focusing on the Endless Forams dataset for the identification of benthic foraminiferas. This approach significantly improves the precision of fossil identification, leveraging deep learning to interpret complex image data efficiently. Additionally, we have identified the ResNet-50 and VGG-16 models as optimal for our research purposes due to their advanced capabilities in handling high-dimensional data and their effectiveness in capturing detailed image features. The findings, application for benthic foraminiferas reveal significant insights into the models' performance, underscored by rigorous statistical evaluation, offering a comprehensive understanding of their capabilities and limitations within the realm of image classification. Geology Benthic Foraminifera Multilabel Classification Deep Learning Convolutional Neural Networks Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 Figure 9 Figure 10 Figure 11 Figure 12 Introduction Foraminifers are unicellular and crustacean creatures that mostly live-in sea and fresh water. As life forms, they can be planktonic or benthic (Platon and Gupta 2001 ). Although planktonic foraminifers are single-celled organisms mostly found on the sea surface, benthic foraminifers can be found on sea and ocean floors, at different depths, and 10,000 species still exist (Vickerman 1992). The foraminifera, the first examples of which are thought to have started from the Paleozoic era, are estimated to have lived until the middle of the Jurassic period of the Mesozoic era (Stanley 1999). When the benthic foraminifers are considered, they have acquired different physical properties by adapting to the depth they are on the sea floor. Thicker crust is seen in benthic species living at deeper depths, while species living closer to the surface have a thinner crust. The differences in their physical structures provide a lot of information about the region they are located as a result of the observations made. It provides us with a lot of information about the change of geological structure in the obtained region, age determination, determination of hydrocarbon deposits, the depths of the oceans in which they are located, paleogeographic connections. At the same time, the inferences obtained from foraminifera have great importance in many studies such as petroleum search. However, the identification and classification process of microfossils are a long process as well as a tiring one. The difficulty of the process also causes errors in the classification studies to be carried out. Companies that benefit from underground resources It is necessary to reduce the possibility of error in terms of the difficulty of the process and the operating costs. such as oil companies determine their fields of study by making use of the inferences they make as a result of the examination of fossils. Since the drilling works are very costly, the findings in the area to be excavated should be analyzed and evaluated correctly. Otherwise, it is inevitable that an excavation work to be carried out in the wrong area will have negative financial effects on companies. In addition, paleontologists specialize in certain groups and/or ages. It may not be able to define other fossil groups in detail in sections where many groups coexist. This process can be both tiring and error prone. The classification is evaluated under three sub-headings. These are binary classification, multiclass classification and multilabel classification. The binary classification requires only two classes. These classes are numbered as zero or one. In multiclass classification, there were more than two classes can be used. However, the output of the classification can only be represented with one of these classes. Unlike the other two methods, the output of the multilabel classification can be multiple. In this comprehensive study, we delve into the nuanced process of fossil specimen classification via multilabel classification methods, leveraging the prowess of deep Convolutional Neural Networks (CNNs), Residual Networks (ResNet), and the Visual Geometry Group Network (VGG). The cornerstone of our approach is a bespoke CNN architecture, enhanced for the precise identification of benthic foraminifera, augmented by the strategic integration of ResNet-50 and VGG-16 models. These models excel in navigating the complexities of high-dimensional data and intricate image features. Our methodology employs a dataset enriched with varied cross-sectional views and angular differences of benthic foraminifers to train the deep learning system, aiming to accurately assign new fossil samples to their respective genera. This fusion of advanced deep learning architectures and machine learning techniques not only spotlights the technological strides in image classification but also sets the stage for future explorations in applying deep learning to geological and micropaleontological studies, emphasizing the critical role of physical attributes such as chamber count, dimensions, and shapes in the classification process. Related Works Artificial intelligence-based classification is used in many fields of study. The underlying purpose of this process is to automate the manual classification process. The automated classification process can minimize the errors that may occur during manual classification and shorten the process considerably. Today, there are many fields where multi-label image classification is performed. For example, these fields such as classification in medical fields, recognition of objects, classification between animals and other species belonging to the same animals, and classification of plants. Various techniques such as artificial intelligence, image processing and pattern recognition are used for classification. In the article, a CNN-based red blood cell classification is described. Images in a dataset of RBC images, converted into grayscale and these converted images are filtered with the help of Canny edge detection methods. Afterwards, these preprocessed images were used to train the system. A single RBC image was estimated in the trained model and classification was performed (Parab et al. 2021). In this article, a classifier model developed for the classification of White blood cells is presented to understand acute lymphoblastic leukemia, which can lead to fatal outcomes if left untreated. WBC images, the cytoplasm of which were obtained by applying various image processing methods such as background extraction and contour extraction using Threshold, were classified using the Support Vector Machine (SVM) (Putzu et al. 2014 ). In this paper, a study describing the detection and classification of retinal lesions by multilabel classification is presented. Multilabel classification of on-perfusion regions (NP), microaneurysms, leakages, and laser scars lesions in the retina was performed by training DenseNet, ResNet50 and VGG16 models trained with 4067 samples taken from the eye center in the Second Affiliated Hospital of Zhejiang University School of Medicine (Pan et al. 2020 ). Also in the article, multilabel classification methods are used for skin disease and skin lesion classification. For disease targeted skin disease classification, a CNN model fined tuned with BVLC AlexNet is used. Also, multi-label CNN model fine-tuned with BVLC AlexNet is used for lesion targeted skin disease classification (Liao et al. 2016). These articles can be considered as examples for medical areas. In terms of working areas, also a lot of works have been done to create models that gives much better classification results. In another study, the method developed for the classification of images taken by the UAV is described. The proposed model starts by dividing the image into a collection of equal tiles, which is then used to generate initial estimates for each tile using an appropriate classifier. Following this, a CRF model is applied to the resulting multi-label map to iteratively develop (Zeggada et al. 2018 ). In this study, the method developed for the classification of hyperspectral images is presented. Semi-supervised deep learning, in which a deep neural network is trained using limited labeled data and large amount of unlabeled data, was used for hyperspectral image classification. Deep convolutional recurrent neural networks, which interpret each hyperspectral pixel as a spectral sequence, are used for hyperspectral image classification (Wu and Prasad 2018 ). Another study using the multilabel classification process used training a model with partial labels where only certain labels are available per image. To demonstrate the potential of using partial labels in multi-label datasets, several labeling procedures were first experimentally compared. Next, a new classification loss is added, which takes advantage of the ratio of known labels in each case to learn with missing tags. Graphical Neural Network (GNN) was used for classification. Threshold strategies were followed for the estimation process (Durund et al. 2019). In this article, the Learning spatial editing technique with image level control developed for multilabel classification process is presented. A unified NN is proposed that uses both semantic and spatial relationships between labels only with image-level controls. Given a multi-label image, the proposed Spatial Regulation Network (SRN) creates attention maps for all tags and captures key relationships between them through learnable convolutions. The original results are combined with the regularized classification results by the ResNet-101 network (Zhu et al. 2017). In another paper about multi-label classification, a method proposed. Local data has been added to improve the discriminative power of the feature. In particular, it started by extracting object suggestions from each image and by taking each image as a bag and the object suggestions extracted from it as examples, the multi-label recognition problem was transformed into a multi-class, multi-sample learning problem (Yang et al. 2016 ). A similar study is also discussed in this article. Classification process is provided with a framework consisting of RNN-CNN models. This RNN model predicts multiple tags by finding the estimation path that maximizes a priority probability (Jiang et al. 2016). In this article, a multilabel classification method for colon cancer developed is presented. Four different features, namely Color Histogram, Gray Level Co-occurrence Matrix, Focused Gradient Histogram, and Euler number, were used to create our distinctive feature set. While creating the multi-label model, OAA, OAO and multi-structure SVM is used (Xu et al. 2013 ). Another work about increases the small object classification performance of the CNN models in the multilabel classification cases presents a method that uses LSTM to latent semantic label dependencies sequentially which the regions that contain multiple highly dependent labels (Zhang et al. 2018 ). Another work for optimize the CNNs to accept various sized images. MarsNet is a CNN-based end-to-end network for multilabel classification that can take inputs of varied sizes, as proposed in this study (Park et al. 2020 ). Preliminary Works This study leverages foundational techniques in dataset preparation and image processing to address the challenges of classifying Foraminifera species using neural network models. Through meticulous image selection, labeling, and the application of advanced computer vision methods, we ensure the model's robustness and accuracy. The research not only showcases the integration of technological advancements with marine biology but also sets a precedent for future studies aiming to automate and enhance the precision of ecological monitoring and biodiversity studies. Dataset Preperation The data set is indispensable for training the neural network model to be created. The dataset needs to be properly parsed and labeled. For this, Endless Forams dataset was used. Endless Forams dataset meets the requirements for training the model to be created with thousands of benthic and planktonic foraminifer images. The Foraminifera samples were bright images. For reduce the effect of the brightness, a binary segmentation mask was applied on the images (Xu et al. 2013 ). From this data set, 9 species were selected to test the model. These species are Bulimina tenuata , Bolivina argentea , Bulimina pagoda , Bolivina seminuda , Bolivina spissa , Bolivina subadvena , Epistominella smithi , Trifarina bradyi , and Takanayanagia delicata . The data of the selected species are divided into 2 different groups to be used for 30% testing and 70% training. The samples of the species that used for the training and testing are shown in the Fig. 1 . Test and validation data were generated randomly from the training file. The images of the selected species for use in the study were collected in an empty folder. In order to perform multi-label classification, the images in the data set must be arranged in a systematic way. Collected images were renamed in _ format with the help of Advenced Renamer software. Thus, the images in the data set were made suitable for the label file to be used in system training. This data set, consisting of a total of 8037 images, was labeled using the Microsoft Excel program. Detecting Foraminiferas For making predictions about the foraminiferas, first the samples of the foraminiferas need to be detected in the image. The process of the detection stats with reading the image. Images are read via Python with the help of OpenCV library. The OpenCV library provides real-time optimized computer vision. The OpenCV reads images in BGR (Blue Green Red) color mode. For process the images, the color channels must be decrease. The grayscale images provide 2-dimensional pixel arrays with respect to RGB images. Therefore, The BGR images are converted into grayscale images. To reveal the foraminiferas, thresholding was applied on the images. Multiple thresholding method were applied on to the images and the results were shown that best method was OTSU thresholding method so, the OTSU thresholding was selected. The threshold pixel value selected 70 and maximum value was selected 255. This means the pixels which have pixel value 70 and higher are set to 255. After the threshold application, the images were looked like in the Fig. 2 . As seen from the figure, the bright spots set to all white and the other spots sets to black. To make the images more suitable for processing, the Connected Component Labeling (CCL) method was used. With the CCL method, the images were separated into blobs. The connectivity of the CCL was selected 2. For eliminate the background from the foraminiferas, a Numpy Zeros mask was determined and applied to the CCL blobs. With that way, the locations of the foraminiferas can be detected. The contour detection was implemented by the find contours function in OpenCV. The contouring mode selected as retrieve the external. This mode gives the outer contour if the one contour encloses the other. The method was selected as chain approximation. Only the endpoints required for creating the contour line are returned by this method. When the contours were detected, they sorted for drawing the bounding box one by one. With the rectangle function, the minimum bounding boxes were drawn. The drawn boxes were separated from the whole image and prepared for the prediction. In a for loop, all the individual bounding boxes were sent to the multilabel classification CNN model for prediction. When the predictions were made, the labels returned from the predictions were placed up on the bounding boxes and final segmented and predicted image was obtained. Proposed Method In this study, we introduce a sophisticated approach for the classification of benthic foraminifera utilizing a custom-developed Convolutional Neural Network (CNN) alongside two renowned deep learning architectures, ResNet-50 and VGG-16. Our methodology harnesses the unique strengths of each model to analyze the Endless Forams dataset comprehensively. The custom CNN architecture is designed for initial feature extraction and learning from the high-dimensional, intricate patterns within microfossil images. In parallel, the ResNet-50 model, known for its ability to mitigate the vanishing gradient problem through residual learning, and the VGG-16 model, celebrated for its simplicity and depth in capturing texture and form, are employed to further enhance classification accuracy. This integrative approach aims to leverage the distinctive capabilities of each architecture, ensuring robust and precise identification of benthic foraminiferas. Custom Proposed Convolutional Neural Network Model In the quest for advancing biological classification, our development of a custom Convolutional Neural Network (CNN) offers a specialized solution for the nuanced task of foraminifera identification. This model is fine-tuned to accommodate the unique characteristics of foraminifera imagery, through a meticulous data normalization process that ensures each image is precisely prepared for analysis. By optimizing the training regimen with binary crossentropy loss and an Adam optimizer set at a learning rate of 0.001, our approach is distinctively calibrated for high accuracy in discerning among foraminifera species. This initiative represents a significant leap forward in leveraging deep learning technologies for biological research, particularly in the classification of microorganisms like foraminifera. The model's ability to accurately predict the dominant class within a set of nine, based on a 70% similarity threshold, underscores its potential as a powerful tool for scientists in the field of micropaleontology and beyond. By focusing on the specific requirements of foraminifera classification, this work paves the way for more specialized and effective applications of machine learning in biological sciences. Custom Proposed Model Architecture For train the model, a custom convolutional neural network has been developed. The custom CNN’s might be a slower solution according to pre-trained models but it provides a large space for customize the network as purpose. The images were collected as mentioned in the previous section which, collected images were converted into arrays via Keras preprocessing functions and the pixel values divided by 255. The reason is that the images were RGB images that have pixel value 0 to 255. The divided pixels had the value between 0 to 1. The images also collected in the shape of 200 width and 200 heights with 3 channels. This form will be used as CNN input. The input layers used for feature extraction. First layer of the network starts with 3x3 convolutional layer with 16 filters followed by and rectified linear activation (ReLu). This convolutional layer combined with batch normalization layer and following this layer, added 2x2 max pooling layer. The number of filters is twice of the previous layers. Before the output, the data flattened with the flatten layer. After the flatten layer, it is followed by 512 units dense layer with activation ReLu. The output layer consists of another dense layer with 9 units which is the number of the input classes and activation is used as sigmoid for probabilistic approach. The convolutional model has been created with those steps. The representation of the custom convolutional neural network is shown in the Fig. 3 . The training of the system has realized by binary crossentropy. Adam selected as optimizer with 0.001 learning rate and batch size was chose 64. The training of the system has been realized by Python which is a programming language via Tensorflow. The final model was obtained after the accuracy tests which have been done by changing the hyperparameters and the layers of the model. As a result of the sigmoid activation. The output of the network generates scores from 0 to 1. These scores are representing the prediction accuracy of given test file to the predicted class. The class have the %70 similarity, which is obtained from the result of the classification, has chosen as the dominant class within the 9 class. So, the test image has been classified according to dominant class. Interface of The System For simplify the process of the find the optimal system, an interface was developed. The interface lets user to change specific hyperparameters for train the model. The interface consists of the parameters which the number of epochs, the step per epoch, the batch size, image size and the percentage of the test split of the image set. User can type the numbers for desired parameters via the keyboard. By clicking the start button, the training process of the defined model starts. The interface is shown in the Fig. 4 . ResNet-50 Neural Network Model The ResNet-50 neural network model, recognized for its innovative architecture, stands as a pivotal development in the field of deep learning, particularly in addressing the longstanding issue of vanishing gradients in deep neural networks. It is distinguished by its introduction of skip connections, which ensure the uninterrupted flow of gradients across its 50-layer depth, a key factor in preserving the learning process's integrity. This feature is instrumental in enabling the model to effectively learn identity functions, thereby enhancing its robustness and efficiency in image classification tasks. Within the specific domain of benthic foraminifera classification, the architectural strengths of ResNet-50 are pronounced, offering unmatched precision in recognizing the nuanced differences among these microfossils. Its deep and sophisticated design allows for efficient feature extraction and generalization, significantly improving classification accuracy and, consequently, the reliability of geological research. In geological studies, particularly in the classification of benthic foraminiferas, the deployment of ResNet-50 is grounded on its revolutionary deep residual learning framework. This framework is adept at overcoming the vanishing gradient problem, a notable challenge in training profoundly deep networks, enabling the training of networks with depths of up to 50 layers without compromising performance. The inherent advantages of ResNet-50, encompassing its deep learning capabilities and intricate architectural features, make it particularly suited for the complex task of benthic foraminifera classification. The application of this model not only enhances the precision and generalization of the classification process but also reveals subtle distinctions between species that were previously difficult to detect. These outcomes demonstrate the transformative potential of deep residual networks in elevating the accuracy and efficiency of geological image analysis, underlining the critical role of ResNet-50 in advancing the understanding of benthic foraminifera species and contributing valuable insights into paleoecological and paleoenvironmental studies. The model's architecture and depth play a crucial role in capturing a wide array of features from microfossil images, essential for accurately classifying diverse benthic foraminiferas, thereby showcasing ResNet-50's efficacy in extracting and learning complex visual information from microfossil imagery and highlighting the significant impact of deep learning in geological research. VGG-16 Neural Network Model The VGG-16 neural network model, developed by the Visual Geometry Group at Oxford University, has become a cornerstone in the field of deep learning due to its deep yet straightforward architecture. Specifically designed for large-scale image recognition challenges, such as those presented by ImageNet, VGG-16's 16-layer structure demonstrates exceptional ability in image recognition and classification tasks. Its widespread popularity is attributed to its high accuracy rates in these domains, underlining the model's proficiency in handling complex visual information. This effectiveness is further enhanced by its architectural design, which consists of multiple convolutional layers with 3x3 filters followed by max-pooling layers. This arrangement facilitates the model's learning of visual patterns across various scales, culminating in fully connected layers and a softmax output layer that enable its application in classification tasks. The model's utility extends into the realm of transfer learning, where the pre-trained VGG-16 model serves as a foundational framework for new tasks, offering significant savings in time and resources, especially in scenarios with limited labeled data. In our research, the integration of VGG-16 has been instrumental in advancing the accuracy of microfossil classification, showcasing the model's capacity to discern intricate details within microfossil imagery. The application of VGG-16 has markedly improved our understanding of benthic foraminifera species, contributing valuable data for paleoecological and paleoenvironmental studies. The model's uniform architecture and the depth of its convolutional layers are particularly advantageous for capturing a broad spectrum of features from microfossil images, essential for the accurate classification of diverse benthic foraminiferas. This capability underscores VGG-16's robustness in extracting and learning from the complex visual information present in microfossil imagery, significantly enhancing classification accuracy. The deployment of VGG-16 in this context not only deepens our grasp of microfossil morphology but also highlights the transformative impact of deep learning in geological research. VGG-16 stands out as a potent tool within the scientific community, improving the precision and efficiency of geological sample analysis and enriching our understanding of Earth's historical biodiversity. Results In this comprehensive study, we delve into the comparative analysis of three sophisticated image classification models: a custom-designed Convolutional Neural Network (CNN), ResNet-50, and VGG-16, each renowned for its unique capabilities in the field of deep learning. Utilizing an identical dataset, we aim to rigorously evaluate the performance of these models in classifying a wide range of images. Custom Proposed Convolutional Neural Network Model For finding the optimal CNN model lots of trainings were realized. The training and testing of the system are realized by using the Endless Forams image set. In this dataset, the species which B. tenuata, B. argentea , B. pagoda , B. seminuda , B. spissa , B. subadvena , E. smithi , T. bradyi , and T. delicata . B. tenuata species have 450 samples, B. argentea species have 320 samples, B. pagoda species have 68 samples, B. seminuda species have 938 samples, B. spissa species have 323 samples, B. subadvena species have 236 samples, E. smithi species have 131 samples, T. bradyi species have 54 samples, T. delicata species have 456 samples. The training and evaluation of the system were conducted utilizing Tensorflow 2.4.0 with GPU support, coupled with Python 3.8.18. The computational environment for these operations was underpinned by Ubuntu 16.04.7 LTS as the operating system. Hardware specifications included an Intel Xeon CPU E5-2637 v4 @ 3.50GHz and multiple NVIDIA Quadro P5000 GPUs, complemented by an extensive 128 GB of RAM. Convolutional Neural Network For having the best accuracy results, the CNN model has been tested multiple times. For automate the testing process, the loop that increases the specific parameter at the end of the training was written with Python. The convolutional layers in the following sentences denoted by 16C3 which 16 represents the number of filters and 3 represents the 3x3 kernel size. Also, max pooling layer denoted by P2 where 2 is 2x2 which is the size of the kernel. First of all, the number of blocks is handled. The testing started with [16C3-P2]-128-9 CNN and every step of the test, the feature extraction blocks are increased and number of the filters doubled at every new block. The best results obtained with [16C3-P2] - [32C3-P2] - [64C3-P2] - [128C3-P2] - [256C3-P2] - [512C3-P2] – 128-9 which means 6 blocks for the convolutional layers. Other results were faced with the overfitting problem more than with respect to the obtained model. The maximum accuracy of the initial model obtained as 1.0 and maximum validation accuracy of the system obtained as 0.57. The initial metrics of the model is shown in the Fig. 5 . After all parametric analysis, we can see that the best results obtained when initially 16 filters were used for first convolutional layer. The Number of Input Layer Comparison can be seen in the Table 1 and the results can be seen in the Table 2 . When the number of blocks and number of filters have been obtained, the units of the dense layer handled. As seen from the Table 3 , the highest accuracy and validation accuracy which %96 and %78 obtained when the number of units equals to 512. For decreasing overfitting, the dropout layer has been added to the CNN model. As seen from the results adding dropout layer creating overfitting while training the model (Table 4 ). For overcome this situation, extra dropout layers removed and only left one dropout layer before the output dense layer. The results show that, adding dropout does not fit for this neural network model. After all these experimental works, we can clearly see from the tables that, optimum CNN architecture has the following form for this work; [16C3 - P2] - [32C3 - P2] - [64C3 - P2] - [128C3 - P2] - [256C3 - P2] - [512C3 - P2] – 512–9. The Single Dropout Layer Comparison Table can be seen from Table 5 . Table 1 Number of Input Layer Comparison Table CNN Epoch Step Batch Activation Max Acc. Max Val. Acc. One Block 100 10 64 Sigmoid 1.0 0.57 Two Blocks 100 10 64 Sigmoid 1.0 0.61 Three Blocks 100 10 64 Sigmoid 1.0 0.65 Four blocks 100 10 64 Sigmoid 1.0 0.68 Five Blocks 100 10 64 Sigmoid 0.98 0.71 Six Blocks 100 10 64 Sigmoid 0.90 0.74 Table 2 Number of Filters Comparison Table Filters Epoch Step Batch Activation Max Acc. Max Val. Acc. 8 100 10 64 Sigmoid 0.9 0.69 16 100 10 64 Sigmoid 0.96 0.74 24 100 10 64 Sigmoid 0.93 0.71 32 100 10 64 Sigmoid 0.95 0.72 Table 3 Number of Units for Dense Layer Dense Epoch Step Batch Activation Max Acc. Max Val. Acc. 64N 100 10 64 Sigmoid 0.92 0.73 128N 100 10 64 Sigmoid 0.96 0.74 256N 100 10 64 Sigmoid 0.93 0.72 512N 100 10 64 Sigmoid 0.96 0.78 1024N 100 10 64 Sigmoid 0.95 0.71 Table 4 The Dropout Rate After Convolutional Layers Comparison Table Dropout Per Layer Epoch Step Batch Activation Max Acc. Max Val. Acc. 0.2 100 10 64 Sigmoid 0.75 0.57 0.3 100 10 64 Sigmoid 0.63 0.52 0.4 100 10 64 Sigmoid 0.61 0.50 0.5 100 10 64 Sigmoid 0.56 0.35 0.6 100 10 64 Sigmoid 0.56 0.13 Table 5 Single Dropout Layer Comparison Table Dropout Rate Single Layer Epoch Step Batch Activation Max Acc. Max Val. Acc. 0.2 100 10 64 Sigmoid 0.95 0.71 0.3 100 10 64 Sigmoid 0.89 0.71 0.4 100 10 64 Sigmoid 0.84 0.72 0.5 100 10 64 Sigmoid 0.94 0.74 0.6 100 10 64 Sigmoid 0.87 0.72 Thresholding and Masking The foraminiferas were detected and masked for prediction process. For testing the object detection and contour drawing process; two, three, four, five and six specimens included images were created. First the OTSU threshold is applied on the images and with the help of CCL, the images were separated to blobs and a zeros mask were implemented. When the bounding boxes were detected for the specimens, the images were ready for prediction process. In the Fig. 6 , 4 specimens testing image was handled. The selected specimens for testing image are; Bolivina argentea , Bulimina tenuata , Epistominella smithi , Bulimina pagoda . The thresholding results of the image was shown in the figure. As shown in the figure, the threshold applied images was obtained smoothly. As a result of the applied threshold, white pixels in dot size are seen in the background. However, these pixels have no effect on the detection of specimens. Drawing Bounding Boxes and Labeling With the thresholding and masking process, the specimens were separated from the background for draw bounding boxes and label. The bounding boxes coordinates were obtained by OpenCV bounding box function and drawn by rectangle function. The results show that the protruding structures of the specimens prevent complete enclosing of the created bounding boxes. In order to obtain a more accurate drawing, the desired bounding box drawings were obtained by applying an offset of 10 pixels in the negative x-axis direction, 10 pixels in the negative y-axis direction of the starting coordinates of the bounding boxes and 10 pixels in the positive x-axis direction and 10 pixels in the positive y-axis direction to the ending coordinates of the bounding boxes. With the final coordinates, all the locations of the specimens were obtained. In a for loop, all the boxes were obtained are predicted one by one. The specimens were classified according to trained model. These specimens first resized for used as an input for prediction process. After that, with the help of the Tensorflow’s predict function, the specimens were classified. When the proposed model created for multilabel classification, the prediction results are sorted from the class with the highest probability to the class with the lowest probability. The specimen detection method allows us to use the class with the highest probability for label the specimen. When the maximum argument was picked, the selected value returns to the labeling function and placed on top of the specimens. An example for labeling results is shown in the Fig. 7 . Test Resutlts of The Dataset The training of the created CNN model was done with the dataset that mentioned earlier in this paper. In order to examine the test outputs, a test set separated as %30 of the total dataset was predicted. The Fig. 8 represents the confusion matrix of the test set. The confusion matrix represents the relation between the predicted labels and true labels according to testing set. In every individual box, the above value gives the number of predicted labels corresponding to the correct label, and the value in parentheses gives the probability values of the prediction made. Since the confusion matrix is an n x n matrix, which n is the number of classes, the elements through which the diagonal of the matrix passes are expected to be the closest values to the unit matrix. The higher the probability, the darker the hue was represented. The interpretation of the confusion matrix also provides us with information about the dataset that used. According to Fig. 8 , the highest probability achieved on Takanayanagia delicata with the probability of 0.95 and the lowest probability achieved on. Bolivina subadvena with the probability of 0.35. Although it has a simple structure, the neural network seems to have difficulties in learning this species. It is observed that the neural network confuses this species with B. seminuda and B. spissa . It can be said that this is due to two main reasons. First, the physical structure of this species is similar to the other two species. Another reason might, although it has physical similarities with the other two species, it does not have enough data to detect the differences. Therewithal, a similar situation is observed in T. bradyi , the species with the second worst probability. The species T. bradyi has the fewest specimens in the current data set. For a neural network, the more data is used for training, the more accurate results are obtained. Test Results With Additional Fragments and Particles In real-life applications, not only fossils are found in a thin section sample taken. In addition to fossils, many fragments and particles are also found in these thin sections. Identifying and separating fragments and particles from fossils is of particular importance. Therefore, in addition to the edited dataset used in section e, 2810 fragment samples, 1801 particle samples in the Endless Forams dataset were added and B. subadvena is removed, the model training was reworked. As a result of the first studies, it was seen that the variable structure of the fragments caused confusion in the training of the model. In order to organize the training model and eliminate false positive situations, some adjustments were made to the model. A batch normalization layer was added after each convolutional layer, thus preventing false positive situations during overfitting and model training. As seen in Fig. 9 , when the obtained confusion matrix was examined, it was observed that there was an improvement in the training of the model. It is seen that the highest accuracy was B. tenuata with 95%, and the lowest accuracy was B. spissa with 69%. When the general picture is evaluated, it is seen that there is an increase in the accuracy rates. We explained the reasons for this low accuracy in the previous part. It can be said that the reason for the decrease in the accuracy of the B. spissa classification is due to the similarity with the physical structure with B. seminuda . Examining Fig. 9 , the B. spissa classification produced a false positive with fragments and B. Seminuda . Labeling of The Fragments, Particles and Foraminiferas For testing the new dataset that fragment included, test images were generated. The images were generated with the following pattern; one foraminifera, one fragments; two foraminifera, two fragment; three foraminiferas and three fragments. The foraminifera species and fragments were selected randomly from the dataset. An example result is shown in the Fig. 10 . In the following test image, included foraminifera species are B. seminuda and T. delicata and also a fragment and particle were included. As seen from the figure, included species and fragments are predicted correctly. Also, the accuracy metrics are nearly 100%. The results are show, the new dataset that includes additional fragments and particles, let the model more accurate with respect to the previous datasets. General evaluation of dataset results When a model metrics were obtained, the results depend on many variables. These variables can be expressed with; the architecture of the model, the size of training inputs of the model etc. For obtain the optimum model, we did lots of parametric works. Although studies have brought model optimization up to a point, they have had difficulty taking it further. At this point, it was inevitable to make changes on the dataset. First, B. pagoda and T. bradyi species were extracted from the initial dataset. It was aimed to improve the results by adding the U. peregrina species instead of these species. The reason for this is that the U. peregrina species shows physical differences and the number of samples in the dataset is high. In order to take the improvement results further and to detect non-fossil microscopic fragments that may be encountered in real life applications, the B. subadvena species was removed from the data set and fragments and particles were added instead. At the same time, since the number of images in the final dataset was quite high, improvements had to be made on the CNN model. For this, the model was optimized by adding a batch normalization layer after each convolutional layer. The results of these changes can be seen from Table 6 . The changes made on the dataset and subsequently on the model show that the validation accuracy metric values have increased considerably. Table 6 Datasets Comparison Table Dataset Epoch Batch Activation Max Acc. Max Val. Acc. Dataset 1 50 64 Sigmoid 0.96 0.78 Dataset 2 50 64 Sigmoid 0.99 0.79 Dataset 3 50 64 Sigmoid 0.99 0.88 ResNet-50 Neural Network Model The ResNet model, specifically the ResNet-50 architecture, has undergone extensive testing to evaluate its classification efficacy, with 15 separate tests conducted. These tests were aimed at determining the model's precision in image classification tasks, culminating in a detailed statistical analysis of its performance. Training and Testing Framework The foundational codes for both training and testing the ResNet-50 model were developed utilizing the TensorFlow and Keras libraries. These libraries facilitated several critical tasks, including the detection of GPU availability, the implementation of data augmentation strategies, the customization of the ResNet-50 model itself, and its subsequent training. Additionally, specialized codes were crafted to compute performance metrics and to generate a visual representation of the confusion matrix, thereby offering a comprehensive understanding of the model's classification accuracy. Model Compilation and Training The training regimen for the ResNet-50 model involved freezing all layers except for the custom top layers, tailored to the specific needs of the classification task at hand. The model was compiled using the 'adam' optimization algorithm, with 'categorical_crossentropy' serving as the loss function and 'accuracy' as the primary performance metric. To mitigate the effects of class imbalance within the training dataset, class weights were calculated and integrated into the training process, ensuring a more balanced and fair training outcome. The dataset was partitioned, with a portion allocated for training and another for validation. The model's performance metrics were evaluated after each training epoch using the validation set, and a comprehensive assessment was conducted on a separate test dataset. Predictions derived from the test dataset were then used to calculate key metrics such as accuracy, precision, recall, and the F1 score, providing a well-rounded evaluation of the model's capabilities. Statistical Summary of Model Performance The performance of the ResNet-50 model across 15 runs is summarized in the table below (Table 7 ), showcasing the minimum, maximum, average, and standard deviation for accuracy, precision, recall, and the F1 score. This statistical summary offers insights into the model's consistency and reliability in image classification tasks. Table 7 Evaluation Metrics of ResNet Min. Max Mean Std. Deviation Accuracy 0.85 0.91 0.89 0.02 Precision 0.85 0.93 0.89 0.02 Recall 0.83 0.93 0.89 0.02 F-Score 0.83 0.92 0.88 0.02 The Table 7 displays the overall performance of the ResNet model and the consistency in classification accuracy. The accuracy of the model ranged from a minimum of 85.4369% to a maximum of 91.5858%, with an average accuracy recorded at 89.1046%. Precision, recall, and F-Score also demonstrated similar consistency, indicating that the model has generally performed with balanced outcomes. Visualization of Model Performance The confusion matrix, as illustrated in Fig. 11 for the Endless Foram dataset, further elucidates the model's classification accuracy, offering a detailed breakdown of true positives, false positives, true negatives, and false negatives. These results were obtained using the ResNet-50 model, providing a comprehensive evaluation of its performance in distinguishing between classes. This detailed analysis enables a deeper understanding of the model's strengths and weaknesses in classification tasks, highlighting areas for potential improvement. The ResNet-50 model, with its deep learning architecture, has proven to be a robust and reliable tool for image classification tasks, demonstrating consistent performance across a series of rigorous tests. Its architecture, coupled with careful training and evaluation, has enabled it to achieve high levels of accuracy, making it a valuable asset in the field of computer vision. VGG MODEL The VGG model, specifically tailored using the VGG16 architecture, has been rigorously tested over 15 trials to determine its proficiency in the domain of image classification. These trials were meticulously crafted to quantify the precision of the model in accurately classifying images. The results from these trials are meticulously documented in the Table 8 presented subsequently. Development and Assessment Framework The development and assessment of the VGG model were carried out using the TensorFlow and Keras libraries, providing an array of functionalities necessary for the model's performance. This included verifying the presence of a GPU, implementing data augmentation techniques, and fine-tuning the VGG16 architecture. Scripts for computing the model's performance metrics and for the visualization of the confusion matrix were also developed and employed. Training Procedure Throughout the training phase, the VGG model was configured to render all layers, except the custom-designed final layers, as non-trainable. The compilation of the model utilized the Adam optimizer, set at a learning rate of 0.0001, and was paired with the 'categorical_crossentropy' loss function and 'accuracy' as the evaluation metric. A unique aspect of the training process was the augmentation of the dataset, which introduced a variety of transformations, including rotation, shifts, shearing, zooming, and flipping, to enhance the robustness of the dataset. The model was subjected to a series of epochs, during which it was evaluated against a validation dataset to monitor its evolving performance. Post-training, the model was further evaluated on a test dataset to generate predictions, which formed the basis for the calculation of the model's accuracy, precision, recall, and the F1 score. Performance Summary The statistical data encapsulated in the Table 8 reflects the VGG model's robust and consistent performance across multiple image classification scenarios. With an accuracy range stretching from 84.4660–88.3495%, and an average accuracy of 86.3862%, the model has showcased remarkable stability. The precision, recall, and F-Score also depict steadfast performance, with average values suggestive of the model's reliability and balanced classification capabilities across various classes. The Table 8 shows statistical summary of the VGG model's test results over 15 runs. Table 8 Evaluation Metrics of VGG Min. Max Mean Std. Deviation Accuracy 0.84 0.88 0.86 0.01 Precision 0.82 0.87 0.85 0.01 Recall 0.82 0.87 0.85 0.01 F-Score 0.79 0.86 0.83 0.01 Visualization of Accuracy Trajectory The confusion matrix, depicted in Fig. 12 for the Endless Foram dataset, sheds light on the classification precision achieved by the model, delineating the distribution of true positives, false positives, true negatives, and false negatives. This evaluation was conducted employing the VGG-16 architecture, which facilitates an in-depth assessment of its efficacy in classifying various categories. Such an analysis offers insight into the capabilities and limitations of the model, underscoring opportunities for enhancement in its classification accuracy. The VGG-16 model, renowned for its convolutional network architecture, has established itself as a formidable and dependable instrument for image classification endeavors, showcasing steady efficacy through various stringent evaluations. Its design, along with meticulous training and assessment processes, has facilitated the attainment of notable accuracy rates, rendering it an indispensable resource in the domain of computer vision. Conclusion In this study, three distinct models—CNN, ResNet, and VGG— developed on the classification of benthic foraminifer species using multi-label classification method are presented. The main aim of this study is to make the foraminifer identification process, which is a very long-term, challenging and difficult subject in daily studies, by using technology and increase the precision in the results. Models were meticulously evaluated using the same dataset to ascertain their image classification efficacies. The custom-developed CNN model exhibited remarkable performance on Dataset 3, achieving an impressive 99% training accuracy and 88% validation accuracy, underscoring its potential for high training accuracy albeit with a marginally lower performance on the validation set, particularly for Dataset 3. Contrastingly, the ResNet and VGG models, developed using TensorFlow and Keras libraries, demonstrated their own merits and limitations. The ResNet model showcased consistent performance with an average accuracy of 89% and a peak accuracy of 91.59%, highlighting its reliability and the consistency of its performance. The VGG model, while slightly trailing with a maximum accuracy of 88.35%, still presented a robust performance indicative of its utility in image classification tasks. The findings from this comparative analysis illuminate the intrinsic advantages and drawbacks inherent to each model. The CNN model, while capable of reaching high training accuracies, exhibited a slight dip in validation accuracy, hinting at potential overfitting issues. The ResNet model balanced high accuracy rates with performance consistency, while the VGG model, despite its slightly lower peak accuracy, maintained a reliable performance. This analysis underscores the importance of considering different architectures and training strategies, which can yield varied results across different datasets. It highlights the necessity of evaluating each model's performance relative to the specific characteristics and requirements of the dataset at hand. The comprehensive analysis and evaluation conducted in this study emphasize the critical nature of deploying and optimizing deep learning models effectively. Such comparative studies serve as a beacon for future research, paving the way for advancements in the field of deep learning. Declarations Competing interests The authors have not disclosed any competing interests. Funding The authors have not disclosed any funding. Author Contribution Kübra Yayan checked and wrote the paleontology part of the manuscript. Cem Bağlum developed the artificial intelligence code and conducted the tests. All the author reviewed the manuscript. Data availability Not applicable. References Durand T, Mehrasa N, Mori G (2019) Learning a Deep ConvNet for Multi-Label Classification With Partial Labels. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 647-657 Feng Z, Hongsheng L, Wanli O, Nenghai Y, Xiaogang W (2017) Learning Spatial Regularization With Image-Level Supervisions for Multi-Label Image Classificatio. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 5513-5522 Haofu L, Yuncheng L and Jiebo L (2016) Skin disease classification versus skin lesion characterization: Achieving robust diagnosis using multi-label deep neural networks. 23rd International Conference on Pattern Recognition (ICPR) pp. 355-360. doi: 10.1109/ICPR.2016.7899659 Pan X, Jin K, Cao J. et al. (2020) Multi-label classification of retinal lesions in diabetic retinopathy for automatic analysis of fundus fluorescein angiography based on deep learning. Graefes Arch Clin Exp Ophthalmol 258: 779–785. https://doi.org/10.1007/s00417-019-04575-w. Parab MA, Mehendale ND (2021) Red Blood Cell Classification Using Image Processing and CNN. SN COMPUT. SCI. 2:70. https://doi.org/10.1007/s42979-021-00458-2 Park J, Hwang Y, Lee D and Kim J (2020) MarsNet: Multi-Label Classification Network for Images of Various Sizes. in IEEE Access, vol. 8: 21832-21846, doi: 10.1109/ACCESS.2020.2969217 Platon E and Gupta BS (2001) Benthic Foraminiferal Communities in Oxygen‐Depleted Environments of the Louisiana Continental Shelf. Coastal hypoxia: consequences for living resources and ecosystems. 58: 147-163. doi: https://doi.org/10.1029/CE058p0147 Putzu L, Caocci G, Ruberto CD (2014) Leucocyte classification for leukaemia detection using image processing techniques. Artificial Intelligence in Medicine. 62: 3 pp. 179-191, ISSN 0933-3657. https://doi.org/10.1016/j.artmed.2014.09.002. Stanley SM (2005). Earth system history. Macmillan. Vickerman K (2014) The diversity and ecological significance of protozoa. Biodiversity & Conservation. 1:4, pp. 334–341 Wang, Jiang & Yang, Yi & Mao, Junhua & Huang, Zhiheng & Huang, Chang & Xu, Wei, “CNN-RNN: A Unified Framework for Multi-label Image Classification”, 2285-2294. doi:10.1109/CVPR.2016.251, 2016. Wu H and Prasad S (2018) Semi-Supervised Deep Learning Using Pseudo Labels for Hyperspectral Image Classification. in IEEE Transactions on Image Processing, 27:3, pp. 1259-1270. doi: 10.1109/TIP.2017.2772836. Xu Y, Jiao L, Wang S, Wei J, Fan Y, Lai M, Chang EI (2013) Multi-label classification for colon cancer using histopathological images. Microsc Res Tech. 76(12):1266-77. doi: 10.1002/jemt.22294 Yang H, Zhou JT, Zhang Y, Gao B, Wu J, Cai J (2016) Exploit Bounding Box Annotations for Multi-Label Object Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 280-288 Zeggada, Abdallah & Benbraika, Souad & Melgani, Farid & Mokhtari, Zouhir. (2018). Multilabel Conditional Random Field Classification for UAV Images. IEEE Geoscience and Remote Sensing Letters. PP. 1-5. doi:10.1109/LGRS.2018.2790426. Zhang J, Wu Q, Shen C, Zhang J and Lu J (2018) Multilabel Image Classification With Regional Latent Semantic Dependencies. in IEEE Transactions on Multimedia. 20:10, pp. 2801-2813. doi: 10.1109/TMM.2018.2812605 Additional Declarations No competing interests reported. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-3970510","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":274230739,"identity":"740a2e18-9a73-4552-a529-26538ea43e9f","order_by":0,"name":"Kübra YAYAN","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAuUlEQVRIiWNgGAWjYBACCQY2IFkB4xCv5QxciwGRWhjbSNEi2d6WJl05z8ZuwwHmg7d5GP7kE9QizXPsmOTZbWnJGw6wJVvzMBhYNhDSIieR3ibZuO1wssEBHjNpoBbCLoNomQPSwv+NOC3SEmnHJBsbDtsBbWEjTotkz7Fky4ZjaQmSh9mMLecYGBPWInG8zfBmQ42NPd/x5oc33lTIERHKUJDYwAyiiNfAwGBPgtpRMApGwSgYaQAAlMsyidIHwsEAAAAASUVORK5CYII=","orcid":"","institution":"Eskişehir Osmangazi University","correspondingAuthor":true,"prefix":"","firstName":"Kübra","middleName":"","lastName":"YAYAN","suffix":""},{"id":274230740,"identity":"a1115181-3195-4626-bc26-17c32f5f7cd1","order_by":1,"name":"Cem BAĞLUM","email":"","orcid":"","institution":"Eskişehir Osmangazi University","correspondingAuthor":false,"prefix":"","firstName":"Cem","middleName":"","lastName":"BAĞLUM","suffix":""}],"badges":[],"createdAt":"2024-02-19 17:00:54","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-3970510/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-3970510/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":51563358,"identity":"dec86ae4-4658-4e95-98e0-4fa8a8315ee2","added_by":"auto","created_at":"2024-02-23 18:45:32","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":610917,"visible":true,"origin":"","legend":"\u003cp\u003eThe Samples of The Species\u003c/p\u003e\n\u003cp\u003eHeader line in the generated csv file contains these; ID; genus; B_argentea; B_pagoda; B_seminuda; B_spissa; B_Subadvena; B_tenuata; E_smithi; T_brady; T_delicata. The created tag set is now available to be used in model training.\u003c/p\u003e","description":"","filename":"floatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-3970510/v1/53cad2da266c4524bda0c448.png"},{"id":51563359,"identity":"f3d3c173-c542-49fe-aa35-5c9c8d565c49","added_by":"auto","created_at":"2024-02-23 18:45:32","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":137186,"visible":true,"origin":"","legend":"\u003cp\u003eThe Application Results of The Thresholding Methods\u003c/p\u003e","description":"","filename":"floatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-3970510/v1/45085da36cd517f8ba725f0f.png"},{"id":51564029,"identity":"893dbeb5-8590-4e12-a6e4-ddbef12f9946","added_by":"auto","created_at":"2024-02-23 18:53:32","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":140878,"visible":true,"origin":"","legend":"\u003cp\u003eThe Representation of The CNN Model\u003c/p\u003e","description":"","filename":"floatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-3970510/v1/09f5488118c9e0862f190c4d.png"},{"id":51563364,"identity":"6eed071e-df9a-4e42-b686-74edc88ae0b3","added_by":"auto","created_at":"2024-02-23 18:45:33","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":28999,"visible":true,"origin":"","legend":"\u003cp\u003eThe Interface of The System\u003c/p\u003e","description":"","filename":"floatimage4.png","url":"https://assets-eu.researchsquare.com/files/rs-3970510/v1/d0ba4695e0851b9084032059.png"},{"id":51564030,"identity":"88ae80e4-b694-407f-b21b-bbeafacb1704","added_by":"auto","created_at":"2024-02-23 18:53:33","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":44012,"visible":true,"origin":"","legend":"\u003cp\u003eInitial Metrics of The Model\u003c/p\u003e","description":"","filename":"floatimage5.png","url":"https://assets-eu.researchsquare.com/files/rs-3970510/v1/1f0bc3af4d5d9c50be69de20.png"},{"id":51563363,"identity":"68739188-2a5f-488f-bca5-8207221d9348","added_by":"auto","created_at":"2024-02-23 18:45:33","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":253621,"visible":true,"origin":"","legend":"\u003cp\u003eThe Testing Image and Results of The Threshold Application\u003c/p\u003e","description":"","filename":"floatimage6.png","url":"https://assets-eu.researchsquare.com/files/rs-3970510/v1/e77b98245e398bf2470494e8.png"},{"id":51563366,"identity":"2fa99bc7-a094-4cd4-8bd5-50dd9e2d3cb3","added_by":"auto","created_at":"2024-02-23 18:45:33","extension":"png","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":222297,"visible":true,"origin":"","legend":"\u003cp\u003eThe Result of The Labeling Process\u003c/p\u003e","description":"","filename":"floatimage7.png","url":"https://assets-eu.researchsquare.com/files/rs-3970510/v1/8a792d8db02530df91c40ffb.png"},{"id":51563361,"identity":"63105bce-40f1-425f-b968-9f09d2367967","added_by":"auto","created_at":"2024-02-23 18:45:32","extension":"png","order_by":8,"title":"Figure 8","display":"","copyAsset":false,"role":"figure","size":94075,"visible":true,"origin":"","legend":"\u003cp\u003eConfusion Matrix of Test Image Set\u003c/p\u003e","description":"","filename":"floatimage8.png","url":"https://assets-eu.researchsquare.com/files/rs-3970510/v1/4a0502628fe80fe177eaf1df.png"},{"id":51563362,"identity":"f92652ed-68a3-47d8-9bc4-551795c6cf25","added_by":"auto","created_at":"2024-02-23 18:45:32","extension":"png","order_by":9,"title":"Figure 9","display":"","copyAsset":false,"role":"figure","size":85024,"visible":true,"origin":"","legend":"\u003cp\u003eConfusion Matrix with Additional Fragments and Particles\u003c/p\u003e","description":"","filename":"floatimage9.png","url":"https://assets-eu.researchsquare.com/files/rs-3970510/v1/90aa339479ceeabb1633278b.png"},{"id":51563368,"identity":"17936072-01a8-4a5b-94d1-190c393d8662","added_by":"auto","created_at":"2024-02-23 18:45:33","extension":"png","order_by":10,"title":"Figure 10","display":"","copyAsset":false,"role":"figure","size":185571,"visible":true,"origin":"","legend":"\u003cp\u003eLabeling of Fragments, Particles and Fossils\u003c/p\u003e","description":"","filename":"floatimage10.png","url":"https://assets-eu.researchsquare.com/files/rs-3970510/v1/b6108268b1b91f308797d4a2.png"},{"id":51563370,"identity":"e613d784-b2aa-4ccb-a14e-4dd9eabc3286","added_by":"auto","created_at":"2024-02-23 18:45:33","extension":"png","order_by":11,"title":"Figure 11","display":"","copyAsset":false,"role":"figure","size":350150,"visible":true,"origin":"","legend":"\u003cp\u003eConfusion Matrix of ResNet-50 Model\u003c/p\u003e","description":"","filename":"floatimage11.png","url":"https://assets-eu.researchsquare.com/files/rs-3970510/v1/6fefe27b17b21dece304a0ec.png"},{"id":51563369,"identity":"38fac670-d5d5-4f3f-82ec-46f0c7622f91","added_by":"auto","created_at":"2024-02-23 18:45:33","extension":"png","order_by":12,"title":"Figure 12","display":"","copyAsset":false,"role":"figure","size":138102,"visible":true,"origin":"","legend":"\u003cp\u003eConfusion Matrix Using VGG for Endless Foram Dataset\u003c/p\u003e","description":"","filename":"floatimage12.png","url":"https://assets-eu.researchsquare.com/files/rs-3970510/v1/8807aea2bb50c01523955670.png"},{"id":53060155,"identity":"8aea61af-0727-40f9-8c32-52829b31d23d","added_by":"auto","created_at":"2024-03-20 07:23:10","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":3302456,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-3970510/v1/20108f85-7d12-4bc5-a3f1-b055722f9ce2.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Multi-Label Benthic Foraminifera Identification with Convolutional Neural Networks","fulltext":[{"header":"Introduction","content":"\u003cp\u003eForaminifers are unicellular and crustacean creatures that mostly live-in sea and fresh water. As life forms, they can be planktonic or benthic (Platon and Gupta \u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e2001\u003c/span\u003e). Although planktonic foraminifers are single-celled organisms mostly found on the sea surface, benthic foraminifers can be found on sea and ocean floors, at different depths, and 10,000 species still exist (Vickerman 1992). The foraminifera, the first examples of which are thought to have started from the Paleozoic era, are estimated to have lived until the middle of the Jurassic period of the Mesozoic era (Stanley 1999). When the benthic foraminifers are considered, they have acquired different physical properties by adapting to the depth they are on the sea floor. Thicker crust is seen in benthic species living at deeper depths, while species living closer to the surface have a thinner crust. The differences in their physical structures provide a lot of information about the region they are located as a result of the observations made. It provides us with a lot of information about the change of geological structure in the obtained region, age determination, determination of hydrocarbon deposits, the depths of the oceans in which they are located, paleogeographic connections. At the same time, the inferences obtained from foraminifera have great importance in many studies such as petroleum search. However, the identification and classification process of microfossils are a long process as well as a tiring one. The difficulty of the process also causes errors in the classification studies to be carried out. Companies that benefit from underground resources It is necessary to reduce the possibility of error in terms of the difficulty of the process and the operating costs. such as oil companies determine their fields of study by making use of the inferences they make as a result of the examination of fossils. Since the drilling works are very costly, the findings in the area to be excavated should be analyzed and evaluated correctly. Otherwise, it is inevitable that an excavation work to be carried out in the wrong area will have negative financial effects on companies. In addition, paleontologists specialize in certain groups and/or ages. It may not be able to define other fossil groups in detail in sections where many groups coexist. This process can be both tiring and error prone. The classification is evaluated under three sub-headings. These are binary classification, multiclass classification and multilabel classification. The binary classification requires only two classes. These classes are numbered as zero or one. In multiclass classification, there were more than two classes can be used. However, the output of the classification can only be represented with one of these classes. Unlike the other two methods, the output of the multilabel classification can be multiple.\u003c/p\u003e \u003cp\u003eIn this comprehensive study, we delve into the nuanced process of fossil specimen classification via multilabel classification methods, leveraging the prowess of deep Convolutional Neural Networks (CNNs), Residual Networks (ResNet), and the Visual Geometry Group Network (VGG). The cornerstone of our approach is a bespoke CNN architecture, enhanced for the precise identification of benthic foraminifera, augmented by the strategic integration of ResNet-50 and VGG-16 models. These models excel in navigating the complexities of high-dimensional data and intricate image features. Our methodology employs a dataset enriched with varied cross-sectional views and angular differences of benthic foraminifers to train the deep learning system, aiming to accurately assign new fossil samples to their respective genera. This fusion of advanced deep learning architectures and machine learning techniques not only spotlights the technological strides in image classification but also sets the stage for future explorations in applying deep learning to geological and micropaleontological studies, emphasizing the critical role of physical attributes such as chamber count, dimensions, and shapes in the classification process.\u003c/p\u003e"},{"header":"Related Works","content":"\u003cp\u003eArtificial intelligence-based classification is used in many fields of study. The underlying purpose of this process is to automate the manual classification process. The automated classification process can minimize the errors that may occur during manual classification and shorten the process considerably.\u003c/p\u003e \u003cp\u003eToday, there are many fields where multi-label image classification is performed. For example, these fields such as classification in medical fields, recognition of objects, classification between animals and other species belonging to the same animals, and classification of plants. Various techniques such as artificial intelligence, image processing and pattern recognition are used for classification. In the article, a CNN-based red blood cell classification is described. Images in a dataset of RBC images, converted into grayscale and these converted images are filtered with the help of Canny edge detection methods. Afterwards, these preprocessed images were used to train the system. A single RBC image was estimated in the trained model and classification was performed (Parab et al. 2021). In this article, a classifier model developed for the classification of White blood cells is presented to understand acute lymphoblastic leukemia, which can lead to fatal outcomes if left untreated. WBC images, the cytoplasm of which were obtained by applying various image processing methods such as background extraction and contour extraction using Threshold, were classified using the Support Vector Machine (SVM) (Putzu et al. \u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e2014\u003c/span\u003e). In this paper, a study describing the detection and classification of retinal lesions by multilabel classification is presented. Multilabel classification of on-perfusion regions (NP), microaneurysms, leakages, and laser scars lesions in the retina was performed by training DenseNet, ResNet50 and VGG16 models trained with 4067 samples taken from the eye center in the Second Affiliated Hospital of Zhejiang University School of Medicine (Pan et al. \u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e2020\u003c/span\u003e). Also in the article, multilabel classification methods are used for skin disease and skin lesion classification. For disease targeted skin disease classification, a CNN model fined tuned with BVLC AlexNet is used. Also, multi-label CNN model fine-tuned with BVLC AlexNet is used for lesion targeted skin disease classification (Liao et al. 2016). These articles can be considered as examples for medical areas.\u003c/p\u003e \u003cp\u003eIn terms of working areas, also a lot of works have been done to create models that gives much better classification results. In another study, the method developed for the classification of images taken by the UAV is described. The proposed model starts by dividing the image into a collection of equal tiles, which is then used to generate initial estimates for each tile using an appropriate classifier. Following this, a CRF model is applied to the resulting multi-label map to iteratively develop (Zeggada et al. \u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e2018\u003c/span\u003e). In this study, the method developed for the classification of hyperspectral images is presented. Semi-supervised deep learning, in which a deep neural network is trained using limited labeled data and large amount of unlabeled data, was used for hyperspectral image classification. Deep convolutional recurrent neural networks, which interpret each hyperspectral pixel as a spectral sequence, are used for hyperspectral image classification (Wu and Prasad \u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e2018\u003c/span\u003e). Another study using the multilabel classification process used training a model with partial labels where only certain labels are available per image. To demonstrate the potential of using partial labels in multi-label datasets, several labeling procedures were first experimentally compared. Next, a new classification loss is added, which takes advantage of the ratio of known labels in each case to learn with missing tags. Graphical Neural Network (GNN) was used for classification. Threshold strategies were followed for the estimation process (Durund et al. 2019). In this article, the Learning spatial editing technique with image level control developed for multilabel classification process is presented. A unified NN is proposed that uses both semantic and spatial relationships between labels only with image-level controls. Given a multi-label image, the proposed Spatial Regulation Network (SRN) creates attention maps for all tags and captures key relationships between them through learnable convolutions. The original results are combined with the regularized classification results by the ResNet-101 network (Zhu et al. 2017). In another paper about multi-label classification, a method proposed. Local data has been added to improve the discriminative power of the feature. In particular, it started by extracting object suggestions from each image and by taking each image as a bag and the object suggestions extracted from it as examples, the multi-label recognition problem was transformed into a multi-class, multi-sample learning problem (Yang et al. \u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e2016\u003c/span\u003e). A similar study is also discussed in this article. Classification process is provided with a framework consisting of RNN-CNN models. This RNN model predicts multiple tags by finding the estimation path that maximizes a priority probability (Jiang et al. 2016). In this article, a multilabel classification method for colon cancer developed is presented. Four different features, namely Color Histogram, Gray Level Co-occurrence Matrix, Focused Gradient Histogram, and Euler number, were used to create our distinctive feature set. While creating the multi-label model, OAA, OAO and multi-structure SVM is used (Xu et al. \u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e2013\u003c/span\u003e). Another work about increases the small object classification performance of the CNN models in the multilabel classification cases presents a method that uses LSTM to latent semantic label dependencies sequentially which the regions that contain multiple highly dependent labels (Zhang et al. \u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e2018\u003c/span\u003e). Another work for optimize the CNNs to accept various sized images. MarsNet is a CNN-based end-to-end network for multilabel classification that can take inputs of varied sizes, as proposed in this study (Park et al. \u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e2020\u003c/span\u003e).\u003c/p\u003e"},{"header":"Preliminary Works","content":"\u003cp\u003eThis study leverages foundational techniques in dataset preparation and image processing to address the challenges of classifying Foraminifera species using neural network models. Through meticulous image selection, labeling, and the application of advanced computer vision methods, we ensure the model's robustness and accuracy. The research not only showcases the integration of technological advancements with marine biology but also sets a precedent for future studies aiming to automate and enhance the precision of ecological monitoring and biodiversity studies.\u003c/p\u003e \u003cdiv id=\"Sec4\" class=\"Section2\"\u003e \u003ch2\u003eDataset Preperation\u003c/h2\u003e \u003cp\u003eThe data set is indispensable for training the neural network model to be created. The dataset needs to be properly parsed and labeled. For this, \u003cem\u003eEndless Forams\u003c/em\u003e dataset was used. \u003cem\u003eEndless Forams\u003c/em\u003e dataset meets the requirements for training the model to be created with thousands of benthic and planktonic foraminifer images. The Foraminifera samples were bright images. For reduce the effect of the brightness, a binary segmentation mask was applied on the images (Xu et al. \u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e2013\u003c/span\u003e). From this data set, 9 species were selected to test the model. These species are \u003cem\u003eBulimina tenuata\u003c/em\u003e, \u003cem\u003eBolivina argentea\u003c/em\u003e, \u003cem\u003eBulimina pagoda\u003c/em\u003e, \u003cem\u003eBolivina seminuda\u003c/em\u003e, \u003cem\u003eBolivina spissa\u003c/em\u003e, \u003cem\u003eBolivina subadvena\u003c/em\u003e, \u003cem\u003eEpistominella smithi\u003c/em\u003e, \u003cem\u003eTrifarina bradyi\u003c/em\u003e, and \u003cem\u003eTakanayanagia delicata\u003c/em\u003e. The data of the selected species are divided into 2 different groups to be used for 30% testing and 70% training. The samples of the species that used for the training and testing are shown in the Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e. Test and validation data were generated randomly from the training file. The images of the selected species for use in the study were collected in an empty folder. In order to perform multi-label classification, the images in the data set must be arranged in a systematic way. Collected images were renamed in \u0026lt;\u0026thinsp;LabelName\u0026gt;_\u0026lt;#ofImage\u0026thinsp;\u0026gt;\u0026thinsp;format with the help of Advenced Renamer software. Thus, the images in the data set were made suitable for the label file to be used in system training. This data set, consisting of a total of 8037 images, was labeled using the Microsoft Excel program.\u003c/p\u003e\u003c/div\u003e \u003cdiv id=\"Sec5\" class=\"Section2\"\u003e \u003ch2\u003eDetecting Foraminiferas\u003c/h2\u003e \u003cp\u003eFor making predictions about the foraminiferas, first the samples of the foraminiferas need to be detected in the image. The process of the detection stats with reading the image. Images are read via Python with the help of OpenCV library. The OpenCV library provides real-time optimized computer vision. The OpenCV reads images in BGR (Blue Green Red) color mode. For process the images, the color channels must be decrease. The grayscale images provide 2-dimensional pixel arrays with respect to RGB images. Therefore, The BGR images are converted into grayscale images. To reveal the foraminiferas, thresholding was applied on the images. Multiple thresholding method were applied on to the images and the results were shown that best method was OTSU thresholding method so, the OTSU thresholding was selected. The threshold pixel value selected 70 and maximum value was selected 255. This means the pixels which have pixel value 70 and higher are set to 255. After the threshold application, the images were looked like in the Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e. As seen from the figure, the bright spots set to all white and the other spots sets to black. To make the images more suitable for processing, the Connected Component Labeling (CCL) method was used. With the CCL method, the images were separated into blobs. The connectivity of the CCL was selected 2. For eliminate the background from the foraminiferas, a Numpy Zeros mask was determined and applied to the CCL blobs. With that way, the locations of the foraminiferas can be detected. The contour detection was implemented by the find contours function in OpenCV.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eThe contouring mode selected as retrieve the external. This mode gives the outer contour if the one contour encloses the other. The method was selected as chain approximation. Only the endpoints required for creating the contour line are returned by this method. When the contours were detected, they sorted for drawing the bounding box one by one. With the rectangle function, the minimum bounding boxes were drawn. The drawn boxes were separated from the whole image and prepared for the prediction. In a for loop, all the individual bounding boxes were sent to the multilabel classification CNN model for prediction. When the predictions were made, the labels returned from the predictions were placed up on the bounding boxes and final segmented and predicted image was obtained.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec6\" class=\"Section2\"\u003e \u003ch2\u003eProposed Method\u003c/h2\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003eIn this study, we introduce a sophisticated approach for the classification of benthic foraminifera utilizing a custom-developed Convolutional Neural Network (CNN) alongside two renowned deep learning architectures, ResNet-50 and VGG-16. Our methodology harnesses the unique strengths of each model to analyze the Endless Forams dataset comprehensively. The custom CNN architecture is designed for initial feature extraction and learning from the high-dimensional, intricate patterns within microfossil images. In parallel, the ResNet-50 model, known for its ability to mitigate the vanishing gradient problem through residual learning, and the VGG-16 model, celebrated for its simplicity and depth in capturing texture and form, are employed to further enhance classification accuracy. This integrative approach aims to leverage the distinctive capabilities of each architecture, ensuring robust and precise identification of benthic foraminiferas.\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec7\" class=\"Section2\"\u003e \u003ch2\u003eCustom Proposed Convolutional Neural Network Model\u003c/h2\u003e \u003cp\u003eIn the quest for advancing biological classification, our development of a custom Convolutional Neural Network (CNN) offers a specialized solution for the nuanced task of foraminifera identification. This model is fine-tuned to accommodate the unique characteristics of foraminifera imagery, through a meticulous data normalization process that ensures each image is precisely prepared for analysis. By optimizing the training regimen with binary crossentropy loss and an Adam optimizer set at a learning rate of 0.001, our approach is distinctively calibrated for high accuracy in discerning among foraminifera species.\u003c/p\u003e \u003cp\u003eThis initiative represents a significant leap forward in leveraging deep learning technologies for biological research, particularly in the classification of microorganisms like foraminifera. The model's ability to accurately predict the dominant class within a set of nine, based on a 70% similarity threshold, underscores its potential as a powerful tool for scientists in the field of micropaleontology and beyond. By focusing on the specific requirements of foraminifera classification, this work paves the way for more specialized and effective applications of machine learning in biological sciences.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003ch2\u003eCustom Proposed Model Architecture\u003c/h2\u003e \u003cp\u003eFor train the model, a custom convolutional neural network has been developed. The custom CNN\u0026rsquo;s might be a slower solution according to pre-trained models but it provides a large space for customize the network as purpose. The images were collected as mentioned in the previous section which, collected images were converted into arrays via Keras preprocessing functions and the pixel values divided by 255. The reason is that the images were RGB images that have pixel value 0 to 255. The divided pixels had the value between 0 to 1. The images also collected in the shape of 200 width and 200 heights with 3 channels. This form will be used as CNN input. The input layers used for feature extraction. First layer of the network starts with 3x3 convolutional layer with 16 filters followed by and rectified linear activation (ReLu). This convolutional layer combined with batch normalization layer and following this layer, added 2x2 max pooling layer. The number of filters is twice of the previous layers. Before the output, the data flattened with the flatten layer. After the flatten layer, it is followed by 512 units dense layer with activation ReLu. The output layer consists of another dense layer with 9 units which is the number of the input classes and activation is used as sigmoid for probabilistic approach. The convolutional model has been created with those steps. The representation of the custom convolutional neural network is shown in the Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e. The training of the system has realized by binary crossentropy. Adam selected as optimizer with 0.001 learning rate and batch size was chose 64. The training of the system has been realized by Python which is a programming language via Tensorflow. The final model was obtained after the accuracy tests which have been done by changing the hyperparameters and the layers of the model. As a result of the sigmoid activation. The output of the network generates scores from 0 to 1. These scores are representing the prediction accuracy of given test file to the predicted class. The class have the %70 similarity, which is obtained from the result of the classification, has chosen as the dominant class within the 9 class. So, the test image has been classified according to dominant class.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec9\" class=\"Section2\"\u003e \u003ch2\u003eInterface of The System\u003c/h2\u003e \u003cp\u003eFor simplify the process of the find the optimal system, an interface was developed. The interface lets user to change specific hyperparameters for train the model. The interface consists of the parameters which the number of epochs, the step per epoch, the batch size, image size and the percentage of the test split of the image set. User can type the numbers for desired parameters via the keyboard. By clicking the start button, the training process of the defined model starts. The interface is shown in the Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cdiv id=\"Sec10\" class=\"Section3\"\u003e \u003ch2\u003eResNet-50 Neural Network Model\u003c/h2\u003e \u003cp\u003eThe ResNet-50 neural network model, recognized for its innovative architecture, stands as a pivotal development in the field of deep learning, particularly in addressing the longstanding issue of vanishing gradients in deep neural networks. It is distinguished by its introduction of skip connections, which ensure the uninterrupted flow of gradients across its 50-layer depth, a key factor in preserving the learning process's integrity. This feature is instrumental in enabling the model to effectively learn identity functions, thereby enhancing its robustness and efficiency in image classification tasks. Within the specific domain of benthic foraminifera classification, the architectural strengths of ResNet-50 are pronounced, offering unmatched precision in recognizing the nuanced differences among these microfossils. Its deep and sophisticated design allows for efficient feature extraction and generalization, significantly improving classification accuracy and, consequently, the reliability of geological research.\u003c/p\u003e \u003cp\u003eIn geological studies, particularly in the classification of benthic foraminiferas, the deployment of ResNet-50 is grounded on its revolutionary deep residual learning framework. This framework is adept at overcoming the vanishing gradient problem, a notable challenge in training profoundly deep networks, enabling the training of networks with depths of up to 50 layers without compromising performance. The inherent advantages of ResNet-50, encompassing its deep learning capabilities and intricate architectural features, make it particularly suited for the complex task of benthic foraminifera classification. The application of this model not only enhances the precision and generalization of the classification process but also reveals subtle distinctions between species that were previously difficult to detect. These outcomes demonstrate the transformative potential of deep residual networks in elevating the accuracy and efficiency of geological image analysis, underlining the critical role of ResNet-50 in advancing the understanding of benthic foraminifera species and contributing valuable insights into paleoecological and paleoenvironmental studies. The model's architecture and depth play a crucial role in capturing a wide array of features from microfossil images, essential for accurately classifying diverse benthic foraminiferas, thereby showcasing ResNet-50's efficacy in extracting and learning complex visual information from microfossil imagery and highlighting the significant impact of deep learning in geological research.\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv id=\"Sec11\" class=\"Section2\"\u003e \u003ch2\u003eVGG-16 Neural Network Model\u003c/h2\u003e \u003cp\u003eThe VGG-16 neural network model, developed by the Visual Geometry Group at Oxford University, has become a cornerstone in the field of deep learning due to its deep yet straightforward architecture. Specifically designed for large-scale image recognition challenges, such as those presented by ImageNet, VGG-16's 16-layer structure demonstrates exceptional ability in image recognition and classification tasks. Its widespread popularity is attributed to its high accuracy rates in these domains, underlining the model's proficiency in handling complex visual information. This effectiveness is further enhanced by its architectural design, which consists of multiple convolutional layers with 3x3 filters followed by max-pooling layers. This arrangement facilitates the model's learning of visual patterns across various scales, culminating in fully connected layers and a softmax output layer that enable its application in classification tasks. The model's utility extends into the realm of transfer learning, where the pre-trained VGG-16 model serves as a foundational framework for new tasks, offering significant savings in time and resources, especially in scenarios with limited labeled data.\u003c/p\u003e \u003cp\u003eIn our research, the integration of VGG-16 has been instrumental in advancing the accuracy of microfossil classification, showcasing the model's capacity to discern intricate details within microfossil imagery. The application of VGG-16 has markedly improved our understanding of benthic foraminifera species, contributing valuable data for paleoecological and paleoenvironmental studies. The model's uniform architecture and the depth of its convolutional layers are particularly advantageous for capturing a broad spectrum of features from microfossil images, essential for the accurate classification of diverse benthic foraminiferas. This capability underscores VGG-16's robustness in extracting and learning from the complex visual information present in microfossil imagery, significantly enhancing classification accuracy. The deployment of VGG-16 in this context not only deepens our grasp of microfossil morphology but also highlights the transformative impact of deep learning in geological research. VGG-16 stands out as a potent tool within the scientific community, improving the precision and efficiency of geological sample analysis and enriching our understanding of Earth's historical biodiversity.\u003c/p\u003e \u003c/div\u003e"},{"header":"Results","content":"\u003cp\u003eIn this comprehensive study, we delve into the comparative analysis of three sophisticated image classification models: a custom-designed Convolutional Neural Network (CNN), ResNet-50, and VGG-16, each renowned for its unique capabilities in the field of deep learning. Utilizing an identical dataset, we aim to rigorously evaluate the performance of these models in classifying a wide range of images.\u003c/p\u003e \u003cdiv id=\"Sec13\" class=\"Section2\"\u003e \u003ch2\u003eCustom Proposed Convolutional Neural Network Model\u003c/h2\u003e \u003cp\u003eFor finding the optimal CNN model lots of trainings were realized. The training and testing of the system are realized by using the Endless Forams image set. In this dataset, the species which \u003cem\u003eB. tenuata, B. argentea\u003c/em\u003e, \u003cem\u003eB. pagoda\u003c/em\u003e, \u003cem\u003eB. seminuda\u003c/em\u003e, \u003cem\u003eB. spissa\u003c/em\u003e, \u003cem\u003eB. subadvena\u003c/em\u003e, \u003cem\u003eE. smithi\u003c/em\u003e, \u003cem\u003eT. bradyi\u003c/em\u003e, and \u003cem\u003eT. delicata\u003c/em\u003e. \u003cem\u003eB. tenuata\u003c/em\u003e species have 450 samples, \u003cem\u003eB. argentea\u003c/em\u003e species have 320 samples, \u003cem\u003eB. pagoda\u003c/em\u003e species have 68 samples, \u003cem\u003eB. seminuda\u003c/em\u003e species have 938 samples, \u003cem\u003eB. spissa\u003c/em\u003e species have 323 samples, \u003cem\u003eB. subadvena\u003c/em\u003e species have 236 samples, \u003cem\u003eE. smithi\u003c/em\u003e species have 131 samples, \u003cem\u003eT. bradyi\u003c/em\u003e species have 54 samples, \u003cem\u003eT. delicata\u003c/em\u003e species have 456 samples.\u003c/p\u003e \u003cp\u003eThe training and evaluation of the system were conducted utilizing Tensorflow 2.4.0 with GPU support, coupled with Python 3.8.18. The computational environment for these operations was underpinned by Ubuntu 16.04.7 LTS as the operating system. Hardware specifications included an Intel Xeon CPU E5-2637 v4 @ 3.50GHz and multiple NVIDIA Quadro P5000 GPUs, complemented by an extensive 128 GB of RAM.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec14\" class=\"Section2\"\u003e \u003ch2\u003eConvolutional Neural Network\u003c/h2\u003e \u003cp\u003eFor having the best accuracy results, the CNN model has been tested multiple times. For automate the testing process, the loop that increases the specific parameter at the end of the training was written with Python. The convolutional layers in the following sentences denoted by 16C3 which 16 represents the number of filters and 3 represents the 3x3 kernel size. Also, max pooling layer denoted by P2 where 2 is 2x2 which is the size of the kernel. First of all, the number of blocks is handled. The testing started with [16C3-P2]-128-9 CNN and every step of the test, the feature extraction blocks are increased and number of the filters doubled at every new block. The best results obtained with [16C3-P2] - [32C3-P2] - [64C3-P2] - [128C3-P2] - [256C3-P2] - [512C3-P2] \u0026ndash; 128-9 which means 6 blocks for the convolutional layers. Other results were faced with the overfitting problem more than with respect to the obtained model. The maximum accuracy of the initial model obtained as 1.0 and maximum validation accuracy of the system obtained as 0.57. The initial metrics of the model is shown in the Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003e.\u003c/p\u003e \u003cp\u003eAfter all parametric analysis, we can see that the best results obtained when initially 16 filters were used for first convolutional layer. The Number of Input Layer Comparison can be seen in the Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e and the results can be seen in the Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e. When the number of blocks and number of filters have been obtained, the units of the dense layer handled. As seen from the Table\u0026nbsp;\u003cspan refid=\"Tab3\" class=\"InternalRef\"\u003e3\u003c/span\u003e, the highest accuracy and validation accuracy which %96 and %78 obtained when the number of units equals to 512. For decreasing overfitting, the dropout layer has been added to the CNN model. As seen from the results adding dropout layer creating overfitting while training the model (Table\u0026nbsp;\u003cspan refid=\"Tab4\" class=\"InternalRef\"\u003e4\u003c/span\u003e). For overcome this situation, extra dropout layers removed and only left one dropout layer before the output dense layer. The results show that, adding dropout does not fit for this neural network model. After all these experimental works, we can clearly see from the tables that, optimum CNN architecture has the following form for this work; [16C3 - P2] - [32C3 - P2] - [64C3 - P2] - [128C3 - P2] - [256C3 - P2] - [512C3 - P2] \u0026ndash; 512\u0026ndash;9. The Single Dropout Layer Comparison Table can be seen from Table\u0026nbsp;\u003cspan refid=\"Tab5\" class=\"InternalRef\"\u003e5\u003c/span\u003e.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eNumber of Input Layer Comparison Table\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"7\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCNN\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eEpoch\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eStep\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eBatch\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eActivation\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eMax Acc.\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c7\"\u003e \u003cp\u003eMax Val. Acc.\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eOne Block\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e100\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e10\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e64\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eSigmoid\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e1.0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.57\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eTwo Blocks\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e100\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e10\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e64\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eSigmoid\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e1.0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.61\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eThree Blocks\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e100\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e10\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e64\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eSigmoid\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e1.0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.65\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eFour blocks\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e100\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e10\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e64\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eSigmoid\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e1.0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.68\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eFive Blocks\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e100\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e10\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e64\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eSigmoid\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.98\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.71\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSix Blocks\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e100\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e10\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e64\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eSigmoid\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.90\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.74\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eNumber of Filters Comparison Table\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"7\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eFilters\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eEpoch\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eStep\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eBatch\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eActivation\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eMax Acc.\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c7\"\u003e \u003cp\u003eMax Val. Acc.\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e8\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e100\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e10\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e64\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eSigmoid\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.9\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.69\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e16\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e100\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e10\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e64\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eSigmoid\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.96\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.74\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e24\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e100\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e10\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e64\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eSigmoid\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.93\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.71\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e32\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e100\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e10\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e64\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eSigmoid\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.95\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.72\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab3\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 3\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eNumber of Units for Dense Layer\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"7\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDense\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eEpoch\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eStep\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eBatch\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eActivation\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eMax Acc.\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c7\"\u003e \u003cp\u003eMax Val. Acc.\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e64N\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e100\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e10\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e64\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eSigmoid\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.92\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.73\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e128N\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e100\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e10\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e64\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eSigmoid\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.96\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.74\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e256N\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e100\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e10\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e64\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eSigmoid\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.93\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.72\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e512N\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e100\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e10\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e64\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eSigmoid\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.96\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.78\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e1024N\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e100\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e10\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e64\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eSigmoid\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.95\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.71\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab4\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 4\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eThe Dropout Rate After Convolutional Layers Comparison Table\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"7\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDropout Per Layer\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eEpoch\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eStep\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eBatch\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eActivation\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eMax Acc.\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c7\"\u003e \u003cp\u003eMax Val. Acc.\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e0.2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e100\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e10\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e64\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eSigmoid\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.75\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.57\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e0.3\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e100\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e10\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e64\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eSigmoid\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.63\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.52\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e0.4\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e100\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e10\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e64\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eSigmoid\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.61\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.50\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e0.5\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e100\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e10\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e64\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eSigmoid\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.56\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.35\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e0.6\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e100\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e10\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e64\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eSigmoid\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.56\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.13\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab5\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 5\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eSingle Dropout Layer Comparison Table\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"7\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDropout Rate\u003c/p\u003e \u003cp\u003eSingle Layer\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eEpoch\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eStep\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eBatch\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eActivation\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eMax Acc.\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c7\"\u003e \u003cp\u003eMax Val. Acc.\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e0.2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e100\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e10\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e64\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eSigmoid\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.95\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.71\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e0.3\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e100\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e10\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e64\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eSigmoid\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.89\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.71\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e0.4\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e100\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e10\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e64\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eSigmoid\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.84\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.72\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e0.5\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e100\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e10\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e64\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eSigmoid\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.94\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.74\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e0.6\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e100\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e10\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e64\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eSigmoid\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.87\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.72\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec15\" class=\"Section2\"\u003e \u003ch2\u003eThresholding and Masking\u003c/h2\u003e \u003cp\u003eThe foraminiferas were detected and masked for prediction process. For testing the object detection and contour drawing process; two, three, four, five and six specimens included images were created. First the OTSU threshold is applied on the images and with the help of CCL, the images were separated to blobs and a zeros mask were implemented. When the bounding boxes were detected for the specimens, the images were ready for prediction process. In the Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003e, \u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e specimens testing image was handled. The selected specimens for testing image are; \u003cem\u003eBolivina argentea\u003c/em\u003e, \u003cem\u003eBulimina tenuata\u003c/em\u003e, \u003cem\u003eEpistominella smithi\u003c/em\u003e, \u003cem\u003eBulimina pagoda\u003c/em\u003e. The thresholding results of the image was shown in the figure. As shown in the figure, the threshold applied images was obtained smoothly. As a result of the applied threshold, white pixels in dot size are seen in the background. However, these pixels have no effect on the detection of specimens.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec16\" class=\"Section2\"\u003e \u003ch2\u003eDrawing Bounding Boxes and Labeling\u003c/h2\u003e \u003cp\u003eWith the thresholding and masking process, the specimens were separated from the background for draw bounding boxes and label. The bounding boxes coordinates were obtained by OpenCV bounding box function and drawn by rectangle function. The results show that the protruding structures of the specimens prevent complete enclosing of the created bounding boxes. In order to obtain a more accurate drawing, the desired bounding box drawings were obtained by applying an offset of 10 pixels in the negative x-axis direction, 10 pixels in the negative y-axis direction of the starting coordinates of the bounding boxes and 10 pixels in the positive x-axis direction and 10 pixels in the positive y-axis direction to the ending coordinates of the bounding boxes. With the final coordinates, all the locations of the specimens were obtained. In a for loop, all the boxes were obtained are predicted one by one. The specimens were classified according to trained model. These specimens first resized for used as an input for prediction process. After that, with the help of the Tensorflow\u0026rsquo;s predict function, the specimens were classified. When the proposed model created for multilabel classification, the prediction results are sorted from the class with the highest probability to the class with the lowest probability. The specimen detection method allows us to use the class with the highest probability for label the specimen. When the maximum argument was picked, the selected value returns to the labeling function and placed on top of the specimens. An example for labeling results is shown in the Fig.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e7\u003c/span\u003e.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec17\" class=\"Section2\"\u003e \u003ch2\u003eTest Resutlts of The Dataset\u003c/h2\u003e \u003cp\u003eThe training of the created CNN model was done with the dataset that mentioned earlier in this paper. In order to examine the test outputs, a test set separated as %30 of the total dataset was predicted. The Fig.\u0026nbsp;\u003cspan refid=\"Fig8\" class=\"InternalRef\"\u003e8\u003c/span\u003e represents the confusion matrix of the test set. The confusion matrix represents the relation between the predicted labels and true labels according to testing set. In every individual box, the above value gives the number of predicted labels corresponding to the correct label, and the value in parentheses gives the probability values of the prediction made. Since the confusion matrix is an n x n matrix, which n is the number of classes, the elements through which the diagonal of the matrix passes are expected to be the closest values to the unit matrix. The higher the probability, the darker the hue was represented. The interpretation of the confusion matrix also provides us with information about the dataset that used. According to Fig.\u0026nbsp;\u003cspan refid=\"Fig8\" class=\"InternalRef\"\u003e8\u003c/span\u003e, the highest probability achieved on \u003cem\u003eTakanayanagia delicata\u003c/em\u003e with the probability of 0.95 and the lowest probability achieved on.\u003c/p\u003e \u003cp\u003e \u003cem\u003eBolivina subadvena\u003c/em\u003e with the probability of 0.35. Although it has a simple structure, the neural network seems to have difficulties in learning this species. It is observed that the neural network confuses this species with \u003cem\u003eB. seminuda\u003c/em\u003e and \u003cem\u003eB. spissa\u003c/em\u003e. It can be said that this is due to two main reasons. First, the physical structure of this species is similar to the other two species. Another reason might, although it has physical similarities with the other two species, it does not have enough data to detect the differences. Therewithal, a similar situation is observed in \u003cem\u003eT. bradyi\u003c/em\u003e, the species with the second worst probability. The species \u003cem\u003eT. bradyi\u003c/em\u003e has the fewest specimens in the current data set. For a neural network, the more data is used for training, the more accurate results are obtained.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec18\" class=\"Section2\"\u003e \u003ch2\u003eTest Results With Additional Fragments and Particles\u003c/h2\u003e \u003cp\u003eIn real-life applications, not only fossils are found in a thin section sample taken. In addition to fossils, many fragments and particles are also found in these thin sections. Identifying and separating fragments and particles from fossils is of particular importance. Therefore, in addition to the edited dataset used in section e, 2810 fragment samples, 1801 particle samples in the \u003cem\u003eEndless Forams\u003c/em\u003e dataset were added and \u003cem\u003eB. subadvena\u003c/em\u003e is removed, the model training was reworked. As a result of the first studies, it was seen that the variable structure of the fragments caused confusion in the training of the model. In order to organize the training model and eliminate false positive situations, some adjustments were made to the model. A batch normalization layer was added after each convolutional layer, thus preventing false positive situations during overfitting and model training. As seen in Fig.\u0026nbsp;\u003cspan refid=\"Fig9\" class=\"InternalRef\"\u003e9\u003c/span\u003e, when the obtained confusion matrix was examined, it was observed that there was an improvement in the training of the model. It is seen that the highest accuracy was \u003cem\u003eB. tenuata\u003c/em\u003e with 95%, and the lowest accuracy was \u003cem\u003eB. spissa\u003c/em\u003e with 69%. When the general picture is evaluated, it is seen that there is an increase in the accuracy rates. We explained the reasons for this low accuracy in the previous part. It can be said that the reason for the decrease in the accuracy of the \u003cem\u003eB. spissa\u003c/em\u003e classification is due to the similarity with the physical structure with \u003cem\u003eB. seminuda\u003c/em\u003e. Examining Fig.\u0026nbsp;\u003cspan refid=\"Fig9\" class=\"InternalRef\"\u003e9\u003c/span\u003e, the \u003cem\u003eB. spissa\u003c/em\u003e classification produced a false positive with fragments and \u003cem\u003eB. Seminuda\u003c/em\u003e.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec19\" class=\"Section2\"\u003e \u003ch2\u003eLabeling of The Fragments, Particles and Foraminiferas\u003c/h2\u003e \u003cp\u003eFor testing the new dataset that fragment included, test images were generated. The images were generated with the following pattern; one foraminifera, one fragments; two foraminifera, two fragment; three foraminiferas and three fragments. The foraminifera species and fragments were selected randomly from the dataset. An example result is shown in the Fig.\u0026nbsp;\u003cspan refid=\"Fig10\" class=\"InternalRef\"\u003e10\u003c/span\u003e. In the following test image, included foraminifera species are \u003cem\u003eB. seminuda\u003c/em\u003e and \u003cem\u003eT. delicata\u003c/em\u003e and also a fragment and particle were included. As seen from the figure, included species and fragments are predicted correctly. Also, the accuracy metrics are nearly 100%. The results are show, the new dataset that includes additional fragments and particles, let the model more accurate with respect to the previous datasets.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec20\" class=\"Section2\"\u003e \u003ch2\u003eGeneral evaluation of dataset results\u003c/h2\u003e \u003cp\u003eWhen a model metrics were obtained, the results depend on many variables. These variables can be expressed with; the architecture of the model, the size of training inputs of the model etc. For obtain the optimum model, we did lots of parametric works. Although studies have brought model optimization up to a point, they have had difficulty taking it further. At this point, it was inevitable to make changes on the dataset. First, \u003cem\u003eB. pagoda\u003c/em\u003e and \u003cem\u003eT. bradyi\u003c/em\u003e species were extracted from the initial dataset. It was aimed to improve the results by adding the \u003cem\u003eU. peregrina\u003c/em\u003e species instead of these species. The reason for this is that the \u003cem\u003eU. peregrina\u003c/em\u003e species shows physical differences and the number of samples in the dataset is high. In order to take the improvement results further and to detect non-fossil microscopic fragments that may be encountered in real life applications, the \u003cem\u003eB. subadvena\u003c/em\u003e species was removed from the data set and fragments and particles were added instead. At the same time, since the number of images in the final dataset was quite high, improvements had to be made on the CNN model. For this, the model was optimized by adding a batch normalization layer after each convolutional layer. The results of these changes can be seen from Table\u0026nbsp;\u003cspan refid=\"Tab6\" class=\"InternalRef\"\u003e6\u003c/span\u003e. The changes made on the dataset and subsequently on the model show that the validation accuracy metric values have increased considerably.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab6\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 6\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eDatasets Comparison Table\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"6\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDataset\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eEpoch\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eBatch\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eActivation\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eMax Acc.\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eMax Val. Acc.\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDataset 1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e50\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e64\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eSigmoid\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.96\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.78\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDataset 2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e50\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e64\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eSigmoid\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.99\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.79\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDataset 3\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e50\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e64\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eSigmoid\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.99\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.88\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec21\" class=\"Section2\"\u003e \u003ch2\u003eResNet-50 Neural Network Model\u003c/h2\u003e \u003cp\u003eThe ResNet model, specifically the ResNet-50 architecture, has undergone extensive testing to evaluate its classification efficacy, with 15 separate tests conducted. These tests were aimed at determining the model's precision in image classification tasks, culminating in a detailed statistical analysis of its performance.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec22\" class=\"Section2\"\u003e \u003ch2\u003eTraining and Testing Framework\u003c/h2\u003e \u003cp\u003eThe foundational codes for both training and testing the ResNet-50 model were developed utilizing the TensorFlow and Keras libraries. These libraries facilitated several critical tasks, including the detection of GPU availability, the implementation of data augmentation strategies, the customization of the ResNet-50 model itself, and its subsequent training. Additionally, specialized codes were crafted to compute performance metrics and to generate a visual representation of the confusion matrix, thereby offering a comprehensive understanding of the model's classification accuracy.\u003c/p\u003e \u003cdiv id=\"Sec23\" class=\"Section3\"\u003e \u003ch2\u003eModel Compilation and Training\u003c/h2\u003e \u003cp\u003eThe training regimen for the ResNet-50 model involved freezing all layers except for the custom top layers, tailored to the specific needs of the classification task at hand. The model was compiled using the 'adam' optimization algorithm, with 'categorical_crossentropy' serving as the loss function and 'accuracy' as the primary performance metric. To mitigate the effects of class imbalance within the training dataset, class weights were calculated and integrated into the training process, ensuring a more balanced and fair training outcome.\u003c/p\u003e \u003cp\u003eThe dataset was partitioned, with a portion allocated for training and another for validation. The model's performance metrics were evaluated after each training epoch using the validation set, and a comprehensive assessment was conducted on a separate test dataset. Predictions derived from the test dataset were then used to calculate key metrics such as accuracy, precision, recall, and the F1 score, providing a well-rounded evaluation of the model's capabilities.\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv id=\"Sec24\" class=\"Section2\"\u003e \u003ch2\u003eStatistical Summary of Model Performance\u003c/h2\u003e \u003cp\u003eThe performance of the ResNet-50 model across 15 runs is summarized in the table below (Table\u0026nbsp;\u003cspan refid=\"Tab7\" class=\"InternalRef\"\u003e7\u003c/span\u003e), showcasing the minimum, maximum, average, and standard deviation for accuracy, precision, recall, and the F1 score. This statistical summary offers insights into the model's consistency and reliability in image classification tasks.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab7\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 7\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eEvaluation Metrics of ResNet\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"5\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eMin.\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eMax\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eMean\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eStd. Deviation\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAccuracy\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.85\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.91\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.89\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.02\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ePrecision\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.85\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.93\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.89\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.02\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRecall\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.83\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.93\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.89\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.02\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eF-Score\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.83\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.92\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.88\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.02\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eThe Table\u0026nbsp;\u003cspan refid=\"Tab7\" class=\"InternalRef\"\u003e7\u003c/span\u003e displays the overall performance of the ResNet model and the consistency in classification accuracy. The accuracy of the model ranged from a minimum of 85.4369% to a maximum of 91.5858%, with an average accuracy recorded at 89.1046%. Precision, recall, and F-Score also demonstrated similar consistency, indicating that the model has generally performed with balanced outcomes.\u003c/p\u003e \u003cdiv id=\"Sec25\" class=\"Section3\"\u003e \u003ch2\u003eVisualization of Model Performance\u003c/h2\u003e \u003cp\u003eThe confusion matrix, as illustrated in Fig.\u0026nbsp;\u003cspan refid=\"Fig11\" class=\"InternalRef\"\u003e11\u003c/span\u003e for the Endless Foram dataset, further elucidates the model's classification accuracy, offering a detailed breakdown of true positives, false positives, true negatives, and false negatives. These results were obtained using the ResNet-50 model, providing a comprehensive evaluation of its performance in distinguishing between classes. This detailed analysis enables a deeper understanding of the model's strengths and weaknesses in classification tasks, highlighting areas for potential improvement.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eThe ResNet-50 model, with its deep learning architecture, has proven to be a robust and reliable tool for image classification tasks, demonstrating consistent performance across a series of rigorous tests. Its architecture, coupled with careful training and evaluation, has enabled it to achieve high levels of accuracy, making it a valuable asset in the field of computer vision.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec26\" class=\"Section3\"\u003e \u003ch2\u003eVGG MODEL\u003c/h2\u003e \u003cp\u003eThe VGG model, specifically tailored using the VGG16 architecture, has been rigorously tested over 15 trials to determine its proficiency in the domain of image classification. These trials were meticulously crafted to quantify the precision of the model in accurately classifying images. The results from these trials are meticulously documented in the Table\u0026nbsp;\u003cspan refid=\"Tab8\" class=\"InternalRef\"\u003e8\u003c/span\u003e presented subsequently.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec27\" class=\"Section3\"\u003e \u003ch2\u003eDevelopment and Assessment Framework\u003c/h2\u003e \u003cp\u003eThe development and assessment of the VGG model were carried out using the TensorFlow and Keras libraries, providing an array of functionalities necessary for the model's performance. This included verifying the presence of a GPU, implementing data augmentation techniques, and fine-tuning the VGG16 architecture. Scripts for computing the model's performance metrics and for the visualization of the confusion matrix were also developed and employed.\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv id=\"Sec28\" class=\"Section2\"\u003e \u003ch2\u003eTraining Procedure\u003c/h2\u003e \u003cp\u003eThroughout the training phase, the VGG model was configured to render all layers, except the custom-designed final layers, as non-trainable. The compilation of the model utilized the Adam optimizer, set at a learning rate of 0.0001, and was paired with the 'categorical_crossentropy' loss function and 'accuracy' as the evaluation metric. A unique aspect of the training process was the augmentation of the dataset, which introduced a variety of transformations, including rotation, shifts, shearing, zooming, and flipping, to enhance the robustness of the dataset.\u003c/p\u003e \u003cp\u003eThe model was subjected to a series of epochs, during which it was evaluated against a validation dataset to monitor its evolving performance. Post-training, the model was further evaluated on a test dataset to generate predictions, which formed the basis for the calculation of the model's accuracy, precision, recall, and the F1 score.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec29\" class=\"Section2\"\u003e \u003ch2\u003ePerformance Summary\u003c/h2\u003e \u003cp\u003eThe statistical data encapsulated in the Table\u0026nbsp;\u003cspan refid=\"Tab8\" class=\"InternalRef\"\u003e8\u003c/span\u003e reflects the VGG model's robust and consistent performance across multiple image classification scenarios. With an accuracy range stretching from 84.4660\u0026ndash;88.3495%, and an average accuracy of 86.3862%, the model has showcased remarkable stability. The precision, recall, and F-Score also depict steadfast performance, with average values suggestive of the model's reliability and balanced classification capabilities across various classes. The Table\u0026nbsp;\u003cspan refid=\"Tab8\" class=\"InternalRef\"\u003e8\u003c/span\u003e shows statistical summary of the VGG model's test results over 15 runs.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab8\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 8\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eEvaluation Metrics of VGG\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"5\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eMin.\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eMax\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eMean\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eStd. Deviation\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAccuracy\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.84\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.88\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.86\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.01\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ePrecision\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.82\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.87\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.85\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.01\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRecall\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.82\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.87\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.85\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.01\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eF-Score\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.79\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.86\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.83\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.01\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eVisualization of Accuracy Trajectory\u003c/h3\u003e\n\u003cp\u003eThe confusion matrix, depicted in Fig.\u0026nbsp;\u003cspan refid=\"Fig12\" class=\"InternalRef\"\u003e12\u003c/span\u003e for the Endless Foram dataset, sheds light on the classification precision achieved by the model, delineating the distribution of true positives, false positives, true negatives, and false negatives. This evaluation was conducted employing the VGG-16 architecture, which facilitates an in-depth assessment of its efficacy in classifying various categories. Such an analysis offers insight into the capabilities and limitations of the model, underscoring opportunities for enhancement in its classification accuracy.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eThe VGG-16 model, renowned for its convolutional network architecture, has established itself as a formidable and dependable instrument for image classification endeavors, showcasing steady efficacy through various stringent evaluations. Its design, along with meticulous training and assessment processes, has facilitated the attainment of notable accuracy rates, rendering it an indispensable resource in the domain of computer vision.\u003c/p\u003e"},{"header":"Conclusion","content":"\u003cp\u003eIn this study, three distinct models\u0026mdash;CNN, ResNet, and VGG\u0026mdash; developed on the classification of benthic foraminifer species using multi-label classification method are presented. The main aim of this study is to make the foraminifer identification process, which is a very long-term, challenging and difficult subject in daily studies, by using technology and increase the precision in the results. Models were meticulously evaluated using the same dataset to ascertain their image classification efficacies. The custom-developed CNN model exhibited remarkable performance on Dataset 3, achieving an impressive 99% training accuracy and 88% validation accuracy, underscoring its potential for high training accuracy albeit with a marginally lower performance on the validation set, particularly for Dataset 3.\u003c/p\u003e \u003cp\u003eContrastingly, the ResNet and VGG models, developed using TensorFlow and Keras libraries, demonstrated their own merits and limitations. The ResNet model showcased consistent performance with an average accuracy of 89% and a peak accuracy of 91.59%, highlighting its reliability and the consistency of its performance. The VGG model, while slightly trailing with a maximum accuracy of 88.35%, still presented a robust performance indicative of its utility in image classification tasks.\u003c/p\u003e \u003cp\u003eThe findings from this comparative analysis illuminate the intrinsic advantages and drawbacks inherent to each model. The CNN model, while capable of reaching high training accuracies, exhibited a slight dip in validation accuracy, hinting at potential overfitting issues. The ResNet model balanced high accuracy rates with performance consistency, while the VGG model, despite its slightly lower peak accuracy, maintained a reliable performance.\u003c/p\u003e \u003cp\u003eThis analysis underscores the importance of considering different architectures and training strategies, which can yield varied results across different datasets. It highlights the necessity of evaluating each model's performance relative to the specific characteristics and requirements of the dataset at hand. The comprehensive analysis and evaluation conducted in this study emphasize the critical nature of deploying and optimizing deep learning models effectively. Such comparative studies serve as a beacon for future research, paving the way for advancements in the field of deep learning.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003eCompeting interests The authors have not disclosed any competing interests.\u003c/p\u003e\u003ch2\u003eFunding\u003c/h2\u003e \u003cp\u003eThe authors have not disclosed any funding.\u003c/p\u003e\u003ch2\u003eAuthor Contribution\u003c/h2\u003e\u003cp\u003eK\u0026uuml;bra Yayan checked and wrote the paleontology part of the manuscript. Cem Bağlum developed the artificial intelligence code and conducted the tests. All the author reviewed the manuscript.\u003c/p\u003e\u003ch2\u003eData availability\u003c/h2\u003e \u003cp\u003eNot applicable.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eDurand T, Mehrasa N, Mori G (2019) Learning a Deep ConvNet for Multi-Label Classification With Partial Labels. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 647-657\u003c/li\u003e\n\u003cli\u003eFeng Z, Hongsheng L, Wanli O, Nenghai Y, Xiaogang W (2017) Learning Spatial Regularization With Image-Level Supervisions for Multi-Label Image \u003c/li\u003e\n\u003cli\u003eClassificatio. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 5513-5522\u003c/li\u003e\n\u003cli\u003eHaofu L, Yuncheng L and Jiebo L (2016) Skin disease classification versus skin lesion characterization: Achieving robust diagnosis using multi-label deep \u003c/li\u003e\n\u003cli\u003eneural networks. 23rd International Conference on Pattern Recognition (ICPR) pp. 355-360. doi: 10.1109/ICPR.2016.7899659\u003c/li\u003e\n\u003cli\u003ePan X, Jin K, Cao J. et al. (2020) Multi-label classification of retinal lesions in diabetic retinopathy for automatic analysis of fundus fluorescein angiography \u003c/li\u003e\n\u003cli\u003ebased on deep learning. Graefes Arch Clin Exp Ophthalmol 258: 779\u0026ndash;785. https://doi.org/10.1007/s00417-019-04575-w.\u003c/li\u003e\n\u003cli\u003eParab MA, Mehendale ND (2021) Red Blood Cell Classification Using Image Processing and CNN. SN COMPUT. SCI. 2:70. https://doi.org/10.1007/s42979-021-00458-2\u003c/li\u003e\n\u003cli\u003ePark J, Hwang Y, Lee D and Kim J (2020) MarsNet: Multi-Label Classification Network for Images of Various Sizes. in IEEE Access, vol. 8: 21832-21846, doi: 10.1109/ACCESS.2020.2969217\u003c/li\u003e\n\u003cli\u003ePlaton E and Gupta BS (2001) Benthic Foraminiferal Communities in Oxygen‐Depleted Environments of the Louisiana Continental Shelf. Coastal hypoxia: consequences for living resources and ecosystems. 58: 147-163. doi: https://doi.org/10.1029/CE058p0147 \u003c/li\u003e\n\u003cli\u003ePutzu L, Caocci G, Ruberto CD (2014) Leucocyte classification for leukaemia detection using image processing techniques. Artificial Intelligence in Medicine. 62: 3 pp. 179-191, ISSN 0933-3657. https://doi.org/10.1016/j.artmed.2014.09.002.\u003c/li\u003e\n\u003cli\u003eStanley SM (2005). Earth system history. Macmillan.\u003c/li\u003e\n\u003cli\u003eVickerman K (2014) The diversity and ecological significance of protozoa. Biodiversity \u0026amp; Conservation. 1:4, pp. 334\u0026ndash;341\u003c/li\u003e\n\u003cli\u003eWang, Jiang \u0026amp; Yang, Yi \u0026amp; Mao, Junhua \u0026amp; Huang, Zhiheng \u0026amp; Huang, Chang \u0026amp; Xu, Wei, \u0026ldquo;CNN-RNN: A Unified Framework for Multi-label Image Classification\u0026rdquo;, 2285-2294. doi:10.1109/CVPR.2016.251, 2016.\u003c/li\u003e\n\u003cli\u003eWu H and Prasad S (2018) Semi-Supervised Deep Learning Using Pseudo Labels for Hyperspectral Image Classification. in IEEE Transactions on Image Processing, 27:3, pp. 1259-1270. doi: 10.1109/TIP.2017.2772836.\u003c/li\u003e\n\u003cli\u003eXu Y, Jiao L, Wang S, Wei J, Fan Y, Lai M, Chang EI (2013) Multi-label classification for colon cancer using histopathological images. Microsc Res Tech. 76(12):1266-77. doi: 10.1002/jemt.22294\u003c/li\u003e\n\u003cli\u003eYang H, Zhou JT, Zhang Y, Gao B, Wu J, Cai J (2016) Exploit Bounding Box Annotations for Multi-Label Object Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 280-288\u003c/li\u003e\n\u003cli\u003eZeggada, Abdallah \u0026amp; Benbraika, Souad \u0026amp; Melgani, Farid \u0026amp; Mokhtari, Zouhir. (2018). Multilabel Conditional Random Field Classification for UAV Images. IEEE Geoscience and Remote Sensing Letters. PP. 1-5. doi:10.1109/LGRS.2018.2790426.\u003c/li\u003e\n\u003cli\u003eZhang J, Wu Q, Shen C, Zhang J and Lu J (2018) Multilabel Image Classification With Regional Latent Semantic Dependencies. in IEEE Transactions on Multimedia. 20:10, pp. 2801-2813. doi: 10.1109/TMM.2018.2812605\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Geology, Benthic Foraminifera, Multilabel Classification, Deep Learning, Convolutional Neural Networks","lastPublishedDoi":"10.21203/rs.3.rs-3970510/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-3970510/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eGeological studies are of great importance in order to observe the change of living species over the years, to make inferences by using the information provided by the observed species, and to understand the developing and changing structure of the world we live in over the years. However, the examination and interpretation of fossil specimens is a complex and lengthy process. Especially in thin sections where microfossil studies are carried out, more than one fossil and non-fossil structures are often observed together. The detection and classification of fossil specimens with the help of computers simplifies this process as much as possible compared to manual classification processes. This study represents a comparative analysis of three distinct image classification models: CNN, ResNet, and VGG. We developed a custom Convolutional Neural Network (CNN) architecture aimed at advancing the analysis and classification of geological specimens, particularly focusing on the Endless Forams dataset for the identification of benthic foraminiferas. This approach significantly improves the precision of fossil identification, leveraging deep learning to interpret complex image data efficiently. Additionally, we have identified the ResNet-50 and VGG-16 models as optimal for our research purposes due to their advanced capabilities in handling high-dimensional data and their effectiveness in capturing detailed image features. The findings, application for benthic foraminiferas reveal significant insights into the models' performance, underscored by rigorous statistical evaluation, offering a comprehensive understanding of their capabilities and limitations within the realm of image classification.\u003c/p\u003e","manuscriptTitle":"Multi-Label Benthic Foraminifera Identification with Convolutional Neural Networks","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2024-02-23 18:45:27","doi":"10.21203/rs.3.rs-3970510/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"6638b84e-b2e9-4830-b366-61d8ba205391","owner":[],"postedDate":"February 23rd, 2024","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2024-03-20T07:15:02+00:00","versionOfRecord":[],"versionCreatedAt":"2024-02-23 18:45:27","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-3970510","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-3970510","identity":"rs-3970510","version":["v1"]},"buildId":"qtupq5eGEP_6zYnWcrvyt","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2024) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-19T01:45:01.086888+00:00