Enhancing Object Segmentation Model with GAN-based Augmentation using Oil Palm as a Reference | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Enhancing Object Segmentation Model with GAN-based Augmentation using Oil Palm as a Reference Qi Bin Kwong, Yee Thung Kon, Wan Rusydiah W Rusik, Mohd Nor Azizi Shabudin, and 3 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-3833628/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 08 Sep, 2024 Read the published version in Journal of Big Data → Version 1 posted 9 You are reading this latest preprint version Abstract In digital agriculture, a central challenge in automating drone applications in the plantation sector, including oil palm, is the development of a detection model that can adapt across diverse environments. This study addresses the feasibility of using GAN augmentation methods to improve palm detection models. For this purpose, drone images of young palms (< 5 year-old) from eight different estates were collected, annotated, and used to build a baseline detection model based on DETR. StyleGAN2 was trained on the extracted palms and then used to generate a series of synthetic palms, which were then inserted into tiles representing different environments. CycleGAN networks were trained for bidirectional translation between synthetic and real tiles, subsequently utilized to augment the authenticity of synthetic tiles. Both synthetic and real tiles were used to train the GAN-based detection model. The baseline model achieved precision and recall values of 95.8% and 97.2%, whereas the GAN-based model achieved precision and recall values of 98.5% and 98.6%. In the challenge dataset 1 consisting older palms (> 5 year-old), both models also achieved similar accuracies, with baseline model achieving precision and recall of 93.1% and 99.4%, and GAN-based model achieving 95.7% and 99.4%. As for the challenge dataset 2 consisting of storm affected palms, the baseline model achieved precision of 100% but recall was only 13%, whereas GAN-based model achieved a high precision and recall values of 98.7% and 95.3%. This result demonstrates that images generated by GANs have the potential to enhance the accuracies of palm detection models. Oil palm segmentation GAN Object detection Object segmentation Data augmentation Detection transformer Detectron Phenotyping Figures Figure 1 Figure 2 Figure 3 Figure 4 Introduction As the world's population is poised to reach unprecedented levels in the coming decades (Lee, 2011 ), ensuring food security for this rapidly growing populace becomes a global imperative. Addressing this challenge necessitates a holistic and sustainable approach to agriculture. In recent years, the advent of digital agriculture has emerged as a transformative force, reshaping our conventional methods. This surge in automation, driven by cutting-edge innovations such as computer vision, the Internet of Things, and robotics, is instigating a paradigm shift in the agriculture sector. Simultaneously, these technologies have facilitated automated phenotyping, which, with its capacity to accurately assess and characterize plant traits, plays a pivotal role in refining breeding selection processes and advancing precision breeding. These automated practices hold the potential to substantially boost agricultural yield per unit area. Despite the potential of oil palm to yield up to 10 tonnes oil per hectare per year (t/ha/yr), the global average productivity has plateaued around 3 t/ha/yr. Unfortunately, progress in closing this gap has been sluggish for many years. Genomic selection initiatives have shown great promise in addressing this issue, particularly for yield component traits with high genetic heritability such as shell or fruit mesocarp thickness, as shown by different researchers (Cros et al., 2017 ; Kwong et al., 2016 ). However, when it comes to complex traits like total oil yield (Kwong et al., 2017 ) and height (Garzón-Martínez, 2022), environmental factors account for a significant 60% or more of the variation, adding complexity to the improvement efforts. Furthermore, phenotyping for complex traits tends to be slow and labour-intensive, a significant challenge given labour shortages (Crowley, 2020 ). In response to these challenges, the integration of digital agriculture and automated phenotyping has emerged as a pivotal solution. Among the most cost-effective tools in this transformative journey are the usage of drones. The utilization of drone technology in agriculture is an emerging and continuously evolving practice, with applications spanning various aspects, including crop classification, pest and disease detection (Inoue, 2020 ; Kalischuk et al., 2019 ; Rejeb, Abdollahi, Rejeb, & Treiblmaier, 2022 ) and, more recently, phenotyping tasks such as height measurements (Volpato et al., 2021 ). While recent advancements in computer vision, especially convolutional neural networks (CNNs), have facilitated the development of highly accurate and automated agricultural object detection models (Chen et al., 2018 ; Lu, Tan, & Jiang, 2021 ; Volpato et al., 2021 ), the persistent challenge lies in constructing models that can robustly generalize across diverse scenarios. This challenge is amplified by the vast diversity of environmental conditions encountered in real-world settings. Moreover, the manual annotation of extensive image datasets demands substantial resources in terms of time and labor. The introduction of generative adversarial network (GAN) (Goodfellow, 2014) offers an intriguing prospect for addressing these challenges. GAN provide a pathway to augment existing image datasets and generate new data (Motamed, Rogalla, & Khalvati, 2021 ; Sandfort, Yan, Pickhardt, & Summers, 2019 ). Given that this approach is relatively new, the purpose of this study is to assess the influence of GAN on both detection and segmentation accuracy, with a particular emphasis on oil palm. Methodology Data Collection and Processing Eight random estates with immature or young palms (< 5-year-old, before palm canopies overlap) owned by Sime Darby Plantation across Malaysia were selected for this study. The planting density of these estates was around 180 stands/ha. For this study, the DJI Mavic 2 Pro drone equipped with a Hasselblad camera with a F2.8 EQV 28mm lens was employed for mapping purposes. The flight altitude was set at a value of 80m to capture detailed imagery. As for drone settings, image overlap was set at 80%, sidelap was set at 60%, and a flight speed of 5m/s was set to balance efficiency with image quality. Additional drone settings, including optimal camera parameters such as exposure and sharpness, precise GPS accuracy were set kept default. The collected images were uploaded to our customized WebODM (OpenDroneMap, 2020 ) server. After image processing, the stitched orthophotos were separated into individual tiles of 640x640 using gdal_retile.py script from GDAL library (GDAL, 2023). From this step, 7755 generated tiles were selected and annotated using LabelMe (Torralba, Russell, & Yuen, 2010 ). 6278 of the tiles were treated as training and 1477 as validation set. Two additional independent estates were chosen, the images acquired followed the same tiling and processing procedures as previously described. From these estates, a total of 100 tiles were selected and designated as test/challenge set 1. This dataset consists primarily of palms older than 5 years and having overlapping canopies. In addition, a separate test set 2 was assembled from an estate impacted by a destructive storm, resulting in the generation of 100 tiles for evaluation. Detection and Segmentation model The palm detection and segmentation model were built with Detection Transformer (DETR) (Carion et al., 2020 ) on top of Detectron 2 framework (Wu, Kirillov, Massa, Lo, & Girshick, 2019 ). The model backbone architecture used was the “ResNeXt50_32X4D” (Xie, Girshick, Dollár, Tu, & He, 2017 ), an extension of the ResNet architecture (He, Zhang, Ren, & Sun, 2016 ), featuring 50 network layers, 32 cardinality levels and width of 4. For the transformer-based object detection training, the initial learning rate was set at 1e-4, batch size 16, weight decay at 1e-4 and learning rate drop at 50. The encoding and decoding layers were both kept at 6, embedding size of 256, dropout of 0.1, number of attention heads of 8 and number of query slots of 100. Training was stopped when both the train and validation loss converged. The segmentation head of the network was trained separately, using the same parameters besides for learning rate drop at 20, batch size of 4 and the frozen weight from the previous training. The model trained was labelled as “Baseline palm model”. The training was carried out on a Google Cloud Platform Virtual Machine with a single NVIDIA Tesla A100 GPU, 85GB RAM and 12 CPU. During each validation step, Common Objects in Context (COCO) (Lin et al., 2014 ) evaluator function was used to assess the model quality/accuracy. The COCO evaluation metrics used in this study were mean Average Precision (mAP) (at Intersection over Union (IoU) of 0.50:0.95) (Lin et al., 2014 ), Average Precision (AP) (at IoU of 0.50) (Everingham, Van Gool, Williams, Winn, & Zisserman, 2010 ) and mean Average Recall (mAR) (at IoU 0.50:0.95), both for maximum detections of 100 and for all areas. Besides COCO metrics, a simpler and more practical metric, known as palm count precision/recall, which were based on precision and recall at a detection score of 0.9 were also calculated manually. GAN-based augmentation From the training dataset, individual palms were segmented out from the tiles and placed into the center of 256x256 pixel sized images with black backgrounds using a Python script. The images were manually screened through and the images with entire and clear palm features were selected. 1,444 images were selected from this step. These images served as the dataset for training the Generative Adversarial Network (GAN) generator and discriminator from scratch. The GAN architecture used in this step was Style-based Generative Adversarial Network 2 (StyleGAN2) (Karras, Laine, et al., 2020 ) with adaptive discriminator augmentation (Karras, Aittala, et al., 2020 ), implemented in Pytorch (Paszke et al., 2017 ; Paszke et al., 2019 ) (Supplementary Image 1). The “kimg” parameter was set at 25,000, learning rate at 0.0025 while the batch size was set at 64 with a single GPU. The other parameters were kept at default. The training process was stopped after the FID (Fréchet inception distance) score plateaued and no longer showed improvement on TensorBoard (Abadi et al., 2016 ). Using the resulting model, approximately 200,000 synthetic palm images were generated. Accompanying each of these synthetic palms, automated palm segmentations were generated in JSON (Pezoa, Reutter, Suarez, Ugarte, & Vrgoč) format using a customized Python script. 37 random drone orthophotos from diverse global locations were retrieved from OpenAerialMap (OpenAerialMap, 2023 ) and subsequently partitioned into individual tiles. From this pool, a total of 20,225 tiles were selected, alongside an additional 29,775 background control tiles generated from vacant field images. This combined dataset of 50,000 tiles served as the background dataset for the subsequent phase of the study. Employing a custom Python script, four synthetic palm images were randomly inserted into each background tile, ensuring no overlap, and simultaneously generating the corresponding segmentation JSON file. Cycle-Consistent Generative Adversarial Network (CycleGAN) (Zhu, Park, Isola, & Efros, 2017 ) was utilized to enhance the realism of synthetic tiles. The dataset of this step comprised of 50,000 synthetic (fake) tiles and 13,422 real (unannotated drone-captured) tiles. To facilitate model training and evaluation, both the synthetic and real tiles were divided into training and validation sets. In this process, 90% of the tiles from each category were designated for the training set, while the remaining 10% were set aside for validation. In the context of the A-to-B direction, the synthetic tiles were utilized as training dataset A, and the real tiles were employed as validation dataset B. Conversely, for the B-to-A direction, the real tiles constituted training dataset B, and the synthetic tiles served as validation dataset A. The selected mode for the GAN was "lsgan" with the discriminator network (net_D) kept as “basic”. Conversely, the generator network (net_G) implemented was “unet_128”. The learning rate was set at 0.0002, batch size at 20, decay epoch at 10 and loading size at 640. Training was stopped when the loss values for generator losses and the discriminator losses all stabilized. Attainment of acceptable level of image quality was another condition. Utilizing the final generator model, all the synthetic tiles were transformed to closely resemble the real drone tiles. 33,746 transformed tiles (good quality) were combined with the original 6,278 tiles and used as the new training set to build the new palm detection and segmentation model using the same network architecture and method as before. The resulting model, known as the “GAN Palm Detector” was also evaluated on the final test/challenge datasets. Result StyleGAN2-ADA network training was completed after 1500 epochs and the final plateaued FID score was 16.82. Sample synthetic palm images can be found in Fig. 2 . As for CycleGAN, training was stopped after 50 epochs. The generator A had a loss of 0.166, and discriminator A's loss was 0.266, while for generator B and discriminator B, the losses were 0.196 each. An example of synthetic tile before and after CycleGAN transformation can be found in Fig. 2 . The COCO evaluated model performance for both baseline and GAN-based models can be found in Table 1 for detection and Table 2 for segmentation. Table 1 COCO evaluation table for palm detection model. Detection Model Baseline Palm Detector GAN Palm Detector Baseline Palm Detector (Test Set 1) GAN Palm Detector (Test Set 1) Baseline Palm Detector (Test Set 2) GAN Palm Detector (Test Set 2) Mean Average Precision (mAP) 0.628 0.758 0.468 0.462 0.025 0.512 Average Precision (AP) 0.966 0.972 0.922 0.927 0.053 0.927 Mean Average Recall (mAR) 0.702 0.785 0.585 0.587 0.083 0.604 Table 2 COCO evaluation table for palm segmentation model. Segmentation Model Baseline Palm Detector GAN Palm Detector Baseline Palm Detector (Test Set 1) GAN Palm Detector (Test Set 1) Baseline Palm Detector (Test Set 2) GAN Palm Detector (Test Set 2) Mean Average Precision (mAP) 0.502 0.625 0.401 0.398 0.032 0.413 Average Precision (AP) 0.958 0.963 0.878 0.876 0.065 0.937 Mean Average Recall (mAR) 0.569 0.647 0.484 0.483 0.143 0.495 For the validation dataset, the palm count precision and recall were 95.8% and 97.2% (F1 score 0.97) for the baseline model/detector. The corresponding values for the GAN-based model were 98.5% and 98.6% (F1 score 0.99). Comparatively for challenge dataset 1, the precision and recall were 93.1% and 99.4% (F1 score 0.96) respectively for the baseline palm detector, whereas they were 95.7% and 99.4% (F1 score 0.98) for the GAN palm detector. As for the challenge dataset 2, the baseline model achieved precision of 100% and a very low recall of 13% (F1 score 0.23), indicating that most palms were not detected, whereas the GAN-based model achieved a high precision and recall values of 98.7% and 95.3% (F1 score 0.97). Discussion The introduction of CNN-based object detection models has spurred advancements in agricultural automation. Notably, these models have found application in tasks such as automated weed identification in crops (Hashemi-Beni, Gebrehiwot, Karimoddini, Shahbazi, & Dorbu, 2022 ) and the detection of plant diseases (Boulent, Foucher, Theau, & St-Charles, 2019 ). In the context of oil palm industry, CNN-based models are being explored for drone or satellite-based palm detection and counting (Freudenberg et al., 2019 ; Kipli et al., 2023 ; Li, Fu, & Yu, 2017 ). This study builds upon the application of CNNs in palm detection and extends the concept into palm segmentation. Here, segmentation is specifically referred to as instance segmentation, focusing on pixels representing individual palms, instead of panoptic and semantic segmentation (Chuang, Zhang, & Zhao, 2023 ). However, a significant challenge lies in the generalizability of these palm CNN models across diverse environments. Unlike crops grown in controlled environments like greenhouses, oil palms are cultivated in open fields, exposed to numerous unpredictable factors. Environmental elements such as weather conditions (rain, wind, and fog) and varying lighting conditions influenced by sunlight and time of day are known to impact drone image quality (Puliti, Ørka, Gobakken, & Næsset, 2015 ), thereby introducing environmental variation and limiting the generalizability of the models. Additionally, factors such as drone camera type and flight altitudes have been identified as contributors to variations in image quality (Domingo, Ørka, Næsset, Kachamba, & Gobakken, 2019 ). The intricacies of image processing and stitching further complicate this issue (Bouchekara et al., 2023 ; Duan, Liu, Huang, Wang, & Zhao, 2019 ). These variations in image quality further contribute to increased variability within the images. Hence, developing a generalizable model necessitates a diverse representation of palm images. Rather than manually addressing every conceivable scenario, GAN (Goodfellow, 2014) offer a potential solution to mitigate these issues. GAN based background switch has been proposed as an augmentation method for object detection (Hedayati, McGuinness, Cree, & Perrone, 2019 ). This study shares similar augmentation principle as the referenced publication, with the object of interest being inserted into a new background image. However, one major difference is that the palms being used were generated via StyleGAN2 (Karras, Aittala, et al., 2020 ; Karras, Laine, et al., 2020 ). StyleGAN2 comprises a generator and a discriminator; the generator produces synthetic images, while the discriminator evaluates and distinguishes them from real images. An essential feature of StyleGAN2 is its ability to independently manipulate high and fine-level details in images, known as style-mixing. It also introduces stochastic variation, adding randomness for greater diversity in synthetic image generation. Leveraging these capabilities, StyleGAN2, along with its predecessor StyleGAN, had been used in generating highly realistic human faces (Meira, Silva, Bianchi, & Rabelo, 2023 ), aerial imageries (Yates, Hart, Houghton, Torres, & Pound, 2022 ), medical images (Tariq et al., 2023 ), and microstructural images of alloys (Lambard, Yamazaki, & Demura, 2023 ). In this study, the synthetic palms were generated onto an empty background image, and the annotation masks - essential for pixel-wise class classification, were automatically derived from object boundaries. While the individual synthetic palms introduced variations during development of the detection and segmentation models, the background and environmental variations across all possible scenarios remained unrepresented. With the number of plantation drone images being limited in this study, the environmental variations were introduced from random selection of drone images across different flight missions across the world found at OpenAerialMap (OpenAerialMap, 2023 ). However, imprinting individual palms onto these background tiles presented challenges including color inconsistency, resulting in an artificial appearance. To address this, CycleGAN (Zhu et al., 2017 ) was employed. Known for its image-to-image translation capability, CycleGAN maps images from one domain to another while preserving content. Beyond artistic style transfer, CycleGAN has found applications in X-ray image augmentation (Bargshady et al., 2022 ) and laser–visible image translation (Qin, Fan, Guo, & Wang, 2022 ). In our case, CycleGAN was used mainly to enhance the realism of synthetic tiles by harmonizing color and lighting conditions. While it's acknowledged that a combination of manual mixing techniques can substitute GAN for this purpose (Wyawahare et al., 2023 ), this avenue was not explored here. Nevertheless, synthetic tiles were generated for augmentation purposes following the use of these two GANs. Data augmentation has been proven valuable in image classification (Shorten & Khoshgoftaar, 2019 ) and its extension into object detection has been explored, albeit with a slightly lower impact on accuracy (Zoph et al., 2020 ). In addition to conventional augmentations such as random flipping which were done for both datasets in this study, the synthetic tiles generated through GAN-based augmentation were used together with the real tiles to develop the GAN-based palm detector. The DETR framework (Carion et al., 2020 ) implemented on top of the Detectron2 object detection framework (Wu et al., 2019 ) was used to build the palm detectors used in this study. DETR integrates a transformer encoder-decoder with a CNN backbone. The CNN backbone used for feature extraction was a variation of the ResNeXt architecture (Xie et al., 2017 ), which was built upon ResNet (He et al., 2016 ) with the introduction of the “cardinality” concept. Nevertheless, both these architectures shared similarities in that they were both based on residual learning, which involves the use of bottleneck blocks and skip connections. The cardinality feature of the ResNeXt architecture, which divides the input channels into multiple groups and perform separate convolutions for each group, helps the model capture diverse features and learn different aspects of the training images in parallel. The transformer preceding the CNN backbone, consists of an encoder and decoder (Carion et al., 2020 ), is used for global context reasoning. The DETR architecture incorporates a set-based global loss with bipartite matching, enabling pair-wise and parallel decoding of object embeddings and simultaneous prediction of object coordinates and class labels. DETR’s versatility has been demonstrated across various applications, including medical object and drone-based insulator defect detection (Cheng & Liu, 2022 ; Ickler, Baumgartner, Roy, Wald, & Maier-Hein, 2023 ). Given the typically structured and dense arrangement of palms in plantations, and that replanting is usually conducted on entire fields, DETR's ability to capture global context and relationships between objects positions it as a particularly fitting choice for our application. In their respective validation datasets, both the baseline and the GAN-based palm detectors demonstrated strong performance across all detection, segmentation and counting tasks. The GAN-based palm detector achieved impressive precision and recall values, reaching up to 98.5% and 98.6% respectively. These results were comparable, and in some cases, slightly superior to reported values for similar tasks involving various agricultural crops or plants (Morales et al., 2018 ; W. Zhao, Yamada, Li, Digman, & Runge, 2021 ). The mAP and AP values also stood on par with the findings of other relevant research works (Cai & Vasconcelos, 2018 ; Cao, Chen, & Gao, 2020 ; Ren, He, Girshick, & Sun, 2015 ; L. Zhao & Li, 2020 ). It is noteworthy, however, that the single-class focus of this study—oil palm—likely contributed to these high-performance metrics. Upon applying the models to challenge dataset 1, in general a slight decrease in palm detection accuracies was observed. This can be attributed to the dataset's characteristics, which include older palms and overlapping canopies, posing increased challenges for detection. Compared with detection, the segmentation accuracies declined slightly more. This decline can be attributed to the presence of dense canopies, which cast shadows and obstruct the visibility of individual palm canopies in the surrounding area. This limitation hinders effective observation and delineation of the palm canopies during the segmentation process. Comparing palm count accuracies between the baseline and GAN models revealed that the GAN-based model demonstrates a lower susceptibility to false positives. Though not reflected in the mAP and AP metrics, the baseline model displayed a slight inclination to mistakenly identify indistinct shrubs, which loosely resemble palm seedlings from top view, as palms (as illustrated in Fig. 4 b). It is important to note that the tiles used in this study were predominantly from plantations, with most tiles exclusively featuring palms. Instances where tiles contained both shrubs and palms were rare, and tiles exclusively featuring shrubs lacked annotations and resulted in their exclusion from COCO evaluations. Consequently, many of the falsely detected palms on these tiles could only be accounted for in palm count precision metric. The lower false positives in the GAN-based model can be attributed to the diverse background objects present in the drone background tiles sourced and processed from OpenAerialMap (OpenAerialMap, 2023 ). This diversity contributes to the generation of synthetic tiles containing a wide array of background objects, ultimately improving the model. The challenge dataset 2, collected from plantation affected by a destructive storm, served as unforeseen plantation circumstance. The palms found here were also > 5-year-old, similar to challenge dataset 1. The mAP, AP and mAR of the baseline palm detector were all lower than 0.15. Though achieving count precision of 100%, the recall was only 13%, indicating that all detected palms were correct; however, the model missed a substantial number of actual palms. As a direct comparison, the GAN-based palm detector was capable of all detecting, segmenting, and counting the palms, achieving comparable accuracies with challenge dataset 1 and its validation set. In many cases (as shown in Fig. 4 c), the model was even able to detect fallen palms. From the top view, an individual palm canopy appears radial and almost symmetrical. The formation of the canopy is driven by the sequential growth and arrangement of fronds in a spiral pattern, known as phyllotaxis (Aholoukpè et al., 2018 ; Thomas, Chan, & Easau, 1969 ). These fronds are usually packed in an organized spiral, which contributes to the vertical and horizontal dimensions of the canopy. As for the individual fronds of an oil palm, they have a fan-like shape with a central axis and radiating leaflets. While some StyleGAN2-generated synthetic palms resembled the phenotypic outcome of these biological patterns, others did not and generated asymmetrical canopies with irregular fronds. In the challenge dataset 2, the storm has, to a certain extent, altered the canopy shapes and orientation of the fronds. It is likely that that the training dataset used to construct baseline model inadequately represented these structural changes. On the other hand, the GAN-based training dataset exhibited a broader range of “possible” palm canopy structures in real-life, thereby achieving high accuracies. The storm-affected palm dataset exemplified an extreme instance of palm canopy variations; typically, distortions to palm canopies in a plantation are usually not as severe. Nevertheless, the results demonstrate that a detection or segmentation model constructed using GAN-generated synthetic tiles in conjunction with raw tiles exhibits superior generalizability and robustness compared to a model relying solely on raw tiles. While both the GAN-based and baseline models showcase effective performance across various age groups, and the GAN-based model demonstrates the ability to detect palms with canopy distortions, our study has certain limitations. We did not explore factors such as variations in drone image capture heights, planting density, drone camera types, and camera settings. Despite these limitations, the GAN-based model has shown superior performance in terms of both precision and recall compared to the baseline model. The GAN-based palm detection model can easily be improved further by incorporating images with older palms, but it was not our focus in this study. Moving forward, our focus will be on leveraging this model to automate young palm (< 5-year-old) growth phenotyping and abnormal palm detection. In doing so, we strive to usher the oil palm industry into a new epoch of digital agriculture, marked by advanced automation and precise phenotypic measurement. Abbreviations AP Average Precision CNN Convolutional neural network COCO Common Objects in Context CycleGAN Cycle-Consistent Generative Adversarial Network DETR Detection Transformer FID Fréchet inception distance GAN Generative adversarial network IoU Intersection over Union mAP Mean Average Precision mAR Average Recall StyleGAN2 Style-based Generative Adversarial Network architecture 2 t/ha/yr tonnes oil per hectare per year Declarations Ethics approval and consent to participate Not applicable Consent for publication Not applicable Availability of data and materials The palm GAN generator model has been uploaded and deployed at https://huggingface.co/spaces/qibin85/fake_palm_generator. Other datasets used during the current study are available from the corresponding author on reasonable request. Competing interests The authors declare that they have no competing interests. Funding This project was funded by Sime Darby Plantation Research Sdn Bhd. Authors' contributions QBK was involved in conception and design of the work. QBK, YTK, WRWR, MNAS & SSAR were involved in data acquisition and analysis. QBK, WRWR, MNAS, DRA & HK were involved in data interpretation. QBK, YTK & SSAR were involved in the development of new software/scripts used in the work. QBK drafted the work and all authors substantively revised and approved it. Acknowledgements All authors thank the employees of Sime Darby Plantation Research & Upstream Malaysia for their assistance in data collection. Also, they thank the editors and reviewers for their attention to the paper. References Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Zheng X. (2016). TensorFlow: A System for Large-Scale Machine Learning on Heterogeneous Distributed Systems . Paper presented at the Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation. Aholoukpè HNS, Dubos B, Deleporte P, Flori A, Amadji LG, Chotte J-L, Blavet D. Allometric equations for estimating oil palm stem biomass in the ecological context of Benin, West Africa. Trees. 2018;32(6):1669–80. 10.1007/s00468-018-1742-8 . Bargshady G, Zhou X, Barua PD, Gururajan R, Li Y, Acharya UR. Application of CycleGAN and transfer learning techniques for automated detection of COVID-19 using X-ray images. Pattern Recognit Lett. 2022;153:67–74. 10.1016/j.patrec.2021.11.020 . Bouchekara HREH, Sadiq BO, Zakariyya O, Sha’aban S, Shahriar YA, M. S., Isah MM. SIFT-CNN Pipeline in Livestock Management: A Drone Image Stitching Algorithm. Drones. 2023;7(1). 10.3390/drones7010017 . Boulent J, Foucher S, Theau J, St-Charles PL. Convolutional Neural Networks for the Automatic Identification of Plant Diseases. Front Plant Sci. 2019;10:941. 10.3389/fpls.2019.00941 . Cai Z, Vasconcelos N. (2018, 18–23 June 2018). Cascade R-CNN: Delving Into High Quality Object Detection. Paper presented at the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Cao D, Chen Z, Gao L. An improved object detection algorithm based on multi-scaled and deformable convolutional neural networks. Human-centric Comput Inform Sci. 2020;10(1):14. 10.1186/s13673-020-00219-9 . Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S. (2020, 2020//). End-to-End Object Detection with Transformers. Paper presented at the Computer Vision – ECCV 2020, Cham. Chen J, Zhou H, Hu H, Song Y, Gifu D, Li Y, Huang Y. Research on agricultural monitoring system based on convolutional neural network. Future Generation Computer Systems. 2018;88:271–8. 10.1016/j.future.2018.05.045 . Cheng Y, Liu D. (2022). An Image-Based Deep Learning Approach with Improved DETR for Power Line Insulator Defect Detection. Journal of Sensors, 2022 , 6703864. 10.1155/2022/6703864 . Chuang Y, Zhang S, Zhao X. Deep learning-based panoptic segmentation: Recent advances and perspectives. IET Image Processing; 2023. Cros D, Bocs S, Riou V, Ortega-Abboud E, Tisne S, Argout X, Durand-Gasselin T. Genomic preselection with genotyping-by-sequencing increases performance of commercial oil palm hybrid crosses. BMC Genomics. 2017;18(1):839. 10.1186/s12864-017-4179-3 . Crowley MZ. Foreign Labor Shortages in the Malaysian Palm Oil Industry: Impacts and Recommendations. AgEcon Search; 2020. Domingo D, Ørka HO, Næsset E, Kachamba D, Gobakken T. Effects of UAV Image Resolution, Camera Type, and Image Overlap on Accuracy of Biomass Predictions in a Tropical Woodland. Remote Sens. 2019;11(8). 10.3390/rs11080948 . Duan H, Liu Y, Huang H, Wang Z, Zhao H. (2019). Image Stitching Algorithm for Drones Based on SURF-GHT. IOP Conference Series: Materials Science and Engineering, 569 (5), 052025. 10.1088/1757-899X/569/5/052025 . Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A. The Pascal Visual Object Classes (VOC) Challenge. Int J Comput Vision. 2010;88(2):303–38. 10.1007/s11263-009-0275-4 . Freudenberg M, Nölke N, Agostini A, Urban K, Wörgötter F, Kleinn C. Large Scale Palm Tree Detection in High Resolution Satellite Images Using U-Net. Remote Sens. 2019;11(3). 10.3390/rs11030312 . Garzón-Martínez GA, O.-G. J. A. MLPB, Barrero S, Lopez-Cruz LS, Enciso-Rodríguez M. Felix E. (2022). Genomic selection for morphological and yield–related traits using genome–wide SNPs in oil palm. Mol Breeding . GDAL, O. c. (2023). GDAL/OGR Geospatial Data Abstraction software Library. 10.5281/zenodo.5884351 . Goodfellow IP-A, Mirza J, Xu M, Warde-Farley B, Ozair D, Courville S, Bengio A. Yoshua. (2014). Generative adversarial nets. In Advances in neural information processing systems (pp. 2672–2680). Hashemi-Beni L, Gebrehiwot A, Karimoddini A, Shahbazi A, Dorbu F. (2022). Deep Convolutional Neural Networks for Weeds and Crops Discrimination From UAS Imagery. 3 . 10.3389/frsen.2022.755939 . He K, Zhang X, Ren S, Sun J. (2016, 27–30 June 2016). Deep Residual Learning for Image Recognition. Paper presented at the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Hedayati H, McGuinness BJ, Cree MJ, Perrone JA. (2019, 2–4 Dec. 2019). Generalization Approach for CNN-based Object Detection in Unconstrained Outdoor Environments. Paper presented at the 2019 International Conference on Image and Vision Computing New Zealand (IVCNZ). Ickler MK, Baumgartner M, Roy S, Wald T, Maier-Hein KH. (2023, 2023//). Taming Detection Transformers for Medical Object Detection. Paper presented at the Bildverarbeitung für die Medizin 2023, Wiesbaden. Inoue Y. Satellite- and drone-based remote sensing of crops and soils for smart farming – a review. Soil Sci Plant Nutr. 2020;66(6):798–810. 10.1080/00380768.2020.1738899 . Kalischuk M, Paret ML, Freeman JH, Raj D, Da Silva S, Eubanks S, Das J. An Improved Crop Scouting Technique Incorporating Unmanned Aerial Vehicle-Assisted Multispectral Crop Imaging into Conventional Scouting Practice for Gummy Stem Blight in Watermelon. Plant Dis. 2019;103(7):1642–50. 10.1094/PDIS-08-18-1373-RE . Karras T, Aittala M, Hellsten J, Laine S, Lehtinen J, Aila T. (2020). Training Generative Adversarial Networks with Limited Data . Paper presented at the Proc. NeurIPS. Karras T, Laine S, Aittala M, Hellsten J, Lehtinen J, Aila T. (2020, 13–19 June 2020). Analyzing and Improving the Image Quality of StyleGAN. Paper presented at the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Kipli K, Osman S, Joseph A, Zen H, Awang Salleh DNSD, Lit A, Chin KL. Deep learning applications for oil palm tree detection and counting. Smart Agricultural Technology. 2023;5:100241. 10.1016/j.atech.2023.100241 . Kwong QB, Ong AL, Teh CK, Chew FT, Tammi M, Mayes S, Appleton DR. Genomic Selection in Commercial Perennial Crops: Applicability and Improvement in Oil Palm (Elaeis guineensis Jacq). Sci Rep. 2017;7(1):2872. 10.1038/s41598-017-02602-6 . Kwong QB, Teh CK, Ong AL, Heng HY, Lee HL, Mohamed M, Appleton DR. Development and Validation of a High-Density SNP Genotyping Array for African Oil Palm. Mol Plant. 2016;9(8):1132–41. 10.1016/j.molp.2016.04.010 . Lambard G, Yamazaki K, Demura M. Generation of highly realistic microstructural images of alloys from limited data with a style-based generative adversarial network. Sci Rep. 2023;13(1):566. 10.1038/s41598-023-27574-8 . Lee R. The outlook for population growth. Science. 2011;333(6042):569–73. 10.1126/science.1208859 . Li W, Fu H, Yu L. (2017, 23–28 July 2017). Deep convolutional neural network based large-scale oil palm tree detection for high-resolution remote sensing images. Paper presented at the 2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS). Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Zitnick CL. (2014, 2014//). Microsoft COCO: Common Objects in Context. Paper presented at the Computer Vision – ECCV 2014, Cham. Lu J, Tan L, Jiang H. Review on Convolutional Neural Network (CNN) Applied to Plant Leaf Disease Classification. Agriculture. 2021;11(8). 10.3390/agriculture11080707 . Meira N, Silva M, Bianchi A, Rabelo R. (2023). Generating Synthetic Faces for Data Augmentation with StyleGAN2-ADA . Morales G, Kemper G, Sevillano G, Arteaga D, Ortega I, Telles J. Automatic Segmentation of Mauritia flexuosa in Unmanned Aerial Vehicle (UAV) Imagery Using Deep Learning. Forests. 2018;9(12). 10.3390/f9120736 . Motamed S, Rogalla P, Khalvati F. Data augmentation using Generative Adversarial Networks (GANs) for GAN-based detection of Pneumonia and COVID-19 in chest X-ray images. Inf Med Unlocked. 2021;27:100779. 10.1016/j.imu.2021.100779 . OpenAerialMap A. (2023). OpenAerialMap. OpenDroneMap A. ODM – A command line toolkit to generate maps, point clouds, 3D models and DEMs from drone. balloon or kite images; 2020. Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lerer A. (2017). Automatic differentiation in PyTorch . Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Chintala S. (2019). PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32 (pp. 8024–8035): Curran Associates, Inc. Pezoa F, Reutter JL, Suarez F, Ugarte M n., Vrgoč D. Foundations of JSON schema . Puliti S, Ørka HO, Gobakken T, Næsset E. Inventory of Small Forest Areas Using an Unmanned Aerial System. Remote Sens. 2015;7(8):9632–54. 10.3390/rs70809632 . Qin M, Fan Y, Guo H, Wang M. Application of Improved CycleGAN in Laser-Visible Face Image Translation. Sensors. 2022;22(11). 10.3390/s22114057 . Rejeb A, Abdollahi A, Rejeb K, Treiblmaier H. Drones in agriculture: A review and bibliometric analysis. Comput Electron Agric. 2022;198:107017. 10.1016/j.compag.2022.107017 . Ren S, He K, Girshick R, Sun J. (2015). Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks . Paper presented at the Advances in Neural Information Processing Systems. Sandfort V, Yan K, Pickhardt PJ, Summers RM. Data augmentation using generative adversarial networks (CycleGAN) to improve generalizability in CT segmentation tasks. Sci Rep. 2019;9(1):16884. 10.1038/s41598-019-52737-x . Shorten C, Khoshgoftaar TM. A survey on Image Data Augmentation for Deep Learning. J Big Data. 2019;6(1):60. 10.1186/s40537-019-0197-0 . Tariq U, Qureshi R, Zafar A, Aftab D, Wu J, Alam T, Ali H. (2023, 2023//). Brain Tumor Synthetic Data Generation with Adaptive StyleGANs. Paper presented at the Artificial Intelligence and Cognitive Science, Cham. Thomas RL, Chan KW, Easau PT. Phyllotaxis in the Oil Palm: Arrangement of Fronds on the Trunk of Mature Palms. Ann Botany. 1969;33(5):1001–8. 10.1093/oxfordjournals.aob.a084328 . Torralba A, Russell BC, Yuen J. (2010). LabelMe: Online Image Annotation and Applications. Proceedings of the IEEE, 98 (8), 1467–1484. 10.1109/JPROC.2010.2050290 . Volpato L, Pinto F, Gonzalez-Perez L, Thompson IG, Borem A, Reynolds M, Rodrigues FA Jr.. High Throughput Field Phenotyping for Plant Height Using UAV-Based RGB Imagery in Wheat Breeding Lines: Feasibility and Validation. Front Plant Sci. 2021;12:591587. 10.3389/fpls.2021.591587 . Wu Y, Kirillov A, Massa F, Lo W-Y, Girshick R. (2019). Detectron2. Wyawahare M, Ekbote N, Pimperkhede S, Deshpande A, Bapat P, Aphale I. (2023, 2023//). Comparison of Image Blending Using Cycle GAN and Traditional Approach. Paper presented at the Pervasive Computing and Social Networking, Singapore. Xie S, Girshick R, Dollár P, Tu Z, He K. (2017, 21–26 July 2017). Aggregated Residual Transformations for Deep Neural Networks. Paper presented at the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Yates M, Hart G, Houghton R, Torres MT, Pound M. Evaluation of synthetic aerial imagery using unconditional generative adversarial networks. ISPRS J Photogrammetry Remote Sens. 2022;190:231–51. 10.1016/j.isprsjprs.2022.06.010 . Zhao L, Li S. Object Detection Algorithm Based on Improved YOLOv3. Electronics. 2020;9(3). 10.3390/electronics9030537 . Zhao W, Yamada W, Li T, Digman M, Runge T. Augmenting Crop Detection for Precision Agriculture with Deep Visual Transfer Learning—A Case Study of Bale Detection. Remote Sens. 2021;13(1). 10.3390/rs13010023 . Zhu JY, Park T, Isola P, Efros AA. (2017, 22–29 Oct. 2017). Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks. Paper presented at the 2017 IEEE International Conference on Computer Vision (ICCV). Zoph B, Cubuk ED, Ghiasi G, Lin T-Y, Shlens J, Le QV. (2020, 2020//). Learning Data Augmentation Strategies for Object Detection. Paper presented at the Computer Vision – ECCV 2020, Cham. Additional Declarations No competing interests reported. Cite Share Download PDF Status: Published Journal Publication published 08 Sep, 2024 Read the published version in Journal of Big Data → Version 1 posted Editorial decision: Revision requested 25 Jun, 2024 Reviews received at journal 31 May, 2024 Reviews received at journal 22 May, 2024 Reviewers agreed at journal 11 May, 2024 Reviewers agreed at journal 11 May, 2024 Reviewers invited by journal 02 Feb, 2024 Editor assigned by journal 28 Jan, 2024 Submission checks completed at journal 05 Jan, 2024 First submitted to journal 03 Jan, 2024 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-3833628","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":265375225,"identity":"9746a568-d3e8-4a83-b51c-b985d3a391fd","order_by":0,"name":"Qi Bin Kwong","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA6UlEQVRIiWNgGAWjYBACCSjNwyDB2MDwAchiYydFC+MMkBZmIrWAWcw8IAYhLZIzko89/MJgLcMg3dz22ObXNnk+ZgbGDx9zcGuRlkhLN5ZhSOdhkDnYbpzbd9uwjZmBWXLmNtxa5KRzzKQlGA4D/ZLYJp3bc5sRqIWNmZdoLZY9t+0JapEGapH8ANPC8ON2IkEtkvOfpUkzGKTzsMkcbJPsbbid3MbM2IzXLxJnDh+T/FFhbc8v3f5M4sef27bz25sPfviIRwsIMPMYAB0DYjG2gckG/OpBSn7Ao+8PQcWjYBSMglEwAgEACahDPjajyv0AAAAASUVORK5CYII=","orcid":"","institution":"Sime Darby Plantation Research Sdn Bhd","correspondingAuthor":true,"prefix":"","firstName":"Qi","middleName":"Bin","lastName":"Kwong","suffix":""},{"id":265375226,"identity":"fff5d7e9-02eb-405d-bd41-47ee5f9f9316","order_by":1,"name":"Yee Thung Kon","email":"","orcid":"","institution":"Sime Darby Plantation Research Sdn Bhd","correspondingAuthor":false,"prefix":"","firstName":"Yee","middleName":"Thung","lastName":"Kon","suffix":""},{"id":265375227,"identity":"672c9061-bd96-46de-9c43-a2b535aa6120","order_by":2,"name":"Wan Rusydiah W Rusik","email":"","orcid":"","institution":"Sime Darby Plantation Research Sdn Bhd","correspondingAuthor":false,"prefix":"","firstName":"Wan","middleName":"Rusydiah W","lastName":"Rusik","suffix":""},{"id":265375228,"identity":"a9ddc394-8351-42fb-8372-65d3e136678a","order_by":3,"name":"Mohd Nor Azizi Shabudin","email":"","orcid":"","institution":"Sime Darby Plantation Research Sdn Bhd","correspondingAuthor":false,"prefix":"","firstName":"Mohd","middleName":"Nor Azizi","lastName":"Shabudin","suffix":""},{"id":265375229,"identity":"aea26fc0-4801-4798-a0f8-b37195ffb05c","order_by":4,"name":"Harikrishna Kulaveerasingam","email":"","orcid":"","institution":"Sime Darby Plantation Research Sdn Bhd","correspondingAuthor":false,"prefix":"","firstName":"Harikrishna","middleName":"","lastName":"Kulaveerasingam","suffix":""},{"id":265375230,"identity":"bf08a44f-8ed5-4588-b013-3527c726d22f","order_by":5,"name":"Shahirah Shazana A Rahman","email":"","orcid":"","institution":"Sime Darby Plantation Research Sdn Bhd","correspondingAuthor":false,"prefix":"","firstName":"Shahirah","middleName":"Shazana A","lastName":"Rahman","suffix":""},{"id":265375231,"identity":"9bf6c819-dd1a-4a43-8301-82dcdb5fb2e2","order_by":6,"name":"David Ross Appleton","email":"","orcid":"","institution":"Sime Darby Plantation Research Sdn Bhd","correspondingAuthor":false,"prefix":"","firstName":"David","middleName":"Ross","lastName":"Appleton","suffix":""}],"badges":[],"createdAt":"2024-01-04 04:44:12","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-3833628/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-3833628/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1186/s40537-024-00990-x","type":"published","date":"2024-09-08T15:57:13+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":49331731,"identity":"a1538abd-b2c0-432a-9a2b-352b556ea17b","added_by":"auto","created_at":"2024-01-08 19:23:04","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":69750,"visible":true,"origin":"","legend":"\u003cp\u003eRepresentative backbone architecture for ResNeXt50_32X4D. The convolutional layers were labelled as kernel size, convolutional layer name, in channel, out channel, cardinality level. The dotted curve arrows represent skip connection with dimension correction (convolutional residual block), whereas the solid curved arrows represent skip connection without dimension correction (identity residual block).\u003c/p\u003e","description":"","filename":"floatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-3833628/v1/08438890cbae140c75c0c60b.png"},{"id":49331730,"identity":"78875852-93f9-4f5c-b3a9-252fb2d1746e","added_by":"auto","created_at":"2024-01-08 19:23:04","extension":"jpeg","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":373053,"visible":true,"origin":"","legend":"\u003cp\u003eSample GAN-generated palm images.\u003c/p\u003e","description":"","filename":"floatimage2.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-3833628/v1/592df1fe8624732fe2a31e24.jpeg"},{"id":49331732,"identity":"495e15b7-e55a-41c9-8fb5-627c98428df7","added_by":"auto","created_at":"2024-01-08 19:23:04","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":1961744,"visible":true,"origin":"","legend":"\u003cp\u003eA) Synthetic palm tile before CycleGAN transformation. B) Synthetic palm tile after CycleGAN transformation.\u003c/p\u003e","description":"","filename":"floatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-3833628/v1/381571fd0402a5f7494198b9.png"},{"id":49331734,"identity":"cf9f69e2-7920-4792-9854-b8b89e356350","added_by":"auto","created_at":"2024-01-08 19:23:04","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":2975165,"visible":true,"origin":"","legend":"\u003cp\u003ea. Comparison of A) raw tile B) baseline detector and C) GAN palm model for immature palms. Both models performed almost equally in detecting palms.\u003c/p\u003e\n\u003cp\u003eb. \u0026nbsp;Comparison of A) raw tile B) baseline detector and C) GAN palm model for challenge dataset 1. The baseline detector mistakenly detected some of the low-resolution shrubs as palms.\u003c/p\u003e\n\u003cp\u003ec. Comparison of A) raw tile B) baseline detector and C) GAN palm model for challenge dataset 2. The baseline model was not able to detect storm-affected palms. Comparatively, the GAN palm model was even able to detect the fallen palm in the middle.\u003c/p\u003e","description":"","filename":"4.png","url":"https://assets-eu.researchsquare.com/files/rs-3833628/v1/710cd8cf63204e9f3b2e248f.png"},{"id":64185753,"identity":"3f392b8c-c197-4594-a509-b1a6914b49f2","added_by":"auto","created_at":"2024-09-09 16:21:38","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":7216016,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-3833628/v1/eaa7fb00-f323-4259-90fb-1e34351a874c.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Enhancing Object Segmentation Model with GAN-based Augmentation using Oil Palm as a Reference","fulltext":[{"header":"Introduction","content":"\u003cp\u003eAs the world's population is poised to reach unprecedented levels in the coming decades (Lee, \u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e2011\u003c/span\u003e), ensuring food security for this rapidly growing populace becomes a global imperative. Addressing this challenge necessitates a holistic and sustainable approach to agriculture. In recent years, the advent of digital agriculture has emerged as a transformative force, reshaping our conventional methods. This surge in automation, driven by cutting-edge innovations such as computer vision, the Internet of Things, and robotics, is instigating a paradigm shift in the agriculture sector. Simultaneously, these technologies have facilitated automated phenotyping, which, with its capacity to accurately assess and characterize plant traits, plays a pivotal role in refining breeding selection processes and advancing precision breeding. These automated practices hold the potential to substantially boost agricultural yield per unit area.\u003c/p\u003e \u003cp\u003eDespite the potential of oil palm to yield up to 10 tonnes oil per hectare per year (t/ha/yr), the global average productivity has plateaued around 3 t/ha/yr. Unfortunately, progress in closing this gap has been sluggish for many years. Genomic selection initiatives have shown great promise in addressing this issue, particularly for yield component traits with high genetic heritability such as shell or fruit mesocarp thickness, as shown by different researchers (Cros et al., \u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e2017\u003c/span\u003e; Kwong et al., \u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e2016\u003c/span\u003e). However, when it comes to complex traits like total oil yield (Kwong et al., \u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e2017\u003c/span\u003e) and height (Garzón-Martínez, 2022), environmental factors account for a significant 60% or more of the variation, adding complexity to the improvement efforts. Furthermore, phenotyping for complex traits tends to be slow and labour-intensive, a significant challenge given labour shortages (Crowley, \u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e2020\u003c/span\u003e). In response to these challenges, the integration of digital agriculture and automated phenotyping has emerged as a pivotal solution. Among the most cost-effective tools in this transformative journey are the usage of drones.\u003c/p\u003e \u003cp\u003eThe utilization of drone technology in agriculture is an emerging and continuously evolving practice, with applications spanning various aspects, including crop classification, pest and disease detection (Inoue, \u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e2020\u003c/span\u003e; Kalischuk et al., \u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e2019\u003c/span\u003e; Rejeb, Abdollahi, Rejeb, \u0026amp; Treiblmaier, \u003cspan citationid=\"CR47\" class=\"CitationRef\"\u003e2022\u003c/span\u003e) and, more recently, phenotyping tasks such as height measurements (Volpato et al., \u003cspan citationid=\"CR54\" class=\"CitationRef\"\u003e2021\u003c/span\u003e). While recent advancements in computer vision, especially convolutional neural networks (CNNs), have facilitated the development of highly accurate and automated agricultural object detection models (Chen et al., \u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e2018\u003c/span\u003e; Lu, Tan, \u0026amp; Jiang, \u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e2021\u003c/span\u003e; Volpato et al., \u003cspan citationid=\"CR54\" class=\"CitationRef\"\u003e2021\u003c/span\u003e), the persistent challenge lies in constructing models that can robustly generalize across diverse scenarios. This challenge is amplified by the vast diversity of environmental conditions encountered in real-world settings. Moreover, the manual annotation of extensive image datasets demands substantial resources in terms of time and labor. The introduction of generative adversarial network (GAN) (Goodfellow, 2014) offers an intriguing prospect for addressing these challenges. GAN provide a pathway to augment existing image datasets and generate new data (Motamed, Rogalla, \u0026amp; Khalvati, \u003cspan citationid=\"CR39\" class=\"CitationRef\"\u003e2021\u003c/span\u003e; Sandfort, Yan, Pickhardt, \u0026amp; Summers, \u003cspan citationid=\"CR49\" class=\"CitationRef\"\u003e2019\u003c/span\u003e). Given that this approach is relatively new, the purpose of this study is to assess the influence of GAN on both detection and segmentation accuracy, with a particular emphasis on oil palm.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e"},{"header":"Methodology","content":"\u003cp\u003eData Collection and Processing\u003c/p\u003e\u003cp\u003eEight random estates with immature or young palms (\u0026lt; 5-year-old, before palm canopies overlap) owned by Sime Darby Plantation across Malaysia were selected for this study. The planting density of these estates was around 180 stands/ha. For this study, the DJI Mavic 2 Pro drone equipped with a Hasselblad camera with a F2.8 EQV 28mm lens was employed for mapping purposes. The flight altitude was set at a value of 80m to capture detailed imagery. As for drone settings, image overlap was set at 80%, sidelap was set at 60%, and a flight speed of 5m/s was set to balance efficiency with image quality. Additional drone settings, including optimal camera parameters such as exposure and sharpness, precise GPS accuracy were set kept default. The collected images were uploaded to our customized WebODM (OpenDroneMap, \u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e2020\u003c/span\u003e) server. After image processing, the stitched orthophotos were separated into individual tiles of 640x640 using gdal_retile.py script from GDAL library (GDAL, 2023). From this step, 7755 generated tiles were selected and annotated using LabelMe (Torralba, Russell, \u0026amp; Yuen, \u003cspan citationid=\"CR53\" class=\"CitationRef\"\u003e2010\u003c/span\u003e). 6278 of the tiles were treated as training and 1477 as validation set.\u003c/p\u003e\u003cp\u003eTwo additional independent estates were chosen, the images acquired followed the same tiling and processing procedures as previously described. From these estates, a total of 100 tiles were selected and designated as test/challenge set 1. This dataset consists primarily of palms older than 5 years and having overlapping canopies. In addition, a separate test set 2 was assembled from an estate impacted by a destructive storm, resulting in the generation of 100 tiles for evaluation.\u003c/p\u003e\u003cp\u003eDetection and Segmentation model\u003c/p\u003e\u003cp\u003eThe palm detection and segmentation model were built with Detection Transformer (DETR) (Carion et al., \u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e2020\u003c/span\u003e) on top of Detectron 2 framework (Wu, Kirillov, Massa, Lo, \u0026amp; Girshick, \u003cspan citationid=\"CR55\" class=\"CitationRef\"\u003e2019\u003c/span\u003e). The model backbone architecture used was the “ResNeXt50_32X4D” (Xie, Girshick, Dollár, Tu, \u0026amp; He, \u003cspan citationid=\"CR57\" class=\"CitationRef\"\u003e2017\u003c/span\u003e), an extension of the ResNet architecture (He, Zhang, Ren, \u0026amp; Sun, \u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e2016\u003c/span\u003e), featuring 50 network layers, 32 cardinality levels and width of 4.\u003c/p\u003e\u003cp\u003e \u003c/p\u003e\u003cp\u003eFor the transformer-based object detection training, the initial learning rate was set at 1e-4, batch size 16, weight decay at 1e-4 and learning rate drop at 50. The encoding and decoding layers were both kept at 6, embedding size of 256, dropout of 0.1, number of attention heads of 8 and number of query slots of 100. Training was stopped when both the train and validation loss converged. The segmentation head of the network was trained separately, using the same parameters besides for learning rate drop at 20, batch size of 4 and the frozen weight from the previous training. The model trained was labelled as “Baseline palm model”. The training was carried out on a Google Cloud Platform Virtual Machine with a single NVIDIA Tesla A100 GPU, 85GB RAM and 12 CPU.\u003c/p\u003e\u003cp\u003eDuring each validation step, Common Objects in Context (COCO) (Lin et al., \u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e2014\u003c/span\u003e) evaluator function was used to assess the model quality/accuracy. The COCO evaluation metrics used in this study were mean Average Precision (mAP) (at Intersection over Union (IoU) of 0.50:0.95) (Lin et al., \u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e2014\u003c/span\u003e), Average Precision (AP) (at IoU of 0.50) (Everingham, Van Gool, Williams, Winn, \u0026amp; Zisserman, \u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e2010\u003c/span\u003e) and mean Average Recall (mAR) (at IoU 0.50:0.95), both for maximum detections of 100 and for all areas. Besides COCO metrics, a simpler and more practical metric, known as palm count precision/recall, which were based on precision and recall at a detection score of 0.9 were also calculated manually.\u003c/p\u003e\u003cp\u003eGAN-based augmentation\u003c/p\u003e\u003cp\u003eFrom the training dataset, individual palms were segmented out from the tiles and placed into the center of 256x256 pixel sized images with black backgrounds using a Python script. The images were manually screened through and the images with entire and clear palm features were selected. 1,444 images were selected from this step. These images served as the dataset for training the Generative Adversarial Network (GAN) generator and discriminator from scratch. The GAN architecture used in this step was Style-based Generative Adversarial Network 2 (StyleGAN2) (Karras, Laine, et al., \u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e2020\u003c/span\u003e) with adaptive discriminator augmentation (Karras, Aittala, et al., \u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e2020\u003c/span\u003e), implemented in Pytorch (Paszke et al., \u003cspan citationid=\"CR42\" class=\"CitationRef\"\u003e2017\u003c/span\u003e; Paszke et al., \u003cspan citationid=\"CR43\" class=\"CitationRef\"\u003e2019\u003c/span\u003e) (Supplementary Image 1). The “kimg” parameter was set at 25,000, learning rate at 0.0025 while the batch size was set at 64 with a single GPU. The other parameters were kept at default. The training process was stopped after the FID (Fréchet inception distance) score plateaued and no longer showed improvement on TensorBoard (Abadi et al., \u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e2016\u003c/span\u003e). Using the resulting model, approximately 200,000 synthetic palm images were generated. Accompanying each of these synthetic palms, automated palm segmentations were generated in JSON (Pezoa, Reutter, Suarez, Ugarte, \u0026amp; Vrgoč) format using a customized Python script.\u003c/p\u003e\u003cp\u003e37 random drone orthophotos from diverse global locations were retrieved from OpenAerialMap (OpenAerialMap, \u003cspan citationid=\"CR40\" class=\"CitationRef\"\u003e2023\u003c/span\u003e) and subsequently partitioned into individual tiles. From this pool, a total of 20,225 tiles were selected, alongside an additional 29,775 background control tiles generated from vacant field images. This combined dataset of 50,000 tiles served as the background dataset for the subsequent phase of the study. Employing a custom Python script, four synthetic palm images were randomly inserted into each background tile, ensuring no overlap, and simultaneously generating the corresponding segmentation JSON file.\u003c/p\u003e\u003cp\u003eCycle-Consistent Generative Adversarial Network (CycleGAN) (Zhu, Park, Isola, \u0026amp; Efros, \u003cspan citationid=\"CR61\" class=\"CitationRef\"\u003e2017\u003c/span\u003e) was utilized to enhance the realism of synthetic tiles. The dataset of this step comprised of 50,000 synthetic (fake) tiles and 13,422 real (unannotated drone-captured) tiles. To facilitate model training and evaluation, both the synthetic and real tiles were divided into training and validation sets. In this process, 90% of the tiles from each category were designated for the training set, while the remaining 10% were set aside for validation. In the context of the A-to-B direction, the synthetic tiles were utilized as training dataset A, and the real tiles were employed as validation dataset B. Conversely, for the B-to-A direction, the real tiles constituted training dataset B, and the synthetic tiles served as validation dataset A. The selected mode for the GAN was \"lsgan\" with the discriminator network (net_D) kept as “basic”. Conversely, the generator network (net_G) implemented was “unet_128”. The learning rate was set at 0.0002, batch size at 20, decay epoch at 10 and loading size at 640. Training was stopped when the loss values for generator losses and the discriminator losses all stabilized. Attainment of acceptable level of image quality was another condition. Utilizing the final generator model, all the synthetic tiles were transformed to closely resemble the real drone tiles.\u003c/p\u003e\u003cp\u003e33,746 transformed tiles (good quality) were combined with the original 6,278 tiles and used as the new training set to build the new palm detection and segmentation model using the same network architecture and method as before. The resulting model, known as the “GAN Palm Detector” was also evaluated on the final test/challenge datasets.\u003c/p\u003e"},{"header":"Result","content":"\u003cp\u003eStyleGAN2-ADA network training was completed after 1500 epochs and the final plateaued FID score was 16.82. Sample synthetic palm images can be found in Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e.\u003c/p\u003e\u003cp\u003e \u003c/p\u003e\u003cp\u003eAs for CycleGAN, training was stopped after 50 epochs. The generator A had a loss of 0.166, and discriminator A's loss was 0.266, while for generator B and discriminator B, the losses were 0.196 each. An example of synthetic tile before and after CycleGAN transformation can be found in Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e.\u003c/p\u003e\u003cp\u003e \u003c/p\u003e\u003cp\u003eThe COCO evaluated model performance for both baseline and GAN-based models can be found in Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e for detection and Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e for segmentation.\u003c/p\u003e\u003cp\u003e \u003c/p\u003e\u003cdiv class=\"gridtable\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eCOCO evaluation table for palm detection model.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e\u003ccolgroup cols=\"7\"\u003e\u003c/colgroup\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDetection Model\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eBaseline Palm Detector\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eGAN Palm Detector\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eBaseline Palm Detector (Test Set 1)\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eGAN Palm Detector (Test Set 1)\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eBaseline Palm Detector (Test Set 2)\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c7\"\u003e \u003cp\u003eGAN Palm Detector (Test Set 2)\u003c/p\u003e \u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMean Average Precision (mAP)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.628\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.758\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.468\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.462\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.025\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.512\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAverage Precision (AP)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.966\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.972\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.922\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.927\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.053\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.927\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMean Average Recall (mAR)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.702\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.785\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.585\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.587\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.083\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.604\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/table\u003e\u003c/div\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003e \u003c/p\u003e\u003cdiv class=\"gridtable\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eCOCO evaluation table for palm segmentation model.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e\u003ccolgroup cols=\"7\"\u003e\u003c/colgroup\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSegmentation Model\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eBaseline Palm Detector\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eGAN Palm Detector\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eBaseline Palm Detector (Test Set 1)\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eGAN Palm Detector (Test Set 1)\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eBaseline Palm Detector (Test Set 2)\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c7\"\u003e \u003cp\u003eGAN Palm Detector (Test Set 2)\u003c/p\u003e \u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMean Average Precision (mAP)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.502\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.625\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.401\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.398\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.032\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.413\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAverage Precision (AP)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.958\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.963\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.878\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.876\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.065\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.937\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMean Average Recall (mAR)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.569\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.647\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.484\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.483\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.143\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.495\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/table\u003e\u003c/div\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003eFor the validation dataset, the palm count precision and recall were 95.8% and 97.2% (F1 score 0.97) for the baseline model/detector. The corresponding values for the GAN-based model were 98.5% and 98.6% (F1 score 0.99). Comparatively for challenge dataset 1, the precision and recall were 93.1% and 99.4% (F1 score 0.96) respectively for the baseline palm detector, whereas they were 95.7% and 99.4% (F1 score 0.98) for the GAN palm detector. As for the challenge dataset 2, the baseline model achieved precision of 100% and a very low recall of 13% (F1 score 0.23), indicating that most palms were not detected, whereas the GAN-based model achieved a high precision and recall values of 98.7% and 95.3% (F1 score 0.97).\u003c/p\u003e"},{"header":"Discussion","content":"\u003cp\u003eThe introduction of CNN-based object detection models has spurred advancements in agricultural automation. Notably, these models have found application in tasks such as automated weed identification in crops (Hashemi-Beni, Gebrehiwot, Karimoddini, Shahbazi, \u0026amp; Dorbu, \u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e2022\u003c/span\u003e) and the detection of plant diseases (Boulent, Foucher, Theau, \u0026amp; St-Charles, \u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e2019\u003c/span\u003e). In the context of oil palm industry, CNN-based models are being explored for drone or satellite-based palm detection and counting (Freudenberg et al., \u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e2019\u003c/span\u003e; Kipli et al., \u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e2023\u003c/span\u003e; Li, Fu, \u0026amp; Yu, \u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e2017\u003c/span\u003e). This study builds upon the application of CNNs in palm detection and extends the concept into palm segmentation. Here, segmentation is specifically referred to as instance segmentation, focusing on pixels representing individual palms, instead of panoptic and semantic segmentation (Chuang, Zhang, \u0026amp; Zhao, \u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e2023\u003c/span\u003e). However, a significant challenge lies in the generalizability of these palm CNN models across diverse environments. Unlike crops grown in controlled environments like greenhouses, oil palms are cultivated in open fields, exposed to numerous unpredictable factors. Environmental elements such as weather conditions (rain, wind, and fog) and varying lighting conditions influenced by sunlight and time of day are known to impact drone image quality (Puliti, \u0026Oslash;rka, Gobakken, \u0026amp; N\u0026aelig;sset, \u003cspan citationid=\"CR45\" class=\"CitationRef\"\u003e2015\u003c/span\u003e), thereby introducing environmental variation and limiting the generalizability of the models. Additionally, factors such as drone camera type and flight altitudes have been identified as contributors to variations in image quality (Domingo, \u0026Oslash;rka, N\u0026aelig;sset, Kachamba, \u0026amp; Gobakken, \u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e2019\u003c/span\u003e). The intricacies of image processing and stitching further complicate this issue (Bouchekara et al., \u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e2023\u003c/span\u003e; Duan, Liu, Huang, Wang, \u0026amp; Zhao, \u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e2019\u003c/span\u003e). These variations in image quality further contribute to increased variability within the images. Hence, developing a generalizable model necessitates a diverse representation of palm images. Rather than manually addressing every conceivable scenario, GAN (Goodfellow, 2014) offer a potential solution to mitigate these issues.\u003c/p\u003e \u003cp\u003eGAN based background switch has been proposed as an augmentation method for object detection (Hedayati, McGuinness, Cree, \u0026amp; Perrone, \u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e2019\u003c/span\u003e). This study shares similar augmentation principle as the referenced publication, with the object of interest being inserted into a new background image. However, one major difference is that the palms being used were generated via StyleGAN2 (Karras, Aittala, et al., \u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e2020\u003c/span\u003e; Karras, Laine, et al., \u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e2020\u003c/span\u003e). StyleGAN2 comprises a generator and a discriminator; the generator produces synthetic images, while the discriminator evaluates and distinguishes them from real images. An essential feature of StyleGAN2 is its ability to independently manipulate high and fine-level details in images, known as style-mixing. It also introduces stochastic variation, adding randomness for greater diversity in synthetic image generation. Leveraging these capabilities, StyleGAN2, along with its predecessor StyleGAN, had been used in generating highly realistic human faces (Meira, Silva, Bianchi, \u0026amp; Rabelo, \u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e2023\u003c/span\u003e), aerial imageries (Yates, Hart, Houghton, Torres, \u0026amp; Pound, \u003cspan citationid=\"CR58\" class=\"CitationRef\"\u003e2022\u003c/span\u003e), medical images (Tariq et al., \u003cspan citationid=\"CR51\" class=\"CitationRef\"\u003e2023\u003c/span\u003e), and microstructural images of alloys (Lambard, Yamazaki, \u0026amp; Demura, \u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e2023\u003c/span\u003e). In this study, the synthetic palms were generated onto an empty background image, and the annotation masks - essential for pixel-wise class classification, were automatically derived from object boundaries.\u003c/p\u003e \u003cp\u003eWhile the individual synthetic palms introduced variations during development of the detection and segmentation models, the background and environmental variations across all possible scenarios remained unrepresented. With the number of plantation drone images being limited in this study, the environmental variations were introduced from random selection of drone images across different flight missions across the world found at OpenAerialMap (OpenAerialMap, \u003cspan citationid=\"CR40\" class=\"CitationRef\"\u003e2023\u003c/span\u003e). However, imprinting individual palms onto these background tiles presented challenges including color inconsistency, resulting in an artificial appearance. To address this, CycleGAN (Zhu et al., \u003cspan citationid=\"CR61\" class=\"CitationRef\"\u003e2017\u003c/span\u003e) was employed. Known for its image-to-image translation capability, CycleGAN maps images from one domain to another while preserving content. Beyond artistic style transfer, CycleGAN has found applications in X-ray image augmentation (Bargshady et al., \u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e2022\u003c/span\u003e) and laser\u0026ndash;visible image translation (Qin, Fan, Guo, \u0026amp; Wang, \u003cspan citationid=\"CR46\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). In our case, CycleGAN was used mainly to enhance the realism of synthetic tiles by harmonizing color and lighting conditions. While it's acknowledged that a combination of manual mixing techniques can substitute GAN for this purpose (Wyawahare et al., \u003cspan citationid=\"CR56\" class=\"CitationRef\"\u003e2023\u003c/span\u003e), this avenue was not explored here. Nevertheless, synthetic tiles were generated for augmentation purposes following the use of these two GANs.\u003c/p\u003e \u003cp\u003eData augmentation has been proven valuable in image classification (Shorten \u0026amp; Khoshgoftaar, \u003cspan citationid=\"CR50\" class=\"CitationRef\"\u003e2019\u003c/span\u003e) and its extension into object detection has been explored, albeit with a slightly lower impact on accuracy (Zoph et al., \u003cspan citationid=\"CR62\" class=\"CitationRef\"\u003e2020\u003c/span\u003e). In addition to conventional augmentations such as random flipping which were done for both datasets in this study, the synthetic tiles generated through GAN-based augmentation were used together with the real tiles to develop the GAN-based palm detector. The DETR framework (Carion et al., \u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e2020\u003c/span\u003e) implemented on top of the Detectron2 object detection framework (Wu et al., \u003cspan citationid=\"CR55\" class=\"CitationRef\"\u003e2019\u003c/span\u003e) was used to build the palm detectors used in this study. DETR integrates a transformer encoder-decoder with a CNN backbone. The CNN backbone used for feature extraction was a variation of the ResNeXt architecture (Xie et al., \u003cspan citationid=\"CR57\" class=\"CitationRef\"\u003e2017\u003c/span\u003e), which was built upon ResNet (He et al., \u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e2016\u003c/span\u003e) with the introduction of the \u0026ldquo;cardinality\u0026rdquo; concept. Nevertheless, both these architectures shared similarities in that they were both based on residual learning, which involves the use of bottleneck blocks and skip connections. The cardinality feature of the ResNeXt architecture, which divides the input channels into multiple groups and perform separate convolutions for each group, helps the model capture diverse features and learn different aspects of the training images in parallel. The transformer preceding the CNN backbone, consists of an encoder and decoder (Carion et al., \u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e2020\u003c/span\u003e), is used for global context reasoning. The DETR architecture incorporates a set-based global loss with bipartite matching, enabling pair-wise and parallel decoding of object embeddings and simultaneous prediction of object coordinates and class labels. DETR\u0026rsquo;s versatility has been demonstrated across various applications, including medical object and drone-based insulator defect detection (Cheng \u0026amp; Liu, \u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e2022\u003c/span\u003e; Ickler, Baumgartner, Roy, Wald, \u0026amp; Maier-Hein, \u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e2023\u003c/span\u003e). Given the typically structured and dense arrangement of palms in plantations, and that replanting is usually conducted on entire fields, DETR's ability to capture global context and relationships between objects positions it as a particularly fitting choice for our application. In their respective validation datasets, both the baseline and the GAN-based palm detectors demonstrated strong performance across all detection, segmentation and counting tasks. The GAN-based palm detector achieved impressive precision and recall values, reaching up to 98.5% and 98.6% respectively. These results were comparable, and in some cases, slightly superior to reported values for similar tasks involving various agricultural crops or plants (Morales et al., \u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e2018\u003c/span\u003e; W. Zhao, Yamada, Li, Digman, \u0026amp; Runge, \u003cspan citationid=\"CR60\" class=\"CitationRef\"\u003e2021\u003c/span\u003e). The mAP and AP values also stood on par with the findings of other relevant research works (Cai \u0026amp; Vasconcelos, \u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e2018\u003c/span\u003e; Cao, Chen, \u0026amp; Gao, \u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e2020\u003c/span\u003e; Ren, He, Girshick, \u0026amp; Sun, \u003cspan citationid=\"CR48\" class=\"CitationRef\"\u003e2015\u003c/span\u003e; L. Zhao \u0026amp; Li, \u003cspan citationid=\"CR59\" class=\"CitationRef\"\u003e2020\u003c/span\u003e). It is noteworthy, however, that the single-class focus of this study\u0026mdash;oil palm\u0026mdash;likely contributed to these high-performance metrics.\u003c/p\u003e \u003cp\u003eUpon applying the models to challenge dataset 1, in general a slight decrease in palm detection accuracies was observed. This can be attributed to the dataset's characteristics, which include older palms and overlapping canopies, posing increased challenges for detection. Compared with detection, the segmentation accuracies declined slightly more. This decline can be attributed to the presence of dense canopies, which cast shadows and obstruct the visibility of individual palm canopies in the surrounding area. This limitation hinders effective observation and delineation of the palm canopies during the segmentation process. Comparing palm count accuracies between the baseline and GAN models revealed that the GAN-based model demonstrates a lower susceptibility to false positives. Though not reflected in the mAP and AP metrics, the baseline model displayed a slight inclination to mistakenly identify indistinct shrubs, which loosely resemble palm seedlings from top view, as palms (as illustrated in Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e4\u003c/span\u003eb). It is important to note that the tiles used in this study were predominantly from plantations, with most tiles exclusively featuring palms. Instances where tiles contained both shrubs and palms were rare, and tiles exclusively featuring shrubs lacked annotations and resulted in their exclusion from COCO evaluations. Consequently, many of the falsely detected palms on these tiles could only be accounted for in palm count precision metric. The lower false positives in the GAN-based model can be attributed to the diverse background objects present in the drone background tiles sourced and processed from OpenAerialMap (OpenAerialMap, \u003cspan citationid=\"CR40\" class=\"CitationRef\"\u003e2023\u003c/span\u003e). This diversity contributes to the generation of synthetic tiles containing a wide array of background objects, ultimately improving the model.\u003c/p\u003e \u003cp\u003eThe challenge dataset 2, collected from plantation affected by a destructive storm, served as unforeseen plantation circumstance. The palms found here were also \u0026gt;\u0026thinsp;5-year-old, similar to challenge dataset 1. The mAP, AP and mAR of the baseline palm detector were all lower than 0.15. Though achieving count precision of 100%, the recall was only 13%, indicating that all detected palms were correct; however, the model missed a substantial number of actual palms. As a direct comparison, the GAN-based palm detector was capable of all detecting, segmenting, and counting the palms, achieving comparable accuracies with challenge dataset 1 and its validation set. In many cases (as shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e4\u003c/span\u003ec), the model was even able to detect fallen palms. From the top view, an individual palm canopy appears radial and almost symmetrical. The formation of the canopy is driven by the sequential growth and arrangement of fronds in a spiral pattern, known as phyllotaxis (Aholoukp\u0026egrave; et al., \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2018\u003c/span\u003e; Thomas, Chan, \u0026amp; Easau, \u003cspan citationid=\"CR52\" class=\"CitationRef\"\u003e1969\u003c/span\u003e). These fronds are usually packed in an organized spiral, which contributes to the vertical and horizontal dimensions of the canopy. As for the individual fronds of an oil palm, they have a fan-like shape with a central axis and radiating leaflets. While some StyleGAN2-generated synthetic palms resembled the phenotypic outcome of these biological patterns, others did not and generated asymmetrical canopies with irregular fronds. In the challenge dataset 2, the storm has, to a certain extent, altered the canopy shapes and orientation of the fronds. It is likely that that the training dataset used to construct baseline model inadequately represented these structural changes. On the other hand, the GAN-based training dataset exhibited a broader range of \u0026ldquo;possible\u0026rdquo; palm canopy structures in real-life, thereby achieving high accuracies. The storm-affected palm dataset exemplified an extreme instance of palm canopy variations; typically, distortions to palm canopies in a plantation are usually not as severe. Nevertheless, the results demonstrate that a detection or segmentation model constructed using GAN-generated synthetic tiles in conjunction with raw tiles exhibits superior generalizability and robustness compared to a model relying solely on raw tiles.\u003c/p\u003e \u003cp\u003eWhile both the GAN-based and baseline models showcase effective performance across various age groups, and the GAN-based model demonstrates the ability to detect palms with canopy distortions, our study has certain limitations. We did not explore factors such as variations in drone image capture heights, planting density, drone camera types, and camera settings. Despite these limitations, the GAN-based model has shown superior performance in terms of both precision and recall compared to the baseline model. The GAN-based palm detection model can easily be improved further by incorporating images with older palms, but it was not our focus in this study. Moving forward, our focus will be on leveraging this model to automate young palm (\u0026lt;\u0026thinsp;5-year-old) growth phenotyping and abnormal palm detection. In doing so, we strive to usher the oil palm industry into a new epoch of digital agriculture, marked by advanced automation and precise phenotypic measurement.\u003c/p\u003e"},{"header":"Abbreviations","content":"\u003cp\u003eAP\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;Average Precision\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eCNN\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;Convolutional neural network\u003c/p\u003e\n\u003cp\u003eCOCO\u0026nbsp;\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;Common Objects in Context\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eCycleGAN\u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;Cycle-Consistent Generative Adversarial Network\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eDETR \u0026nbsp;\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;Detection Transformer\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eFID\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;Fr\u0026eacute;chet inception distance\u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eGAN \u0026nbsp;\u0026nbsp;\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;Generative adversarial network\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eIoU\u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;Intersection over Union\u0026nbsp;\u003c/p\u003e\n\u003cp\u003emAP\u0026nbsp; \u0026nbsp;\u0026nbsp;\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;Mean Average Precision\u0026nbsp;\u003c/p\u003e\n\u003cp\u003emAR\u0026nbsp; \u0026nbsp;\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;Average Recall\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eStyleGAN2\u0026nbsp; \u0026nbsp; \u0026nbsp;Style-based Generative Adversarial Network architecture 2\u003c/p\u003e\n\u003cp\u003et/ha/yr \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; tonnes oil per hectare per year\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003eEthics approval and consent to participate\u003c/p\u003e\n\u003cp\u003eNot applicable\u003c/p\u003e\n\u003cp\u003eConsent for publication\u003c/p\u003e\n\u003cp\u003eNot applicable\u003c/p\u003e\n\u003cp\u003eAvailability of data and materials\u003c/p\u003e\n\u003cp\u003eThe palm GAN generator model has been uploaded and deployed at https://huggingface.co/spaces/qibin85/fake_palm_generator. Other datasets used during the current study are available from the corresponding author on reasonable request.\u003c/p\u003e\n\u003cp\u003eCompeting interests\u003c/p\u003e\n\u003cp\u003eThe authors declare that they have no competing interests.\u003c/p\u003e\n\u003cp\u003eFunding\u003c/p\u003e\n\u003cp\u003eThis project was funded by Sime Darby Plantation Research Sdn Bhd.\u003c/p\u003e\n\u003cp\u003eAuthors\u0026apos; contributions\u003c/p\u003e\n\u003cp\u003eQBK was involved in conception and design of the work. QBK, YTK, WRWR, MNAS \u0026amp; SSAR were involved in data acquisition and analysis. QBK, WRWR, MNAS, DRA \u0026amp; HK were involved in data interpretation. QBK, YTK \u0026amp; SSAR were involved in the development of new software/scripts used in the work. QBK drafted the work and all authors substantively revised and approved it.\u003c/p\u003e\n\u003cp\u003eAcknowledgements\u003c/p\u003e\n\u003cp\u003eAll authors thank the employees of Sime Darby Plantation Research \u0026amp; Upstream Malaysia for their assistance in data collection. Also, they thank the editors and reviewers for their attention to the paper.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eAbadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Zheng X. (2016). \u003cem\u003eTensorFlow: A System for Large-Scale Machine Learning on Heterogeneous Distributed Systems\u003c/em\u003e. Paper presented at the Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAholoukp\u0026egrave; HNS, Dubos B, Deleporte P, Flori A, Amadji LG, Chotte J-L, Blavet D. Allometric equations for estimating oil palm stem biomass in the ecological context of Benin, West Africa. Trees. 2018;32(6):1669\u0026ndash;80. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1007/s00468-018-1742-8\u003c/span\u003e\u003cspan address=\"10.1007/s00468-018-1742-8\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBargshady G, Zhou X, Barua PD, Gururajan R, Li Y, Acharya UR. Application of CycleGAN and transfer learning techniques for automated detection of COVID-19 using X-ray images. Pattern Recognit Lett. 2022;153:67\u0026ndash;74. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1016/j.patrec.2021.11.020\u003c/span\u003e\u003cspan address=\"10.1016/j.patrec.2021.11.020\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBouchekara HREH, Sadiq BO, Zakariyya O, Sha\u0026rsquo;aban S, Shahriar YA, M. S., Isah MM. SIFT-CNN Pipeline in Livestock Management: A Drone Image Stitching Algorithm. Drones. 2023;7(1). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.3390/drones7010017\u003c/span\u003e\u003cspan address=\"10.3390/drones7010017\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBoulent J, Foucher S, Theau J, St-Charles PL. Convolutional Neural Networks for the Automatic Identification of Plant Diseases. Front Plant Sci. 2019;10:941. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.3389/fpls.2019.00941\u003c/span\u003e\u003cspan address=\"10.3389/fpls.2019.00941\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCai Z, Vasconcelos N. (2018, 18\u0026ndash;23 June 2018). \u003cem\u003eCascade R-CNN: Delving Into High Quality Object Detection.\u003c/em\u003e Paper presented at the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCao D, Chen Z, Gao L. An improved object detection algorithm based on multi-scaled and deformable convolutional neural networks. Human-centric Comput Inform Sci. 2020;10(1):14. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1186/s13673-020-00219-9\u003c/span\u003e\u003cspan address=\"10.1186/s13673-020-00219-9\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCarion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S. (2020, 2020//). \u003cem\u003eEnd-to-End Object Detection with Transformers.\u003c/em\u003e Paper presented at the Computer Vision \u0026ndash; ECCV 2020, Cham.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChen J, Zhou H, Hu H, Song Y, Gifu D, Li Y, Huang Y. Research on agricultural monitoring system based on convolutional neural network. Future Generation Computer Systems. 2018;88:271\u0026ndash;8. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1016/j.future.2018.05.045\u003c/span\u003e\u003cspan address=\"10.1016/j.future.2018.05.045\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCheng Y, Liu D. (2022). An Image-Based Deep Learning Approach with Improved DETR for Power Line Insulator Defect Detection. \u003cem\u003eJournal of Sensors, 2022\u003c/em\u003e, 6703864. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1155/2022/6703864\u003c/span\u003e\u003cspan address=\"10.1155/2022/6703864\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChuang Y, Zhang S, Zhao X. Deep learning-based panoptic segmentation: Recent advances and perspectives. IET Image Processing; 2023.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCros D, Bocs S, Riou V, Ortega-Abboud E, Tisne S, Argout X, Durand-Gasselin T. Genomic preselection with genotyping-by-sequencing increases performance of commercial oil palm hybrid crosses. BMC Genomics. 2017;18(1):839. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1186/s12864-017-4179-3\u003c/span\u003e\u003cspan address=\"10.1186/s12864-017-4179-3\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCrowley MZ. Foreign Labor Shortages in the Malaysian Palm Oil Industry: Impacts and Recommendations. AgEcon Search; 2020.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDomingo D, \u0026Oslash;rka HO, N\u0026aelig;sset E, Kachamba D, Gobakken T. Effects of UAV Image Resolution, Camera Type, and Image Overlap on Accuracy of Biomass Predictions in a Tropical Woodland. Remote Sens. 2019;11(8). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.3390/rs11080948\u003c/span\u003e\u003cspan address=\"10.3390/rs11080948\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDuan H, Liu Y, Huang H, Wang Z, Zhao H. (2019). Image Stitching Algorithm for Drones Based on SURF-GHT. \u003cem\u003eIOP Conference Series: Materials Science and Engineering, 569\u003c/em\u003e(5), 052025. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1088/1757-899X/569/5/052025\u003c/span\u003e\u003cspan address=\"10.1088/1757-899X/569/5/052025\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eEveringham M, Van Gool L, Williams CKI, Winn J, Zisserman A. The Pascal Visual Object Classes (VOC) Challenge. Int J Comput Vision. 2010;88(2):303\u0026ndash;38. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1007/s11263-009-0275-4\u003c/span\u003e\u003cspan address=\"10.1007/s11263-009-0275-4\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFreudenberg M, N\u0026ouml;lke N, Agostini A, Urban K, W\u0026ouml;rg\u0026ouml;tter F, Kleinn C. Large Scale Palm Tree Detection in High Resolution Satellite Images Using U-Net. Remote Sens. 2019;11(3). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.3390/rs11030312\u003c/span\u003e\u003cspan address=\"10.3390/rs11030312\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGarz\u0026oacute;n-Mart\u0026iacute;nez GA, O.-G. J. A. MLPB, Barrero S, Lopez-Cruz LS, Enciso-Rodr\u0026iacute;guez M. Felix E. (2022). Genomic selection for morphological and yield\u0026ndash;related traits using genome\u0026ndash;wide SNPs in oil palm. \u003cem\u003eMol Breeding\u003c/em\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGDAL, O. c. (2023). GDAL/OGR Geospatial Data Abstraction software Library. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.5281/zenodo.5884351\u003c/span\u003e\u003cspan address=\"10.5281/zenodo.5884351\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGoodfellow IP-A, Mirza J, Xu M, Warde-Farley B, Ozair D, Courville S, Bengio A. Yoshua. (2014). Generative adversarial nets. In \u003cem\u003eAdvances in neural information processing systems\u003c/em\u003e (pp.\u0026nbsp;2672\u0026ndash;2680).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHashemi-Beni L, Gebrehiwot A, Karimoddini A, Shahbazi A, Dorbu F. (2022). Deep Convolutional Neural Networks for Weeds and Crops Discrimination From UAS Imagery. \u003cem\u003e3\u003c/em\u003e. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.3389/frsen.2022.755939\u003c/span\u003e\u003cspan address=\"10.3389/frsen.2022.755939\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHe K, Zhang X, Ren S, Sun J. (2016, 27\u0026ndash;30 June 2016). \u003cem\u003eDeep Residual Learning for Image Recognition.\u003c/em\u003e Paper presented at the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHedayati H, McGuinness BJ, Cree MJ, Perrone JA. (2019, 2\u0026ndash;4 Dec. 2019). \u003cem\u003eGeneralization Approach for CNN-based Object Detection in Unconstrained Outdoor Environments.\u003c/em\u003e Paper presented at the 2019 International Conference on Image and Vision Computing New Zealand (IVCNZ).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eIckler MK, Baumgartner M, Roy S, Wald T, Maier-Hein KH. (2023, 2023//). \u003cem\u003eTaming Detection Transformers for Medical Object Detection.\u003c/em\u003e Paper presented at the Bildverarbeitung f\u0026uuml;r die Medizin 2023, Wiesbaden.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eInoue Y. Satellite- and drone-based remote sensing of crops and soils for smart farming \u0026ndash; a review. Soil Sci Plant Nutr. 2020;66(6):798\u0026ndash;810. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1080/00380768.2020.1738899\u003c/span\u003e\u003cspan address=\"10.1080/00380768.2020.1738899\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKalischuk M, Paret ML, Freeman JH, Raj D, Da Silva S, Eubanks S, Das J. An Improved Crop Scouting Technique Incorporating Unmanned Aerial Vehicle-Assisted Multispectral Crop Imaging into Conventional Scouting Practice for Gummy Stem Blight in Watermelon. Plant Dis. 2019;103(7):1642\u0026ndash;50. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1094/PDIS-08-18-1373-RE\u003c/span\u003e\u003cspan address=\"10.1094/PDIS-08-18-1373-RE\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKarras T, Aittala M, Hellsten J, Laine S, Lehtinen J, Aila T. (2020). \u003cem\u003eTraining Generative Adversarial Networks with Limited Data\u003c/em\u003e. Paper presented at the Proc. NeurIPS.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKarras T, Laine S, Aittala M, Hellsten J, Lehtinen J, Aila T. (2020, 13\u0026ndash;19 June 2020). \u003cem\u003eAnalyzing and Improving the Image Quality of StyleGAN.\u003c/em\u003e Paper presented at the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKipli K, Osman S, Joseph A, Zen H, Awang Salleh DNSD, Lit A, Chin KL. Deep learning applications for oil palm tree detection and counting. Smart Agricultural Technology. 2023;5:100241. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1016/j.atech.2023.100241\u003c/span\u003e\u003cspan address=\"10.1016/j.atech.2023.100241\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKwong QB, Ong AL, Teh CK, Chew FT, Tammi M, Mayes S, Appleton DR. Genomic Selection in Commercial Perennial Crops: Applicability and Improvement in Oil Palm (Elaeis guineensis Jacq). Sci Rep. 2017;7(1):2872. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1038/s41598-017-02602-6\u003c/span\u003e\u003cspan address=\"10.1038/s41598-017-02602-6\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKwong QB, Teh CK, Ong AL, Heng HY, Lee HL, Mohamed M, Appleton DR. Development and Validation of a High-Density SNP Genotyping Array for African Oil Palm. Mol Plant. 2016;9(8):1132\u0026ndash;41. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1016/j.molp.2016.04.010\u003c/span\u003e\u003cspan address=\"10.1016/j.molp.2016.04.010\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLambard G, Yamazaki K, Demura M. Generation of highly realistic microstructural images of alloys from limited data with a style-based generative adversarial network. Sci Rep. 2023;13(1):566. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1038/s41598-023-27574-8\u003c/span\u003e\u003cspan address=\"10.1038/s41598-023-27574-8\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLee R. The outlook for population growth. Science. 2011;333(6042):569\u0026ndash;73. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1126/science.1208859\u003c/span\u003e\u003cspan address=\"10.1126/science.1208859\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLi W, Fu H, Yu L. (2017, 23\u0026ndash;28 July 2017). \u003cem\u003eDeep convolutional neural network based large-scale oil palm tree detection for high-resolution remote sensing images.\u003c/em\u003e Paper presented at the 2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Zitnick CL. (2014, 2014//). \u003cem\u003eMicrosoft COCO: Common Objects in Context.\u003c/em\u003e Paper presented at the Computer Vision \u0026ndash; ECCV 2014, Cham.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLu J, Tan L, Jiang H. Review on Convolutional Neural Network (CNN) Applied to Plant Leaf Disease Classification. Agriculture. 2021;11(8). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.3390/agriculture11080707\u003c/span\u003e\u003cspan address=\"10.3390/agriculture11080707\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMeira N, Silva M, Bianchi A, Rabelo R. (2023). \u003cem\u003eGenerating Synthetic Faces for Data Augmentation with StyleGAN2-ADA\u003c/em\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMorales G, Kemper G, Sevillano G, Arteaga D, Ortega I, Telles J. Automatic Segmentation of Mauritia flexuosa in Unmanned Aerial Vehicle (UAV) Imagery Using Deep Learning. Forests. 2018;9(12). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.3390/f9120736\u003c/span\u003e\u003cspan address=\"10.3390/f9120736\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMotamed S, Rogalla P, Khalvati F. Data augmentation using Generative Adversarial Networks (GANs) for GAN-based detection of Pneumonia and COVID-19 in chest X-ray images. Inf Med Unlocked. 2021;27:100779. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1016/j.imu.2021.100779\u003c/span\u003e\u003cspan address=\"10.1016/j.imu.2021.100779\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eOpenAerialMap A. (2023). OpenAerialMap.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eOpenDroneMap A. ODM \u0026ndash; A command line toolkit to generate maps, point clouds, 3D models and DEMs from drone. balloon or kite images; 2020.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePaszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lerer A. (2017). \u003cem\u003eAutomatic differentiation in PyTorch\u003c/em\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePaszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Chintala S. (2019). PyTorch: An Imperative Style, High-Performance Deep Learning Library. In \u003cem\u003eAdvances in Neural Information Processing Systems 32\u003c/em\u003e (pp.\u0026nbsp;8024\u0026ndash;8035): Curran Associates, Inc.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePezoa F, Reutter JL, Suarez F, Ugarte M n., Vrgoč D. \u003cem\u003eFoundations of JSON schema\u003c/em\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePuliti S, \u0026Oslash;rka HO, Gobakken T, N\u0026aelig;sset E. Inventory of Small Forest Areas Using an Unmanned Aerial System. Remote Sens. 2015;7(8):9632\u0026ndash;54. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.3390/rs70809632\u003c/span\u003e\u003cspan address=\"10.3390/rs70809632\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eQin M, Fan Y, Guo H, Wang M. Application of Improved CycleGAN in Laser-Visible Face Image Translation. Sensors. 2022;22(11). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.3390/s22114057\u003c/span\u003e\u003cspan address=\"10.3390/s22114057\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRejeb A, Abdollahi A, Rejeb K, Treiblmaier H. Drones in agriculture: A review and bibliometric analysis. Comput Electron Agric. 2022;198:107017. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1016/j.compag.2022.107017\u003c/span\u003e\u003cspan address=\"10.1016/j.compag.2022.107017\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRen S, He K, Girshick R, Sun J. (2015). \u003cem\u003eFaster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks\u003c/em\u003e. Paper presented at the Advances in Neural Information Processing Systems.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSandfort V, Yan K, Pickhardt PJ, Summers RM. Data augmentation using generative adversarial networks (CycleGAN) to improve generalizability in CT segmentation tasks. Sci Rep. 2019;9(1):16884. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1038/s41598-019-52737-x\u003c/span\u003e\u003cspan address=\"10.1038/s41598-019-52737-x\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eShorten C, Khoshgoftaar TM. A survey on Image Data Augmentation for Deep Learning. J Big Data. 2019;6(1):60. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1186/s40537-019-0197-0\u003c/span\u003e\u003cspan address=\"10.1186/s40537-019-0197-0\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTariq U, Qureshi R, Zafar A, Aftab D, Wu J, Alam T, Ali H. (2023, 2023//). \u003cem\u003eBrain Tumor Synthetic Data Generation with Adaptive StyleGANs.\u003c/em\u003e Paper presented at the Artificial Intelligence and Cognitive Science, Cham.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eThomas RL, Chan KW, Easau PT. Phyllotaxis in the Oil Palm: Arrangement of Fronds on the Trunk of Mature Palms. Ann Botany. 1969;33(5):1001\u0026ndash;8. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1093/oxfordjournals.aob.a084328\u003c/span\u003e\u003cspan address=\"10.1093/oxfordjournals.aob.a084328\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTorralba A, Russell BC, Yuen J. (2010). LabelMe: Online Image Annotation and Applications. \u003cem\u003eProceedings of the IEEE, 98\u003c/em\u003e(8), 1467\u0026ndash;1484. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1109/JPROC.2010.2050290\u003c/span\u003e\u003cspan address=\"10.1109/JPROC.2010.2050290\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eVolpato L, Pinto F, Gonzalez-Perez L, Thompson IG, Borem A, Reynolds M, Rodrigues FA Jr.. High Throughput Field Phenotyping for Plant Height Using UAV-Based RGB Imagery in Wheat Breeding Lines: Feasibility and Validation. Front Plant Sci. 2021;12:591587. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.3389/fpls.2021.591587\u003c/span\u003e\u003cspan address=\"10.3389/fpls.2021.591587\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWu Y, Kirillov A, Massa F, Lo W-Y, Girshick R. (2019). Detectron2.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWyawahare M, Ekbote N, Pimperkhede S, Deshpande A, Bapat P, Aphale I. (2023, 2023//). \u003cem\u003eComparison of Image Blending Using Cycle GAN and Traditional Approach.\u003c/em\u003e Paper presented at the Pervasive Computing and Social Networking, Singapore.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eXie S, Girshick R, Doll\u0026aacute;r P, Tu Z, He K. (2017, 21\u0026ndash;26 July 2017). \u003cem\u003eAggregated Residual Transformations for Deep Neural Networks.\u003c/em\u003e Paper presented at the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYates M, Hart G, Houghton R, Torres MT, Pound M. Evaluation of synthetic aerial imagery using unconditional generative adversarial networks. ISPRS J Photogrammetry Remote Sens. 2022;190:231\u0026ndash;51. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1016/j.isprsjprs.2022.06.010\u003c/span\u003e\u003cspan address=\"10.1016/j.isprsjprs.2022.06.010\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhao L, Li S. Object Detection Algorithm Based on Improved YOLOv3. Electronics. 2020;9(3). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.3390/electronics9030537\u003c/span\u003e\u003cspan address=\"10.3390/electronics9030537\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhao W, Yamada W, Li T, Digman M, Runge T. Augmenting Crop Detection for Precision Agriculture with Deep Visual Transfer Learning\u0026mdash;A Case Study of Bale Detection. Remote Sens. 2021;13(1). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.3390/rs13010023\u003c/span\u003e\u003cspan address=\"10.3390/rs13010023\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhu JY, Park T, Isola P, Efros AA. (2017, 22\u0026ndash;29 Oct. 2017). \u003cem\u003eUnpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks.\u003c/em\u003e Paper presented at the 2017 IEEE International Conference on Computer Vision (ICCV).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZoph B, Cubuk ED, Ghiasi G, Lin T-Y, Shlens J, Le QV. (2020, 2020//). \u003cem\u003eLearning Data Augmentation Strategies for Object Detection.\u003c/em\u003e Paper presented at the Computer Vision \u0026ndash; ECCV 2020, Cham.\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"journal-of-big-data","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"bigd","sideBox":"Learn more about [Journal of Big Data](http://journalofbigdata.springeropen.com)","snPcode":"40537","submissionUrl":"https://submission.nature.com/new-submission/40537/3","title":"Journal of Big Data","twitterHandle":"@SpringerOpen","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"em","reportingPortfolio":"BMC/SO AJ","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"Oil palm segmentation, GAN, Object detection, Object segmentation, Data augmentation, Detection transformer, Detectron, Phenotyping","lastPublishedDoi":"10.21203/rs.3.rs-3833628/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-3833628/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eIn digital agriculture, a central challenge in automating drone applications in the plantation sector, including oil palm, is the development of a detection model that can adapt across diverse environments. This study addresses the feasibility of using GAN augmentation methods to improve palm detection models. For this purpose, drone images of young palms (\u0026lt;\u0026thinsp;5 year-old) from eight different estates were collected, annotated, and used to build a baseline detection model based on DETR. StyleGAN2 was trained on the extracted palms and then used to generate a series of synthetic palms, which were then inserted into tiles representing different environments. CycleGAN networks were trained for bidirectional translation between synthetic and real tiles, subsequently utilized to augment the authenticity of synthetic tiles. Both synthetic and real tiles were used to train the GAN-based detection model. The baseline model achieved precision and recall values of 95.8% and 97.2%, whereas the GAN-based model achieved precision and recall values of 98.5% and 98.6%. In the challenge dataset 1 consisting older palms (\u0026gt;\u0026thinsp;5 year-old), both models also achieved similar accuracies, with baseline model achieving precision and recall of 93.1% and 99.4%, and GAN-based model achieving 95.7% and 99.4%. As for the challenge dataset 2 consisting of storm affected palms, the baseline model achieved precision of 100% but recall was only 13%, whereas GAN-based model achieved a high precision and recall values of 98.7% and 95.3%. This result demonstrates that images generated by GANs have the potential to enhance the accuracies of palm detection models.\u003c/p\u003e","manuscriptTitle":"Enhancing Object Segmentation Model with GAN-based Augmentation using Oil Palm as a Reference","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2024-01-08 19:22:59","doi":"10.21203/rs.3.rs-3833628/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2024-06-25T11:43:08+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2024-05-31T12:59:35+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2024-05-22T10:24:28+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"267077478157760265276087556985618036583","date":"2024-05-11T11:55:07+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"106082652367497616646433292981307393930","date":"2024-05-11T11:38:44+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2024-02-02T14:49:23+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2024-01-29T00:02:51+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2024-01-05T09:16:09+00:00","index":"","fulltext":""},{"type":"submitted","content":"Journal of Big Data","date":"2024-01-04T04:37:33+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"journal-of-big-data","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"bigd","sideBox":"Learn more about [Journal of Big Data](http://journalofbigdata.springeropen.com)","snPcode":"40537","submissionUrl":"https://submission.nature.com/new-submission/40537/3","title":"Journal of Big Data","twitterHandle":"@SpringerOpen","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"em","reportingPortfolio":"BMC/SO AJ","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"ba7a9196-23a9-45d2-9a83-7fa2b09ec273","owner":[],"postedDate":"January 8th, 2024","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[],"tags":[],"updatedAt":"2024-09-09T16:11:36+00:00","versionOfRecord":{"articleIdentity":"rs-3833628","link":"https://doi.org/10.1186/s40537-024-00990-x","journal":{"identity":"journal-of-big-data","isVorOnly":false,"title":"Journal of Big Data"},"publishedOn":"2024-09-08 15:57:13","publishedOnDateReadable":"September 8th, 2024"},"versionCreatedAt":"2024-01-08 19:22:59","video":"","vorDoi":"10.1186/s40537-024-00990-x","vorDoiUrl":"https://doi.org/10.1186/s40537-024-00990-x","workflowStages":[]},"version":"v1","identity":"rs-3833628","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-3833628","identity":"rs-3833628","version":["v1"]},"buildId":"qtupq5eGEP_6zYnWcrvyt","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.