Unsupervised Domain Adaptation for Cross-domain Remote Sensing Object Detection Via Joint Input and Feature Space | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Unsupervised Domain Adaptation for Cross-domain Remote Sensing Object Detection Via Joint Input and Feature Space Deliang Chen, Taotao Cheng, Siyu Hong, Wen Gao, Zixuan Lu, Liu Yang, and 2 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-6642304/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract The rapid advancement of deep learning has led to significant achievements in remote sensing object detection. However, domain shift often causes notable performance drops when models trained on one domain are applied to real-world scenarios. Unsupervised domain adaptation (UDA) offers a solution by narrowing domain gaps. Generative adversarial networks (GANs) are commonly used for this purpose, but they can degrade key textures and details in source images. To address this, we propose a method that integrates transformations in both input and feature spaces. First, we standardize image dimensions across source and target domains. Then, a Joint Color Space Transformation (JCST) module operates in the feature space to decouple and recombine color channels, preserving crucial image details while aligning data distributions. We validated our approach on a dataset containing large-, medium-, and small-scale objects, using multiple object detection models. Results show that our method boosts average detection accuracy by 2–4% on source domain images, demonstrating improved generalization and robustness in cross-domain tasks. Physical sciences/Optics and photonics/Optical techniques/Imaging and sensing Physical sciences/Mathematics and computing/Computer science Unsupervised domain adaptation Joint Color Space Transformation(JCST) High-resolution remote sensing images Cross-domain object detection Multi-model federation Multi-scale object Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 1. Introduction Progress in earth observation technology has led to greater accessibility of high-resolution remote sensing imagery, now obtainable more easily and at a lower cost [ 1 – 3 ]. We have now entered the era of big data in remote sensing, when automated picture interpretation is crucial for realizing its full potential [ 4 , 5 ]. In recent years, there have been substantial breakthroughs in the application of deep learning approaches for the detection of objects using remote sensing [ 6 , 7 ]. When suitable labeled image examples are available, it is possible to train object detection models based on deep learning in an integrated fashion, which has distinct advantages over conventional methods that rely on manually produced characteristics. Nonetheless, these deep learning methodologies are significantly reliant on substantial supervision (i.e., big, well labeled datasets) and necessitate analogous data distributions between training and testing sets. In contrast to natural photographs, remote sensing photos frequently display considerable diversity among several datasets due to changing imaging conditions, including time, location, and sensor specifications. Consequently, utilizing a model trained on source-domain images (i.e., the training set) directly on target-domain images (i.e., the test set) frequently results in a significant decline in performance [ 8 ]. Therefore, it is an immediate need to fix the disparity between the datasets for the source and target domains. In order to lessen the impact of domain bias, unsupervised domain adaptation models aim to bring the distributions of features from the source and target domains into agreement [ 9 , 10 ]. By projecting the characteristics of the two domains into a large-scale space and then reducing the difference between their distributions, approaches that use the maximum divergence of averages facilitate the transfer of characteristics [ 11 ]. On the other hand, adversarial learning-based methods use adversarial training to help the source and target domains' feature distributions become more comparable [ 12 ]. These algorithms encounter difficulties when used on high-resolution imagery from satellites, even if they have shown notable success in cross-domain object detection for natural images. Remote sensing images markedly differ from natural images owing to discrepancies in capture angles, temporal conditions, spatial resolutions, and display considerable intra- and inter-class variability. This variability sometimes causes the target-domain images generated by adversarial learning to lose important texture and feature information, resulting in differences from the source images. These differences create additional challenges for detecting tasks. Prior research has primarily focused on cross-domain recognition of tiny targets, but large targets such as airports and runways in remote sensing imagery have received scant attention. As the target size enlarges, the detection frame encompasses additional background information. A significant domain disparity between the source and target leads to diminished detection accuracy for larger targets. This paper investigates the issues related to cross-domain object detection in remote sensing. To mitigate performance deterioration caused by uneven distributions across domains, we propose a joint domain adaptation strategy designed for cross-domain detection. To prevent the loss of detail and texture when transferring images from the source to the destination domain, we provide the JCST module for effective image information transfer. Moreover, extensively utilized remote sensing datasets, including DOTA[ 13 ], DIOR [ 7 ], and LEVIR [ 14 ], encompass varied data sources and comparable target scales. Nevertheless, datasets that provide consistent data sources and diverse target scales are uncommon. To rectify this deficiency, we created a novel dataset from uniform data sources while incorporating varied goal scales to enhance the assessment of our method's efficacy. A multitude of studies validates the efficacy of our methodology across various detection models. The primary contributions of this paper include: We present the first use case of the JCST module, which is for object detection in remote sensing using unsupervised domain adaptation. The JCST is incorporated into a generative adversarial network to facilitate picture transfer between domains, hence improving detection efficacy. We present the DSOD dataset, which includes uniform data sources and multi-scale targets (large, medium, small) to evaluate the framework's resilience. The dataset is available at https://github.com/ChiYuIII/DSOD . The subsequent sections of the paper are structured as follows: Section II addresses related work, Section III outlines the proposed method, Section IV provides experimental results and discussions, and Section V concludes the paper. 2. Related work 2.1. Unsupervised deep domain adaptation The goal of unsupervised domain adaptation is to find a solution to the distributional inconsistencies that exist between the data from the source domain and the data from the target domain. This is a problem that is common across a variety of datasets. The main aim is to efficiently leverage shared features across these domains and implement them in tasks within the source domain [ 8 ]. Prior approaches in this field extracted domain-invariant features using the Maximum Mean Discrepancy (MMD) technique. With the growing integration of deep learning in domain adaptation research, unsupervised deep domain adaptation has become a prominent methodology. The methods can be classified into four distinct categories. One significant method is domain adaptation using Generative Adversarial Networks (GANs), which aim to balance source and target domains through an adversarial formation that consists of a generator and a discriminator. Notable techniques, including CycleGAN [ 15 ], StarGAN [ 16 ], and GatedGAN [ 17 ], help the target domain's image styles to be transferred from the source domain. Methods based on adversarial networks integrate adversarial strategies into the domain adaptation problem, enabling the trained network to differentiate between categories while also capturing domain-invariant features. Class-level adversarial networks, as introduced by [ 18 ], maintain semantic consistency locally. [ 19 ] Enhance adversarial learning by incorporating both strong and weak data enhancement techniques, as well as mutual learning within a teacher-student framework, thereby enabling the student model to acquire unique domain characteristics through self-learning. The reconstruction-based approach utilizes a self-encoder to reduce information loss in the reconstruction of input data, thereby preserving the essential features of the original data. [ 20 ] To improve the extraction and refinement of shared characteristics, a domain separation network has been implemented to isolate specific information in source and target domains. [ 21 ] suggested a collaborative learning technique to find shared characteristics between synthetic and real images in the target domain at the pixel, region, and image levels. Domain distribution differences are used to bridge the gap between domains to reduce generalization error in the target domain. [ 22 ] implemented an optimal multiple kernel selection strategy for mean embedding alignment, which significantly minimized domain discrepancies and facilitated the effective adaptation of deep convolutional neural networks in domain transfer contexts. These approaches have notably advanced research in domain adaptation, enhancing the development of the field. 2.2. Cross Domain Object Detection Recent advancements in unsupervised domain adaptation techniques for object detection tasks have been significant. There are three basic types of cross-domain object identification techniques that use unsupervised domain adaptation: disparity-based techniques, adversarial-based techniques, and reconstruction-based techniques. [ 23 ] to reduce the discrepancy between formation and test data at the example and image level, we developed two adaptation modules based on the H-divergence theory. [ 24 ] utilized the CycleGAN model to improve detection performance by transforming datasets from daytime to nighttime, employing the generated images for training the detection model. [ 25 ] suggested a technique for unsupervised domain adaptation that combines less strict global alignment with strong local feature alignment. In a similar vein, [ 26 ] implemented consistency regularization within the Mean Teacher framework to mitigate domain bias during the transfer from synthetic to real images. In the same vein, [ 27 ] developed an automatic annotation framework that enhances the robustness of pedestrian detection via a dual-stream region proposal network. [ 28 ] generated high-quality synthetic images from 3D CAD models to mitigate data scarcity. [ 29 ] offered a way to generate images without human supervision with the goal of reducing domain inconsistencies in thermal and visual photography. [ 30 ] With ingenuity, the inter-domain reconnaissance problem has been solved by employing a customized dorsal network to extract local and global information, thereby enabling the recognition of small and closely clustered targets. While these methods have improved object detection in natural images, cross-domain remote sensing object detection continues to pose challenges due to the complex structure of remote sensing imagery. 2.3. Remote sensing cross domain object detection Significant progress has been made in related domains in recent years, but cross-domain remote sensing object identification is still understudied and its adaptability potential is yet untapped. [ 31 ] presented a deep learning migration-based change detection method that pre-trains and fine-tunes source and target domain models to address poor adaption. [ 32 ] suggested a progressive migration-based unsupervised SAR ship detecting system, this enables a precise detection in non-labelled SAR images by aligning the characteristics of the source and target fields at the pixel level, as well as the characteristics and predictions. [ 33 ] introduced a semi-synthetic data generator for the production of remotely sensed data and developed domain-adapted detectors to address cross-domain recognition challenges in remote sensing images. Additionally, [ 34 ] suggested a multi-source change network that is domain-adaptive to address the shortcomings of adversarial domain adaptation methods, particularly the neglect of instance features. [ 35 ] network includes a data augmentation module SDA, a noise-reducing and feature-enhancing module SFR, and a PLG module that utilizes source domain knowledge to label unlabeled target domains, thereby mitigating domain bias in remote sensing object detection. These studies offer significant theoretical and methodological foundations for the advancement of cross-domain remote sensing object detection. 3. Methodology This study's overall framework is depicted in Fig. 1 . The first step is to perform a domain-supervised adaptation by employing the JCST_GAN model to convert the images via the source into the destination. After that, the related detection models are trained on a multi-scale target detection dataset. In the concluding phase, multiple detection models are utilized on the domain-adapted images to yield the final detection outcomes. 3.1. Multi-scale cross domain remote sensing object detection dataset Current publicly available datasets exhibit significant variability in data sources, while target sizes tend to be consistent. We developed a remote sensing target detection dataset utilizing consistent data sources while varying target sizes to assess the precision and robustness of our method. The dataset comprises 14,640 Google tiles, each possessing a resolution of 1024 × 1024 pixels. Although the image dimensions remain constant, the spatial resolutions vary, resulting in significant geographic and scene diversity. Each subset was balanced and representative of the target categories and distributions by partitioning the dataset into training, validation, and test sets in a 6:2:2 ratio. The dataset comprises four distinct target categories: storage tanks, airplanes, wind turbines, and airstrips. The targets were chosen based on significant size variations observed in remote sensing images. Storage tanks and airplanes are categorized as small targets, wind turbines as medium-sized targets, and airstrips as large-scale targets. This variety allows the dataset to more accurately represent complex real-world conditions, providing a thorough evaluation of the detection model's capabilities. Figure 2 displays sample images for each target type, while Table 1 outlines the quantity, spatial distribution, and associated Google tile levels of the dataset's targets. This configuration enables the assessment of model performance across various target sizes and environmental conditions. Table 1 Data labeling details. Label Train Validate Test Label box Total Google level Scale Storage 1415 470 472 36468 2357 18 Small Airplane 1707 569 569 33379 2845 17 Small Wind turbine 4449 1483 1483 29892 7415 17 Medium Runway 1214 404 405 2806 2023 16 Large Total 8785 2926 2929 - 14640 - - 3.2. Unsupervised Domain Adaptation Based on Generative Adversity Through finding invariant semantic characteristics, unsupervised techniques may efficiently translate pictures from the source domain to the destination domain, hence reducing domain differences. Unsupervised domain adaptation techniques, mostly utilizing Generative Adversarial Networks (GANs), are often categorized into two types: those necessitating paired images and those that do not. The former utilizes strictly paired images for supervised learning, exemplified by conditional GANs, initially developed for image-to-image translation with paired data. This approach is ill-suited for cross-domain remote sensing object detection problems. Conversely, the latter does not depend on paired images, rendering it more suitable for generalization to cross-domain remote sensing object detection applications. CycleGAN minimizes dependence on image pairs by employing a cyclic consistency loss and adversarial treatment between the generator and discriminator to generate highly qualitative images that accurately reflect the target domain. Nonetheless, the varied acquisition conditions of remote sensing photos, including imaging modes, object scale, color saturation, and shooting angles, can lead to substantial domain-specific discrepancies. The interrelated challenges lead to significant stylistic variations across remote sensing photos from various domains. Consequently, the accuracy of target detection may be impacted by the loss of detail and texture information that may result from the application of style transfer. We propose an improved CycleGAN model that maintains detail and texture during style transfer to resolve this issue. Furthermore, we do quantitative comparisons between our enhanced CycleGAN and other prominent unsupervised style transfer techniques, with experimental findings illustrating the better efficacy of our method in preserving visual details and texture. 3.3. Unsupervised Domain Adaptation Based on Generative Adversity 3.3.1. Input space alignment Fixed input size in the model results in fluctuations in image resolution, which causes discrepancies in context and layout, hence complicating adversarial learning. To resolve this, we utilize uniform scaling and fixed-window cropping, as demonstrated in Fig. 3 , to synchronize the resolution between the source and target domains. We attain contextual consistency for low-resolution images in the target domain by scaling them to correspond with the resolution of the source domain. We then crop the scaled photos with a fixed-size window to match the dimensions of the source domain images, ensuring layout consistency [ 36 , 37 ]. When the target domain image possesses a greater resolution, employing fixed-window cropping on the source domain photos, in conjunction with scaling the target images, yields a comparable outcome. $$\:\left\{\begin{array}{c}\text{φ}\left(\text{δ}\left({\text{x}}_{\text{T}}\text{,\:r}\right)\text{,\:S}\right)\text{}\text{i}\text{f}\text{}{\text{R}}_{\text{T}}\text{<}{\text{R}}_{\text{S}}\\\:\text{δ}\left({\text{x}}_{\text{T}}\text{,\:r}\right)\text{,}\text{}\text{φ}\left({\text{x}}_{\text{S}}\text{,\:}{\text{S}}_{{\text{x}}_{\text{T}}\text{·r}}\right)\text{}\text{i}\text{f}\text{}{\text{R}}_{\text{T}}\text{>}{\text{R}}_{\text{S}}\end{array}\right.$$ 1 where \(\:\text{δ}\) and \(\:\text{φ}\) denote uniform scaling and fixed-window cropping, \(\:{\text{x}}_{\text{T}}\) and \(\:{\text{x}}_{\text{S}}\) denote the target and source images, respectively, \(\:{\text{R}}_{\text{S}}\) and \(\:{\text{R}}_{\text{T}}\) are the resolutions of the source and target images, respectively, \(\:\text{S}\) is the input size, \(\:\text{r}\) is the resizing factor, i.e., the ratio between \(\:{\text{R}}_{\text{T}}\) and \(\:{\text{R}}_{\text{S}}\) ratio, and \(\:{\text{S}}_{{\text{x}}_{\text{T}}\text{·r}}\) is the size of \(\:{\text{x}}_{\text{T}}\) after resizing. 3.3.2. JCST Module Remote sensing object detection typically utilizes high-resolution images that encompass abundant detailed information; however, CycleGAN is primarily engineered for style transfer of low- and medium-resolution images. The current network's depth and complexity are insufficient to fully capture all detailed information in high-resolution images. The generator is incapable of accurately capturing and reproducing intricate textures and edge details while producing high-resolution photos. This results in the loss of much detail and texture information in the moved image. To address the aforementioned issues and guarantee the complete preservation of detail and texture information from the source image during style migration with CycleGAN, we developed a Joint Color Space Transformation (JCST) module, as illustrated in Fig. 4 . The CycleGAN network uses style migration to convert the source domain image into the target domain image. Following the accomplishment of getting the migrated target domain picture, a color space conversion is carried out in order to change the image from the RGB color space to the YCbCr color space, which delineates the luminance information (Y channel) from the chrominance information (Cb and Cr channels). The Y, Cb, and Cr channels of the transferred target domain image are subsequently isolated. Luminance data (Y channel) and chrominance data (Cb and Cr channels) are extracted, and the Y channel from the original source domain image is merged with the Cb and Cr channels from the migrated target domain image. This method integrates luminance data from the source domain image to preserve the structure and details of the original image, while imparting the target domain image with the chromatic style derived from its chromaticity information. The resultant image preserves the details and structure of the source domain while integrating the stylistic attributes of the target domain, so producing a target domain image characterized by visual coherence and superior quality. 3.4. Experimental design 3.4.1. Task setup In this research, we conducted remote sensing cross domain object detection ex-periments using the constructed multi-scale remote sensing object detection dataset (DSOD). Specifically, we take the DSOD dataset as the target domain data and select the Gaofen-2 image of a region as the source domain data. The Gaofen-2 image under-goes preprocessing to provide RGB three-channel data, which is subsequently cropped to match the dimensions of the DSOD dataset. We conducted three experimental sets for object detection employing a multi-model detection strategy: 1. Remote sensing object detection directly on images in the source domain; 2. Remote sensing object detection on images post-style migration via CycleGAN; 3. Using our suggested technique for distant sensing object detection on images after style migration.PyTorch is used for each and every experiment to be carried out in this investigation. The model training is expedited utilizing the NVIDIA RTX 4080, which possesses 24GB of graphics RAM. 3.4.2. Evaluation of the indexes We propose to use Peak Signal to Noise Ratio (PSNR) and Structural Similarity Index (SSIM) to evaluate the quality of the images acquired using the proposed method. $$\:\text{PSNR=10∙}{\text{log}}_{\text{10}}\text{(}\frac{{\text{MAX}}^{\text{2}}}{\text{MSE}}\text{)}$$ 2 $$\:\text{SSIM(x,\:y)=}\frac{\text{(2}{\text{μ}}_{\text{x}}{\text{μ}}_{\text{y}}\text{+}{\text{C}}_{\text{1}}\text{)(2}{\text{σ}}_{\text{xy}}\text{+}{\text{C}}_{\text{2}}\text{)}}{\text{(}{\text{μ}}_{\text{x}}^{\text{2}}\text{+}{\text{μ}}_{\text{y}}^{\text{2}}\text{+}{\text{C}}_{\text{1}}\text{)(}{\text{σ}}_{\text{x}}^{\text{2}}\text{+}{\text{σ}}_{\text{y}}^{\text{2}}\text{+}{\text{C}}_{\text{2}}\text{)}}$$ 3 Where \(\:\text{MAX}\) is the maximum possible pixel value of the image, \(\:\text{MSE}\) is the mean square error, \(\:{\text{μ}}_{\text{x}}\) and \(\:{\text{μ}}_{\text{y}}\) are the mean values of the image \(\:\text{x}\) and \(\:\text{y}\) , \(\:{\text{σ}}_{\text{x}}^{\text{2}}\) and \(\:{\text{σ}}_{\text{y}}^{\text{2}}\) are the variances of the image \(\:\text{x}\) and \(\:\text{y}\) , \(\:{\text{σ}}_{\text{xy}}\) is the covariance of the image \(\:\text{x}\) and \(\:\text{y}\) , and \(\:{\text{C}}_{\text{1}}\) and \(\:{\text{C}}_{\text{2}}\) are constants used for stabilization calculation. We assessed the detection model's performance in our tests using five traditional evaluation metrics obtained from the confusion matrix: accuracy, recall, F1 score, and mAP value. Accuracy assesses the model's capacity to forecast positive instances, recall evaluates the model's proficiency in properly identifying positive instances, and the F1 score is the harmonic mean of accuracy and recall, offering a balanced evaluation of the model's performance. mAP, conversely, indicates the comprehensive performance of the model across all areas. $$\:\text{Precision=}\frac{\text{TP}}{\text{TP+FP}}$$ 4 $$\:\text{Precision=}\frac{\text{TP}}{\text{TP+FP}}$$ 5 $$\:{\text{F}}_{\text{1}}\text{=}\frac{\text{2×Precision×Recall}}{\text{Precision+Recall}}$$ 6 $$\:\text{mAP}\text{=}\frac{\text{1}}{\text{n}}\sum\:_{\text{i}\text{=1}}^{\text{n}}{\text{AP}}_{\text{i}}$$ 7 \(\:\text{TP}\) , \(\:\text{FP}\) , and \(\:\text{FN}\) are the number of true cases, false positive cases, and false negative cases respectively, \(\:\text{n}\) denotes the number of categories, and \(\:{\text{AP}}_{\text{i}}\) denotes the \(\:\text{AP}\) value of the ith category. 3.4.3. Multi-modal detection A multi-model detection strategy is proposed, aiming to synthesize high-quality images and multiple types of object detection models generated by the improved CycleGAN through an integrated learning approach, for the purpose of verifying the proposed method's accuracy and robustness in the detection system. First, we normalize and resize the input images to ensure that they can adapt to the input requirements of each target detection model and enhance the model's flexibility in adjusting to various settings and illumination. Subsequently, the optimized CycleGAN network is utilized for image style migration to generate higher quality and more detailed target images. The JCST module, which is incorporated into the upgraded CycleGAN, efficiently maintains the target objects' features and edge information while also greatly improving the output images' clarity and realism. Next, the generated high-quality images are input into the following four different types of object detection models for detection, respectively. Single-stage detection model: directly predicts the locations and classes of all targets in the image through a single forward propagation[ 38 ]. Models based on deformed convolutional and Transformer architectures: combining deformed convolutional networks and Transformer mechanisms, they can effectively capture shape changes and contextual information of targets, and are suitable for target detection and pose estimation in complex scenes[ 39 ]. A two-stage model based on region proposition network (RPN): creates potential target regions first, after which these regions are subjected to categorization and bounding box regression.[ 40 ]. Fully convolutional one-stage model: the location and class of targets are directly predicted by a fully convolutional network on feature maps at different scales[ 41 ]. These techniques are employed to confirm that the enhanced CycleGAN's high-quality pictures may be applied to multi-model detection tasks. This strategy not only brings substantial performance improvement for target detection tasks in complex scenes, but also demonstrates the potential and practical application value of the improved CycleGAN in remote sensing image processing applications. 4. Experimental results and Discussion 4.1. Unsupervised domain adaptation results The aim of this research is to improve target detection through style migration with the advanced CycleGAN. The source domain comprises the preprocessed Gaofen-2 images, whereas the target domain consists of the image from the DOSD dataset. Table 2 presents the quantitative experimental data, whereas Fig. 5 illustrates the visualization outcomes. The CycleGAN generator employs a U-Net-like architecture, which enhances its ability to preserve global information; however, it exhibits shortcomings in processing high-frequency image details, particularly during the downsampling and upsampling phases, leading to a loss of detail and texture information. This negatively impacts the future target detection. To reduce the loss of texture and detail in produced images, we integrate the JCST module into the CycleGAN network, aiming to modify just the image style while retaining enhanced detail information. Experiments show that the upgraded CycleGAN outperforms the original in structural similarity (SSIM) and peak signal-to-noise ratio (PSNR), while the JCST module improves images. In the visualization findings, our method excels in retaining high-frequency information, and the generated images closely resemble the original images, offering more dependable inputs for the target recognition model. While our enhanced method alleviates the issue of high-frequency information loss to a degree, it remains incapable of completely restoring all details in very intricate texture areas. Furthermore, despite attaining notable enhancements in SSIM and PSNR, inconsistencies may persist between the generated image and the target domain image in other quantitative measures (e.g., color and spatial consistency). This may influence the efficacy of target detection across various application contexts; hence, subsequent study should investigate in greater depth how to regulate the extent of style migration to maintain semantic coherence within the produced image and the target domain image. Table 2 Comparison of image of quality generated using CycleGAN and JCST_GAN. CycleGAN JCST_GAN SSIM 0.85 0.99 PSNR 23.99 30.82 In addition, the introduction of the JCST module increases the complexity and training time of the model, especially when processing high-resolution images, and the demand for computational resources increases significantly, which may limit its application in large-scale datasets or real-time applications. Future exploration of model trimming or parameter optimization may further decrease computing expenses. And different detection tasks and datasets may on the effect of generated images, so validating the generalization of the method in a wider range of future research will continue to focus on application scenarios. CycleGAN is chosen as the base model for style migration mainly based on its wide application and excellent performance in image translation tasks. Compared with other mainstream models, CycleGAN achieves high-quality image style migration without the need for pairs of training data, which makes it outstanding in diverse datasets and unsupervised environments. From the results in Table 3 , compared to other popular models, CycleGAN produces images of noticeably higher quality., which further validates its advantages in the style migration task. In addition, the architecture of CycleGAN is highly flexible and easy to combine with other models. For example, the JCST module introduced in this study is an example, which realizes a significant improvement in the detail fidelity of the generated images by seamlessly combining with CycleGAN. Table 3 Comparison of image quality generated by different models. DualGAN DiscoGAN CycleGAN SSIM 0.74 0.69 0.82 PSNR 17.63 14.83 21.36 4.2. Multi-model object detection results The performance of many object identification models is compared and analyzed in this part, with an emphasis on the outcomes before and after the suggested approach was used. Figure 6 presents the results, while Table 4 illustrates the detection outcomes for targets including wind turbines, storage tanks, airplanes, and airport runways. The comparison indicates that the style-migrated images outperform the original images in the object detection task, with an improvement in the mAP of 2%-4%. The findings show that the suggested CycleGAN augmentation technique effectively migrates picture styles while simultaneously improving the object identification model's performance, particularly in situations involving many targets, showing significant advantages. In terms of specific model performance, YOLOv8 and FCOS show a more comprehensive improvement in all object detection tasks, with F1 scores improved by 2%-6%. This suggests that style migration enhances the detection capability of these models, especially in complex multi-target scenarios.The RE-DETR model also exhibits F1 score enhancement in the detection of wind turbines, airplanes, and airport runways, but the F1 scores are unchanged for the detection of storage tank targets due to the fact that while the accuracy improves, the decrease in the recall rate offsets this advantage. On the contrary, the F1 score of Faster R-CNN decreases in the detection of wind motors, mainly due to the decrease in recall after style migration, and the model fails to recognize some of the wind motor targets, but its accuracy improves, indicating that the style-migrated photos more closely resemble the model training set data distribution. Despite its strong overall performance, the suggested approach has several drawbacks. First, the style migration is not effective enough to detect certain targets (e.g., storage tanks), and there is also a decrease in recall on some models (e.g., Faster R-CNN), suggesting that different models are differently adapted to style-migrated images. Future research can perform personalized optimization for different detection models to enhance the applicability of style migration. Second, in extremely complex scenes, style migration may still not be able to sufficiently improve the contrast between target and background, It implies that in order to further boost the detection performance in intricate scenarios, we can include further picture enhancing techniques or intricate multi-scale feature extraction procedures. Table 4 Proposed methodology and baseline methodology object detection results. Wind turbine Storage Airplane Runway R P F1 R P F1 R P F1 R P F1 mAP YOLO V8 0.86 0.88 0.87 0.57 1.00 0.72 0.60 1.00 0.75 0.46 0.81 0.59 0.92 JCST_YOLO V8 0.81 0.97 0.89 0.62 1.00 0.77 0.67 1.00 0.81 0.54 0.83 0.65 0.95 RE-DETR 0.81 0.95 0.88 0.63 0.99 0.77 0.36 0.92 0.52 0.86 0.77 0.81 0.91 JCST_RE-DETR 0.86 1.00 0.93 0.62 1.00 0.77 0.44 0.95 0.60 0.86 0.80 0.83 0.94 Faster R-CNN 0.86 0.97 0.91 0.94 0.42 0.58 0.95 0.96 0.95 0.64 0.90 0.75 0.81 JCST_Faster R-CNN 0.72 1.00 0.84 0.96 0.42 0.59 0.95 0.98 0.96 0.64 1.00 0.78 0.85 FCOS 0.84 0.88 0.86 0.90 0.93 0.91 0.94 0.96 0.95 0.57 0.70 0.63 0.87 JCST_FCOS 0.84 0.90 0.87 0.92 0.95 0.94 0.95 0.99 0.97 0.57 0.73 0.64 0.89 4.3. Ablation experiment We created and carried out a number of comparison tests to evaluate the efficacy of the suggested approach. First, the images designated for detection were input directly into the target detection model without any style migration. Second, the images underwent style migration using the traditional CycleGAN, and the resulting images were then fed into the object detection model for recognition. Finally, we employed the improved CycleGAN, which includes the JCST module, to perform style migration on the images prior to target detection. Table 5 provides a summary of these trials' findings. The data presented in the table indicates that the F1 scores and mAP values for object detection using images generated by the traditional CycleGAN are lower than those obtained from the original images. This can primarily be attributed to the significant loss of detail and texture information in the CycleGAN-generated images compared to the originals, which hampers the detection model’s ability to effectively extract key features of the targets, thereby affecting class discrimination and leading to decreased detection performance. However, the implementation of our proposed improved CycleGAN, enhanced by the JCST module, leads to a noticeable enhancement in the transferred images' object detection ability. Notably, for the Faster R-CNN model, the mAP value improved by 4%., with similar improvements observed across all other models. This suggests that the JCST module effectively mitigates the loss of image detail and texture, enhancing the applicability of the generated images to the target detection models and consequently improving detection accuracy. Table 5 Quantitative evaluation of ablation experiments. Wind turbine Storage Airplane Runway F 1 F 1 F 1 F 1 mAP YOLO V8 0.87 0.72 0.75 0.59 0.92 CycleGAN + YOLO V8 0.72 0.56 0.61 0.45 0.81 JCST_YOLO V8 0.89 0.77 0.81 0.65 0.95 RE-DETR 0.88 0.77 0.52 0.81 0.91 CycleGAN + RE-DETR 0.74 0.68 0.46 0.69 0.79 JCST_RE-DETR 0.93 0.77 0.60 0.83 0.94 Faster R-CNN 0.91 0.58 0.95 0.75 0.81 CycleGAN + Faster R-CNN 0.81 0.47 0.84 0.69 0.71 JCST_Faster R-CNN 0.84 0.59 0.96 0.78 0.85 FCOS 0.86 0.91 0.95 0.63 0.87 CycleGAN + FCOS 0.79 0.83 0.89 0.55 0.82 JCST_FCOS 0.87 0.94 0.97 0.64 0.89 5. Conclusions Due to the rapid advancements in artificial intelligence and object detection technology, many object detection tasks now employ deep learning techniques. However, remote sensing images are sourced from various origins, and even images taken by the same sensor may exhibit inconsistent data distributions due to differences in acquisition times and imaging angles. This disparity between the training dataset and the detection model's accuracy is greatly impacted by the images that need to be detected. We propose a method to harmonize the data distributions between the model's training set and the target images in order to resolve this issue. This technique is then applied to real-world detection tasks. In particular, we incorporate a JCST module into the CycleGAN framework to effectively preserve the details and texture of the images during the style transfer process, thereby improving the quality of the resultant images. The experimental results suggest that our method can enhance the accuracy of target detection by 2–4%. Moreover, to evaluate the effectiveness of target detection across various scales, we have developed a multi-scale target detection dataset that considers differences in target sizes. This dataset encompasses targets of three scales: large, medium, and small, with all images originating from the same source. The final experimental results indicate that our proposed method not only substantially improves the performance of various detection models but also improves the accuracy of detection for targets of varying sizes. The integration of the domain adaptation model with the target detection model will be the primary focus of future research in order to enhance the model's efficacy and achieve a higher level of automation. Declarations Data availability All data generated or analyzed during this study are included in this published article. Author Contributions: Conceptualization, D.C. and T.C.; methodology, D.C., T.C. and S.H.; validation, W.G., Z.L. and L.Y.; data curation, S.H.; writing—original draft preparation, D.C. and T.C.; writing—review and editing, S.H.; project administration, L.C.; funding acquisition, C.J. All authors have read and agreed to the published version of the manuscript. Funding: This research was funded by the National Natural Science Foundation of China, grant number No. 42401548. Competing interests The authors declare no competing interests. Consent to participate All authors voluntarily agree to participate in this research study. Consent to publish All authors voluntarily approved the publication of this research study. References Chen, Z. et al. Joint alignment of the distribution in input and feature space for cross-domain aerial image semantic segmentation. International J. Appl. Earth Observation Geoinformation 115 (2022). Zhang, L., Zhang, L. & Du, B. Deep learning for remote sensing data: A technical tutorial on the state of the art. IEEE Geoscience remote Sens. magazine . 4 , 22–40 (2016). Chi, M. et al. Big data for remote sensing: Challenges and opportunities. Proceedings of the IEEE 104, 2207–2219 (2016). Sun, X. et al. From single-to multi-modal remote sensing imagery interpretation: A survey and taxonomy. Sci. China Inform. Sci. 66 , 140301 (2023). Li, Y. et al. Learning deep semantic segmentation network under multiple weakly-supervised constraints for cross-domain remote sensing image semantic segmentation. ISPRS J. Photogrammetry Remote Sens. 175 , 20–33 (2021). Zhang, X. et al. Remote sensing object detection meets deep learning: A metareview of challenges and advances. IEEE Geoscience Remote Sens. Magazine (2023). Li, K., Wan, G., Cheng, G., Meng, L. & Han, J. Object detection in optical remote sensing images: A survey and a new benchmark. ISPRS J. photogrammetry remote Sens. 159 , 296–307 (2020). Oza, P., Sindagi, V. A., Sharmini, V. V. & Patel, V. M. Unsupervised domain adaptation of object detectors: A survey. IEEE Trans. Pattern Anal. Mach. Intelligence (2023). Shi, Y., Du, L., Li, C., Guo, Y. & Du, Y. Unsupervised domain adaptation for SAR target classification based on domain-and class-level alignment: From simulated to real data. ISPRS J. Photogrammetry Remote Sens. 207 , 1–13 (2024). Wilson, G. & Cook, D. J. A survey of unsupervised deep domain adaptation. ACM Trans. Intell. Syst. Technol. (TIST) . 11 , 1–46 (2020). Yan, H. et al. Weighted and class-specific maximum mean discrepancy for unsupervised domain adaptation. IEEE Trans. Multimedia . 22 , 2420–2433 (2019). Benjdira, B., Bazi, Y., Koubaa, A. & Ouni, K. Unsupervised domain adaptation using generative adversarial networks for semantic segmentation of aerial images. Remote Sens. 11 , 1369 (2019). Xia, G. S. et al. in Proceedings of the IEEE conference on computer vision and pattern recognition. 3974–3983. Zou, Z. & Shi, Z. Random access memories: A new paradigm for target detection in high resolution aerial remote sensing images. IEEE Trans. Image Process. 27 , 1100–1111 (2017). Chu, C., Zhmoginov, A. & Sandler, M. Cyclegan, a master of steganography. arXiv preprint arXiv:1712.02950 (2017). Choi, Y. et al. in Proceedings of the IEEE conference on computer vision and pattern recognition. 8789–8797. Chen, X., Xu, C., Yang, X., Song, L. & Tao, D. Gated-gan: Adversarial gated networks for multi-collection style transfer. IEEE Trans. Image Process. 28 , 546–560 (2018). Luo, Y., Zheng, L., Guan, T., Yu, J. & Yang, Y. in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2507–2516. Li, Y. J. et al. Cross-domain object detection via adaptive self-training. CoRR (2021). Tsai, J. C. & Chien, J. T. in 2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP). 1–6 (IEEE). Sun, R. et al. in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4360–4369. Long, M., Cao, Y., Wang, J. & Jordan, M. in International conference on machine learning. 97–105 (PMLR). Chen, Y., Li, W., Sakaridis, C., Dai, D. & Van Gool, L. in Proceedings of the IEEE conference on computer vision and pattern recognition. 3339–3348. Arruda, V. F. et al. in 2019 International Joint Conference on Neural Networks (IJCNN). 1–8 (IEEE). Saito, K., Ushiku, Y., Harada, T. & Saenko, K. in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 6956–6965. Cai, Q. et al. in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11457–11466. Cao, Y. et al. Pedestrian detection with unsupervised multispectral feature learning using deep neural networks. Inform. fusion . 46 , 206–217 (2019). Liu, W., Luo, B. & Liu, J. Synthetic data augmentation using multiscale attention CycleGAN for aircraft detection in remote sensing images. IEEE Geosci. Remote Sens. Lett. 19 , 1–5 (2021). Liu, P., Li, F., Yuan, S. & Li, W. Unsupervised Image-Generation Enhanced Adaptation for Object Detection in Thermal Images. Mobile information systems 1837894 (2021). (2021). Biswas, D. & Tešić, J. Domain adaptation with contrastive learning for object detection in satellite imagery. IEEE Trans. Geoscience Remote Sensing (2024). Yang, M., Jiao, L., Liu, F., Hou, B. & Yang, S. Transferred deep learning-based change detection in remote sensing images. IEEE Trans. Geosci. Remote Sens. 57 , 6960–6973 (2019). Shi, Y., Du, L., Guo, Y. & Du, Y. Unsupervised domain adaptation based on progressive transfer for ship detection: From optical to SAR images. IEEE Trans. Geosci. Remote Sens. 60 , 1–17 (2022). Xu, T. et al. Feature aligned domain adaptive object detection in remote sensing imagery. IEEE Trans. Geoscience Remote Sens. 60 . FADA , 1–16 (2022). Zhang, C. et al. A domain adaptation neural network for change detection with heterogeneous optical and SAR remote sensing images. Int. J. Appl. Earth Obs. Geoinf. 109 , 102769 (2022). Zhu, Y., Sun, X., Diao, W., Li, H. & Fu, K. RFA-Net: Reconstructed feature alignment network for domain adaptation object detection in remote sensing imagery. IEEE J. Sel. Top. Appl. Earth Observations Remote Sens. 15 , 5689–5703 (2022). Hashemzadeh, M., Asheghi, B. & Farajzadeh, N. Content-aware image resizing: an improved and shadow-preserving seam carving method. Sig. Process. 155 , 233–246 (2019). Asheghi, B., Salehpour, P., Khiavi, A. M. & Hashemzadeh, M. A comprehensive review on content-aware image retargeting: From classical to state-of-the-art methods. Sig. Process. 195 , 108496 (2022). Hussain, M. YOLO-v1 to YOLO-v8, the rise of YOLO and its complementary nature toward digital manufacturing and industrial defect detection. Machines 11 , 677 (2023). Zhao, Z. et al. RT-DETR-Tomato: Tomato Target Detection Algorithm Based on Improved RT-DETR for Agricultural Safety Production. Appl. Sci. 14 , 6287 (2024). Ren, S., He, K., Girshick, R., Sun, J. & Faster, R-C-N-N. Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39 , 1137–1149 (2016). Tian, Z., Shen, C., Chen, H. & He, T. F. C. O. S. A simple and strong anchor-free object detector. IEEE Trans. Pattern Anal. Mach. Intell. 44 , 1922–1933 (2020). Additional Declarations No competing interests reported. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-6642304","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":463079664,"identity":"4139dda6-d254-4cec-b026-c994804801b5","order_by":0,"name":"Deliang Chen","email":"","orcid":"","institution":"Nanjing University of Posts and Telecommunications","correspondingAuthor":false,"prefix":"","firstName":"Deliang","middleName":"","lastName":"Chen","suffix":""},{"id":463079665,"identity":"cadd0150-c95e-41d3-91d3-57257c251106","order_by":1,"name":"Taotao Cheng","email":"","orcid":"","institution":"Nanjing University of Posts and Telecommunications","correspondingAuthor":false,"prefix":"","firstName":"Taotao","middleName":"","lastName":"Cheng","suffix":""},{"id":463079666,"identity":"e9b0d3f5-4b75-4f73-bdcf-c15dc787344e","order_by":2,"name":"Siyu Hong","email":"","orcid":"","institution":"Nanjing University of Posts and Telecommunications","correspondingAuthor":false,"prefix":"","firstName":"Siyu","middleName":"","lastName":"Hong","suffix":""},{"id":463079667,"identity":"5b0e380b-1ae8-4d38-aacf-28e701b2ac1a","order_by":3,"name":"Wen Gao","email":"","orcid":"","institution":"Nanjing University of Posts and Telecommunications","correspondingAuthor":false,"prefix":"","firstName":"Wen","middleName":"","lastName":"Gao","suffix":""},{"id":463079668,"identity":"716c3a20-9547-4fde-8f1c-7938c2a26ac4","order_by":4,"name":"Zixuan Lu","email":"","orcid":"","institution":"Nanjing University of Posts and Telecommunications","correspondingAuthor":false,"prefix":"","firstName":"Zixuan","middleName":"","lastName":"Lu","suffix":""},{"id":463079669,"identity":"0384f055-947e-45ca-b19e-646819bbbd79","order_by":5,"name":"Liu Yang","email":"","orcid":"","institution":"Nanjing University of Posts and Telecommunications","correspondingAuthor":false,"prefix":"","firstName":"Liu","middleName":"","lastName":"Yang","suffix":""},{"id":463079670,"identity":"12065da7-1b55-4048-b746-92292e096018","order_by":6,"name":"Chen Ji","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAxElEQVRIiWNgGAWjYDACZgYGA4YKEMXAIEGCljMkaQEBxjYITZwWeXf2BwU/591hNzjAfPA2D4NdHkEthod5DAx7tz1jNjjAlmzNw5BcTFhLMw+DMeO2w0AtPGbSPAwHEhsIa2F/YMw4B6SF/xtxWuSZGQyMGRvAtrARp8WAGeiXnmOHmSUPsxlbzjFIJsKW/uPPDH7UHE7mO9788MabCjsibDnAwGYApJMhkWlASD3IlgYG5gdA2o4ItaNgFIyCUTBSAQAu0TU6j6K/nQAAAABJRU5ErkJggg==","orcid":"","institution":"Nanjing University","correspondingAuthor":true,"prefix":"","firstName":"Chen","middleName":"","lastName":"Ji","suffix":""},{"id":463079671,"identity":"e689ee99-0b42-4300-ba56-11f0546adfa9","order_by":7,"name":"Liang Cheng","email":"","orcid":"","institution":"Nanjing University","correspondingAuthor":false,"prefix":"","firstName":"Liang","middleName":"","lastName":"Cheng","suffix":""}],"badges":[],"createdAt":"2025-05-12 02:53:24","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-6642304/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-6642304/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":83646049,"identity":"148e106f-e186-4e9a-89a7-2e15f9bf3d71","added_by":"auto","created_at":"2025-05-30 05:29:44","extension":"jpg","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":74152,"visible":true,"origin":"","legend":"\u003cp\u003eOverall framework diagram.\u003c/p\u003e","description":"","filename":"Picture1.jpg","url":"https://assets-eu.researchsquare.com/files/rs-6642304/v1/8376eed2ed923ad6db23fc55.jpg"},{"id":83646048,"identity":"de077f42-9801-4696-8834-888246ea66f7","added_by":"auto","created_at":"2025-05-30 05:29:44","extension":"jpg","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":111793,"visible":true,"origin":"","legend":"\u003cp\u003eSample labeling schematic.\u003c/p\u003e","description":"","filename":"Picture2.jpg","url":"https://assets-eu.researchsquare.com/files/rs-6642304/v1/f80f8f8845576d4cdfe21941.jpg"},{"id":83646387,"identity":"ec858808-6a69-4546-b583-d1174d1b2313","added_by":"auto","created_at":"2025-05-30 05:37:45","extension":"jpg","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":24780,"visible":true,"origin":"","legend":"\u003cp\u003eUniform image resolution.\u003c/p\u003e","description":"","filename":"Picture3.jpg","url":"https://assets-eu.researchsquare.com/files/rs-6642304/v1/7c5ccd3a25cb654694adf418.jpg"},{"id":83646057,"identity":"1b477e91-a538-4380-beb6-842c3218fada","added_by":"auto","created_at":"2025-05-30 05:29:45","extension":"jpg","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":40889,"visible":true,"origin":"","legend":"\u003cp\u003eJCST module working diagram.\u003c/p\u003e","description":"","filename":"Picture4.jpg","url":"https://assets-eu.researchsquare.com/files/rs-6642304/v1/9c80148e76cd8aebd91f54bf.jpg"},{"id":83646389,"identity":"5e6c403e-29d4-4406-bde4-fab371b57136","added_by":"auto","created_at":"2025-05-30 05:37:45","extension":"jpg","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":85633,"visible":true,"origin":"","legend":"\u003cp\u003eComparison of image visualization before and after domain migration (a) represents the original image (b) represents the image after migration using CycleGAN (c) represents the image after migration using the proposed method.\u003c/p\u003e","description":"","filename":"Picture5.jpg","url":"https://assets-eu.researchsquare.com/files/rs-6642304/v1/ca736263ec8539492af2fddd.jpg"},{"id":83646056,"identity":"af358718-d0ac-45ca-aa25-769ffd679303","added_by":"auto","created_at":"2025-05-30 05:29:45","extension":"jpg","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":157119,"visible":true,"origin":"","legend":"\u003cp\u003eDetection results of different models before and after utilizing the proposed method.\u003c/p\u003e","description":"","filename":"Picture6.jpg","url":"https://assets-eu.researchsquare.com/files/rs-6642304/v1/a85d180b0d32c443dfa68eea.jpg"},{"id":84759049,"identity":"8cf87e1c-8ee0-4073-ae44-31bc05fa4f59","added_by":"auto","created_at":"2025-06-17 05:32:55","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1573494,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-6642304/v1/caf5d0cb-d6ab-49c5-8863-7bfe0b8641dd.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Unsupervised Domain Adaptation for Cross-domain Remote Sensing Object Detection Via Joint Input and Feature Space","fulltext":[{"header":"1. Introduction","content":"\u003cp\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eProgress in earth observation technology has led to greater accessibility of high-resolution remote sensing imagery, now obtainable more easily and at a lower cost [\u003cspan additionalcitationids=\"CR2\" citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e]. We have now entered the era of big data in remote sensing, when automated picture interpretation is crucial for realizing its full potential [\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e, \u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eIn recent years, there have been substantial breakthroughs in the application of deep learning approaches for the detection of objects using remote sensing [\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e, \u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e]. When suitable labeled image examples are available, it is possible to train object detection models based on deep learning in an integrated fashion, which has distinct advantages over conventional methods that rely on manually produced characteristics. Nonetheless, these deep learning methodologies are significantly reliant on substantial supervision (i.e., big, well labeled datasets) and necessitate analogous data distributions between training and testing sets. In contrast to natural photographs, remote sensing photos frequently display considerable diversity among several datasets due to changing imaging conditions, including time, location, and sensor specifications. Consequently, utilizing a model trained on source-domain images (i.e., the training set) directly on target-domain images (i.e., the test set) frequently results in a significant decline in performance [\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e]. Therefore, it is an immediate need to fix the disparity between the datasets for the source and target domains.\u003c/p\u003e \u003cp\u003eIn order to lessen the impact of domain bias, unsupervised domain adaptation models aim to bring the distributions of features from the source and target domains into agreement [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e, \u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e]. By projecting the characteristics of the two domains into a large-scale space and then reducing the difference between their distributions, approaches that use the maximum divergence of averages facilitate the transfer of characteristics [\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e]. On the other hand, adversarial learning-based methods use adversarial training to help the source and target domains' feature distributions become more comparable [\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e]. These algorithms encounter difficulties when used on high-resolution imagery from satellites, even if they have shown notable success in cross-domain object detection for natural images. Remote sensing images markedly differ from natural images owing to discrepancies in capture angles, temporal conditions, spatial resolutions, and display considerable intra- and inter-class variability. This variability sometimes causes the target-domain images generated by adversarial learning to lose important texture and feature information, resulting in differences from the source images. These differences create additional challenges for detecting tasks.\u003c/p\u003e \u003cp\u003ePrior research has primarily focused on cross-domain recognition of tiny targets, but large targets such as airports and runways in remote sensing imagery have received scant attention. As the target size enlarges, the detection frame encompasses additional background information. A significant domain disparity between the source and target leads to diminished detection accuracy for larger targets.\u003c/p\u003e \u003cp\u003eThis paper investigates the issues related to cross-domain object detection in remote sensing. To mitigate performance deterioration caused by uneven distributions across domains, we propose a joint domain adaptation strategy designed for cross-domain detection. To prevent the loss of detail and texture when transferring images from the source to the destination domain, we provide the JCST module for effective image information transfer. Moreover, extensively utilized remote sensing datasets, including DOTA[\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e], DIOR [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e], and LEVIR [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e], encompass varied data sources and comparable target scales. Nevertheless, datasets that provide consistent data sources and diverse target scales are uncommon. To rectify this deficiency, we created a novel dataset from uniform data sources while incorporating varied goal scales to enhance the assessment of our method's efficacy. A multitude of studies validates the efficacy of our methodology across various detection models.\u003c/p\u003e \u003cp\u003eThe primary contributions of this paper include:\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003col\u003e \u003cli\u003e \u003cp\u003e We present the first use case of the JCST module, which is for object detection in remote sensing using unsupervised domain adaptation. The JCST is incorporated into a generative adversarial network to facilitate picture transfer between domains, hence improving detection efficacy.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003e We present the DSOD dataset, which includes uniform data sources and multi-scale targets (large, medium, small) to evaluate the framework's resilience. The dataset is available at \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/ChiYuIII/DSOD\u003c/span\u003e\u003cspan address=\"https://github.com/ChiYuIII/DSOD\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/p\u003e \u003c/li\u003e \u003c/ol\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eThe subsequent sections of the paper are structured as follows: Section II addresses related work, Section III outlines the proposed method, Section IV provides experimental results and discussions, and Section V concludes the paper.\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e"},{"header":"2. Related work","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003e2.1. Unsupervised deep domain adaptation\u003c/h2\u003e \u003cp\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eThe goal of unsupervised domain adaptation is to find a solution to the distributional inconsistencies that exist between the data from the source domain and the data from the target domain. This is a problem that is common across a variety of datasets. The main aim is to efficiently leverage shared features across these domains and implement them in tasks within the source domain [\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e]. Prior approaches in this field extracted domain-invariant features using the Maximum Mean Discrepancy (MMD) technique. With the growing integration of deep learning in domain adaptation research, unsupervised deep domain adaptation has become a prominent methodology. The methods can be classified into four distinct categories.\u003c/p\u003e \u003cp\u003eOne significant method is domain adaptation using Generative Adversarial Networks (GANs), which aim to balance source and target domains through an adversarial formation that consists of a generator and a discriminator. Notable techniques, including CycleGAN [\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e], StarGAN [\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e], and GatedGAN [\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e], help the target domain's image styles to be transferred from the source domain.\u003c/p\u003e \u003cp\u003eMethods based on adversarial networks integrate adversarial strategies into the domain adaptation problem, enabling the trained network to differentiate between categories while also capturing domain-invariant features. Class-level adversarial networks, as introduced by [\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e], maintain semantic consistency locally. [\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e] Enhance adversarial learning by incorporating both strong and weak data enhancement techniques, as well as mutual learning within a teacher-student framework, thereby enabling the student model to acquire unique domain characteristics through self-learning.\u003c/p\u003e \u003cp\u003eThe reconstruction-based approach utilizes a self-encoder to reduce information loss in the reconstruction of input data, thereby preserving the essential features of the original data. [\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e] To improve the extraction and refinement of shared characteristics, a domain separation network has been implemented to isolate specific information in source and target domains. [\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e] suggested a collaborative learning technique to find shared characteristics between synthetic and real images in the target domain at the pixel, region, and image levels.\u003c/p\u003e \u003cp\u003eDomain distribution differences are used to bridge the gap between domains to reduce generalization error in the target domain. [\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e] implemented an optimal multiple kernel selection strategy for mean embedding alignment, which significantly minimized domain discrepancies and facilitated the effective adaptation of deep convolutional neural networks in domain transfer contexts. These approaches have notably advanced research in domain adaptation, enhancing the development of the field.\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec4\" class=\"Section2\"\u003e \u003ch2\u003e2.2. Cross Domain Object Detection\u003c/h2\u003e \u003cp\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eRecent advancements in unsupervised domain adaptation techniques for object detection tasks have been significant. There are three basic types of cross-domain object identification techniques that use unsupervised domain adaptation: disparity-based techniques, adversarial-based techniques, and reconstruction-based techniques. [\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e] to reduce the discrepancy between formation and test data at the example and image level, we developed two adaptation modules based on the H-divergence theory. [\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e] utilized the CycleGAN model to improve detection performance by transforming datasets from daytime to nighttime, employing the generated images for training the detection model. [\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e] suggested a technique for unsupervised domain adaptation that combines less strict global alignment with strong local feature alignment. In a similar vein, [\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e] implemented consistency regularization within the Mean Teacher framework to mitigate domain bias during the transfer from synthetic to real images. In the same vein, [\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e] developed an automatic annotation framework that enhances the robustness of pedestrian detection via a dual-stream region proposal network. [\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e] generated high-quality synthetic images from 3D CAD models to mitigate data scarcity. [\u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e] offered a way to generate images without human supervision with the goal of reducing domain inconsistencies in thermal and visual photography. [\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e] With ingenuity, the inter-domain reconnaissance problem has been solved by employing a customized dorsal network to extract local and global information, thereby enabling the recognition of small and closely clustered targets. While these methods have improved object detection in natural images, cross-domain remote sensing object detection continues to pose challenges due to the complex structure of remote sensing imagery.\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec5\" class=\"Section2\"\u003e \u003ch2\u003e2.3. Remote sensing cross domain object detection\u003c/h2\u003e \u003cp\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eSignificant progress has been made in related domains in recent years, but cross-domain remote sensing object identification is still understudied and its adaptability potential is yet untapped. [\u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e31\u003c/span\u003e] presented a deep learning migration-based change detection method that pre-trains and fine-tunes source and target domain models to address poor adaption. [\u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e32\u003c/span\u003e] suggested a progressive migration-based unsupervised SAR ship detecting system, this enables a precise detection in non-labelled SAR images by aligning the characteristics of the source and target fields at the pixel level, as well as the characteristics and predictions. [\u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e] introduced a semi-synthetic data generator for the production of remotely sensed data and developed domain-adapted detectors to address cross-domain recognition challenges in remote sensing images. Additionally, [\u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e34\u003c/span\u003e] suggested a multi-source change network that is domain-adaptive to address the shortcomings of adversarial domain adaptation methods, particularly the neglect of instance features. [\u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e35\u003c/span\u003e] network includes a data augmentation module SDA, a noise-reducing and feature-enhancing module SFR, and a PLG module that utilizes source domain knowledge to label unlabeled target domains, thereby mitigating domain bias in remote sensing object detection. These studies offer significant theoretical and methodological foundations for the advancement of cross-domain remote sensing object detection.\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e"},{"header":"3. Methodology","content":"\u003cp\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eThis study's overall framework is depicted in Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e. The first step is to perform a domain-supervised adaptation by employing the JCST_GAN model to convert the images via the source into the destination. After that, the related detection models are trained on a multi-scale target detection dataset. In the concluding phase, multiple detection models are utilized on the domain-adapted images to yield the final detection outcomes.\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e \u003cdiv id=\"Sec7\" class=\"Section2\"\u003e \u003ch2\u003e3.1. Multi-scale cross domain remote sensing object detection dataset\u003c/h2\u003e \u003cp\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eCurrent publicly available datasets exhibit significant variability in data sources, while target sizes tend to be consistent. We developed a remote sensing target detection dataset utilizing consistent data sources while varying target sizes to assess the precision and robustness of our method. The dataset comprises 14,640 Google tiles, each possessing a resolution of 1024 \u0026times; 1024 pixels. Although the image dimensions remain constant, the spatial resolutions vary, resulting in significant geographic and scene diversity. Each subset was balanced and representative of the target categories and distributions by partitioning the dataset into training, validation, and test sets in a 6:2:2 ratio.\u003c/p\u003e \u003cp\u003eThe dataset comprises four distinct target categories: storage tanks, airplanes, wind turbines, and airstrips. The targets were chosen based on significant size variations observed in remote sensing images. Storage tanks and airplanes are categorized as small targets, wind turbines as medium-sized targets, and airstrips as large-scale targets. This variety allows the dataset to more accurately represent complex real-world conditions, providing a thorough evaluation of the detection model's capabilities. Figure\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e displays sample images for each target type, while Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e outlines the quantity, spatial distribution, and associated Google tile levels of the dataset's targets. This configuration enables the assessment of model performance across various target sizes and environmental conditions.\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eData labeling details.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"8\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c8\" colnum=\"8\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLabel\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eTrain\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eValidate\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eTest\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eLabel box\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eTotal\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c7\"\u003e \u003cp\u003eGoogle level\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c8\"\u003e \u003cp\u003eScale\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eStorage\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e1415\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e470\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e472\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e36468\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e2357\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e18\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003eSmall\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAirplane\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e1707\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e569\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e569\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e33379\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e2845\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e17\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003eSmall\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eWind turbine\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e4449\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e1483\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e1483\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e29892\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e7415\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e17\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003eMedium\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRunway\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e1214\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e404\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e405\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e2806\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e2023\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e16\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003eLarge\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eTotal\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e8785\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e2926\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e2929\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e14640\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003ch2\u003e3.2. Unsupervised Domain Adaptation Based on Generative Adversity\u003c/h2\u003e \u003cp\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eThrough finding invariant semantic characteristics, unsupervised techniques may efficiently translate pictures from the source domain to the destination domain, hence reducing domain differences. Unsupervised domain adaptation techniques, mostly utilizing Generative Adversarial Networks (GANs), are often categorized into two types: those necessitating paired images and those that do not. The former utilizes strictly paired images for supervised learning, exemplified by conditional GANs, initially developed for image-to-image translation with paired data. This approach is ill-suited for cross-domain remote sensing object detection problems. Conversely, the latter does not depend on paired images, rendering it more suitable for generalization to cross-domain remote sensing object detection applications. CycleGAN minimizes dependence on image pairs by employing a cyclic consistency loss and adversarial treatment between the generator and discriminator to generate highly qualitative images that accurately reflect the target domain. Nonetheless, the varied acquisition conditions of remote sensing photos, including imaging modes, object scale, color saturation, and shooting angles, can lead to substantial domain-specific discrepancies. The interrelated challenges lead to significant stylistic variations across remote sensing photos from various domains. Consequently, the accuracy of target detection may be impacted by the loss of detail and texture information that may result from the application of style transfer.\u003c/p\u003e \u003cp\u003eWe propose an improved CycleGAN model that maintains detail and texture during style transfer to resolve this issue. Furthermore, we do quantitative comparisons between our enhanced CycleGAN and other prominent unsupervised style transfer techniques, with experimental findings illustrating the better efficacy of our method in preserving visual details and texture.\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec9\" class=\"Section2\"\u003e \u003ch2\u003e3.3. Unsupervised Domain Adaptation Based on Generative Adversity\u003c/h2\u003e \u003cdiv id=\"Sec10\" class=\"Section3\"\u003e \u003ch2\u003e3.3.1. Input space alignment\u003c/h2\u003e \u003cp\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eFixed input size in the model results in fluctuations in image resolution, which causes discrepancies in context and layout, hence complicating adversarial learning. To resolve this, we utilize uniform scaling and fixed-window cropping, as demonstrated in Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e, to synchronize the resolution between the source and target domains. We attain contextual consistency for low-resolution images in the target domain by scaling them to correspond with the resolution of the source domain. We then crop the scaled photos with a fixed-size window to match the dimensions of the source domain images, ensuring layout consistency [\u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e36\u003c/span\u003e, \u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e37\u003c/span\u003e]. When the target domain image possesses a greater resolution, employing fixed-window cropping on the source domain photos, in conjunction with scaling the target images, yields a comparable outcome.\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv id=\"Equ1\" class=\"Equation\"\u003e \u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ1\" name=\"EquationSource\"\u003e\n$$\\:\\left\\{\\begin{array}{c}\\text{\u0026phi;}\\left(\\text{\u0026delta;}\\left({\\text{x}}_{\\text{T}}\\text{,\\:r}\\right)\\text{,\\:S}\\right)\\text{}\\text{i}\\text{f}\\text{}{\\text{R}}_{\\text{T}}\\text{\u0026lt;}{\\text{R}}_{\\text{S}}\\\\\\:\\text{\u0026delta;}\\left({\\text{x}}_{\\text{T}}\\text{,\\:r}\\right)\\text{,}\\text{}\\text{\u0026phi;}\\left({\\text{x}}_{\\text{S}}\\text{,\\:}{\\text{S}}_{{\\text{x}}_{\\text{T}}\\text{\u0026middot;r}}\\right)\\text{}\\text{i}\\text{f}\\text{}{\\text{R}}_{\\text{T}}\\text{\u0026gt;}{\\text{R}}_{\\text{S}}\\end{array}\\right.$$\u003c/div\u003e \u003cdiv class=\"EquationNumber\"\u003e1\u003c/div\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003ewhere \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\text{\u0026delta;}\\)\u003c/span\u003e\u003c/span\u003e and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\text{\u0026phi;}\\)\u003c/span\u003e\u003c/span\u003e denote uniform scaling and fixed-window cropping, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{\\text{x}}_{\\text{T}}\\)\u003c/span\u003e\u003c/span\u003e and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{\\text{x}}_{\\text{S}}\\)\u003c/span\u003e\u003c/span\u003e denote the target and source images, respectively, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{\\text{R}}_{\\text{S}}\\)\u003c/span\u003e\u003c/span\u003e and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{\\text{R}}_{\\text{T}}\\)\u003c/span\u003e\u003c/span\u003e are the resolutions of the source and target images, respectively, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\text{S}\\)\u003c/span\u003e\u003c/span\u003e is the input size, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\text{r}\\)\u003c/span\u003e\u003c/span\u003e is the resizing factor, i.e., the ratio between \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{\\text{R}}_{\\text{T}}\\)\u003c/span\u003e\u003c/span\u003e and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{\\text{R}}_{\\text{S}}\\)\u003c/span\u003e\u003c/span\u003e ratio, and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{\\text{S}}_{{\\text{x}}_{\\text{T}}\\text{\u0026middot;r}}\\)\u003c/span\u003e\u003c/span\u003e is the size of \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{\\text{x}}_{\\text{T}}\\)\u003c/span\u003e\u003c/span\u003e after resizing.\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec11\" class=\"Section3\"\u003e \u003ch2\u003e3.3.2. JCST Module\u003c/h2\u003e \u003cp\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eRemote sensing object detection typically utilizes high-resolution images that encompass abundant detailed information; however, CycleGAN is primarily engineered for style transfer of low- and medium-resolution images. The current network's depth and complexity are insufficient to fully capture all detailed information in high-resolution images. The generator is incapable of accurately capturing and reproducing intricate textures and edge details while producing high-resolution photos. This results in the loss of much detail and texture information in the moved image. To address the aforementioned issues and guarantee the complete preservation of detail and texture information from the source image during style migration with CycleGAN, we developed a Joint Color Space Transformation (JCST) module, as illustrated in Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e.\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eThe CycleGAN network uses style migration to convert the source domain image into the target domain image. Following the accomplishment of getting the migrated target domain picture, a color space conversion is carried out in order to change the image from the RGB color space to the YCbCr color space, which delineates the luminance information (Y channel) from the chrominance information (Cb and Cr channels). The Y, Cb, and Cr channels of the transferred target domain image are subsequently isolated. Luminance data (Y channel) and chrominance data (Cb and Cr channels) are extracted, and the Y channel from the original source domain image is merged with the Cb and Cr channels from the migrated target domain image. This method integrates luminance data from the source domain image to preserve the structure and details of the original image, while imparting the target domain image with the chromatic style derived from its chromaticity information. The resultant image preserves the details and structure of the source domain while integrating the stylistic attributes of the target domain, so producing a target domain image characterized by visual coherence and superior quality.\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv id=\"Sec12\" class=\"Section2\"\u003e \u003ch2\u003e3.4. Experimental design\u003c/h2\u003e \u003cdiv id=\"Sec13\" class=\"Section3\"\u003e \u003ch2\u003e3.4.1. Task setup\u003c/h2\u003e \u003cp\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eIn this research, we conducted remote sensing cross domain object detection ex-periments using the constructed multi-scale remote sensing object detection dataset (DSOD). Specifically, we take the DSOD dataset as the target domain data and select the Gaofen-2 image of a region as the source domain data. The Gaofen-2 image under-goes preprocessing to provide RGB three-channel data, which is subsequently cropped to match the dimensions of the DSOD dataset. We conducted three experimental sets for object detection employing a multi-model detection strategy:\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003col\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003e1. Remote sensing object detection directly on images in the source domain;\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003e2. Remote sensing object detection on images post-style migration via CycleGAN;\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003e3. Using our suggested technique for distant sensing object detection on images after style migration.PyTorch is used for each and every experiment to be carried out in this investigation. The model training is expedited utilizing the NVIDIA RTX 4080, which possesses 24GB of graphics RAM.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003c/ol\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec14\" class=\"Section3\"\u003e \u003ch2\u003e3.4.2. Evaluation of the indexes\u003c/h2\u003e \u003cp\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eWe propose to use Peak Signal to Noise Ratio (PSNR) and Structural Similarity Index (SSIM) to evaluate the quality of the images acquired using the proposed method.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Equ2\" class=\"Equation\"\u003e \u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ2\" name=\"EquationSource\"\u003e\n$$\\:\\text{PSNR=10∙}{\\text{log}}_{\\text{10}}\\text{(}\\frac{{\\text{MAX}}^{\\text{2}}}{\\text{MSE}}\\text{)}$$\u003c/div\u003e \u003cdiv class=\"EquationNumber\"\u003e2\u003c/div\u003e\u003c/div\u003e \u003cdiv id=\"Equ3\" class=\"Equation\"\u003e \u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ3\" name=\"EquationSource\"\u003e\n$$\\:\\text{SSIM(x,\\:y)=}\\frac{\\text{(2}{\\text{\u0026mu;}}_{\\text{x}}{\\text{\u0026mu;}}_{\\text{y}}\\text{+}{\\text{C}}_{\\text{1}}\\text{)(2}{\\text{\u0026sigma;}}_{\\text{xy}}\\text{+}{\\text{C}}_{\\text{2}}\\text{)}}{\\text{(}{\\text{\u0026mu;}}_{\\text{x}}^{\\text{2}}\\text{+}{\\text{\u0026mu;}}_{\\text{y}}^{\\text{2}}\\text{+}{\\text{C}}_{\\text{1}}\\text{)(}{\\text{\u0026sigma;}}_{\\text{x}}^{\\text{2}}\\text{+}{\\text{\u0026sigma;}}_{\\text{y}}^{\\text{2}}\\text{+}{\\text{C}}_{\\text{2}}\\text{)}}$$\u003c/div\u003e \u003cdiv class=\"EquationNumber\"\u003e3\u003c/div\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eWhere \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\text{MAX}\\)\u003c/span\u003e\u003c/span\u003e is the maximum possible pixel value of the image, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\text{MSE}\\)\u003c/span\u003e\u003c/span\u003e is the mean square error, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{\\text{\u0026mu;}}_{\\text{x}}\\)\u003c/span\u003e\u003c/span\u003e and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{\\text{\u0026mu;}}_{\\text{y}}\\)\u003c/span\u003e\u003c/span\u003e are the mean values of the image \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\text{x}\\)\u003c/span\u003e\u003c/span\u003e and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\text{y}\\)\u003c/span\u003e\u003c/span\u003e, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{\\text{\u0026sigma;}}_{\\text{x}}^{\\text{2}}\\)\u003c/span\u003e\u003c/span\u003e and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{\\text{\u0026sigma;}}_{\\text{y}}^{\\text{2}}\\)\u003c/span\u003e\u003c/span\u003e are the variances of the image \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\text{x}\\)\u003c/span\u003e\u003c/span\u003e and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\text{y}\\)\u003c/span\u003e\u003c/span\u003e, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{\\text{\u0026sigma;}}_{\\text{xy}}\\)\u003c/span\u003e\u003c/span\u003e is the covariance of the image \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\text{x}\\)\u003c/span\u003e\u003c/span\u003e and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\text{y}\\)\u003c/span\u003e\u003c/span\u003e, and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{\\text{C}}_{\\text{1}}\\)\u003c/span\u003e\u003c/span\u003e and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{\\text{C}}_{\\text{2}}\\)\u003c/span\u003e\u003c/span\u003e are constants used for stabilization calculation.\u003c/p\u003e \u003cp\u003eWe assessed the detection model's performance in our tests using five traditional evaluation metrics obtained from the confusion matrix: accuracy, recall, F1 score, and mAP value. Accuracy assesses the model's capacity to forecast positive instances, recall evaluates the model's proficiency in properly identifying positive instances, and the F1 score is the harmonic mean of accuracy and recall, offering a balanced evaluation of the model's performance. mAP, conversely, indicates the comprehensive performance of the model across all areas.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Equ4\" class=\"Equation\"\u003e \u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ4\" name=\"EquationSource\"\u003e\n$$\\:\\text{Precision=}\\frac{\\text{TP}}{\\text{TP+FP}}$$\u003c/div\u003e \u003cdiv class=\"EquationNumber\"\u003e4\u003c/div\u003e\u003c/div\u003e \u003cdiv id=\"Equ5\" class=\"Equation\"\u003e \u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ5\" name=\"EquationSource\"\u003e\n$$\\:\\text{Precision=}\\frac{\\text{TP}}{\\text{TP+FP}}$$\u003c/div\u003e \u003cdiv class=\"EquationNumber\"\u003e5\u003c/div\u003e\u003c/div\u003e \u003cdiv id=\"Equ6\" class=\"Equation\"\u003e \u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ6\" name=\"EquationSource\"\u003e\n$$\\:{\\text{F}}_{\\text{1}}\\text{=}\\frac{\\text{2\u0026times;Precision\u0026times;Recall}}{\\text{Precision+Recall}}$$\u003c/div\u003e \u003cdiv class=\"EquationNumber\"\u003e6\u003c/div\u003e\u003c/div\u003e \u003cdiv id=\"Equ7\" class=\"Equation\"\u003e \u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ7\" name=\"EquationSource\"\u003e\n$$\\:\\text{mAP}\\text{=}\\frac{\\text{1}}{\\text{n}}\\sum\\:_{\\text{i}\\text{=1}}^{\\text{n}}{\\text{AP}}_{\\text{i}}$$\u003c/div\u003e \u003cdiv class=\"EquationNumber\"\u003e7\u003c/div\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003e \u003cspan class=\"InlineEquation\"\u003e \u003cspan class=\"mathinline\"\u003e\\(\\:\\text{TP}\\)\u003c/span\u003e \u003c/span\u003e, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\text{FP}\\)\u003c/span\u003e\u003c/span\u003e, and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\text{FN}\\)\u003c/span\u003e\u003c/span\u003e are the number of true cases, false positive cases, and false negative cases respectively, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\text{n}\\)\u003c/span\u003e\u003c/span\u003e denotes the number of categories, and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{\\text{AP}}_{\\text{i}}\\)\u003c/span\u003e\u003c/span\u003edenotes the \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\text{AP}\\)\u003c/span\u003e\u003c/span\u003e value of the ith category.\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec15\" class=\"Section3\"\u003e \u003ch2\u003e3.4.3. Multi-modal detection\u003c/h2\u003e \u003cp\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eA multi-model detection strategy is proposed, aiming to synthesize high-quality images and multiple types of object detection models generated by the improved CycleGAN through an integrated learning approach, for the purpose of verifying the proposed method's accuracy and robustness in the detection system.\u003c/p\u003e \u003cp\u003eFirst, we normalize and resize the input images to ensure that they can adapt to the input requirements of each target detection model and enhance the model's flexibility in adjusting to various settings and illumination. Subsequently, the optimized CycleGAN network is utilized for image style migration to generate higher quality and more detailed target images. The JCST module, which is incorporated into the upgraded CycleGAN, efficiently maintains the target objects' features and edge information while also greatly improving the output images' clarity and realism.\u003c/p\u003e \u003cp\u003eNext, the generated high-quality images are input into the following four different types of object detection models for detection, respectively. Single-stage detection model: directly predicts the locations and classes of all targets in the image through a single forward propagation[\u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e38\u003c/span\u003e]. Models based on deformed convolutional and Transformer architectures: combining deformed convolutional networks and Transformer mechanisms, they can effectively capture shape changes and contextual information of targets, and are suitable for target detection and pose estimation in complex scenes[\u003cspan citationid=\"CR39\" class=\"CitationRef\"\u003e39\u003c/span\u003e]. A two-stage model based on region proposition network (RPN): creates potential target regions first, after which these regions are subjected to categorization and bounding box regression.[\u003cspan citationid=\"CR40\" class=\"CitationRef\"\u003e40\u003c/span\u003e]. Fully convolutional one-stage model: the location and class of targets are directly predicted by a fully convolutional network on feature maps at different scales[\u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e41\u003c/span\u003e]. These techniques are employed to confirm that the enhanced CycleGAN's high-quality pictures may be applied to multi-model detection tasks. This strategy not only brings substantial performance improvement for target detection tasks in complex scenes, but also demonstrates the potential and practical application value of the improved CycleGAN in remote sensing image processing applications.\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003c/div\u003e"},{"header":"4. Experimental results and Discussion","content":"\u003cdiv id=\"Sec17\" class=\"Section2\"\u003e \u003ch2\u003e4.1. Unsupervised domain adaptation results\u003c/h2\u003e \u003cp\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eThe aim of this research is to improve target detection through style migration with the advanced CycleGAN. The source domain comprises the preprocessed Gaofen-2 images, whereas the target domain consists of the image from the DOSD dataset. Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e presents the quantitative experimental data, whereas Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003e illustrates the visualization outcomes. The CycleGAN generator employs a U-Net-like architecture, which enhances its ability to preserve global information; however, it exhibits shortcomings in processing high-frequency image details, particularly during the downsampling and upsampling phases, leading to a loss of detail and texture information. This negatively impacts the future target detection. To reduce the loss of texture and detail in produced images, we integrate the JCST module into the CycleGAN network, aiming to modify just the image style while retaining enhanced detail information.\u003c/p\u003e \u003cp\u003eExperiments show that the upgraded CycleGAN outperforms the original in structural similarity (SSIM) and peak signal-to-noise ratio (PSNR), while the JCST module improves images. In the visualization findings, our method excels in retaining high-frequency information, and the generated images closely resemble the original images, offering more dependable inputs for the target recognition model. While our enhanced method alleviates the issue of high-frequency information loss to a degree, it remains incapable of completely restoring all details in very intricate texture areas. Furthermore, despite attaining notable enhancements in SSIM and PSNR, inconsistencies may persist between the generated image and the target domain image in other quantitative measures (e.g., color and spatial consistency). This may influence the efficacy of target detection across various application contexts; hence, subsequent study should investigate in greater depth how to regulate the extent of style migration to maintain semantic coherence within the produced image and the target domain image.\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eComparison of image of quality generated using CycleGAN and JCST_GAN.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"3\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eCycleGAN\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eJCST_GAN\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSSIM\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.85\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.99\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ePSNR\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e23.99\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e30.82\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eIn addition, the introduction of the JCST module increases the complexity and training time of the model, especially when processing high-resolution images, and the demand for computational resources increases significantly, which may limit its application in large-scale datasets or real-time applications. Future exploration of model trimming or parameter optimization may further decrease computing expenses. And different detection tasks and datasets may on the effect of generated images, so validating the generalization of the method in a wider range of future research will continue to focus on application scenarios. CycleGAN is chosen as the base model for style migration mainly based on its wide application and excellent performance in image translation tasks. Compared with other mainstream models, CycleGAN achieves high-quality image style migration without the need for pairs of training data, which makes it outstanding in diverse datasets and unsupervised environments. From the results in Table\u0026nbsp;\u003cspan refid=\"Tab3\" class=\"InternalRef\"\u003e3\u003c/span\u003e, compared to other popular models, CycleGAN produces images of noticeably higher quality., which further validates its advantages in the style migration task. In addition, the architecture of CycleGAN is highly flexible and easy to combine with other models. For example, the JCST module introduced in this study is an example, which realizes a significant improvement in the detail fidelity of the generated images by seamlessly combining with CycleGAN.\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab3\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 3\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eComparison of image quality generated by different models.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"4\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eDualGAN\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eDiscoGAN\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eCycleGAN\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSSIM\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.74\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.69\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e0.82\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ePSNR\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e17.63\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e14.83\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e21.36\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec18\" class=\"Section2\"\u003e \u003ch2\u003e4.2. Multi-model object detection results\u003c/h2\u003e \u003cp\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eThe performance of many object identification models is compared and analyzed in this part, with an emphasis on the outcomes before and after the suggested approach was used. Figure\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003e presents the results, while Table\u0026nbsp;\u003cspan refid=\"Tab4\" class=\"InternalRef\"\u003e4\u003c/span\u003e illustrates the detection outcomes for targets including wind turbines, storage tanks, airplanes, and airport runways. The comparison indicates that the style-migrated images outperform the original images in the object detection task, with an improvement in the mAP of 2%-4%. The findings show that the suggested CycleGAN augmentation technique effectively migrates picture styles while simultaneously improving the object identification model's performance, particularly in situations involving many targets, showing significant advantages.\u003c/p\u003e \u003cp\u003eIn terms of specific model performance, YOLOv8 and FCOS show a more comprehensive improvement in all object detection tasks, with F1 scores improved by 2%-6%. This suggests that style migration enhances the detection capability of these models, especially in complex multi-target scenarios.The RE-DETR model also exhibits F1 score enhancement in the detection of wind turbines, airplanes, and airport runways, but the F1 scores are unchanged for the detection of storage tank targets due to the fact that while the accuracy improves, the decrease in the recall rate offsets this advantage. On the contrary, the F1 score of Faster R-CNN decreases in the detection of wind motors, mainly due to the decrease in recall after style migration, and the model fails to recognize some of the wind motor targets, but its accuracy improves, indicating that the style-migrated photos more closely resemble the model training set data distribution.\u003c/p\u003e \u003cp\u003eDespite its strong overall performance, the suggested approach has several drawbacks. First, the style migration is not effective enough to detect certain targets (e.g., storage tanks), and there is also a decrease in recall on some models (e.g., Faster R-CNN), suggesting that different models are differently adapted to style-migrated images. Future research can perform personalized optimization for different detection models to enhance the applicability of style migration. Second, in extremely complex scenes, style migration may still not be able to sufficiently improve the contrast between target and background, It implies that in order to further boost the detection performance in intricate scenarios, we can include further picture enhancing techniques or intricate multi-scale feature extraction procedures.\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab4\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 4\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eProposed methodology and baseline methodology object detection results.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"14\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c8\" colnum=\"8\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c9\" colnum=\"9\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c10\" colnum=\"10\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c11\" colnum=\"11\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c12\" colnum=\"12\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c13\" colnum=\"13\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c14\" colnum=\"14\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/th\u003e \u003cth align=\"left\" colspan=\"3\" nameend=\"c4\" namest=\"c2\"\u003e \u003cp\u003eWind turbine\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"3\" nameend=\"c7\" namest=\"c5\"\u003e \u003cp\u003eStorage\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"3\" nameend=\"c10\" namest=\"c8\"\u003e \u003cp\u003eAirplane\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"3\" nameend=\"c13\" namest=\"c11\"\u003e \u003cp\u003eRunway\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c14\"\u003e\u0026nbsp;\u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eR\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eP\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eF1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eR\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003eP\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eF1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003eR\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003eP\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003eF1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e \u003cp\u003eR\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c12\"\u003e \u003cp\u003eP\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c13\"\u003e \u003cp\u003eF1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c14\"\u003e \u003cp\u003emAP\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eYOLO V8\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.86\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.88\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.87\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.57\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e1.00\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.72\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e0.60\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e1.00\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e0.75\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e \u003cp\u003e0.46\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c12\"\u003e \u003cp\u003e0.81\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c13\"\u003e \u003cp\u003e0.59\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c14\"\u003e \u003cp\u003e0.92\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eJCST_YOLO V8\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.81\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.97\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e0.89\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.62\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e1.00\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e0.77\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e0.67\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e1.00\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e\u003cb\u003e0.81\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e \u003cp\u003e0.54\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c12\"\u003e \u003cp\u003e0.83\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c13\"\u003e \u003cp\u003e\u003cb\u003e0.65\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c14\"\u003e \u003cp\u003e\u003cb\u003e0.95\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRE-DETR\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.81\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.95\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.88\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.63\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.99\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.77\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e0.36\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e0.92\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e0.52\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e \u003cp\u003e0.86\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c12\"\u003e \u003cp\u003e0.77\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c13\"\u003e \u003cp\u003e0.81\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c14\"\u003e \u003cp\u003e0.91\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eJCST_RE-DETR\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.86\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1.00\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e0.93\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.62\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e1.00\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.77\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e0.44\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e0.95\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e\u003cb\u003e0.60\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e \u003cp\u003e0.86\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c12\"\u003e \u003cp\u003e0.80\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c13\"\u003e \u003cp\u003e\u003cb\u003e0.83\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c14\"\u003e \u003cp\u003e\u003cb\u003e0.94\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eFaster R-CNN\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.86\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.97\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.91\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.94\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.42\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.58\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e0.95\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e0.96\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e0.95\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e \u003cp\u003e0.64\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c12\"\u003e \u003cp\u003e0.90\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c13\"\u003e \u003cp\u003e0.75\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c14\"\u003e \u003cp\u003e0.81\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eJCST_Faster R-CNN\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.72\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1.00\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.84\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.96\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.42\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e0.59\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e0.95\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e0.98\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e\u003cb\u003e0.96\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e \u003cp\u003e0.64\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c12\"\u003e \u003cp\u003e1.00\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c13\"\u003e \u003cp\u003e\u003cb\u003e0.78\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c14\"\u003e \u003cp\u003e\u003cb\u003e0.85\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eFCOS\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.84\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.88\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.86\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.90\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.93\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.91\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e0.94\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e0.96\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e0.95\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e \u003cp\u003e0.57\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c12\"\u003e \u003cp\u003e0.70\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c13\"\u003e \u003cp\u003e0.63\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c14\"\u003e \u003cp\u003e0.87\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eJCST_FCOS\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.84\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.90\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e0.87\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.92\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.95\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e0.94\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e0.95\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e0.99\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e\u003cb\u003e0.97\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e \u003cp\u003e0.57\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c12\"\u003e \u003cp\u003e0.73\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c13\"\u003e \u003cp\u003e\u003cb\u003e0.64\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c14\"\u003e \u003cp\u003e\u003cb\u003e0.89\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec19\" class=\"Section2\"\u003e \u003ch2\u003e4.3. Ablation experiment\u003c/h2\u003e \u003cp\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eWe created and carried out a number of comparison tests to evaluate the efficacy of the suggested approach. First, the images designated for detection were input directly into the target detection model without any style migration. Second, the images underwent style migration using the traditional CycleGAN, and the resulting images were then fed into the object detection model for recognition. Finally, we employed the improved CycleGAN, which includes the JCST module, to perform style migration on the images prior to target detection. Table\u0026nbsp;\u003cspan refid=\"Tab5\" class=\"InternalRef\"\u003e5\u003c/span\u003e provides a summary of these trials' findings.\u003c/p\u003e \u003cp\u003eThe data presented in the table indicates that the F1 scores and mAP values for object detection using images generated by the traditional CycleGAN are lower than those obtained from the original images. This can primarily be attributed to the significant loss of detail and texture information in the CycleGAN-generated images compared to the originals, which hampers the detection model\u0026rsquo;s ability to effectively extract key features of the targets, thereby affecting class discrimination and leading to decreased detection performance. However, the implementation of our proposed improved CycleGAN, enhanced by the JCST module, leads to a noticeable enhancement in the transferred images' object detection ability. Notably, for the Faster R-CNN model, the mAP value improved by 4%., with similar improvements observed across all other models. This suggests that the JCST module effectively mitigates the loss of image detail and texture, enhancing the applicability of the generated images to the target detection models and consequently improving detection accuracy.\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab5\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 5\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eQuantitative evaluation of ablation experiments.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"6\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eWind turbine\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eStorage\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eAirplane\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eRunway\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e\u0026nbsp;\u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eF\u003csub\u003e1\u003c/sub\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eF\u003csub\u003e1\u003c/sub\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eF\u003csub\u003e1\u003c/sub\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eF\u003csub\u003e1\u003c/sub\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003emAP\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eYOLO V8\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.87\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.72\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.75\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.59\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.92\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCycleGAN\u0026thinsp;+\u0026thinsp;YOLO V8\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.72\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.56\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.61\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.45\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.81\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eJCST_YOLO V8\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.89\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.77\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.81\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.65\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.95\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRE-DETR\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.88\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.77\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.52\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.81\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.91\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCycleGAN\u0026thinsp;+\u0026thinsp;RE-DETR\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.74\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.68\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.46\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.69\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.79\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eJCST_RE-DETR\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.93\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.77\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.60\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.83\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.94\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eFaster R-CNN\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.91\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.58\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.95\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.75\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.81\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCycleGAN\u0026thinsp;+\u0026thinsp;Faster R-CNN\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.81\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.47\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.84\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.69\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.71\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eJCST_Faster R-CNN\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.84\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.59\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.96\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.78\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.85\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eFCOS\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.86\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.91\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.95\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.63\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.87\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCycleGAN\u0026thinsp;+\u0026thinsp;FCOS\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.79\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.83\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.89\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.55\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.82\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eJCST_FCOS\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.87\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.94\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.97\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.64\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.89\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e"},{"header":"5. Conclusions","content":"\u003cp\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eDue to the rapid advancements in artificial intelligence and object detection technology, many object detection tasks now employ deep learning techniques. However, remote sensing images are sourced from various origins, and even images taken by the same sensor may exhibit inconsistent data distributions due to differences in acquisition times and imaging angles. This disparity between the training dataset and the detection model's accuracy is greatly impacted by the images that need to be detected. We propose a method to harmonize the data distributions between the model's training set and the target images in order to resolve this issue. This technique is then applied to real-world detection tasks.\u003c/p\u003e \u003cp\u003eIn particular, we incorporate a JCST module into the CycleGAN framework to effectively preserve the details and texture of the images during the style transfer process, thereby improving the quality of the resultant images. The experimental results suggest that our method can enhance the accuracy of target detection by 2\u0026ndash;4%.\u003c/p\u003e \u003cp\u003eMoreover, to evaluate the effectiveness of target detection across various scales, we have developed a multi-scale target detection dataset that considers differences in target sizes. This dataset encompasses targets of three scales: large, medium, and small, with all images originating from the same source. The final experimental results indicate that our proposed method not only substantially improves the performance of various detection models but also improves the accuracy of detection for targets of varying sizes. The integration of the domain adaptation model with the target detection model will be the primary focus of future research in order to enhance the model's efficacy and achieve a higher level of automation.\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eData availability\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eAll data generated or analyzed during this study are included in this published article.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAuthor Contributions:\u0026nbsp;\u003c/strong\u003eConceptualization, D.C. and T.C.; methodology, D.C., T.C. and S.H.; validation, W.G., Z.L. and L.Y.; data curation, S.H.; writing—original draft preparation, D.C. and T.C.; writing—review and editing, S.H.; project administration, L.C.; funding acquisition, C.J. All authors have read and agreed to the published version of the manuscript.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFunding:\u003c/strong\u003e This research was funded by the National Natural Science Foundation of China, grant number No. 42401548.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCompeting interests\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors declare no competing interests.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConsent to participate\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eAll authors voluntarily agree to participate in this research study.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConsent to publish\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eAll authors voluntarily approved the publication of this research study.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eChen, Z. et al. Joint alignment of the distribution in input and feature space for cross-domain aerial image semantic segmentation. \u003cem\u003eInternational J. Appl. Earth Observation Geoinformation\u003c/em\u003e \u003cb\u003e115\u003c/b\u003e (2022).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhang, L., Zhang, L. \u0026amp; Du, B. Deep learning for remote sensing data: A technical tutorial on the state of the art. \u003cem\u003eIEEE Geoscience remote Sens. magazine\u003c/em\u003e. \u003cb\u003e4\u003c/b\u003e, 22\u0026ndash;40 (2016).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChi, M. et al. Big data for remote sensing: Challenges and opportunities. \u003cem\u003eProceedings of the IEEE\u003c/em\u003e 104, 2207\u0026ndash;2219 (2016).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSun, X. et al. From single-to multi-modal remote sensing imagery interpretation: A survey and taxonomy. \u003cem\u003eSci. China Inform. Sci.\u003c/em\u003e \u003cb\u003e66\u003c/b\u003e, 140301 (2023).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLi, Y. et al. Learning deep semantic segmentation network under multiple weakly-supervised constraints for cross-domain remote sensing image semantic segmentation. \u003cem\u003eISPRS J. Photogrammetry Remote Sens.\u003c/em\u003e \u003cb\u003e175\u003c/b\u003e, 20\u0026ndash;33 (2021).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhang, X. et al. Remote sensing object detection meets deep learning: A metareview of challenges and advances. \u003cem\u003eIEEE Geoscience Remote Sens. Magazine\u003c/em\u003e (2023).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLi, K., Wan, G., Cheng, G., Meng, L. \u0026amp; Han, J. Object detection in optical remote sensing images: A survey and a new benchmark. \u003cem\u003eISPRS J. photogrammetry remote Sens.\u003c/em\u003e \u003cb\u003e159\u003c/b\u003e, 296\u0026ndash;307 (2020).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eOza, P., Sindagi, V. A., Sharmini, V. V. \u0026amp; Patel, V. M. Unsupervised domain adaptation of object detectors: A survey. \u003cem\u003eIEEE Trans. Pattern Anal. Mach. Intelligence\u003c/em\u003e (2023).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eShi, Y., Du, L., Li, C., Guo, Y. \u0026amp; Du, Y. Unsupervised domain adaptation for SAR target classification based on domain-and class-level alignment: From simulated to real data. \u003cem\u003eISPRS J. Photogrammetry Remote Sens.\u003c/em\u003e \u003cb\u003e207\u003c/b\u003e, 1\u0026ndash;13 (2024).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWilson, G. \u0026amp; Cook, D. J. A survey of unsupervised deep domain adaptation. \u003cem\u003eACM Trans. Intell. Syst. Technol. (TIST)\u003c/em\u003e. \u003cb\u003e11\u003c/b\u003e, 1\u0026ndash;46 (2020).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYan, H. et al. Weighted and class-specific maximum mean discrepancy for unsupervised domain adaptation. \u003cem\u003eIEEE Trans. Multimedia\u003c/em\u003e. \u003cb\u003e22\u003c/b\u003e, 2420\u0026ndash;2433 (2019).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBenjdira, B., Bazi, Y., Koubaa, A. \u0026amp; Ouni, K. Unsupervised domain adaptation using generative adversarial networks for semantic segmentation of aerial images. \u003cem\u003eRemote Sens.\u003c/em\u003e \u003cb\u003e11\u003c/b\u003e, 1369 (2019).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eXia, G. S. et al. in \u003cem\u003eProceedings of the IEEE conference on computer vision and pattern recognition.\u003c/em\u003e 3974\u0026ndash;3983.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZou, Z. \u0026amp; Shi, Z. Random access memories: A new paradigm for target detection in high resolution aerial remote sensing images. \u003cem\u003eIEEE Trans. Image Process.\u003c/em\u003e \u003cb\u003e27\u003c/b\u003e, 1100\u0026ndash;1111 (2017).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChu, C., Zhmoginov, A. \u0026amp; Sandler, M. Cyclegan, a master of steganography. \u003cem\u003earXiv preprint arXiv:1712.02950\u003c/em\u003e (2017).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChoi, Y. et al. in \u003cem\u003eProceedings of the IEEE conference on computer vision and pattern recognition.\u003c/em\u003e 8789\u0026ndash;8797.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChen, X., Xu, C., Yang, X., Song, L. \u0026amp; Tao, D. Gated-gan: Adversarial gated networks for multi-collection style transfer. \u003cem\u003eIEEE Trans. Image Process.\u003c/em\u003e \u003cb\u003e28\u003c/b\u003e, 546\u0026ndash;560 (2018).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLuo, Y., Zheng, L., Guan, T., Yu, J. \u0026amp; Yang, Y. in \u003cem\u003eProceedings of the IEEE/CVF conference on computer vision and pattern recognition.\u003c/em\u003e 2507\u0026ndash;2516.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLi, Y. J. et al. Cross-domain object detection via adaptive self-training. \u003cem\u003eCoRR\u003c/em\u003e (2021).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTsai, J. C. \u0026amp; Chien, J. T. in \u003cem\u003e2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP).\u003c/em\u003e 1\u0026ndash;6 (IEEE).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSun, R. et al. in \u003cem\u003eProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.\u003c/em\u003e 4360\u0026ndash;4369.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLong, M., Cao, Y., Wang, J. \u0026amp; Jordan, M. in \u003cem\u003eInternational conference on machine learning.\u003c/em\u003e 97\u0026ndash;105 (PMLR).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChen, Y., Li, W., Sakaridis, C., Dai, D. \u0026amp; Van Gool, L. in \u003cem\u003eProceedings of the IEEE conference on computer vision and pattern recognition.\u003c/em\u003e 3339\u0026ndash;3348.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eArruda, V. F. et al. in \u003cem\u003e2019 International Joint Conference on Neural Networks (IJCNN).\u003c/em\u003e 1\u0026ndash;8 (IEEE).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSaito, K., Ushiku, Y., Harada, T. \u0026amp; Saenko, K. in \u003cem\u003eProceedings of the IEEE/CVF conference on computer vision and pattern recognition.\u003c/em\u003e 6956\u0026ndash;6965.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCai, Q. et al. in \u003cem\u003eProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.\u003c/em\u003e 11457\u0026ndash;11466.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCao, Y. et al. Pedestrian detection with unsupervised multispectral feature learning using deep neural networks. \u003cem\u003eInform. fusion\u003c/em\u003e. \u003cb\u003e46\u003c/b\u003e, 206\u0026ndash;217 (2019).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLiu, W., Luo, B. \u0026amp; Liu, J. Synthetic data augmentation using multiscale attention CycleGAN for aircraft detection in remote sensing images. \u003cem\u003eIEEE Geosci. Remote Sens. Lett.\u003c/em\u003e \u003cb\u003e19\u003c/b\u003e, 1\u0026ndash;5 (2021).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLiu, P., Li, F., Yuan, S. \u0026amp; Li, W. Unsupervised Image-Generation Enhanced Adaptation for Object Detection in Thermal Images. \u003cem\u003eMobile information systems\u003c/em\u003e 1837894 (2021). (2021).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBiswas, D. \u0026amp; Tešić, J. Domain adaptation with contrastive learning for object detection in satellite imagery. \u003cem\u003eIEEE Trans. Geoscience Remote Sensing\u003c/em\u003e (2024).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYang, M., Jiao, L., Liu, F., Hou, B. \u0026amp; Yang, S. Transferred deep learning-based change detection in remote sensing images. \u003cem\u003eIEEE Trans. Geosci. Remote Sens.\u003c/em\u003e \u003cb\u003e57\u003c/b\u003e, 6960\u0026ndash;6973 (2019).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eShi, Y., Du, L., Guo, Y. \u0026amp; Du, Y. Unsupervised domain adaptation based on progressive transfer for ship detection: From optical to SAR images. \u003cem\u003eIEEE Trans. Geosci. Remote Sens.\u003c/em\u003e \u003cb\u003e60\u003c/b\u003e, 1\u0026ndash;17 (2022).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eXu, T. et al. Feature aligned domain adaptive object detection in remote sensing imagery. \u003cem\u003eIEEE Trans. Geoscience Remote Sens. 60\u003c/em\u003e. \u003cb\u003eFADA\u003c/b\u003e, 1\u0026ndash;16 (2022).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhang, C. et al. A domain adaptation neural network for change detection with heterogeneous optical and SAR remote sensing images. \u003cem\u003eInt. J. Appl. Earth Obs. Geoinf.\u003c/em\u003e \u003cb\u003e109\u003c/b\u003e, 102769 (2022).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhu, Y., Sun, X., Diao, W., Li, H. \u0026amp; Fu, K. RFA-Net: Reconstructed feature alignment network for domain adaptation object detection in remote sensing imagery. \u003cem\u003eIEEE J. Sel. Top. Appl. Earth Observations Remote Sens.\u003c/em\u003e \u003cb\u003e15\u003c/b\u003e, 5689\u0026ndash;5703 (2022).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHashemzadeh, M., Asheghi, B. \u0026amp; Farajzadeh, N. Content-aware image resizing: an improved and shadow-preserving seam carving method. \u003cem\u003eSig. Process.\u003c/em\u003e \u003cb\u003e155\u003c/b\u003e, 233\u0026ndash;246 (2019).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAsheghi, B., Salehpour, P., Khiavi, A. M. \u0026amp; Hashemzadeh, M. A comprehensive review on content-aware image retargeting: From classical to state-of-the-art methods. \u003cem\u003eSig. Process.\u003c/em\u003e \u003cb\u003e195\u003c/b\u003e, 108496 (2022).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHussain, M. YOLO-v1 to YOLO-v8, the rise of YOLO and its complementary nature toward digital manufacturing and industrial defect detection. \u003cem\u003eMachines\u003c/em\u003e \u003cb\u003e11\u003c/b\u003e, 677 (2023).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhao, Z. et al. RT-DETR-Tomato: Tomato Target Detection Algorithm Based on Improved RT-DETR for Agricultural Safety Production. \u003cem\u003eAppl. Sci.\u003c/em\u003e \u003cb\u003e14\u003c/b\u003e, 6287 (2024).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRen, S., He, K., Girshick, R., Sun, J. \u0026amp; Faster, R-C-N-N. Towards real-time object detection with region proposal networks. \u003cem\u003eIEEE Trans. Pattern Anal. Mach. Intell.\u003c/em\u003e \u003cb\u003e39\u003c/b\u003e, 1137\u0026ndash;1149 (2016).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTian, Z., Shen, C., Chen, H. \u0026amp; He, T. F. C. O. S. A simple and strong anchor-free object detector. \u003cem\u003eIEEE Trans. Pattern Anal. Mach. Intell.\u003c/em\u003e \u003cb\u003e44\u003c/b\u003e, 1922\u0026ndash;1933 (2020).\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Unsupervised domain adaptation, Joint Color Space Transformation(JCST), High-resolution remote sensing images, Cross-domain object detection, Multi-model federation, Multi-scale object","lastPublishedDoi":"10.21203/rs.3.rs-6642304/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-6642304/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eThe rapid advancement of deep learning has led to significant achievements in remote sensing object detection. However, domain shift often causes notable performance drops when models trained on one domain are applied to real-world scenarios. Unsupervised domain adaptation (UDA) offers a solution by narrowing domain gaps. Generative adversarial networks (GANs) are commonly used for this purpose, but they can degrade key textures and details in source images. To address this, we propose a method that integrates transformations in both input and feature spaces. First, we standardize image dimensions across source and target domains. Then, a Joint Color Space Transformation (JCST) module operates in the feature space to decouple and recombine color channels, preserving crucial image details while aligning data distributions. We validated our approach on a dataset containing large-, medium-, and small-scale objects, using multiple object detection models. Results show that our method boosts average detection accuracy by 2\u0026ndash;4% on source domain images, demonstrating improved generalization and robustness in cross-domain tasks.\u003c/p\u003e","manuscriptTitle":"Unsupervised Domain Adaptation for Cross-domain Remote Sensing Object Detection Via Joint Input and Feature Space","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-05-30 05:29:40","doi":"10.21203/rs.3.rs-6642304/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"ccae5521-807b-4673-8047-0cfa6e026b7f","owner":[],"postedDate":"May 30th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[{"id":49171341,"name":"Physical sciences/Optics and photonics/Optical techniques/Imaging and sensing"},{"id":49171342,"name":"Physical sciences/Mathematics and computing/Computer science"}],"tags":[],"updatedAt":"2025-06-17T05:08:49+00:00","versionOfRecord":[],"versionCreatedAt":"2025-05-30 05:29:40","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-6642304","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-6642304","identity":"rs-6642304","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.