Multi-Modal Ancient Script Recognition via deep learning with Data Homogenization and Augmentation | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Multi-Modal Ancient Script Recognition via deep learning with Data Homogenization and Augmentation Nan Wang, Weichen Wang, Bang Li, Han Zhang, Qingju Jiao, Chaofan Liu This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-6871550/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 15 Oct, 2025 Read the published version in npj Heritage Science → Version 1 posted 9 You are reading this latest preprint version Abstract Ancient scripts provide invaluable insights into ancient societies, and their effective recognition is crucial for cultural relic preservation, textual decipherment, and heritage. Current research primarily focuses on single mode ancient text data recognition such as processing rubbings or handwritten scripts independently, yet ancient scripts exhibit diverse forms across modalities. To address this, we propose a novel multimodal recognition framework capable of processing hybrid inputs like oracle bone rubbings and handwritten scripts. Our method employs two additional modules, a cross-modal data homogenization block to unify heterogeneous data representations and a data augmentation block to enhance model robustness, then achieve the recognition with convolutional neural networks. Evaluated on oracle bone and bronze inscription datasets, our approach outperforms baseline methods in recognition accuracy and generalization capability across modalities. Ancient Script Recognition Multi-Modal Data Homogenization Data Augmentation Deep Learning Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 1. Introduction Character recognition is one of the foundational pillars of ancient text research and holds significant importance for exploring ancient civilizations. Ancient Greece, Rome, Egypt, and China each developed unique writing systems, many of which have faded or disappeared over centuries. Today, these texts are preserved in unearthed cultural relics, often fragmented and displaced—sometimes far from their original locations—due to natural decay or human activities such as trafficking. Traditional methods of ancient scripts recognition rely heavily on accessing extensive repositories of information and the expertise of scholars. This process primarily depends on a researcher’s accumulated experience and the corpus they have access to. When specialists study these inscriptions, they must invest considerable effort and care in organizing relevant materials, often engaging in high-threshold tasks such as reconstructing missing texts and conducting comprehensive literature reviews. As a result, traditional methods are highly complex, time-consuming, and require specialized workflows, which have increasingly faced limitations in recent years. The advent of artificial intelligence (AI) and deep learning technologies has opened new avenues for researchers, enabling them to uncover and leverage intricate statistical patterns within vast datasets. A notable example is Ithaca [ 1 ] , a groundbreaking tool that has reinforced confidence in this emerging research direction. To enhance the application of recognition technology in the study of ancient Chinese script, we propose an ancient script recognition model based on separate studies of oracle bone inscriptions and bronze inscriptions. The model comprises two main components: a data preprocessing module, which includes a cross-modal data homogenization block and a data augmentation block; a recognition module based on CNN model. Our method is specifically designed to address the following challenges in existing research: (1) Limited data scale and highly uneven distribution. Existing data on oracle bone script and bronze inscriptions primarily consist of rubbings and their replicas. The overall volume of data is relatively limited. For instance, in oracle bone script, some characters appear thousands of times, whereas others occur only once or twice. In deep learning, particularly with large models, the limited data volume hampers the application of related technologies. (2) Diverse forms of inscriptions. In addition to rubbings, handwritten forms are another primary medium for oracle bone inscriptions and bronze inscriptions. These handwritten forms are frequently encountered in academic papers and monographs. When retrieving information from diverse sources, both rubbings and handwritten forms are utilized. Therefore, the recognition system must accommodate hybrid type data. (3) High resource demand. Deep learning, particularly large language models, requires substantial hardware resources. This poses significant constraints on the advancement of related research, especially given that the study of ancient Chinese script is a niche field. Therefore, exploring methods to achieve effective identification results with fewer resources is of critical importance. Our model can effectively address the problems of small data scale and hybrid inputs, and can use CNN models to improve the accuracy of ancient script recognition. 2. Related Works Currently, there have been some studies on the recognition of oracle bone inscriptions and bronze inscriptions. Some basic exploratory methods used traditional machine learning methods. In those methods, features of ancient characters images are analyzed and transformed into corresponding structural encoding forms. Then, classification algorithms such as support vector machine and K-nearest neighbor are used to classify the results to conduct the recognition. For example, Liu et al. proposed a recognition algorithm that employs Support Vector Machine (SVM) to classify the features of different handwritten oracle bone inscriptions [ 2 ] . Gu divides the flat graphics of oracle bone inscriptions into four quadrants, then the fractal geometry principles are used to represent the oracle bone inscriptions into corresponding description codes, which are then registered with the fractal feature library of oracle bone inscriptions to achieve their recognition [ 3 ] . Qu et al. analyzed the topological structure of oracle bone inscriptions and constructed topological feature relationships such as topological feature points and connected domains through extensive text structure analysis. Then they used similarity comparison methods to conduct oracle bone inscriptions recognition [ 4 ] . Zhao et al. fused the HOG and GLCM features and used a SVM model to achieve the bronze inscriptions recognition [ 5 ] . These traditional methods generally demonstrate robust recognition capabilities for these ancient inscriptions with uncomplicated line structures. However, challenges remain when processing characters with complex structures, particularly for multi-component characters, where recognition performance tends to be significantly weaker. Deep learning models are more commonly applied to the task of ancient scripts recognition in recent studies. For example, Liu et al. carried out the oracle bone inscription recognition study based on deep convolutional neural networks and obtained relatively accurate results on certain dataset [ 6 ] . Fujikawa et al. proposed a two-step method to achieve the recognition of oracle bone inscription, which used YOLO to perform automatic retrieval and recognition at first, then used the MobileNet to manually deal with the undetected oracle bone inscriptions in the image in the second step [ 7 ] . Meng et al. (2018) proposed an oracle bone inscriptions recognition method based on data augmentation and deep learning model, which achieved good results. Guo et al. (2022) proposed an improved neural network model based on Inception-v3 for oracle bone inscription character recognition and got a good effect [ 8 ] . Mai et al. presented a new convolutional neural network architecture for recognizing oracle bone inscriptions. It is based on the idea of Inception modules and the use of residual connections [ 9 ] . Qiao et al. used generative adversarial networks to achieve image augmentation, used Pix2Pix model to restore text and used ResNet50 to robust feature extraction, this method has good recognition accuracy on both oracle bone dataset and bronze inscriptions dataset [ 10 ] . Wu et al. designed a convolutional neural network (CNN)-based model, enhanced attention mechanisms by introducing a spatial transformer network (STN), incorporated a robust loss function for implicit semantic data augmentation (ISDA), and leveraged a newly constructed large-scale bronze inscription character dataset, achieving a high accuracy of 91.21% in automated recognition [ 11 ] . In addition to the aforementioned studies, some scholars have also conducted research on other types of ancient characters. Obtained good recognition results through different models. For example, Zhao et al. constructed a generative model based on a multilayer adversarial neural network with a Laplacian structure, which can effectively recognize Shui characters [ 12 ] . Xu et al. propose a large-scale continual learning framework based on the convolutional prototype network, which can without saving the raw data of old classes and enables simultaneous classification of all existing classes without knowing the incremental batch number [ 13 ] . Barucci et al. proposed an improved CNN model that can effectively recognize and classify ancient Egyptian texts [ 14 ] . For the existing research, there are still some shortcomings: (1) Research on mixed data is insufficient. Most of the existing studies aim at the recognition of single type ancient inscription. Only few focus on the recognition method with mixed data Zhang et al. (2021). However, through analyzing the actual data environment of “Yin Qi Wen Yuan” (Yin Qi Wen Yuan, 2024), recognition for mixed types of oracle bone inscriptions is more in line with actual needs which is as the same in Bronze Inscriptions research. (2) Low recognition accuracy or narrow applicability. Current research results show that the accuracy of most methods remains relatively low. Even some studies on the handwritten ancient characters do not have good results. Not to mention the rubbings. They are difficult to apply to practical research. In order to achieve better results, a new research idea from the data augmentation is brought forward. During our previous studies, it is found that a method can obtain good results on the task of handwriting recognition [ 15 ] . Therefore, a generative model that can transfer rubbings into handwritings to bypass the problem of rubbing recognition. Image generation has been a popular research topic in many application areas over recent years. Based on the deep learning model, researchers carried out studies like how to achieve Image-to-Image, text-to-image, and image inpainting [ 16 ][ 17 ][ 18 ] . Regarding ancient inscriptions, some researchers have leveraged similar technologies to conduct studies on the generation and restoration of ancient texts, for example: Kaneko et al. applied zero-shot restoration based on Diffusion models to ancient degraded documents, specifically, leverage inpainting of Denoing Diffusion Restoration Models (DDRM) for missing ancient characters [ 19 ] . Inspired this method we focus on U-net which is more often used for image segmentation tasks [ 20 ] . Rubbing images are a kind of black and white image, similar to medical images such as CT. Therefore, With the properties of rubbings, we took oracle bone inscriptions and bronze inscriptions as the research object and studied the method to achieve the homogenization between rubbings and handwritten data. 3. Materials and Methods 3.1 Datasets The experimental data includes two types of ancient scripts: the oracle bone inscriptions and the bronze inscriptions. Either of these types ancient characters images has two categories: rubbings and handwritings. The oracle bone inscriptions dataset is collected from OBC306 and HWOBC on Yin Qiwen Yuan Oracle Big Data Platform ( https://jgw.aynu.edu.cn/home/down/index.html ). The HWOBC data set contains 3881 handwritten oracle bone inscriptions with 83245 images. The OBC306 data set contains 306 oracle bone inscriptions with both handwriting and rubbing forms, as shown in Fig. 1 . But as the uneven distribution of the data set some characters only has one or two images, which is too small to use for conducting experiments. Therefore, the original data set had been filtered and finally 165 oracle bone inscriptions are chosen for the experiments which have enough quantity and good image quality in both of the two data set as the experimental data. Ultimately, the data set contains 12000 images, in which there are 8474 training images and 3526 testing images, as shown in Table 1 . Table 1 The oracle bone inscriptions dataset Training data Testing data Total Handwritings (HWOBC) 3007 1254 4261 Rubbings (OBC) 5467 2272 7739 Mixed data (MSO) 8474 3526 12000 The bronze inscriptions dataset is collected from “Jinwen script compilation”. As shown in Fig. 2 . This dataset is still in the processing and organization stage, Tencent and Key Laboratory of Oracle Bone Inscriptions Information Processing are currently responsible for organizing relevant data. But even so we selected 60 characters as the experimental data. Ultimately, the data set contains 2551 images including rubbings and handwritings, the detailed information is shown in Table 2 . Table 2 The bronze inscriptions dataset Training data Testing data Total Bronze Inscription Handwriting 897 452 1349 Bronze Inscription Rubbings 899 303 1202 Bronze Inscription Mixed data 1796 755 2551 3.2 Proposed Model 3.2.1 Overall Framework In the preliminary research, it was found that imitations have high accuracy in the recognition process, due to its simple image structure [ 15 ] . As for the rubbings, due to noise factors like wear and shield patterns in the image, the recognition accuracy of rubbings is relatively low. Therefore, when facing mixed type data, we proposed an idea which is normalize the input data and then applied the recognition process. The normalization is implemented using a U-net based module that converts rubbing data into handwriting data. We also proposed a method for data augmentation, specifically targeting pictographic characters such as oracle bone script and bronze inscriptions. The proposed methods are more effective than the general random methods. The relevant work was reflected in the ablation experiments. The architecture of our model is shown in Fig. 3 . 3.2.2 Data homogenization The data homogenization block is based on the U-net and for better results the spatial attention and channel attention mechanism are introduced in the process. The generator is as shown in Fig. 4 . For low-order features, a residual block with spatial attention is inserted before the third down sampling step. For the high-order features after down-sampling, each channel of a feature map is considered as a feature detector, channel attention is induced to the residual blocks. During the down-sampling process, the Resblock can be described as follows: $$\:{F}_{down}\left(x\right)=\sigma\:({W}_{2}\bullet\:\sigma\:({W}_{1}\bullet\:x\left)\right)$$ 1 where \(\:{W}_{1}\) and \(\:{W}_{2}\) are the weights of two convolutions, the stride S of the \(\:{W}_{1}\) is set to 2. The output size is as shown in the following equation: $$\:{Hight}_{out}=⌊\frac{{Hight}_{in}}{2}⌋,\:{Width}_{out}=⌊\frac{{Width}_{in}}{2}⌋$$ 2 The skip connection needs to synchronize down sampling, which is as follows: $$\:{W}_{s}\bullet\:x$$ 3 where S = 2, and the convolutional kernel size is \(\:1\times\:1\) . The final output is: $$\:Y=\:{F}_{down}\left(x\right)+({W}_{s}\bullet\:x)$$ 4 The up sampling uses the transposed convolution with the output shown in Eq. 5 : $$\:{N}_{out}=\left({N}_{in}-1\right)\times\:S+F-2P$$ 5 F is the convolutional kernel with size \(\:2\times\:2\) , S is stride with value 2, P is the padding with value 0. Based on the facsimile generation all rubbings can be converted to the handwriting form, which realizes the unified representation of each kind of oracle bone inscriptions data. 3.2.3 Data augmentation Based on the characteristics of ancient Chinese script such as oracle bone inscriptions and bronze inscriptions, we propose a targeted data augmentation plan, specifically, as follows: (1) Horizontal Flipping. Giving a definition of a binary random variable α∼Bernoulli(0.5), The flipped image \(\:{I}_{flip}\) is given by: $$\:{I}_{flip}(X,Y)=\left\{\begin{array}{c}I\left(x,W-y\right),\:\alpha\:=1\\\:I\left(x,y\right),\alpha\:=0\end{array}\right.$$ 6 where W is the image width. (2) Rotation Random rotation angle \(\:\:\theta\:\sim\text{U}(-20^\circ\:,20^\circ\:)\) , about center ( x c , y c ), Transformed coordinates: $$\:\left[\begin{array}{c}x{\prime\:}\\\:y{\prime\:}\end{array}\right]=\left[\begin{array}{cc}\text{cos}\theta\:&\:-\text{sin}\theta\:\\\:\text{sin}\theta\:&\:\text{cos}\theta\:\end{array}\right]\left[\begin{array}{c}x-{x}_{c}\\\:y-{y}_{c}\end{array}\right]+\left[\begin{array}{c}{x}_{c}\\\:{y}_{c}\end{array}\right]$$ 7 (3) Affine Transformation Scaling factor s∼U(0.6,1.2), and shear factor β∼U(5,13) $$\:\left[\begin{array}{c}x{\prime\:}\\\:y{\prime\:}\end{array}\right]=\left[\begin{array}{cc}s&\:\beta\:\\\:0&\:s\end{array}\right]\left[\begin{array}{c}x\\\:y\end{array}\right]$$ 8 Empty areas filled with: \(\:{I}_{affine}\left({x}^{{\prime\:}},{y}^{{\prime\:}}\right)=255\) . (4) Salt-and-Pepper Noise Noise mask M ∈{0,1} H × W , noise density, ρ = 0.15 (SNR = 0.85): $$\:{I}_{noise}\left(x,y\right)=\left\{\begin{array}{c}0,\:\:\:\:\:\:\:\:M\left(x,y\right)=1\left(peper\right)\\\:255,\:\:\:M\left(x,y\right)=0\left(salt\right)\\\:I\left(x,y\right),\:\:\:\:\:\:\:\:\:\:\:\:\:otherwise\end{array}\right.$$ 9 Application probability: P (apply) = 0.3. (5) Gaussian Noise Noise η ∼N(0,0.05): $$\:{I}_{gauss}\left(x,y\right)=clip(I\left(x,y\right)+\eta\:\bullet\:\text{255,0},255)$$ 10 Application probability: P (apply) = 0.3. (6) Brightness/Contrast Adjustment Random gain γ ∼U(0.8,1.2): $$\:{I}_{bright}\left(x,y\right)=clip(\gamma\:\bullet\:I\left(x,y\right),\:\text{0,255})$$ 11 (7) Grayscale Conversion Luminance transformation: $$\:{I}_{grey}\left(x,y\right)=0.299R\left(x,y\right)+0.587G\left(x,y\right)+0.114B(x,y)$$ 12 (8) Gaussian Blur Kernel K of size 7×7 with σ = 0.15: $$\:{I}_{blur}=I*K,\:\:\:\:\:\:K\left(u,v\right)=\frac{1}{2\pi\:{\sigma\:}^{2}}{e}^{-\frac{{u}^{2}+{v}^{2}}{2{\sigma\:}^{2}}}$$ 13 Application probability: P (apply) = 0.3 In the course of practice, after successfully performing the steps above an augmented experiment data set can be achieved. These randomly changed samples can reduce the model's dependence on certain attributes, thereby improving the model's generalization ability. 4. Experiments and Results 4.1 Baseline Deep Convolutional Neural Networks (CNNs) have demonstrated exceptional performance in image recognition tasks, owing to their inherent advantages including local receptive fields, parameter sharing mechanisms, hierarchical feature learning, translation equivariance, and dimensionality reduction via pooling operations. Given these merits, we adopt seven representative deep learning architectures as baselines: AlexNet, VGG-19, ResNet-50, GoogLeNet, ShuffleNet, Vision Transformer (ViT), and ConvNeXt. Extensive comparative experiments substantiate the superiority of our proposed approach. 4.2 Experiment Setup To ensure the consistency and stability across all experiments, we used a dedicated deep learning device. The experiments are conducted on the same hardware environment and software environment. As shown in Table 3 . Table 3 Experimental Environment Hardware Environment Hardware Name Specific models GPU NVIDIA TITAN XP ×4 GPU memory 12G CPU Intel E5-2683 v3 Computer memory 32G Software Environment Software Name Specific models Operating system Ubuntu 20.04.4 LTS Deep learning framework PyTorch 1.7.1 develop environment Python 3.8 + PyCharm Graphics adapter NVIDIA 470.53 CUDA Version v11.0 Table 4 summarizes the hyperparameters used in the experiments for each model. The hyperparameters for all models were carefully selected to ensure effective training. A batch size of 64 was used for all models to balance between training efficiency and memory usage. The Cross-Entropy loss function was chosen for the effectiveness in classification tasks of all models. Most models utilized the AdamW optimizer, which incorporates weight decay directly into the optimiza- tion process and provides adaptive learning rates, thereby promoting stable and efficient convergence. The learning rates were set to 1e-4 for most models, with ShuffleNet using a slightly higher rate of 5e-4 owing to its lightweight architecture and faster convergence behavior. Weight decay was set to 0.05 for all models to prevent overfitting. Each model was trained for 200 epochs to ensure sufficient learning. Table 4 Hyperparameters for Each Model Model Batch Size Loss Function Optimizer Learning Rate Weight Decay Epochs AlexNet 64 Cross-Entropy AdamW 1e-4 0.05 200 VGG19 64 Cross-Entropy AdamW 1e-4 0.05 200 ResNet50 64 Cross-Entropy AdamW 1e-4 0.05 200 ConvNext 64 Cross-Entropy AdamW 1e-4 0.05 200 EfficientNet 64 Cross-Entropy AdamW 1e-4 0.05 200 ShuffleNet 64 Cross-Entropy AdamW 5e-4 0.05 200 ViT 64 Cross-Entropy AdamW 1e-4 0.05 200 4.3 Results In this study, we systematically implemented both the data homogenization and augmentation modules across all evaluated models, followed by comprehensive performance analysis. Table 5 presents the comparative results on the mixed dataset, including four key metrics: Top-1 accuracy, F1 score, AUC, and Top-5 accuracy. The table contrasts two configurations: (1) Base - representing the baseline approach where models are trained solely on the original heterogeneous dataset containing both rubbing and handwritten images without any preprocessing; and (2) CA* - our proposed framework incorporating the synergistic combination of data homogenization and augmentation components. Table 5 Performance Comparison on Oracle bone inscriptions Dataset Model Top-1 Acc ↑ F1 Score ↑ AUC ↑ Top-5 Acc ↑ Base CA* Base CA* Base CA* Base CA* AlexNet 0.712 0.894 0.707 0.893 0.989 0.999 0.907 0.973 VGG19 0.711 0.913 0.707 0.912 0.985 0.999 0.898 0.979 ResNet50 0.482 0.921 0.477 0.920 0.964 0.999 0.764 0.984 ConvNext 0.548 0.909 0.544 0.910 0.977 0.999 0.808 0.986 EfficientNet 0.712 0.912 0.710 0.919 0.988 0.999 0.928 0.985 ShuffleNet 0.731 0.933 0.732 0.931 0.991 0.999 0.904 0.988 ViT 0.319 0.856 0.314 0.854 0.894 0.998 0.553 0.962 On the Hybrid dataset, the baseline models initially demonstrated only marginal performance. However, with the integration of our proposed data homogenization and augmentation modules, the results exhibited substantial improvement. As illustrated in Fig. 5 , the Top-1 accuracy increased by at least 18 percentage points across all models. Notably, the Vision Transformer (ViT) achieved a dramatic enhancement, rising from 0.319 to 0.856. These findings conclusively validate the efficacy of our method for oracle bone inscription recognition. In an effort to provide a better assessment of our model, we measured the performances of the proposed method with the bronze inscriptions data. Table 6 shows the performances of the baseline methods and the proposed method. Table 6 Performance Comparison on Bronze Inscription Dataset Model Top-1 Acc ↑ F1 Score ↑ AUC ↑ Top-5 Acc ↑ Base CA * Base CA* Base CA* Base CA* AlexNet 0.588 0.798 0.576 0.793 0.960 0.991 0.838 0.938 VGG19 0.550 0.815 0.536 0.809 0.951 0.990 0.831 0.941 ResNet50 0.401 0.798 0.379 0.790 0.914 0.991 0.665 0.946 ConvNext 0.496 0.750 0.480 0.740 0.937 0.986 0.785 0.918 EfficientNet 0.581 0.824 0.568 0.819 0.965 0.987 0.854 0.947 ShuffleNet 0.580 0.859 0.577 0.854 0.960 0.993 0.877 0.957 ViT 0.246 0.574 0.223 0.562 0.838 0.958 0.557 0.828 The results shown that for the Top-1 accuracy, F1 score, AUC and Top-5 accuracy indices, with the proposed data homogenization block and the augmentation block the recognition results had been improved, even the data scale of bronze inscriptions is relatively small. Take the Top-1 accuracy as an example, as shown in Fig. 6 , the accuracy had increased by at least 26 percentage points. To sum up, for the multi-modal ancient script, the proposed method performed notably better than the baseline methods. 4.4 Ablation Study To rigorously validate the efficacy of our proposed method, we conducted systematic ablation studies. The key contribution of this work lies in the integration of the data homogenization block and augmentation block with the recognition block. Through controlled experiments, we quantitatively assessed the individual impact of each block on recognition performance. 4.4.1 The impact of data homogenization In the first ablation experiment, we isolated the effect of data homogenization by exclusively removing the data augmentation block while retaining other components. Subsequently, we evaluated the recognition performance using only the homogenization block across both oracle bone inscriptions and bronze inscriptions datasets. For quantitative comparison, Top-1 and Top-5 accuracy were selected as evaluation metrics. The detailed results of this configuration are presented in Table 7 . Table 7 Baseline models vs proposed method with only data homogenization block (marked as C only) and CA* Accuracy Comparison on Oracle bone inscriptions Dataset Model Top-1 Acc ↑ Top-5 Acc ↑ Base C only CA* Base C only CA* AlexNet 0.712 0.844 0.894 0.907 0.963 0.973 VGG19 0.711 0.764 0.913 0.898 0.911 0.979 ResNet50 0.482 0.332 0.921 0.764 0.601 0.984 ConvNext 0.548 0.725 0.909 0.808 0.931 0.986 EfficientNet 0.712 0.836 0.912 0.928 0.969 0.985 ShuffleNet 0.731 0.846 0.933 0.904 0.973 0.988 ViT 0.319 0.346 0.856 0.553 0.734 0.962 The incorporation of the data homogenization block enhanced the recognition capability of baseline models, though the improvement margin remained limited. When evaluated against the complete model, architectures utilizing only the homogenization block exhibited inferior Top-1 and Top-5 accuracy across all benchmarks. Notably, AlexNet, ConvNext, EfficientNet, and ShuffleNet demonstrated relatively higher performance gains with homogenization, whereas ViT and VGG19 showed marginal improvements. In contrast, ResNet50 experienced a slight performance degradation. The complete model configuration consistently achieved significant accuracy improvements, underscoring the synergistic effect of integrated modules. The same situation also occurred on the bronze inscriptions. Table 8 shows the Top-1 accuracy and Top-5 accuracy for all the methods to be compared. Table 8 Baseline models vs proposed method with only data homogenization block (marked as C only) and CA* Accuracy Comparison on bronze inscriptions Dataset Model Top-1 Acc ↑ Top-5 Acc ↑ Base C only CA* Base C only CA* AlexNet 0.588 0.685 0.798 0.838 0.888 0.938 VGG19 0.550 0.585 0.815 0.831 0.846 0.941 ResNet50 0.401 0.315 0.798 0.665 0.642 0.946 ConvNext 0.496 0.573 0.750 0.785 0.804 0.918 EfficientNet 0.581 0.642 0.824 0.854 0.892 0.947 ShuffleNet 0.580 0.692 0.859 0.877 0.912 0.957 ViT 0.246 0.311 0.574 0.557 0.627 0.828 The results shown in Table 8 were similar to these shown in Table 7 . The data homogenization block could improve the performance of each baseline model but the improvement is limited. There was still a decrease about the result of ResNet50. Compared with the complete model, the gap in accuracy still existed. 4.4.2 The impact of data augmentation To further evaluate the role of data augmentation, we performed comparative experiments analyzing its isolated contribution. Table 9 presents the Top-1 and Top-5 accuracy metrics for three configurations on the Oracle Bone Inscriptions dataset: (1) baseline models without augmentation, (2) models with only the augmentation module, and (3) the complete integrated model. Table 9 Baseline models vs proposed method with only data augmentation block (marked as C only) and CA* Accuracy Comparison on Oracle bone inscriptions Dataset Model Top-1 Acc ↑ Top-5 Acc ↑ Base A only CA* Base A only CA* AlexNet 0.712 0.801 0.894 0.907 0.942 0.973 VGG19 0.711 0.885 0.913 0.898 0.971 0.979 ResNet50 0.482 0.870 0.921 0.764 0.969 0.984 ConvNext 0.548 0.775 0.909 0.808 0.916 0.986 EfficientNet 0.712 0.866 0.912 0.928 0.963 0.985 ShuffleNet 0.731 0.893 0.933 0.904 0.972 0.988 ViT 0.319 0.528 0.856 0.553 0.690 0.962 Table 10 shows the results of different models on bronze inscriptions Dataset. The results shown in the tables indicated that data augmentation block could also improve the performance of each baseline model. But compared with the complete model, the results of models with data augmentation were relatively low. To summarize, all the above ablation experiments indicate that the two block we added in the recognition process can indeed improve the accuracy of ancient character recognition. Each single block can have a certain effect, and with the complete model we can effectively improve the performance of baseline models. Table 10 Baseline models vs proposed method with only data augmentation block (marked as C only) and CA* accuracy comparison on bronze inscriptions dataset Model Top-1 Acc ↑ Top-5 Acc ↑ Base A only CA* Base A only CA* AlexNet 0.588 0.731 0.798 0.838 0.923 0.938 VGG19 0.550 0.751 0.815 0.831 0.918 0.941 ResNet50 0.401 0.744 0.798 0.665 0.919 0.946 ConvNext 0.496 0.698 0.750 0.785 0.891 0.918 EfficientNet 0.581 0.776 0.824 0.854 0.912 0.947 ShuffleNet 0.580 0.783 0.859 0.877 0.935 0.957 ViT 0.246 0.454 0.574 0.557 0.735 0.828 Conclusion We propose a deep learning-based data processing mechanism for ancient text recognition, incorporating two key components: (1) a U-Net-based data normalization block to handle multimodal script variations, and (2) a customized augmentation block to enhance data robustness. These processed data are then fed into CNN models for final recognition. Experimental results on oracle bone inscriptions and bronze inscriptions datasets demonstrate significant improvements over baseline models. For instance, ResNet50 achieved a Top-1 accuracy increase from 0.482 to 0.921 (91.0% relative improvement) on oracle bone inscriptions, and from 0.401 to 0.798 (98.8% improvement) on bronze inscriptions. Ablation studies reveal that while each block contributes to performance gains (homogenization: +22.4% avg., augmentation: +18.7% avg.), their combined use in the complete method yields optimal results (+ 102.3% avg. improvement). This study's primary contribution lies in establishing a standardized preprocessing pipeline for multimodal ancient scripts. The proposed two-block mechanism provides both theoretical foundations and practical methodologies for archaeological text analysis, with experimentally validated efficacy. Declarations Author contributions Nan Wang: Conceptualization. Nan Wang and Bang Li: Methodology. Weichen Wang: Software. Weichen Wang and Nan Wang: Validation. Nan Wang: Writing original draft preparation. Han Zhang: Writing review and editing. Qingju Jiao and Chaofan Liu: Data preparation. Declaration of competing interest The authors declare that there are no conflicts of interest. Acknowledgments This study was funded by Natural Science Foundation of Henan Province, grant number 242300420680, Henan Province Science and Technology Research Project, grant number 222102320036. Data Availability Statement : The original data used in this paper HWOBC and OBC306 can be obtained from the website: http://jgw.aynu.edu.cn/DownPage; the complete data can be obtained from the website: https://github.com/Augety88/oracle-jinwen-code-data. Code availability Statement : The underlying code for this study is available in https://github.com/Augety88/oracle-jinwen-code-data References Assael Y, et al. Restoring and attributing ancient texts using deep neural networks. Nature. 2022;603(7900):280–3. Liu YG, Liu GY. Oracle bone inscription recognition based on SVM. J Anyang Normal Univ. 2017;2:54–6. Gu ST. Identification of oracle-bone script fonts based on fractal geometry. J Chin Inform Process. 32(10) (2018). Qu HY, Liu JZ, Wu J. Oracle-Bone Inscriptions Recognition Based on Topological Features. Comput Sci Application. 2019;9(6):1111–7. Zhao RQ, Wang HQ, Wang K, Wang Z, &Liu WT. Recognition of bronze inscriptions image based on mixed features of histogram of oriented gradient and gray level co-occurrence matrix. Laser Optoelectron Progress. 2020;57(12):90–6. Liu MT, Liu GY, Liu YG, Jiao QJ. Oracle bone inscriptions recognition based on deep convolutional neural network. J image graphics. 2020;8(4):114–9. Fujikawa Y, et al. Recognition of oracle bone inscriptions by using two deep learning models. Int J Digit Humanit. 2023;5(2):65–79. Meng L, Kamitoku N, Yamazaki K. Recognition of oracle bone inscriptions using deep learning based on data augmentation. 2018 metrology for archaeology and cultural heritage . IEEE; 2018. pp. 33–8. Mai C, Penava P, Buettner R. Oracle Bone Inscription Character Recognition based on a novel Convolutional Neural Network Architecture[J]. IEEE Access. 2024;12:197021–34. Qiao YG, Xing LZ. Applying Deep Learning Algorithms for Automatic Recognition and Transcription of Texts in Oracle Bones and Golden Texts. Appl Math Nonlinear Sci. 2023;9(1):1–16. Wu XQ, Wang ZY, Ren P. CNN-based Bronze Inscriptions Character Recognition. 2022 5th International Conference on Advanced Electronic Materials, Computers and Software Engineering . IEEE, 514–519 (2022). Zhao HS, Chu HZ, Zhang YY, Yu J. Improvement of ancient Shui character recognition model based on convolutional neural network. IEEE Access. 2020;8:33080–7. Xu Y, Zhang XY, Zhang ZX, Liu CL. Large-scale continual learning for ancient Chinese character recognition. Pattern Recogn. 2024;150:110283. Barucci A, et al. A deep learning approach to ancient egyptian hieroglyphs classification. IEEE Access. 2021;9:123438–47. Wang N, Wang CJ, Jiao QJ. Research on Handwritten Oracle Bone Inscriptions Recognition Based on EasyDL. Electron Technol Softw Eng. 2023;3:184–7. Parmar G et al. Zero-shot image-to-image translation. ACM SIGGRAPH 2023 conference proceedings. ACM, 1–11 (2023). Li YH et al. Gligen: Open-set grounded text-to-image generation. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 22511–22521 (2023). Zhang XB, Zhai DH, Li TR, Zhou YX, Lin Y. Image inpainting based on deep learning: A review. Inform Fusion. 2023;90:74–94. Kaneko H, Yoshizu Y, Ishibashi R, Meng L. An attempt at zero-shot ancient documents restoration based on diffusion models. 2023 International Conference on Advanced Mechatronic Systems (ICAMechS). IEEE, 1–6 (2023). Chen BZ, Liu YS, Zhang Z, Lu GM, Kong A, Transattunet. Multi-level attention-guided u-net with transformer for medical image segmentation. IEEE Trans Emerg Top Comput Intell. 2023;8(1):55–68. Additional Declarations No competing interests reported. Cite Share Download PDF Status: Published Journal Publication published 15 Oct, 2025 Read the published version in npj Heritage Science → Version 1 posted Editorial decision: Revision requested 29 Jun, 2025 Reviews received at journal 27 Jun, 2025 Reviewers agreed at journal 16 Jun, 2025 Reviews received at journal 16 Jun, 2025 Reviewers agreed at journal 13 Jun, 2025 Reviewers invited by journal 13 Jun, 2025 Editor assigned by journal 12 Jun, 2025 Submission checks completed at journal 12 Jun, 2025 First submitted to journal 11 Jun, 2025 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-6871550","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":471568867,"identity":"a646434a-c290-4c98-be20-85cfe94a5a9f","order_by":0,"name":"Nan Wang","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAwUlEQVRIiWNgGAWjYNCCCijNQ7yWMyRrYWwjRYvB7QbGz7zz7BK3SyQwPnjbxiBvTlDLnQPM0rzbkhN3zkhgNpzbxmC4s4GAFrMbCQxALcyJG24ksEnztjEkGBwgrIX5N++cepAW9t/EagEa3nAYbAszUVrsgSot5xw7brzhzMNmyTnnJAw3ENIiCfT1jTc11bIbjicf/PCmzEaeoC0MDPwfmIDR4djAwNgA5EkQVA8GjD+ADiRO6SgYBaNgFIxIAADkCz/YUkA4TgAAAABJRU5ErkJggg==","orcid":"","institution":"Anyang Normal University","correspondingAuthor":true,"prefix":"","firstName":"Nan","middleName":"","lastName":"Wang","suffix":""},{"id":471568868,"identity":"fe67ca54-dcce-45f3-90b5-3576b0e97375","order_by":1,"name":"Weichen Wang","email":"","orcid":"","institution":"Anyang Normal University","correspondingAuthor":false,"prefix":"","firstName":"Weichen","middleName":"","lastName":"Wang","suffix":""},{"id":471568869,"identity":"1f424e77-be07-4801-bcc9-71e66974d9ec","order_by":2,"name":"Bang Li","email":"","orcid":"","institution":"Ministry of Education of China, Anyang Normal University","correspondingAuthor":false,"prefix":"","firstName":"Bang","middleName":"","lastName":"Li","suffix":""},{"id":471568870,"identity":"b0719eeb-73bf-4508-89d6-941e5241bba6","order_by":3,"name":"Han Zhang","email":"","orcid":"","institution":"Anyang Normal University","correspondingAuthor":false,"prefix":"","firstName":"Han","middleName":"","lastName":"Zhang","suffix":""},{"id":471568871,"identity":"6f16fc2d-0344-43fb-9d5f-49c609075ea5","order_by":4,"name":"Qingju Jiao","email":"","orcid":"","institution":"Anyang Normal University","correspondingAuthor":false,"prefix":"","firstName":"Qingju","middleName":"","lastName":"Jiao","suffix":""},{"id":471568872,"identity":"90ccc352-b852-46ca-bde7-ed4a05d0e4e5","order_by":5,"name":"Chaofan Liu","email":"","orcid":"","institution":"Anyang Normal University","correspondingAuthor":false,"prefix":"","firstName":"Chaofan","middleName":"","lastName":"Liu","suffix":""}],"badges":[],"createdAt":"2025-06-11 11:53:28","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-6871550/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-6871550/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1038/s40494-025-02095-x","type":"published","date":"2025-10-15T15:57:59+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":84684781,"identity":"7e77ebb3-f465-4c96-b933-68f2897c7abb","added_by":"auto","created_at":"2025-06-16 08:49:14","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":20266,"visible":true,"origin":"","legend":"\u003cp\u003eExamples of oracle bone inscriptions\u003c/p\u003e","description":"","filename":"Onlinefloatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-6871550/v1/798011c35c4c6ae858f4b12c.png"},{"id":84684782,"identity":"2b4c0b7d-96ee-4bea-9164-0d35ff29357e","added_by":"auto","created_at":"2025-06-16 08:49:14","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":15778,"visible":true,"origin":"","legend":"\u003cp\u003eExamples of bronze inscriptions\u003c/p\u003e","description":"","filename":"Onlinefloatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-6871550/v1/e0ea98ad0049121c358e08bb.png"},{"id":84686506,"identity":"cbb74712-b8d7-4ef1-a259-38f1cd9d8caf","added_by":"auto","created_at":"2025-06-16 08:57:14","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":209987,"visible":true,"origin":"","legend":"\u003cp\u003eThe framework of the proposed method\u003c/p\u003e","description":"","filename":"3.png","url":"https://assets-eu.researchsquare.com/files/rs-6871550/v1/cec15d80c38a31240af24673.png"},{"id":84684786,"identity":"e0b0761c-39dd-427f-b99a-705cb5fd74e3","added_by":"auto","created_at":"2025-06-16 08:49:14","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":140536,"visible":true,"origin":"","legend":"\u003cp\u003eGenerator structure\u003c/p\u003e","description":"","filename":"floatimage4.png","url":"https://assets-eu.researchsquare.com/files/rs-6871550/v1/ffddf7c00854e98ce3523a32.png"},{"id":84686507,"identity":"86dcd7dc-9f50-46a0-9d76-99de49853a58","added_by":"auto","created_at":"2025-06-16 08:57:15","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":330675,"visible":true,"origin":"","legend":"\u003cp\u003eAccuracy curves of various models on the oracle bone inscriptions dataset: with and without the proposed CA* strategy\u003c/p\u003e","description":"","filename":"floatimage5.png","url":"https://assets-eu.researchsquare.com/files/rs-6871550/v1/dcea21ba149fb0da0ed4e320.png"},{"id":84684790,"identity":"2636710b-ba64-4c02-98df-659a9ffa6d0e","added_by":"auto","created_at":"2025-06-16 08:49:15","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":357975,"visible":true,"origin":"","legend":"\u003cp\u003eAccuracy curves of various models on the bronze inscription dataset: with and without the proposed CA* strategy\u003c/p\u003e","description":"","filename":"floatimage6.png","url":"https://assets-eu.researchsquare.com/files/rs-6871550/v1/20a640d12ac48c5a29ba3b82.png"},{"id":93957048,"identity":"85930cc8-eb9a-4994-b74f-0edc99daceec","added_by":"auto","created_at":"2025-10-20 16:12:51","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":2490427,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-6871550/v1/0c27cea9-3991-4af0-867d-26b4ede6cc86.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Multi-Modal Ancient Script Recognition via deep learning with Data Homogenization and Augmentation","fulltext":[{"header":"1. Introduction","content":"\u003cp\u003eCharacter recognition is one of the foundational pillars of ancient text research and holds significant importance for exploring ancient civilizations. Ancient Greece, Rome, Egypt, and China each developed unique writing systems, many of which have faded or disappeared over centuries. Today, these texts are preserved in unearthed cultural relics, often fragmented and displaced\u0026mdash;sometimes far from their original locations\u0026mdash;due to natural decay or human activities such as trafficking.\u003c/p\u003e \u003cp\u003eTraditional methods of ancient scripts recognition rely heavily on accessing extensive repositories of information and the expertise of scholars. This process primarily depends on a researcher\u0026rsquo;s accumulated experience and the corpus they have access to. When specialists study these inscriptions, they must invest considerable effort and care in organizing relevant materials, often engaging in high-threshold tasks such as reconstructing missing texts and conducting comprehensive literature reviews. As a result, traditional methods are highly complex, time-consuming, and require specialized workflows, which have increasingly faced limitations in recent years.\u003c/p\u003e \u003cp\u003eThe advent of artificial intelligence (AI) and deep learning technologies has opened new avenues for researchers, enabling them to uncover and leverage intricate statistical patterns within vast datasets. A notable example is \u003cem\u003eIthaca\u003c/em\u003e \u003csup\u003e[\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e]\u003c/sup\u003e, a groundbreaking tool that has reinforced confidence in this emerging research direction.\u003c/p\u003e \u003cp\u003eTo enhance the application of recognition technology in the study of ancient Chinese script, we propose an ancient script recognition model based on separate studies of oracle bone inscriptions and bronze inscriptions. The model comprises two main components: a data preprocessing module, which includes a cross-modal data homogenization block and a data augmentation block; a recognition module based on CNN model. Our method is specifically designed to address the following challenges in existing research:\u003c/p\u003e \u003cp\u003e(1) Limited data scale and highly uneven distribution. Existing data on oracle bone script and bronze inscriptions primarily consist of rubbings and their replicas. The overall volume of data is relatively limited. For instance, in oracle bone script, some characters appear thousands of times, whereas others occur only once or twice. In deep learning, particularly with large models, the limited data volume hampers the application of related technologies.\u003c/p\u003e \u003cp\u003e(2) Diverse forms of inscriptions. In addition to rubbings, handwritten forms are another primary medium for oracle bone inscriptions and bronze inscriptions. These handwritten forms are frequently encountered in academic papers and monographs. When retrieving information from diverse sources, both rubbings and handwritten forms are utilized. Therefore, the recognition system must accommodate hybrid type data.\u003c/p\u003e \u003cp\u003e(3) High resource demand. Deep learning, particularly large language models, requires substantial hardware resources. This poses significant constraints on the advancement of related research, especially given that the study of ancient Chinese script is a niche field. Therefore, exploring methods to achieve effective identification results with fewer resources is of critical importance.\u003c/p\u003e \u003cp\u003eOur model can effectively address the problems of small data scale and hybrid inputs, and can use CNN models to improve the accuracy of ancient script recognition.\u003c/p\u003e"},{"header":"2. Related Works","content":"\u003cp\u003eCurrently, there have been some studies on the recognition of oracle bone inscriptions and bronze inscriptions. Some basic exploratory methods used traditional machine learning methods. In those methods, features of ancient characters images are analyzed and transformed into corresponding structural encoding forms. Then, classification algorithms such as support vector machine and K-nearest neighbor are used to classify the results to conduct the recognition.\u003c/p\u003e \u003cp\u003eFor example, Liu et al. proposed a recognition algorithm that employs Support Vector Machine (SVM) to classify the features of different handwritten oracle bone inscriptions \u003csup\u003e[\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e]\u003c/sup\u003e. Gu divides the flat graphics of oracle bone inscriptions into four quadrants, then the fractal geometry principles are used to represent the oracle bone inscriptions into corresponding description codes, which are then registered with the fractal feature library of oracle bone inscriptions to achieve their recognition \u003csup\u003e[\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e]\u003c/sup\u003e. Qu et al. analyzed the topological structure of oracle bone inscriptions and constructed topological feature relationships such as topological feature points and connected domains through extensive text structure analysis. Then they used similarity comparison methods to conduct oracle bone inscriptions recognition \u003csup\u003e[\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e]\u003c/sup\u003e. Zhao et al. fused the HOG and GLCM features and used a SVM model to achieve the bronze inscriptions recognition \u003csup\u003e[\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e]\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eThese traditional methods generally demonstrate robust recognition capabilities for these ancient inscriptions with uncomplicated line structures. However, challenges remain when processing characters with complex structures, particularly for multi-component characters, where recognition performance tends to be significantly weaker.\u003c/p\u003e \u003cp\u003eDeep learning models are more commonly applied to the task of ancient scripts recognition in recent studies. For example, Liu et al. carried out the oracle bone inscription recognition study based on deep convolutional neural networks and obtained relatively accurate results on certain dataset \u003csup\u003e[\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e]\u003c/sup\u003e. Fujikawa et al. proposed a two-step method to achieve the recognition of oracle bone inscription, which used YOLO to perform automatic retrieval and recognition at first, then used the MobileNet to manually deal with the undetected oracle bone inscriptions in the image in the second step \u003csup\u003e[\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e]\u003c/sup\u003e. Meng et al. (2018) proposed an oracle bone inscriptions recognition method based on data augmentation and deep learning model, which achieved good results. Guo et al. (2022) proposed an improved neural network model based on Inception-v3 for oracle bone inscription character recognition and got a good effect \u003csup\u003e[\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e]\u003c/sup\u003e. Mai et al. presented a new convolutional neural network architecture for recognizing oracle bone inscriptions. It is based on the idea of Inception modules and the use of residual connections \u003csup\u003e[\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e]\u003c/sup\u003e. Qiao et al. used generative adversarial networks to achieve image augmentation, used Pix2Pix model to restore text and used ResNet50 to robust feature extraction, this method has good recognition accuracy on both oracle bone dataset and bronze inscriptions dataset \u003csup\u003e[\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e]\u003c/sup\u003e. Wu et al. designed a convolutional neural network (CNN)-based model, enhanced attention mechanisms by introducing a spatial transformer network (STN), incorporated a robust loss function for implicit semantic data augmentation (ISDA), and leveraged a newly constructed large-scale bronze inscription character dataset, achieving a high accuracy of 91.21% in automated recognition \u003csup\u003e[\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e]\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eIn addition to the aforementioned studies, some scholars have also conducted research on other types of ancient characters. Obtained good recognition results through different models. For example, Zhao et al. constructed a generative model based on a multilayer adversarial neural network with a Laplacian structure, which can effectively recognize Shui characters \u003csup\u003e[\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e]\u003c/sup\u003e. Xu et al. propose a large-scale continual learning framework based on the convolutional prototype network, which can without saving the raw data of old classes and enables simultaneous classification of all existing classes without knowing the incremental batch number \u003csup\u003e[\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e]\u003c/sup\u003e. Barucci et al. proposed an improved CNN model that can effectively recognize and classify ancient Egyptian texts \u003csup\u003e[\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e]\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eFor the existing research, there are still some shortcomings:\u003c/p\u003e \u003cp\u003e(1) Research on mixed data is insufficient. Most of the existing studies aim at the recognition of single type ancient inscription. Only few focus on the recognition method with mixed data Zhang et al. (2021). However, through analyzing the actual data environment of \u0026ldquo;Yin Qi Wen Yuan\u0026rdquo; (Yin Qi Wen Yuan, 2024), recognition for mixed types of oracle bone inscriptions is more in line with actual needs which is as the same in Bronze Inscriptions research.\u003c/p\u003e \u003cp\u003e(2) Low recognition accuracy or narrow applicability. Current research results show that the accuracy of most methods remains relatively low. Even some studies on the handwritten ancient characters do not have good results. Not to mention the rubbings. They are difficult to apply to practical research.\u003c/p\u003e \u003cp\u003eIn order to achieve better results, a new research idea from the data augmentation is brought forward. During our previous studies, it is found that a method can obtain good results on the task of handwriting recognition \u003csup\u003e[\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e]\u003c/sup\u003e. Therefore, a generative model that can transfer rubbings into handwritings to bypass the problem of rubbing recognition.\u003c/p\u003e \u003cp\u003eImage generation has been a popular research topic in many application areas over recent years. Based on the deep learning model, researchers carried out studies like how to achieve Image-to-Image, text-to-image, and image inpainting \u003csup\u003e[\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e][\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e][\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e]\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eRegarding ancient inscriptions, some researchers have leveraged similar technologies to conduct studies on the generation and restoration of ancient texts, for example: Kaneko et al. applied zero-shot restoration based on Diffusion models to ancient degraded documents, specifically, leverage inpainting of Denoing Diffusion Restoration Models (DDRM) for missing ancient characters \u003csup\u003e[\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e]\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eInspired this method we focus on U-net which is more often used for image segmentation tasks \u003csup\u003e[\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e]\u003c/sup\u003e. Rubbing images are a kind of black and white image, similar to medical images such as CT. Therefore, With the properties of rubbings, we took oracle bone inscriptions and bronze inscriptions as the research object and studied the method to achieve the homogenization between rubbings and handwritten data.\u003c/p\u003e"},{"header":"3. Materials and Methods","content":"\u003cdiv id=\"Sec4\" class=\"Section2\"\u003e \u003ch2\u003e3.1 Datasets\u003c/h2\u003e \u003cp\u003eThe experimental data includes two types of ancient scripts: the oracle bone inscriptions and the bronze inscriptions. Either of these types ancient characters images has two categories: rubbings and handwritings.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eThe oracle bone inscriptions dataset is collected from OBC306 and HWOBC on Yin Qiwen Yuan Oracle Big Data Platform (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://jgw.aynu.edu.cn/home/down/index.html\u003c/span\u003e\u003cspan address=\"https://jgw.aynu.edu.cn/home/down/index.html\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e). The HWOBC data set contains 3881 handwritten oracle bone inscriptions with 83245 images. The OBC306 data set contains 306 oracle bone inscriptions with both handwriting and rubbing forms, as shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e. But as the uneven distribution of the data set some characters only has one or two images, which is too small to use for conducting experiments. Therefore, the original data set had been filtered and finally 165 oracle bone inscriptions are chosen for the experiments which have enough quantity and good image quality in both of the two data set as the experimental data.\u003c/p\u003e \u003cp\u003eUltimately, the data set contains 12000 images, in which there are 8474 training images and 3526 testing images, as shown in Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eThe oracle bone inscriptions dataset\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"4\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eTraining data\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eTesting data\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eTotal\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eHandwritings (HWOBC)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e3007\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e1254\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e4261\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRubbings (OBC)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e5467\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e2272\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e7739\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMixed data (MSO)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e8474\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e3526\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e12000\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eThe bronze inscriptions dataset is collected from \u0026ldquo;Jinwen script compilation\u0026rdquo;. As shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e. This dataset is still in the processing and organization stage, Tencent and Key Laboratory of Oracle Bone Inscriptions Information Processing are currently responsible for organizing relevant data. But even so we selected 60 characters as the experimental data.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eUltimately, the data set contains 2551 images including rubbings and handwritings, the detailed information is shown in Table \u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eThe bronze inscriptions dataset\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"4\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eTraining data\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eTesting data\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eTotal\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eBronze Inscription Handwriting\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e897\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e452\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e1349\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eBronze Inscription Rubbings\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e899\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e303\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e1202\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eBronze Inscription Mixed data\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e1796\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e755\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e2551\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec5\" class=\"Section2\"\u003e \u003ch2\u003e3.2 Proposed Model\u003c/h2\u003e \u003cdiv id=\"Sec6\" class=\"Section3\"\u003e \u003ch2\u003e3.2.1 Overall Framework\u003c/h2\u003e \u003cp\u003eIn the preliminary research, it was found that imitations have high accuracy in the recognition process, due to its simple image structure \u003csup\u003e[\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e]\u003c/sup\u003e. As for the rubbings, due to noise factors like wear and shield patterns in the image, the recognition accuracy of rubbings is relatively low.\u003c/p\u003e \u003cp\u003eTherefore, when facing mixed type data, we proposed an idea which is normalize the input data and then applied the recognition process. The normalization is implemented using a U-net based module that converts rubbing data into handwriting data.\u003c/p\u003e \u003cp\u003eWe also proposed a method for data augmentation, specifically targeting pictographic characters such as oracle bone script and bronze inscriptions. The proposed methods are more effective than the general random methods. The relevant work was reflected in the ablation experiments. The architecture of our model is shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec7\" class=\"Section3\"\u003e \u003ch2\u003e3.2.2 Data homogenization\u003c/h2\u003e \u003cp\u003eThe data homogenization block is based on the U-net and for better results the spatial attention and channel attention mechanism are introduced in the process. The generator is as shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e. For low-order features, a residual block with spatial attention is inserted before the third down sampling step. For the high-order features after down-sampling, each channel of a feature map is considered as a feature detector, channel attention is induced to the residual blocks.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eDuring the down-sampling process, the Resblock can be described as follows:\u003cdiv id=\"Equ1\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ1\" name=\"EquationSource\"\u003e\n$$\\:{F}_{down}\\left(x\\right)=\\sigma\\:({W}_{2}\\bullet\\:\\sigma\\:({W}_{1}\\bullet\\:x\\left)\\right)$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e1\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003ewhere \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{W}_{1}\\)\u003c/span\u003e\u003c/span\u003e and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{W}_{2}\\)\u003c/span\u003e\u003c/span\u003e are the weights of two convolutions, the stride \u003cem\u003eS\u003c/em\u003e of the \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{W}_{1}\\)\u003c/span\u003e\u003c/span\u003e is set to 2. The output size is as shown in the following equation:\u003cdiv id=\"Equ2\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ2\" name=\"EquationSource\"\u003e\n$$\\:{Hight}_{out}=\u0026lfloor;\\frac{{Hight}_{in}}{2}\u0026rfloor;,\\:{Width}_{out}=\u0026lfloor;\\frac{{Width}_{in}}{2}\u0026rfloor;$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e2\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003eThe skip connection needs to synchronize down sampling, which is as follows:\u003cdiv id=\"Equ3\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ3\" name=\"EquationSource\"\u003e\n$$\\:{W}_{s}\\bullet\\:x$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e3\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003ewhere S\u0026thinsp;=\u0026thinsp;2, and the convolutional kernel size is \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:1\\times\\:1\\)\u003c/span\u003e\u003c/span\u003e. The final output is:\u003cdiv id=\"Equ4\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ4\" name=\"EquationSource\"\u003e\n$$\\:Y=\\:{F}_{down}\\left(x\\right)+({W}_{s}\\bullet\\:x)$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e4\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003eThe up sampling uses the transposed convolution with the output shown in Eq.\u0026nbsp;\u003cspan refid=\"Equ5\" class=\"InternalRef\"\u003e5\u003c/span\u003e:\u003cdiv id=\"Equ5\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ5\" name=\"EquationSource\"\u003e\n$$\\:{N}_{out}=\\left({N}_{in}-1\\right)\\times\\:S+F-2P$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e5\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003e \u003cem\u003eF\u003c/em\u003e is the convolutional kernel with size \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:2\\times\\:2\\)\u003c/span\u003e\u003c/span\u003e, \u003cem\u003eS\u003c/em\u003e is stride with value 2, \u003cem\u003eP\u003c/em\u003e is the padding with value 0.\u003c/p\u003e \u003cp\u003eBased on the facsimile generation all rubbings can be converted to the handwriting form, which realizes the unified representation of each kind of oracle bone inscriptions data.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec8\" class=\"Section3\"\u003e \u003ch2\u003e3.2.3 Data augmentation\u003c/h2\u003e \u003cp\u003eBased on the characteristics of ancient Chinese script such as oracle bone inscriptions and bronze inscriptions, we propose a targeted data augmentation plan, specifically, as follows:\u003c/p\u003e \u003cp\u003e(1) Horizontal Flipping.\u003c/p\u003e \u003cp\u003eGiving a definition of a binary random variable α\u0026sim;Bernoulli(0.5), The flipped image \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{I}_{flip}\\)\u003c/span\u003e\u003c/span\u003e is given by:\u003cdiv id=\"Equ6\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ6\" name=\"EquationSource\"\u003e\n$$\\:{I}_{flip}(X,Y)=\\left\\{\\begin{array}{c}I\\left(x,W-y\\right),\\:\\alpha\\:=1\\\\\\:I\\left(x,y\\right),\\alpha\\:=0\\end{array}\\right.$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e6\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003ewhere W is the image width.\u003c/p\u003e \u003cp\u003e(2) Rotation\u003c/p\u003e \u003cp\u003eRandom rotation angle\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\:\\theta\\:\\sim\\text{U}(-20^\\circ\\:,20^\\circ\\:)\\)\u003c/span\u003e\u003c/span\u003e, about center (\u003cem\u003ex\u003c/em\u003e\u003csub\u003e\u003cem\u003ec\u003c/em\u003e\u003c/sub\u003e,\u003cem\u003ey\u003c/em\u003e\u003csub\u003e\u003cem\u003ec\u003c/em\u003e\u003c/sub\u003e), Transformed coordinates:\u003cdiv id=\"Equ7\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ7\" name=\"EquationSource\"\u003e\n$$\\:\\left[\\begin{array}{c}x{\\prime\\:}\\\\\\:y{\\prime\\:}\\end{array}\\right]=\\left[\\begin{array}{cc}\\text{cos}\\theta\\:\u0026amp;\\:-\\text{sin}\\theta\\:\\\\\\:\\text{sin}\\theta\\:\u0026amp;\\:\\text{cos}\\theta\\:\\end{array}\\right]\\left[\\begin{array}{c}x-{x}_{c}\\\\\\:y-{y}_{c}\\end{array}\\right]+\\left[\\begin{array}{c}{x}_{c}\\\\\\:{y}_{c}\\end{array}\\right]$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e7\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003e(3) Affine Transformation\u003c/p\u003e \u003cp\u003eScaling factor s\u0026sim;U(0.6,1.2), and shear factor β\u0026sim;U(5,13)\u003cdiv id=\"Equ8\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ8\" name=\"EquationSource\"\u003e\n$$\\:\\left[\\begin{array}{c}x{\\prime\\:}\\\\\\:y{\\prime\\:}\\end{array}\\right]=\\left[\\begin{array}{cc}s\u0026amp;\\:\\beta\\:\\\\\\:0\u0026amp;\\:s\\end{array}\\right]\\left[\\begin{array}{c}x\\\\\\:y\\end{array}\\right]$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e8\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003eEmpty areas filled with: \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{I}_{affine}\\left({x}^{{\\prime\\:}},{y}^{{\\prime\\:}}\\right)=255\\)\u003c/span\u003e\u003c/span\u003e.\u003c/p\u003e \u003cp\u003e(4) Salt-and-Pepper Noise\u003c/p\u003e \u003cp\u003eNoise mask \u003cem\u003eM\u003c/em\u003e\u0026isin;{0,1}\u003csup\u003e\u003cem\u003eH\u003c/em\u003e\u0026times;\u003cem\u003eW\u003c/em\u003e\u003c/sup\u003e, noise density, \u003cem\u003eρ\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.15 (SNR\u0026thinsp;=\u0026thinsp;0.85):\u003cdiv id=\"Equ9\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ9\" name=\"EquationSource\"\u003e\n$$\\:{I}_{noise}\\left(x,y\\right)=\\left\\{\\begin{array}{c}0,\\:\\:\\:\\:\\:\\:\\:\\:M\\left(x,y\\right)=1\\left(peper\\right)\\\\\\:255,\\:\\:\\:M\\left(x,y\\right)=0\\left(salt\\right)\\\\\\:I\\left(x,y\\right),\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:otherwise\\end{array}\\right.$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e9\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003eApplication probability: \u003cem\u003eP\u003c/em\u003e(apply)\u0026thinsp;=\u0026thinsp;0.3.\u003c/p\u003e \u003cp\u003e(5) Gaussian Noise\u003c/p\u003e \u003cp\u003eNoise \u003cem\u003eη\u003c/em\u003e\u0026sim;N(0,0.05):\u003cdiv id=\"Equ10\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ10\" name=\"EquationSource\"\u003e\n$$\\:{I}_{gauss}\\left(x,y\\right)=clip(I\\left(x,y\\right)+\\eta\\:\\bullet\\:\\text{255,0},255)$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e10\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003eApplication probability: \u003cem\u003eP\u003c/em\u003e(apply)\u0026thinsp;=\u0026thinsp;0.3.\u003c/p\u003e \u003cp\u003e(6) Brightness/Contrast Adjustment\u003c/p\u003e \u003cp\u003eRandom gain \u003cem\u003eγ\u003c/em\u003e\u0026sim;U(0.8,1.2):\u003cdiv id=\"Equ11\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ11\" name=\"EquationSource\"\u003e\n$$\\:{I}_{bright}\\left(x,y\\right)=clip(\\gamma\\:\\bullet\\:I\\left(x,y\\right),\\:\\text{0,255})$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e11\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003e(7) Grayscale Conversion\u003c/p\u003e \u003cp\u003eLuminance transformation:\u003cdiv id=\"Equ12\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ12\" name=\"EquationSource\"\u003e\n$$\\:{I}_{grey}\\left(x,y\\right)=0.299R\\left(x,y\\right)+0.587G\\left(x,y\\right)+0.114B(x,y)$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e12\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003e(8) Gaussian Blur\u003c/p\u003e \u003cp\u003eKernel \u003cem\u003eK\u003c/em\u003e of size 7\u0026times;7 with \u003cem\u003eσ\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.15:\u003cdiv id=\"Equ13\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ13\" name=\"EquationSource\"\u003e\n$$\\:{I}_{blur}=I*K,\\:\\:\\:\\:\\:\\:K\\left(u,v\\right)=\\frac{1}{2\\pi\\:{\\sigma\\:}^{2}}{e}^{-\\frac{{u}^{2}+{v}^{2}}{2{\\sigma\\:}^{2}}}$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e13\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003eApplication probability: \u003cem\u003eP\u003c/em\u003e(apply)\u0026thinsp;=\u0026thinsp;0.3\u003c/p\u003e \u003cp\u003eIn the course of practice, after successfully performing the steps above an augmented experiment data set can be achieved. These randomly changed samples can reduce the model's dependence on certain attributes, thereby improving the model's generalization ability.\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e"},{"header":"4. Experiments and Results","content":"\u003cdiv id=\"Sec10\" class=\"Section2\"\u003e \u003ch2\u003e4.1 Baseline\u003c/h2\u003e \u003cp\u003eDeep Convolutional Neural Networks (CNNs) have demonstrated exceptional performance in image recognition tasks, owing to their inherent advantages including local receptive fields, parameter sharing mechanisms, hierarchical feature learning, translation equivariance, and dimensionality reduction via pooling operations. Given these merits, we adopt seven representative deep learning architectures as baselines: AlexNet, VGG-19, ResNet-50, GoogLeNet, ShuffleNet, Vision Transformer (ViT), and ConvNeXt. Extensive comparative experiments substantiate the superiority of our proposed approach.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec11\" class=\"Section2\"\u003e \u003ch2\u003e4.2 Experiment Setup\u003c/h2\u003e \u003cp\u003eTo ensure the consistency and stability across all experiments, we used a dedicated deep learning device. The experiments are conducted on the same hardware environment and software environment. As shown in Table\u0026nbsp;\u003cspan refid=\"Tab3\" class=\"InternalRef\"\u003e3\u003c/span\u003e.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab3\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 3\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eExperimental Environment\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"2\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colspan=\"2\" nameend=\"c2\" namest=\"c1\"\u003e \u003cp\u003eHardware Environment\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eHardware Name\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eSpecific models\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eGPU\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eNVIDIA TITAN XP \u0026times;4\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eGPU memory\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e12G\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCPU\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eIntel E5-2683 v3\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eComputer memory\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e32G\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colspan=\"2\" nameend=\"c2\" namest=\"c1\"\u003e \u003cp\u003eSoftware Environment\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSoftware Name\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eSpecific models\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eOperating system\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eUbuntu 20.04.4 LTS\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDeep learning framework\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003ePyTorch 1.7.1\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003edevelop environment\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003ePython 3.8\u0026thinsp;+\u0026thinsp;PyCharm\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eGraphics adapter\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eNVIDIA 470.53\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCUDA Version\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003ev11.0\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eTable\u0026nbsp;\u003cspan refid=\"Tab4\" class=\"InternalRef\"\u003e4\u003c/span\u003e summarizes the hyperparameters used in the experiments for each model. The hyperparameters for all models were carefully selected to ensure effective training. A batch size of 64 was used for all models to balance between training efficiency and memory usage. The Cross-Entropy loss function was chosen for the effectiveness in classification tasks of all models.\u003c/p\u003e \u003cp\u003eMost models utilized the AdamW optimizer, which incorporates weight decay directly into the optimiza- tion process and provides adaptive learning rates, thereby promoting stable and efficient convergence.\u003c/p\u003e \u003cp\u003eThe learning rates were set to 1e-4 for most models, with ShuffleNet using a slightly higher rate of 5e-4 owing to its lightweight architecture and faster convergence behavior. Weight decay was set to 0.05 for all models to prevent overfitting. Each model was trained for 200 epochs to ensure sufficient learning.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab4\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 4\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eHyperparameters for Each Model\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"7\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eModel\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eBatch Size\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eLoss Function\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eOptimizer\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eLearning Rate\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eWeight Decay\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c7\"\u003e \u003cp\u003eEpochs\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAlexNet\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e64\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eCross-Entropy\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eAdamW\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e1e-4\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.05\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e200\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eVGG19\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e64\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eCross-Entropy\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eAdamW\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e1e-4\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.05\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e200\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eResNet50\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e64\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eCross-Entropy\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eAdamW\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e1e-4\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.05\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e200\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eConvNext\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e64\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eCross-Entropy\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eAdamW\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e1e-4\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.05\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e200\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eEfficientNet\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e64\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eCross-Entropy\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eAdamW\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e1e-4\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.05\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e200\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eShuffleNet\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e64\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eCross-Entropy\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eAdamW\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e5e-4\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.05\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e200\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eViT\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e64\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eCross-Entropy\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eAdamW\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e1e-4\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.05\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e200\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec12\" class=\"Section2\"\u003e \u003ch2\u003e4.3 Results\u003c/h2\u003e \u003cp\u003eIn this study, we systematically implemented both the data homogenization and augmentation modules across all evaluated models, followed by comprehensive performance analysis. Table\u0026nbsp;\u003cspan refid=\"Tab5\" class=\"InternalRef\"\u003e5\u003c/span\u003e presents the comparative results on the mixed dataset, including four key metrics: Top-1 accuracy, F1 score, AUC, and Top-5 accuracy. The table contrasts two configurations: (1) Base - representing the baseline approach where models are trained solely on the original heterogeneous dataset containing both rubbing and handwritten images without any preprocessing; and (2) CA* - our proposed framework incorporating the synergistic combination of data homogenization and augmentation components.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab5\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 5\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003ePerformance Comparison on Oracle bone inscriptions Dataset\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"9\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c8\" colnum=\"8\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c9\" colnum=\"9\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003eModel\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"2\" nameend=\"c3\" namest=\"c2\"\u003e \u003cp\u003eTop-1 Acc \u0026uarr;\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"2\" nameend=\"c5\" namest=\"c4\"\u003e \u003cp\u003eF1 Score \u0026uarr;\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"2\" nameend=\"c7\" namest=\"c6\"\u003e \u003cp\u003eAUC \u0026uarr;\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"2\" nameend=\"c9\" namest=\"c8\"\u003e \u003cp\u003eTop-5 Acc \u0026uarr;\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003ctr\u003e \u003cth align=\"left\" colspan=\"2\" nameend=\"c3\" namest=\"c2\"\u003e \u003cp\u003eBase CA*\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"2\" nameend=\"c5\" namest=\"c4\"\u003e \u003cp\u003eBase CA*\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eBase\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c7\"\u003e \u003cp\u003eCA*\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"2\" nameend=\"c9\" namest=\"c8\"\u003e \u003cp\u003eBase CA*\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAlexNet\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.712\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003e0.894\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.707\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e0.893\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.989\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e0.999\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e0.907\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e\u003cb\u003e0.973\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eVGG19\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.711\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003e0.913\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.707\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e0.912\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.985\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e0.999\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e0.898\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e\u003cb\u003e0.979\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eResNet50\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.482\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003e0.921\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.477\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e0.920\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.964\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e0.999\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e0.764\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e\u003cb\u003e0.984\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eConvNext\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.548\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003e0.909\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.544\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e0.910\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.977\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e0.999\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e0.808\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e\u003cb\u003e0.986\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eEfficientNet\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.712\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003e0.912\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.710\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e0.919\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.988\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e0.999\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e0.928\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e\u003cb\u003e0.985\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eShuffleNet\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.731\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003e0.933\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.732\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e0.931\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.991\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e0.999\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e0.904\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e\u003cb\u003e0.988\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eViT\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.319\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003e0.856\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.314\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e0.854\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.894\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e0.998\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e0.553\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e\u003cb\u003e0.962\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eOn the Hybrid dataset, the baseline models initially demonstrated only marginal performance. However, with the integration of our proposed data homogenization and augmentation modules, the results exhibited substantial improvement. As illustrated in Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003e, the Top-1 accuracy increased by at least 18 percentage points across all models. Notably, the Vision Transformer (ViT) achieved a dramatic enhancement, rising from 0.319 to 0.856. These findings conclusively validate the efficacy of our method for oracle bone inscription recognition.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eIn an effort to provide a better assessment of our model, we measured the performances of the proposed method with the bronze inscriptions data. Table\u0026nbsp;\u003cspan refid=\"Tab6\" class=\"InternalRef\"\u003e6\u003c/span\u003e shows the performances of the baseline methods and the proposed method.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab6\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 6\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003ePerformance Comparison on Bronze Inscription Dataset\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"9\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c8\" colnum=\"8\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c9\" colnum=\"9\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003eModel\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"2\" nameend=\"c3\" namest=\"c2\"\u003e \u003cp\u003eTop-1 Acc \u0026uarr;\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"2\" nameend=\"c5\" namest=\"c4\"\u003e \u003cp\u003eF1 Score \u0026uarr;\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"2\" nameend=\"c7\" namest=\"c6\"\u003e \u003cp\u003eAUC \u0026uarr;\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"2\" nameend=\"c9\" namest=\"c8\"\u003e \u003cp\u003eTop-5 Acc \u0026uarr;\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003ctr\u003e \u003cth align=\"left\" colspan=\"2\" nameend=\"c3\" namest=\"c2\"\u003e \u003cp\u003eBase CA\u003csup\u003e*\u003c/sup\u003e\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"2\" nameend=\"c5\" namest=\"c4\"\u003e \u003cp\u003eBase CA*\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eBase\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c7\"\u003e \u003cp\u003eCA*\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"2\" nameend=\"c9\" namest=\"c8\"\u003e \u003cp\u003eBase CA*\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAlexNet\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.588\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003e0.798\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.576\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e0.793\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.960\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e0.991\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e0.838\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e\u003cb\u003e0.938\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eVGG19\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.550\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003e0.815\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.536\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e0.809\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.951\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e0.990\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e0.831\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e\u003cb\u003e0.941\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eResNet50\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.401\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003e0.798\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.379\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e0.790\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.914\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e0.991\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e0.665\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e\u003cb\u003e0.946\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eConvNext\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.496\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003e0.750\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.480\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e0.740\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.937\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e0.986\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e0.785\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e\u003cb\u003e0.918\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eEfficientNet\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.581\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003e0.824\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.568\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e0.819\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.965\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e0.987\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e0.854\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e\u003cb\u003e0.947\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eShuffleNet\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.580\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003e0.859\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.577\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e0.854\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.960\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e0.993\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e0.877\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e\u003cb\u003e0.957\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eViT\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.246\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003e0.574\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.223\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e0.562\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.838\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e0.958\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e0.557\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e\u003cb\u003e0.828\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eThe results shown that for the Top-1 accuracy, F1 score, AUC and Top-5 accuracy indices, with the proposed data homogenization block and the augmentation block the recognition results had been improved, even the data scale of bronze inscriptions is relatively small. Take the Top-1 accuracy as an example, as shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003e, the accuracy had increased by at least 26 percentage points.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eTo sum up, for the multi-modal ancient script, the proposed method performed notably better than the baseline methods.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec13\" class=\"Section2\"\u003e \u003ch2\u003e4.4 Ablation Study\u003c/h2\u003e \u003cp\u003eTo rigorously validate the efficacy of our proposed method, we conducted systematic ablation studies. The key contribution of this work lies in the integration of the data homogenization block and augmentation block with the recognition block. Through controlled experiments, we quantitatively assessed the individual impact of each block on recognition performance.\u003c/p\u003e \u003cdiv id=\"Sec14\" class=\"Section3\"\u003e \u003ch2\u003e4.4.1 The impact of data homogenization\u003c/h2\u003e \u003cp\u003eIn the first ablation experiment, we isolated the effect of data homogenization by exclusively removing the data augmentation block while retaining other components. Subsequently, we evaluated the recognition performance using only the homogenization block across both oracle bone inscriptions and bronze inscriptions datasets. For quantitative comparison, Top-1 and Top-5 accuracy were selected as evaluation metrics. The detailed results of this configuration are presented in Table\u0026nbsp;\u003cspan refid=\"Tab7\" class=\"InternalRef\"\u003e7\u003c/span\u003e.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab7\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 7\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eBaseline models vs proposed method with only data homogenization block (marked as C only) and CA* Accuracy Comparison on Oracle bone inscriptions Dataset\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"7\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003eModel\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"3\" nameend=\"c4\" namest=\"c2\"\u003e \u003cp\u003eTop-1 Acc \u0026uarr;\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"3\" nameend=\"c7\" namest=\"c5\"\u003e \u003cp\u003eTop-5 Acc \u0026uarr;\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003ctr\u003e \u003cth align=\"left\" colspan=\"3\" nameend=\"c4\" namest=\"c2\"\u003e \u003cp\u003eBase C only CA*\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"3\" nameend=\"c7\" namest=\"c5\"\u003e \u003cp\u003eBase C only CA*\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAlexNet\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.712\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.844\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e0.894\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.907\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.963\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e0.973\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eVGG19\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.711\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.764\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e0.913\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.898\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.911\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e0.979\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eResNet50\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.482\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.332\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e0.921\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.764\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.601\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e0.984\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eConvNext\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.548\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.725\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e0.909\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.808\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.931\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e0.986\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eEfficientNet\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.712\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.836\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e0.912\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.928\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.969\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e0.985\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eShuffleNet\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.731\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.846\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e0.933\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.904\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.973\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e0.988\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eViT\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.319\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.346\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e0.856\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.553\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.734\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e0.962\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eThe incorporation of the data homogenization block enhanced the recognition capability of baseline models, though the improvement margin remained limited. When evaluated against the complete model, architectures utilizing only the homogenization block exhibited inferior Top-1 and Top-5 accuracy across all benchmarks. Notably, AlexNet, ConvNext, EfficientNet, and ShuffleNet demonstrated relatively higher performance gains with homogenization, whereas ViT and VGG19 showed marginal improvements. In contrast, ResNet50 experienced a slight performance degradation. The complete model configuration consistently achieved significant accuracy improvements, underscoring the synergistic effect of integrated modules.\u003c/p\u003e \u003cp\u003eThe same situation also occurred on the bronze inscriptions. Table\u0026nbsp;\u003cspan refid=\"Tab8\" class=\"InternalRef\"\u003e8\u003c/span\u003e shows the Top-1 accuracy and Top-5 accuracy for all the methods to be compared.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab8\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 8\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eBaseline models vs proposed method with only data homogenization block (marked as C only) and CA* Accuracy Comparison on bronze inscriptions Dataset\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"7\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003eModel\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"3\" nameend=\"c4\" namest=\"c2\"\u003e \u003cp\u003eTop-1 Acc \u0026uarr;\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"3\" nameend=\"c7\" namest=\"c5\"\u003e \u003cp\u003eTop-5 Acc \u0026uarr;\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003ctr\u003e \u003cth align=\"left\" colspan=\"3\" nameend=\"c4\" namest=\"c2\"\u003e \u003cp\u003eBase C only CA*\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"3\" nameend=\"c7\" namest=\"c5\"\u003e \u003cp\u003eBase C only CA*\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAlexNet\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.588\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.685\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e0.798\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.838\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.888\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e0.938\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eVGG19\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.550\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.585\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e0.815\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.831\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.846\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e0.941\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eResNet50\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.401\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.315\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e0.798\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.665\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.642\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e0.946\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eConvNext\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.496\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.573\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e0.750\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.785\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.804\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e0.918\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eEfficientNet\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.581\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.642\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e0.824\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.854\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.892\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e0.947\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eShuffleNet\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.580\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.692\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e0.859\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.877\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.912\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e0.957\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eViT\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.246\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.311\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e0.574\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.557\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.627\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e0.828\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eThe results shown in Table\u0026nbsp;\u003cspan refid=\"Tab8\" class=\"InternalRef\"\u003e8\u003c/span\u003e were similar to these shown in Table\u0026nbsp;\u003cspan refid=\"Tab7\" class=\"InternalRef\"\u003e7\u003c/span\u003e. The data homogenization block could improve the performance of each baseline model but the improvement is limited. There was still a decrease about the result of ResNet50. Compared with the complete model, the gap in accuracy still existed.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec15\" class=\"Section3\"\u003e \u003ch2\u003e4.4.2 The impact of data augmentation\u003c/h2\u003e \u003cp\u003eTo further evaluate the role of data augmentation, we performed comparative experiments analyzing its isolated contribution. Table\u0026nbsp;\u003cspan refid=\"Tab9\" class=\"InternalRef\"\u003e9\u003c/span\u003e presents the Top-1 and Top-5 accuracy metrics for three configurations on the Oracle Bone Inscriptions dataset: (1) baseline models without augmentation, (2) models with only the augmentation module, and (3) the complete integrated model.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab9\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 9\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eBaseline models vs proposed method with only data augmentation block (marked as C only) and CA* Accuracy Comparison on Oracle bone inscriptions Dataset\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"7\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003eModel\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"3\" nameend=\"c4\" namest=\"c2\"\u003e \u003cp\u003eTop-1 Acc \u0026uarr;\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"3\" nameend=\"c7\" namest=\"c5\"\u003e \u003cp\u003eTop-5 Acc \u0026uarr;\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003ctr\u003e \u003cth align=\"left\" colspan=\"3\" nameend=\"c4\" namest=\"c2\"\u003e \u003cp\u003eBase A only CA*\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"3\" nameend=\"c7\" namest=\"c5\"\u003e \u003cp\u003eBase A only CA*\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAlexNet\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.712\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.801\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e0.894\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.907\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.942\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e0.973\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eVGG19\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.711\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.885\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e0.913\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.898\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.971\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e0.979\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eResNet50\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.482\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.870\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e0.921\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.764\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.969\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e0.984\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eConvNext\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.548\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.775\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e0.909\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.808\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.916\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e0.986\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eEfficientNet\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.712\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.866\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e0.912\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.928\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.963\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e0.985\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eShuffleNet\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.731\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.893\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e0.933\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.904\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.972\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e0.988\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eViT\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.319\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.528\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e0.856\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.553\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.690\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e0.962\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eTable\u0026nbsp;\u003cspan refid=\"Tab10\" class=\"InternalRef\"\u003e10\u003c/span\u003e shows the results of different models on bronze inscriptions Dataset. The results shown in the tables indicated that data augmentation block could also improve the performance of each baseline model. But compared with the complete model, the results of models with data augmentation were relatively low.\u003c/p\u003e \u003cp\u003eTo summarize, all the above ablation experiments indicate that the two block we added in the recognition process can indeed improve the accuracy of ancient character recognition. Each single block can have a certain effect, and with the complete model we can effectively improve the performance of baseline models.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab10\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 10\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eBaseline models vs proposed method with only data augmentation block (marked as C only) and CA* accuracy comparison on bronze inscriptions dataset\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"7\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003eModel\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"3\" nameend=\"c4\" namest=\"c2\"\u003e \u003cp\u003eTop-1 Acc \u0026uarr;\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"3\" nameend=\"c7\" namest=\"c5\"\u003e \u003cp\u003eTop-5 Acc \u0026uarr;\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003ctr\u003e \u003cth align=\"left\" colspan=\"3\" nameend=\"c4\" namest=\"c2\"\u003e \u003cp\u003eBase A only CA*\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"3\" nameend=\"c7\" namest=\"c5\"\u003e \u003cp\u003eBase A only CA*\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAlexNet\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.588\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.731\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e0.798\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.838\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.923\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e0.938\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eVGG19\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.550\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.751\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e0.815\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.831\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.918\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e0.941\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eResNet50\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.401\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.744\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e0.798\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.665\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.919\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e0.946\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eConvNext\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.496\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.698\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e0.750\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.785\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.891\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e0.918\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eEfficientNet\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.581\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.776\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e0.824\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.854\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.912\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e0.947\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eShuffleNet\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.580\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.783\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e0.859\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.877\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.935\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e0.957\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eViT\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.246\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.454\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e0.574\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.557\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.735\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e0.828\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e"},{"header":"Conclusion","content":" \u003cp\u003eWe propose a deep learning-based data processing mechanism for ancient text recognition, incorporating two key components: (1) a U-Net-based data normalization block to handle multimodal script variations, and (2) a customized augmentation block to enhance data robustness. These processed data are then fed into CNN models for final recognition.\u003c/p\u003e \u003cp\u003eExperimental results on oracle bone inscriptions and bronze inscriptions datasets demonstrate significant improvements over baseline models. For instance, ResNet50 achieved a Top-1 accuracy increase from 0.482 to 0.921 (91.0% relative improvement) on oracle bone inscriptions, and from 0.401 to 0.798 (98.8% improvement) on bronze inscriptions.\u003c/p\u003e \u003cp\u003eAblation studies reveal that while each block contributes to performance gains (homogenization: +22.4% avg., augmentation: +18.7% avg.), their combined use in the complete method yields optimal results (+\u0026thinsp;102.3% avg. improvement).\u003c/p\u003e \u003cp\u003eThis study's primary contribution lies in establishing a standardized preprocessing pipeline for multimodal ancient scripts. The proposed two-block mechanism provides both theoretical foundations and practical methodologies for archaeological text analysis, with experimentally validated efficacy.\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eAuthor contributions\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNan Wang: Conceptualization. Nan Wang and Bang Li: Methodology. Weichen Wang: Software. Weichen Wang and Nan Wang: Validation. Nan Wang: Writing original draft preparation. Han Zhang: Writing review and editing. Qingju Jiao and Chaofan Liu: Data preparation.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eDeclaration of competing interest\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors declare that there are no conflicts of interest.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAcknowledgments\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis study was funded by Natural Science Foundation of Henan Province, grant number 242300420680, Henan Province Science and Technology Research Project, grant number 222102320036.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eData Availability Statement\u003c/strong\u003e: The original data used in this paper HWOBC and OBC306 can be obtained from the website: http://jgw.aynu.edu.cn/DownPage; the complete data can be obtained from the website: https://github.com/Augety88/oracle-jinwen-code-data.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCode availability Statement\u003c/strong\u003e: The underlying code for this study is available in https://github.com/Augety88/oracle-jinwen-code-data\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eAssael Y, et al. Restoring and attributing ancient texts using deep neural networks. Nature. 2022;603(7900):280\u0026ndash;3.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLiu YG, Liu GY. Oracle bone inscription recognition based on SVM. J Anyang Normal Univ. 2017;2:54\u0026ndash;6.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGu ST. Identification of oracle-bone script fonts based on fractal geometry. J Chin Inform Process. 32(10) (2018).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eQu HY, Liu JZ, Wu J. Oracle-Bone Inscriptions Recognition Based on Topological Features. Comput Sci Application. 2019;9(6):1111\u0026ndash;7.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhao RQ, Wang HQ, Wang K, Wang Z, \u0026amp;Liu WT. Recognition of bronze inscriptions image based on mixed features of histogram of oriented gradient and gray level co-occurrence matrix. Laser Optoelectron Progress. 2020;57(12):90\u0026ndash;6.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLiu MT, Liu GY, Liu YG, Jiao QJ. Oracle bone inscriptions recognition based on deep convolutional neural network. J image graphics. 2020;8(4):114\u0026ndash;9.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFujikawa Y, et al. Recognition of oracle bone inscriptions by using two deep learning models. Int J Digit Humanit. 2023;5(2):65\u0026ndash;79.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMeng L, Kamitoku N, Yamazaki K. Recognition of oracle bone inscriptions using deep learning based on data augmentation. \u003cem\u003e2018 metrology for archaeology and cultural heritage\u003c/em\u003e. IEEE; 2018. pp. 33\u0026ndash;8.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMai C, Penava P, Buettner R. Oracle Bone Inscription Character Recognition based on a novel Convolutional Neural Network Architecture[J]. IEEE Access. 2024;12:197021\u0026ndash;34.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eQiao YG, Xing LZ. Applying Deep Learning Algorithms for Automatic Recognition and Transcription of Texts in Oracle Bones and Golden Texts. Appl Math Nonlinear Sci. 2023;9(1):1\u0026ndash;16.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWu XQ, Wang ZY, Ren P. CNN-based Bronze Inscriptions Character Recognition. 2022 \u003cem\u003e5th International Conference on Advanced Electronic Materials, Computers and Software Engineering\u003c/em\u003e. IEEE, 514\u0026ndash;519 (2022).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhao HS, Chu HZ, Zhang YY, Yu J. Improvement of ancient Shui character recognition model based on convolutional neural network. IEEE Access. 2020;8:33080\u0026ndash;7.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eXu Y, Zhang XY, Zhang ZX, Liu CL. Large-scale continual learning for ancient Chinese character recognition. Pattern Recogn. 2024;150:110283.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBarucci A, et al. A deep learning approach to ancient egyptian hieroglyphs classification. IEEE Access. 2021;9:123438\u0026ndash;47.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWang N, Wang CJ, Jiao QJ. Research on Handwritten Oracle Bone Inscriptions Recognition Based on EasyDL. Electron Technol Softw Eng. 2023;3:184\u0026ndash;7.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eParmar G et al. Zero-shot image-to-image translation. ACM SIGGRAPH 2023 conference proceedings. ACM, 1\u0026ndash;11 (2023).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLi YH et al. Gligen: Open-set grounded text-to-image generation. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 22511\u0026ndash;22521 (2023).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhang XB, Zhai DH, Li TR, Zhou YX, Lin Y. Image inpainting based on deep learning: A review. Inform Fusion. 2023;90:74\u0026ndash;94.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKaneko H, Yoshizu Y, Ishibashi R, Meng L. An attempt at zero-shot ancient documents restoration based on diffusion models. 2023 International Conference on Advanced Mechatronic Systems (ICAMechS). IEEE, 1\u0026ndash;6 (2023).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChen BZ, Liu YS, Zhang Z, Lu GM, Kong A, Transattunet. Multi-level attention-guided u-net with transformer for medical image segmentation. IEEE Trans Emerg Top Comput Intell. 2023;8(1):55\u0026ndash;68.\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"npj-heritage-science","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"hsci","sideBox":"Learn more about [Heritage Science](http://heritagesciencejournal.springeropen.com)","snPcode":"40494","submissionUrl":"https://submission.nature.com/new-submission/40494/3","title":"npj Heritage Science","twitterHandle":"@SpringerOpen","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"em","reportingPortfolio":"BMC/SO AJ","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"Ancient Script Recognition, Multi-Modal, Data Homogenization, Data Augmentation, Deep Learning","lastPublishedDoi":"10.21203/rs.3.rs-6871550/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-6871550/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eAncient scripts provide invaluable insights into ancient societies, and their effective recognition is crucial for cultural relic preservation, textual decipherment, and heritage. Current research primarily focuses on single mode ancient text data recognition such as processing rubbings or handwritten scripts independently, yet ancient scripts exhibit diverse forms across modalities. To address this, we propose a novel multimodal recognition framework capable of processing hybrid inputs like oracle bone rubbings and handwritten scripts. Our method employs two additional modules, a cross-modal data homogenization block\u0026zwnj; to unify heterogeneous data representations and \u0026zwnj;a data augmentation block\u0026zwnj; to enhance model robustness, then achieve the recognition with convolutional neural networks. Evaluated on oracle bone and bronze inscription datasets, our approach outperforms baseline methods in recognition accuracy and generalization capability across modalities.\u003c/p\u003e","manuscriptTitle":"Multi-Modal Ancient Script Recognition via deep learning with Data Homogenization and Augmentation","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-06-16 08:49:10","doi":"10.21203/rs.3.rs-6871550/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2025-06-29T21:10:14+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-06-27T14:04:07+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"302880976688234782128771508444642714340","date":"2025-06-16T10:20:34+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-06-16T10:17:18+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"223064469251002606281555102516601207237","date":"2025-06-13T04:43:27+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2025-06-13T04:12:57+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2025-06-12T05:23:56+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2025-06-12T05:23:25+00:00","index":"","fulltext":""},{"type":"submitted","content":"npj heritage science","date":"2025-06-11T11:49:16+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"npj-heritage-science","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"hsci","sideBox":"Learn more about [Heritage Science](http://heritagesciencejournal.springeropen.com)","snPcode":"40494","submissionUrl":"https://submission.nature.com/new-submission/40494/3","title":"npj Heritage Science","twitterHandle":"@SpringerOpen","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"em","reportingPortfolio":"BMC/SO AJ","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"cc86d876-ccea-44c2-89fb-dc3a2d5fdd79","owner":[],"postedDate":"June 16th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[],"tags":[],"updatedAt":"2025-10-20T16:11:28+00:00","versionOfRecord":{"articleIdentity":"rs-6871550","link":"https://doi.org/10.1038/s40494-025-02095-x","journal":{"identity":"npj-heritage-science","isVorOnly":false,"title":"npj Heritage Science"},"publishedOn":"2025-10-15 15:57:59","publishedOnDateReadable":"October 15th, 2025"},"versionCreatedAt":"2025-06-16 08:49:10","video":"","vorDoi":"10.1038/s40494-025-02095-x","vorDoiUrl":"https://doi.org/10.1038/s40494-025-02095-x","workflowStages":[]},"version":"v1","identity":"rs-6871550","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-6871550","identity":"rs-6871550","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.