A bone to pick with ancient Chinese: AI Analysis of Handedness in Bone Inscriptions | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article A bone to pick with ancient Chinese: AI Analysis of Handedness in Bone Inscriptions Hefei Wang, Zhenhao Li, Zihuan Feng, Xun Liang This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-5142859/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Tracing the evolution of human habits and cognition through ancient artifacts offers unique insights into our past. This paper explores the handedness of ancient Chinese individuals through the study of oracle bone inscriptions—some of the earliest forms of Chinese writing, dating back to the Shang Dynasty approximately 3,000 years ago. Our research utilizes manually scanned real images of genuine oracle bone rubbings provided by National Museum of Chinese Writing. We have constructed the largest genuine oracle bone inscriptions dataset currently used in the field of computer technology, which presents unique challenges due to their variable and pictographic nature. Employing unsupervised deep learning techniques, we analyze the subtle stylistic differences in these images to discern whether the inscriptions were crafted by left-handed (sinistromanual) or right-handed (dextromanual) individuals. Our novel computational method, Bone2Vec, treats each pixel of the oracle bone image as a word in text, enabling us to embed and cluster these images to determine handedness patterns. Our findings not only advance our understanding of early Chinese script and its creators but also contribute to anthropological research by providing new evidence of handedness in ancient civilizations. This interdisciplinary approach underscores the potential of artificial intelligence in historical linguistics and archaeology, offering a fresh perspective on the cognitive behaviors of ancient societies. Image Recognition Artificial Intelligence in Archaeology Historical Linguistics Ancient Writing Systems Oracle Bone Inscriptions Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 1 Introduction Handedness, reflecting the asymmetry of brain function, provides a window into the cognitive and social realms of ancient civilizations. Deciphering whether our ancestors were predominantly right-handed (dextromanual) or left-handed (sinistromanual) enhances our understanding of their daily activities and broader anthropological and neuroscientific contexts. This research employs deep learning to investigate handedness through ancient oracle bone inscriptions, some of the earliest forms of Chinese writing, created during the Shang Dynasty about 3,000 years ago. Ancient Chinese oracle words, alongside ancient Mayan glyphs and Sumerian cuneiform, continue to reveal profound historical mysteries to the modern world. These oracle inscriptions, engraved on animal bones, originated approximately 3,000 years ago during the Shang Dynasty in China and marked a significant transition. From that point, Chinese history began to be documented through these engraved words. Serving as the foundation for the evolution and development of Chinese characters, oracle words are regarded as one of the remarkable symbols of Chinese civilization. Since the discovery of the first oracle bone inscription in 1899, scholars have unearthed many animal bones bearing these inscriptions, which were primarily used for divination by the Shang Dynasty's royal family, with content often related to weather, agriculture, hunting, and instruments. To date, over 4,000 different oracle bone characters have been manually categorized, with more than 1,000 identified alongside their modern Chinese counterparts. These inscriptions, as early hieroglyphics, highlight the artistic and symbolic complexity of ancient Chinese writing practices. The creators, known as Zhenren, who specialized in divination, are crucial for understanding the sociopolitical and cultural landscape of their times. Their work varies significantly across different reigns, offering insights into the chronological staging of historical events. The lack of standardization in these inscriptions, with their homonymies and stylistic variations, poses unique challenges. For instance, the same character can appear in multiple forms, reflecting the arbitrary nature of ancient writing styles, which complicates modern interpretation and recognition efforts. Figure 1 shows six oracle bone inscriptions of the character “bird.” The writing style of oracle characters is highly arbitrary, presenting a difficulty in oracle character revaluation. Our interest in oracle recognition is driven by the question of whether humans were inherently right-handed (dextromanual) or if right-handedness became predominant as human societies evolved. David Frayer, a professor emeritus at the University of Kansas. The study, which analyzed striations on the teeth of a Homo habilis fossil found in Tanzania, presented evidence suggesting right-handedness dating back 1.8 million years. This finding is considered significant in understanding brain lateralization and the development of handedness in early human ancestors (Frayer et al., 2016 ). Moreover, his colleagues in Croatia, Italy, and Spain have linked specific markings on these fossils to the right-handed or left-handed tendencies of prehistoric individuals. Such markings suggest that human hands and mouths worked in close coordination, especially when using stone tools, which could leave accidental scratches on the lips. Yet, in everyday life, we observe no preferential use of forelimbs in cats and dogs; however, studies indicate that upright gorillas show a distinct preference for using one hand over the other. This suggests that dextromanuality may be a distinctive characteristic of humans. Anthropological and statistical analyses show that right-handed individuals significantly outnumber left-handed ones, with a ratio of 9 to 1. The enigmatic writing style of the ancient scribes who crafted oracle bone inscriptions thousands of years ago continues to fascinate researchers. With the naked eye, it is challenging to discern whether these glyphs were carved by the left or right hand. By harnessing the capabilities of deep learning to detect subtle features, we aim to ascertain the handedness of the authors of these ancient texts. In the field of computer vision, oracle bone inscriptions are considered both textual characters and pictorial images. The stylistic freedom and variety of configurations present in these inscriptions make their machine recognition challenging. These inscriptions feature a complex mix of homographs, combinations, and pictographic elements, as well as attributes like positive and negative coexistence, bilateral symmetry, vertical symmetry, rotation, and semantic shifts. Traditional character recognition systems struggle with the noise inherent in oracle variant characters, necessitating alternative computational approaches for their identification and analysis. In supervised learning, deep neural networks (DNNs) have achieved remarkable accuracy in image recognition tasks, often surpassing human performance. For instance, while a person wearing a mask might not be recognized by acquaintances, a machine can still identify them accurately. The development of deep learning convolutional neural networks (CNNs) has significantly enhanced recognition capabilities, achieving accuracies up to 99%. However, labeling data for supervised learning can be impractical, especially with ongoing studies of oracle bone inscriptions that lack definitive categorical labels. Therefore, unsupervised learning methods become essential for analyzing these inscriptions. Unsupervised learning, while less precise due to its weaker feature extraction capabilities, is crucial due to the absence of annotated data, particularly in tasks requiring fine-grained text image segmentation. In analyzing ancient oracle bone handwriting, where it is not possible to label images as left or right-handed, unsupervised learning is indispensable. By treating each pixel in an oracle bone image as a word within a text, this approach allows for the mapping of images to high-dimensional feature vectors, proposing a novel method for deep mining and fine-grained classification within unsupervised learning frameworks. In this paper, we employ an unsupervised learning approach, using a novel method named Bone2Vec, which treats each pixel of the oracle bone image as a word in text, embedding these 'words' into high-dimensional vectors that are then clustered to discern patterns that may indicate handedness. This technique, inspired by the distribution hypothesis commonly employed in natural language processing, adapts it to the visual complexity of oracle bone scripts. Our findings suggest a predominance of dextromanuality among the scribes of these ancient texts. This not only aligns with other anthropological evidence from prehistoric sites but also offers a new perspective on the cognitive behaviors associated with early writing practices. The Bone2Vec method, designed specifically for this study, has broader implications for the fields of image recognition and unsupervised machine learning, providing a robust framework for addressing similar challenges in other domains. The contributions of this paper are threefold: (1) We have manually scanned and processed all the available genuine oracle bone inscription images provided by National Museum of Chinese Writing that have been recognized so far, containing more than 1,000 recognized oracle bone images, and constructed the largest genuine oracle bone inscriptions dataset used in the research field of computer technology at present. (2) We proposed Bone2Vec model that extracts the horizontal and vertical features of the image from the image pixels and uses a completely unsupervised algorithm for image embedding and clustering. This model is versatile, applicable to various fields requiring image analysis without predefined labels, such as handwriting identification in legal contexts, authenticity assessments in art, and aging of textual materials. (3) Archaeologists found evidence of left and right hands from ancient teeth, and the research on the left and right hand of oracle bone inscriptions provides a new evidence for anthropological research from the perspective of computer technology. 2 Literature Review The scope of oracle bone inscriptions research is extensive. In the traditional literature, only oracle experts can competently recognize oracle characters. The confirmation of oracle characters must be studied in many aspects. The common goal of all basic research works is to interpret more deracinated oracle bone inscriptions to unravel the history and culture of the Shang Dynasty 3,000 years ago. Differs from traditional Optical Character Recognition (OCR) workflow in digitization, pre-processing and post correction (Nguyen et al. 2021 ), Oracle Bone Inscriptions (OBI) present manifested challenges due to their linguistic complexity, unique physical form, and handwriting pattern recognition demands. In the field of linguistics, oracle bone inscriptions are glyph drawings and the rudiment of the modern Chinese character system. In the field of computer science, Li and Zhou ( 1996 ) took the stroke direction, curvature, and bending times of the oracle shaped drawings as secondary features and encoded them. But only experts can master this method skillfully. Gu ( 2018 ) calculated the fractal dimension of oracle bone inscriptions, and matched it with the characteristic database of oracle bone inscriptions to identify oracle bone inscriptions. Liu et al. ( 2020 ) developed the oracle character recognition method based on SVM. However, due to the coexistence of complex and simplified oracle characters, the recognition accuracy can still be improved. Gao and Liang ( 2020 ) presented a two-stage method of variant character recognition for oracle bone inscriptions, which used VGG16 to recognize oracle bone inscriptions variant characters. In the deep learning field, the presence of CNNs has revolutionized traditional image recognition in unprecedented levels. Integration of the Region Proposal Network (Ren et al., 2015 ), residual blocks (He et al., 2016 ), Fully Convolutional Network with Region of Interest (He et al., 2017 ), and efficient building blocks (Tan and Le, 2019 ) into CNNs represents benchmark development in the field of image recognition. Furthermore, by deploying attention-mechanism in input grid points, attention-based deformable convolutional network performs better in Chinese character recognition (Zhuo and Zhang, 2024 ). In the aspect of image embedding, Liang ( 2020 ) attempted to use an unsupervised method to process the image content by binarization, combining the vectors of each direction into a row vector as the feature representation of the image content, but the recognition accuracy is low. Instead of treating image as sequence of pixels, Swin Transformer constructs hierarchical representation of patched image through Shifted Window Self-Attention mechanism. The approach can partitions an image into fixed-size windows then shifts the windows to cover global regions (Liu et al., 2021 ). Researchers not only mainly use traditional image processing ideas and introduces the supervised network model (Hinton et al., 2006 , Cevikalp et al., 2019 , Marin et al., 2018 ) but also combine different neural networks in two stages. Obtaining the features of images and texts through convolution and sequence networks and used GRU to map the text and image to the same subspace (Faghri et al.,2017). Combining the idea of mask language model and a transformer to achieve the unsupervised learning image task, with the goal of improving resnet-50 (Trinh et al., 2019 ). Using the manifold relation to learn image feature embedding and manifold learning to refine the distance measurement (Chen and Li, 2020 ).Using Node2Vecto generate graph embeddings of nodes and calculates user-news relevance using cosine similarity (Ren et al., 2019 ). For the recognition of glyph images, statistical analysis of data sets is often used to extract features and classify them. Thinning, binarization, normalization and other methods are used in image preprocessing. In the aspect of feature extraction, researchers generally extract global static features (Abdi and Khemakhem, 2012 ), such as the writing center of text, font size spacing, etc., in addition to moment features (Darwish and Auda, 1994 ) and texture features (Said et al., 2000 ). Support Vector Machine and nearest neighbor algorithm are generally used in the initial classifier, but neural network is mostly used at present. As mentioned in three studies of handwriting optical character recognition (Memon et al., 2020 ; Preetha et al. 2020 ; Vashist et al., 2020 ), Oracle Bone Inscriptions handwriting recognition requires multiple classification methodologies with structural pattern recognition in grammar and graphs, non-parametric statistical approach, kernel based approach such as Support Vector Machine and Artificial Neural Network. Combining ResNet and Bidirectional Long Short-Term Memory - Deep Neural Network model (Bi-LSTM-DNN) in generally two stages is also a solution to handwriting recognition (Rao and Babu, 2024 ). 3 Experimental Design We study the problem of oracle font recognition based on unsupervised learning method. There were TF-IDF and subtractive clustering methods (Singh and Sundaram, 2015 ; Shivram et al., 2013 ; Taghavi Sangdehi and Faez, 2009 ) for font feature extraction based on unsupervised learning. It is found that the success rate of unsupervised learning based font feature recognition is very low with few training samples. We extract the horizontal and vertical features of the image from the image pixels and convert them into text. Using the internal correlation of the image, we realize unsupervised image recognition with high accuracy. In this study, we proposed the model Bone2Vec (Fig. 2 ) to enable the recognition of oracle bone scripts and to test our conjecture as to whether left-handedness existed among the carvers of oracle bones more than 3,000 years ago. The model Bone2Vec is divided into 3 stages. Through the experimental results, our model preliminarily empowers the identification of font structure of oracle bone inscriptions. This method can be used in many unlabeled classification scenarios, such as the classification of ancient Mayan characters with hieroglyphic characteristics and Sumerian cuneiform characters. Stage 1 - Image Enhancement: Considering the ancient and data-scarce nature of oracle inscriptions, we employ image enhancement techniques to generate numerous images. These images reflect the diverse characteristics of oracle characters, thereby expanding our training sample set and enhancing the model's generalization capabilities. This approach also improves the recognition of oracle structure. By referring to oracle bone inscriptions that exhibit multiple morphological features, we generate a variety of images through targeted image enhancements. Stage 2 - Image Textualization: The image content mapping method converts the gray value of pixels into text to achieve image textualization. In this approach, each pixel's gray value in an oracle bone image is treated as a word in text. The variability in gray values across different images, and the frequency of each gray value, mirrors the variation and frequency of words in different documents within natural language processing. These variations form the distinct characteristics of the images or documents. Stage 3 – Image Vectorization: The image text is vectorized using the unsupervised deep learning model, Doc2Vec. By introducing prior knowledge and clustering methods, we identify the structural patterns of the oracle font. The high-dimensional vectors are then reduced to two dimensions using a nonlinear method, facilitating the visualization of clustering results. We have constructed four datasets for this study. Dataset 1 consists of the digit '5' written by ten individuals using both left and right hands, totaling 115 images each. Dataset 2 features modern imitations of oracle bone inscriptions, comprising 770 images across 381 types, focusing solely on the shapes without considering stroke thickness or style. Dataset 3, the largest in this field, includes photocopied data from all identified real oracle rubbings, meticulously scanned and processed into single characters by the authors, amounting to approximately 10GB with 4,634 images of 1,108 unique oracle bone characters. This dataset preserves details such as stroke thickness and edge smoothness. Dataset 4 contains 64 images of original oracle bone topographies, replicated by four left-handed and four right-handed individuals, split evenly between the two groups. 3.1 Recognition of Handwritten Digit '5' by Left and Right Hands We empirically investigate the four datasets using our proposed model to assess its effectiveness in recognizing oracle scripts and determining left or right-handedness. Initially, we utilized Datasets 1 and 2 to validate our model's effectiveness. Subsequently, we applied the model to identify and cluster Datasets 3 and 4. The details of the model are outlined in the Methodology section in the Appendix. For Dataset 1, we constructed the Bone2Vec model. This dataset comprises images of the number '5' written by 10 modern individuals. Given that modern individuals have greater finger flexibility than ancient people, we chose a simple character like '5' to test the model's ability to distinguish between left and right-handed writing. The images in Dataset 1 are processed according to Algorithm 1 (see Appendix) to generate image text. The optimal dimension and optimal window are used to generate the document embedding vector. The vector is normalized to get the document embedding matrix. The normalized vector of each document is manifold learned, and the T-SNE method is used to reduce the dimension and visualize. Taking left hand and right hand as categorial labels, 60% of all datasets as training set and 40% as test set, using logistic regression classification algorithm for classification. The classification results (Table 1 , Experiment 1) show that the classification accuracy reaches 83%. It can be seen from the clustering visualization results (Fig. 3 ) and classification results that the model can preliminarily distinguish the number 5 written by left or right hand. In order to avoid the contingency of the outcome, we carried out 10 repeated experiments, each time selected one third of the data from the original dataset for processing. The clustering result (Fig. 4 ) of number 5 written in left and right handwriting is good, and Bone2Vec model can be used to distinguish the words written by left or right hand. 3.2 Modern oracle characters recognition In order to verify the effectiveness of the model on the oracle dataset, we extracted two oracle characters “Bao” and “Bing” from Dataset 2, and use image enhancement to expand the dataset to 230 images, in which the characters “Bao” has 120 images and “Bing” has 130 images. Using Bone2Vec, the results are visualized as Fig. 5 . Taking the Chinese characters corresponding to the oracle bone inscriptions as categorial labels, 60% of all datasets as training set and 40% as test set, using logistic regression classification algorithm for classification. The classification results (Table 1 , Experiment 2) show that the classification accuracy reaches 88%. The algorithm can gather similar oracle characters together, which proves that the algorithm has good effect in the recognition of oracle font structure. Again, to avoid the contingency of the outcome, we ran 10 replicates, each with a third of the data from the original dataset. The clustering result of oracle (Fig. 6 ) is good, and the Bone2vec model can be used to distinguish oracle characters. 3.3 Structure recognition of genuine oracle characters Through Experiments 1 and 2, we verified the effectiveness of Bone2Vec in the recognition of handwritten digit 5 and oracle bone inscriptions. Next, we applied Bone2Vec model to oracle rubbings, which come from genuine oracle. We have showed that Bone2Vec can extract features of oracle bone inscriptions copied by modern people. In order to prove that this method is still effective for the original image of oracle bone inscriptions, we select several groups of oracle bone inscriptions with different features. From the dimensionality reduction visualization results, the complex and simple glyphs (Fig. 7 a) are non-linear separable, our model can distinguish complex and simple oracle images. The similar glyphs are clustered together, while the glyphs (Fig. 7 b) containing circles and cross structures are still non-linear separable. Although we can see from the experimental results that our algorithm can make oracle bone inscriptions images with common features gather together. However, the characteristics of oracle bone inscriptions may be determined by many factors, such as the structure of the glyph, the style of the writer, the writing period, the left and right hands used by the writer, etc. In order to verify our initial conjecture that whether there are sinistromanual or not among the seal carvers of oracle bone inscriptions, we copied several groups of oracle bone inscriptions with left and right hands respectively, and carried out several groups of experiments combined with real oracle bone images. We copied 64 copies of “Bao” and expanded the dataset to 512 images after data enhancement. First, we constructed the image embedding vector from the character “Bao” data copied by our left and right hands. The clustering result (Fig. 8 a) shows that the model can preliminarily distinguish the left and right hand written text. Taking left hand and right hand as categorial labels, 60% of all datasets as training set and 40% as test set, using logistic regression classification algorithm for classification. The classification results (Table 1 , Experiment 3) show that the classification accuracy reaches 81%, which indicates that the model has well extracted the features of oracle bone image. Next, we added the real oracle rubbings image to the image we copied. After data enhancement, a total of 18 images were obtained. The real rubbings of oracle bone inscriptions are mixed with the left and right handwritten characters (Fig. 8 b). Some of them are mixed with the left-handed characters, and some are mixed with the right-handed characters. One of the genuine oracle characters fell in the left-handed images, accounting for 5%. Table 1 Classification results of three sets of experiments. precision recall f1-score support Experiment 1 Left 0.85 0.82 0.84 50 Right 0.80 0.83 0.81 42 Accuracy - - 0.83 92 Experiment 2 Bao 0.89 0.85 0.87 48 Bing 0.88 0.91 0.89 56 Accuracy - - 0.88 104 Experiment 3 Left 0.79 0.86 0.81 109 Right 0.84 0.75 0.79 96 Accuracy - - 0.81 205 There is evidence suggesting that the scribes of oracle bone inscriptions were predominantly right-handed (dextromanual). Utilizing an unsupervised adaptive graph embedding learning algorithm, we can employ computers to analyze ancient character images, enabling us to infer the production methods and lifestyle of ancient human beings. 4 Conclusion Oracle bone inscriptions represent the earliest mature writing system discovered in China and are among the most recognized word systems globally. Despite many inscriptions remaining undeciphered, withholding their stories from modern understanding, the challenges in labeling oracle words make supervised deep learning infeasible. Consequently, our research employs unsupervised learning, which remains challenging due to the presumed dominance of right-handed inscriptions. By incorporating artificially created data from both left and right hands, our experiments with both real and simulated oracle rubbings demonstrate a significant clustering of ancient human images in the right-handed group, while our created data populate both handedness clusters. Given the right-handedness prevalent among most primates, this might suggest a biological rather than social evolutionary trait. Further exploration into Zhenren authors, who are linked to specific kings and eras, provides new insights into the behaviors of these legendary figures from 3,000 years ago. The robustness of our method to image size and language type suggests its applicability beyond oracle bones to ancient Mayan and Sumerian cuneiform characters, reflecting the deep-seated significance of these symbols in understanding ancient civilizations. Our study significantly advances the field of historical linguistics and computer vision by employing a unique dataset of over 1,000 manually scanned images of genuine oracle bone inscriptions. This dataset, the largest of its kind, enables a level of analysis previously unattainable in deciphering ancient scripts. The use of these authentic inscriptions is crucial, as it ensures the fidelity and accuracy of our research, allowing us to draw more precise conclusions about ancient writing practices. Furthermore, the introduction of the Bone2Vec model represents a novel contribution to the field. This unsupervised learning algorithm successfully handles the complex variability inherent in oracle bone scripts, which traditional models struggle to process. By treating each pixel of the scanned images as textual data, Bone2Vec allows for an innovative approach to pattern recognition in ancient scripts, bridging the gap between traditional linguistic analysis and modern computational techniques. This combination of a unique, high-quality dataset and pioneering analytical techniques not only enhances our understanding of ancient Chinese civilizations but also sets a precedent for future studies involving other ancient scripts. Our findings offer new insights into the behavioral patterns of ancient societies and underscore the importance of technological innovation in unraveling historical mysteries. Declarations During the preparation of this work the authors used OpenAI/ChatGPT4.0 in order to check on the grammar and refine the English writing for the abstract and sections 1 and 4 of the paper. After using this tool/service, the authors carefully reviewed and edited the content as needed and take full responsibility for the content of the publication. Conflict of interest The authors have no relevant financial or non-financial interests to disclose. Ethics approval The submitted work is original and has not been published elsewhere in any form or language (partially or in full). Funding No external fund is used for this research. Data availability Data will be made available upon reasonable requests. Authors contribution Conceptualization: [Hefei Wang], [Zihuan Feng]; Methodology: [Xun Liang], [Zihuan Feng]; [Zhenhao Li]; Formal analysis and investigation: [Xun Liang], [Zihuan Feng], [Zhenhao Li]; Writing - original draft preparation: [Hefei Wang], [Zihuan Feng]; Writing - review and editing: [Hefei Wang], [Zhenhao Li] Acknowledgement The authors are grateful for Mr. Kevin McSpadden at South China Morning Post for featuring our research for public awareness of the topic. The article is accessible online at: https://www.scmp.com/news/people-culture/article/3278486/recent-study-ancient-chinese-oracle-bones-highlights-how-ai-changing-archaeology?module=perpetual_scroll_0&pgtype=article Human Participants and/or Animals There are no animals in our work. This research involved human subjects in forming the training set to distinguish between left-handed and right-handed writers; however, no individuals were harmed during the study. References Abdi, M. N., & Khemakhem, M. (2012, October). Arabic writer identification and verification using template matching analysis of texture. In 2012 IEEE 12th International Conference on Computer and Information Technology (pp. 592-597). IEEE. Bengio, Y., Ducharme, R., & Vincent, P. (2000). A neural probabilistic language model. Advances in neural information processing systems, 13. Bloice, M. D., Stocker, C., & Holzinger, A. (2017). Augmentor: an image augmentation library for machine learning. arXiv preprint arXiv:1708.04680. Cevikalp, H., Benligiray, B., Gerek, Ö. N., & Saribas, H. (2019, June). Semi-Supervised Robust Deep Neural Networks for Multi-Label Classification. In CVPR workshops (pp. 9-17). Chen, X., & Li, Y. (2020). Deep feature learning with manifold embedding for robust image retrieval. Algorithms, 13(12), 318. Chen, T. Restudy on the glyph system of Shang oracle bone inscriptions (Shanghai People’s Publishing House, 2010). (In Chinese) Darwish, A. M., & Auda, G. A. (1994, April). A new composite feature vector for Arabic handwritten signature recognition. In Proceedings of ICASSP'94. IEEE International Conference on Acoustics, Speech and Signal Processing (Vol. 2, pp. II-613). IEEE. Faghri, F., Fleet, D. J., Kiros, J. R., & Fidler, S. (2017). Vse++: Improving visual-semantic embeddings with hard negatives. arXiv preprint arXiv:1707.05612. Frayer, D. W., Clarke, R. J., Fiore, I., Blumenschine, R. J., Pérez-Pérez, A., Martinez, L. M., Estebaranz, F., Holloway, R., & Bondioli, L. (2016). OH-65: The earliest evidence for right-handedness in the fossil record. Journal of human evolution, 100, 65–72. https://doi.org/10.1016/j.jhevol.2016.07.002 Finch, G. (1941). Chimpanzee handedness. Science, 94(2431), 117-118. Gao, J., & Liang, X. (2020). Distinguishing oracle variants based on the isomorphism and symmetry invariances of oracle-bone inscriptions. IEEE Access, 8, 152258-152275. Gu, S. T. (2018). A method of oracle character recognition based on fractal geometry. J. Chin. Inf. Process, 32(10), 138-142. Hamilton, W. L., Leskovec, J., & Jurafsky, D. (2016). Diachronic word embeddings reveal statistical laws of semantic change. arXiv preprint arXiv:1605.09096. He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 2961-2969). He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778). Hinton, G. E., Osindero, S., & Teh, Y. W. (2006). A fast learning algorithm for deep belief nets. Neural computation, 18(7), 1527-1554. Le, Q., & Mikolov, T. (2014, June). Distributed representations of sentences and documents. In International conference on machine learning (pp. 1188-1196). PMLR. Liang, X. (2020). Social Computing with Artificial Intelligence. [electronic resource] (1st ed. 2020.). Springer Singapore. Li, F., & Zhou, X. L. (1996). The graph theory method of oracle bone inscriptions automatic recognition. J. Electron, 18(1), 41-47. Lin, J. & Li, J. Feature extraction and preprocessing of offline Chinese signature verification. J. Shanghai Jiaotong Univ. 30, 40–45 (1996). (In Chinese) Liu, M., Liu, G., Liu, Y., & Jiao, Q. (2020). Oracle bone inscriptions recognition based on deep convolutional neural network. Journal of image and graphics, 8(4), 114-119. Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., ... & Guo, B. (2021). Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 10012-10022). Maaten, L. V. D. (2008). Visualizing data using t-SNE. Journal of machine learning research, 9(Nov), 2579. Marin, J., Biswas, A., Ofli, F., Hynes, N., Salvador, A., Aytar, Y., ... & Torralba, A. (2018). Recipe1M+: a dataset for learning cross-modal embeddings for cooking recipes and food images. arXiv preprint arXiv:1810.06553. Memon, J., Sami, M., Khan, R. A., & Uddin, M. (2020). Handwritten optical character recognition (OCR): A comprehensive systematic literature review (SLR). IEEE access, 8, 142642-142668. Nguyen, T. T. H., Jatowt, A., Coustaty, M., & Doucet, A. (2021). Survey of post-OCR processing approaches. ACM Computing Surveys (CSUR), 54(6), 1-37. Preetha, S., Afrid, I. M., & Nishchay, S. K. (2020). Machine learning for handwriting recognition. International Journal of Computer (IJC), 38(1), 93-101. Rao N, S., & Babu C, N. K. (2024). Enhanced ResNet-151-based fused features for optimized Bi-LSTM-DNN-aided handwritten character and digits recognition. Expert Systems With Applications, 244. https://doi-org-s.elink.xjtlu.edu.cn:443/10.1016/j.eswa.2023.122860 Ren, J., Long, J., & Xu, Z. (2019). Financial news recommendation based on graph embeddings. Decision Support Systems, 125, 113115. Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, 28. Said, H. E., Tan, T. N., & Baker, K. D. (2000). Personal identification based on handwriting. Pattern Recognition, 33(1), 149-160. Shivram, A., Ramaiah, C., & Govindaraju, V. (2013). A hierarchical Bayesian approach to online writer identification. Iet Biometrics, 2(4), 191-198. Singh, G., & Sundaram, S. (2015, August). A subtractive clustering scheme for text-independent online writer identification. In 2015 13th International Conference on Document Analysis and Recognition (ICDAR) (pp. 311-315). IEEE. Taghavi Sangdehi, S. A., & Faez, K. (2009). Writer Identification Using Super Paramagnetic Clustering and Spatio Temporal Neural Network. In Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications: 14th Iberoamerican Conference on Pattern Recognition, CIARP 2009, Guadalajara, Jalisco, Mexico, November 15-18, 2009. Proceedings 14 (pp. 669-676). Springer Berlin Heidelberg. Tan, M., & Le, Q. (2019, May). Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning (pp. 6105-6114). PMLR. Trinh, T. H., Luong, M. T., & Le, Q. V. (2019). Selfie: Self-supervised pretraining for image embedding. arXiv preprint arXiv:1906.02940. Tsai, L. S., & Maurer, S. (1930). "Right-handedness" in white rats. Science, 72(1869), 436-438. Vashist, P. C., Pandey, A., & Tripathi, A. (2020, January). A comparative study of handwriting recognition techniques. In 2020 International Conference on Computation, Automation and Knowledge Management (ICCAKM) (pp. 456-461). IEEE. Wang, B. Oracle Bone Inscriptions Copybook (Beijing Arts and crafts Publishing House, Beijing, China, 2019). (In Chinese) Wang, J. & Qin, F. Design and application of adaptive binary filtering algorithm for gray text image. J. Hefei Univ. Technol. 509–512 (2004).(In Chinese) Warren, J. M. (1953). Handedness in the rhesus monkey. Science, 118(3073), 622-623. Yin, Z., & Shen, Y. (2018). On the dimensionality of word embedding. Advances in neural information processing systems, 31. Zhou, X., Li, F. & Hua, X. Study on computer identification method of oracle. J. Fudan Univ. 5, 481–486 (1996). (In Chinese) Zhuo, S., & Zhang, J. (2024). Attention-based deformable convolutional network for Chinese various dynasties character recognition. Expert Systems With Applications, 238(Part B). Supplementary Files Appendix.docx Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-5142859","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":470829316,"identity":"e26f58a2-b9cb-441c-9719-8d41d9a370e5","order_by":0,"name":"Hefei Wang","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABBklEQVRIiWNgGAWjYBACgwOMDxgSGA4DmcwHGCQYLMCCBLQwG0C1sCUwSCRIEKmFAayFB8ggSsvxZsYHDxgOy5vzr/kmYflDIrGBvXmbBEPNHZxa7M8cZjZIYLhtuHPG280GQIclNvAcK5NgOPYMty038o9JALUwbrhxduMDsBaJHDMJxobDuLXcf8z+A6jFfsONMw8OgLXIvyGg5QYzGzDE/iduON/DCLWFh4CWM8nMEgkGz5M33GAzNpBIkzBu40krtkg4hkfL8cOMH39UHLbdcP7wM2kJGxvZfvbDG298qMGtBaoRiIGBwAyKFTaQQAIBDRDAf4CB8QNRKkfBKBgFo2CkAQCpvVtfts5v9AAAAABJRU5ErkJggg==","orcid":"https://orcid.org/0000-0001-8211-3903","institution":"Xi'an Jiaotong-Liverpool University","correspondingAuthor":true,"prefix":"","firstName":"Hefei","middleName":"","lastName":"Wang","suffix":""},{"id":470829317,"identity":"d3de972b-9329-4068-bbb5-ec94bd74fde8","order_by":1,"name":"Zhenhao Li","email":"","orcid":"","institution":"Xi'an Jiaotong-Liverpool University","correspondingAuthor":false,"prefix":"","firstName":"Zhenhao","middleName":"","lastName":"Li","suffix":""},{"id":470829318,"identity":"4eba55d8-b23e-4361-827b-f792cd719abe","order_by":2,"name":"Zihuan Feng","email":"","orcid":"","institution":"Renmin University of China","correspondingAuthor":false,"prefix":"","firstName":"Zihuan","middleName":"","lastName":"Feng","suffix":""},{"id":470829319,"identity":"fb0909ab-6cc3-46b4-94d1-531533255527","order_by":3,"name":"Xun Liang","email":"","orcid":"","institution":"Renmin University of China","correspondingAuthor":false,"prefix":"","firstName":"Xun","middleName":"","lastName":"Liang","suffix":""}],"badges":[],"createdAt":"2024-09-24 07:44:19","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-5142859/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-5142859/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":84870448,"identity":"284b4661-bb8b-4dfb-a4ac-7d427ba5199e","added_by":"auto","created_at":"2025-06-18 08:54:05","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":186947,"visible":true,"origin":"","legend":"\u003cp\u003eDifferent forms of Chinese character “bird”. Image provided by National Museum of Chinese Writing .\u003c/p\u003e","description":"","filename":"1.png","url":"https://assets-eu.researchsquare.com/files/rs-5142859/v1/aaa02c7c2ddc6da8687e8322.png"},{"id":84868998,"identity":"8f44880f-fa45-4687-9267-bb13141f91f8","added_by":"auto","created_at":"2025-06-18 08:46:05","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":59051,"visible":true,"origin":"","legend":"\u003cp\u003eThree-stage Bone2Vec model.\u003c/p\u003e","description":"","filename":"2.png","url":"https://assets-eu.researchsquare.com/files/rs-5142859/v1/52f26926e976e3f62662b823.png"},{"id":84869005,"identity":"2d047cec-e024-4338-847f-42b81a6926ef","added_by":"auto","created_at":"2025-06-18 08:46:05","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":49340,"visible":true,"origin":"","legend":"\u003cp\u003eVisualization of left and right handwritten number 5 (blue is left handed, and green is right handed).\u003c/p\u003e","description":"","filename":"3.png","url":"https://assets-eu.researchsquare.com/files/rs-5142859/v1/13d7c46843ab921d37ab6b4a.png"},{"id":84869003,"identity":"32a9621c-7c8f-4302-8c7f-71dafcc2d976","added_by":"auto","created_at":"2025-06-18 08:46:05","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":65135,"visible":true,"origin":"","legend":"\u003cp\u003eResults of 10 repeated experiments (blue is left handed, and red is right handed).\u003c/p\u003e","description":"","filename":"4.png","url":"https://assets-eu.researchsquare.com/files/rs-5142859/v1/d41f683d25bddaa4f92c72af.png"},{"id":84870452,"identity":"c7bc1ee0-3683-4693-a28a-5e33e9de52c2","added_by":"auto","created_at":"2025-06-18 08:54:06","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":159833,"visible":true,"origin":"","legend":"\u003cp\u003eVisualization results of two types of oracle characters (blue is “Bao”, and green is “Bing”).\u003c/p\u003e","description":"","filename":"5.png","url":"https://assets-eu.researchsquare.com/files/rs-5142859/v1/bf1336f57ef742de7e54971b.png"},{"id":84870449,"identity":"4dfe96ab-32dc-4c6b-bee7-0e2c9f9ee2a0","added_by":"auto","created_at":"2025-06-18 08:54:06","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":66985,"visible":true,"origin":"","legend":"\u003cp\u003eVisualization diagram of 10 repeated experiments of two types of oracle characters (blue is “Bao”, and red is “Bing”).\u003c/p\u003e","description":"","filename":"6.png","url":"https://assets-eu.researchsquare.com/files/rs-5142859/v1/0c07f5821ccf94fc0a458071.png"},{"id":84869016,"identity":"b41d9396-6b2a-44ed-aef5-c2c174b84376","added_by":"auto","created_at":"2025-06-18 08:46:06","extension":"png","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":229078,"visible":true,"origin":"","legend":"\u003cp\u003eVisual results of partial text clustering on real oracle bone. aThe complex and simple glyphs are non-linear separable. b Symbols of circular and cross structures are non-linear and separable.\u003c/p\u003e","description":"","filename":"7.png","url":"https://assets-eu.researchsquare.com/files/rs-5142859/v1/03270c762bee57674b92647d.png"},{"id":84870456,"identity":"58dd053a-d7c2-4cfe-95ba-e451f6ff854f","added_by":"auto","created_at":"2025-06-18 08:54:06","extension":"png","order_by":8,"title":"Figure 8","display":"","copyAsset":false,"role":"figure","size":218002,"visible":true,"origin":"","legend":"\u003cp\u003eClustering results (blue is written by the left hand, green is written by the right hand, and red is the genuine oracle characters). a - Only left and right handwritten “Bao”. b - The mixture of handwritten and genuine oracle characters “Bao”.\u003c/p\u003e","description":"","filename":"8.png","url":"https://assets-eu.researchsquare.com/files/rs-5142859/v1/7207d88c582a0cf104ea4efc.png"},{"id":86068316,"identity":"704cb35c-0b50-410a-9574-d0e20cc3cfbf","added_by":"auto","created_at":"2025-07-05 12:45:36","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1444303,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-5142859/v1/e5954503-4c64-4e29-8db8-66527c997030.pdf"},{"id":84869002,"identity":"e4f77224-d45e-4024-bf81-14cda8b608f1","added_by":"auto","created_at":"2025-06-18 08:46:05","extension":"docx","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":229365,"visible":true,"origin":"","legend":"","description":"","filename":"Appendix.docx","url":"https://assets-eu.researchsquare.com/files/rs-5142859/v1/c69dac675a15faa2de5d2aae.docx"}],"financialInterests":"","formattedTitle":"A bone to pick with ancient Chinese: AI Analysis of Handedness in Bone Inscriptions","fulltext":[{"header":"1 Introduction","content":"\u003cp\u003eHandedness, reflecting the asymmetry of brain function, provides a window into the cognitive and social realms of ancient civilizations. Deciphering whether our ancestors were predominantly right-handed (dextromanual) or left-handed (sinistromanual) enhances our understanding of their daily activities and broader anthropological and neuroscientific contexts. This research employs deep learning to investigate handedness through ancient oracle bone inscriptions, some of the earliest forms of Chinese writing, created during the Shang Dynasty about 3,000 years ago.\u003c/p\u003e \u003cp\u003eAncient Chinese oracle words, alongside ancient Mayan glyphs and Sumerian cuneiform, continue to reveal profound historical mysteries to the modern world. These oracle inscriptions, engraved on animal bones, originated approximately 3,000 years ago during the Shang Dynasty in China and marked a significant transition. From that point, Chinese history began to be documented through these engraved words. Serving as the foundation for the evolution and development of Chinese characters, oracle words are regarded as one of the remarkable symbols of Chinese civilization.\u003c/p\u003e \u003cp\u003eSince the discovery of the first oracle bone inscription in 1899, scholars have unearthed many animal bones bearing these inscriptions, which were primarily used for divination by the Shang Dynasty's royal family, with content often related to weather, agriculture, hunting, and instruments. To date, over 4,000 different oracle bone characters have been manually categorized, with more than 1,000 identified alongside their modern Chinese counterparts. These inscriptions, as early hieroglyphics, highlight the artistic and symbolic complexity of ancient Chinese writing practices. The creators, known as Zhenren, who specialized in divination, are crucial for understanding the sociopolitical and cultural landscape of their times. Their work varies significantly across different reigns, offering insights into the chronological staging of historical events.\u003c/p\u003e \u003cp\u003eThe lack of standardization in these inscriptions, with their homonymies and stylistic variations, poses unique challenges. For instance, the same character can appear in multiple forms, reflecting the arbitrary nature of ancient writing styles, which complicates modern interpretation and recognition efforts. Figure\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e shows six oracle bone inscriptions of the character \u0026ldquo;bird.\u0026rdquo; The writing style of oracle characters is highly arbitrary, presenting a difficulty in oracle character revaluation.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eOur interest in oracle recognition is driven by the question of whether humans were inherently right-handed (dextromanual) or if right-handedness became predominant as human societies evolved. David Frayer, a professor emeritus at the University of Kansas. The study, which analyzed striations on the teeth of a Homo habilis fossil found in Tanzania, presented evidence suggesting right-handedness dating back 1.8\u0026nbsp;million years. This finding is considered significant in understanding brain lateralization and the development of handedness in early human ancestors (Frayer et al., \u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e2016\u003c/span\u003e). Moreover, his colleagues in Croatia, Italy, and Spain have linked specific markings on these fossils to the right-handed or left-handed tendencies of prehistoric individuals. Such markings suggest that human hands and mouths worked in close coordination, especially when using stone tools, which could leave accidental scratches on the lips.\u003c/p\u003e \u003cp\u003eYet, in everyday life, we observe no preferential use of forelimbs in cats and dogs; however, studies indicate that upright gorillas show a distinct preference for using one hand over the other. This suggests that dextromanuality may be a distinctive characteristic of humans. Anthropological and statistical analyses show that right-handed individuals significantly outnumber left-handed ones, with a ratio of 9 to 1. The enigmatic writing style of the ancient scribes who crafted oracle bone inscriptions thousands of years ago continues to fascinate researchers. With the naked eye, it is challenging to discern whether these glyphs were carved by the left or right hand. By harnessing the capabilities of deep learning to detect subtle features, we aim to ascertain the handedness of the authors of these ancient texts.\u003c/p\u003e \u003cp\u003eIn the field of computer vision, oracle bone inscriptions are considered both textual characters\u003c/p\u003e \u003cp\u003eand pictorial images. The stylistic freedom and variety of configurations present in these inscriptions make their machine recognition challenging. These inscriptions feature a complex mix of homographs, combinations, and pictographic elements, as well as attributes like positive and negative coexistence, bilateral symmetry, vertical symmetry, rotation, and semantic shifts. Traditional character recognition systems struggle with the noise inherent in oracle variant characters, necessitating alternative computational approaches for their identification and analysis.\u003c/p\u003e \u003cp\u003eIn supervised learning, deep neural networks (DNNs) have achieved remarkable accuracy in image recognition tasks, often surpassing human performance. For instance, while a person wearing a mask might not be recognized by acquaintances, a machine can still identify them accurately. The development of deep learning convolutional neural networks (CNNs) has significantly enhanced recognition capabilities, achieving accuracies up to 99%. However, labeling data for supervised learning can be impractical, especially with ongoing studies of oracle bone inscriptions that lack definitive categorical labels. Therefore, unsupervised learning methods become essential for analyzing these inscriptions.\u003c/p\u003e \u003cp\u003eUnsupervised learning, while less precise due to its weaker feature extraction capabilities, is crucial due to the absence of annotated data, particularly in tasks requiring fine-grained text image segmentation. In analyzing ancient oracle bone handwriting, where it is not possible to label images as left or right-handed, unsupervised learning is indispensable. By treating each pixel in an oracle bone image as a word within a text, this approach allows for the mapping of images to high-dimensional feature vectors, proposing a novel method for deep mining and fine-grained classification within unsupervised learning frameworks.\u003c/p\u003e \u003cp\u003eIn this paper, we employ an unsupervised learning approach, using a novel method named Bone2Vec, which treats each pixel of the oracle bone image as a word in text, embedding these 'words' into high-dimensional vectors that are then clustered to discern patterns that may indicate handedness. This technique, inspired by the distribution hypothesis commonly employed in natural language processing, adapts it to the visual complexity of oracle bone scripts.\u003c/p\u003e \u003cp\u003eOur findings suggest a predominance of dextromanuality among the scribes of these ancient texts. This not only aligns with other anthropological evidence from prehistoric sites but also offers a new perspective on the cognitive behaviors associated with early writing practices. The Bone2Vec method, designed specifically for this study, has broader implications for the fields of image recognition and unsupervised machine learning, providing a robust framework for addressing similar challenges in other domains.\u003c/p\u003e \u003cp\u003eThe contributions of this paper are threefold:\u003c/p\u003e \u003cp\u003e(1) We have manually scanned and processed all the available genuine oracle bone inscription images provided by National Museum of Chinese Writing that have been recognized so far, containing more than 1,000 recognized oracle bone images, and constructed the largest genuine oracle bone inscriptions dataset used in the research field of computer technology at present.\u003c/p\u003e \u003cp\u003e(2) We proposed Bone2Vec model that extracts the horizontal and vertical features of the image from the image pixels and uses a completely unsupervised algorithm for image embedding and clustering. This model is versatile, applicable to various fields requiring image analysis without predefined labels, such as handwriting identification in legal contexts, authenticity assessments in art, and aging of textual materials.\u003c/p\u003e \u003cp\u003e(3) Archaeologists found evidence of left and right hands from ancient teeth, and the research on the left and right hand of oracle bone inscriptions provides a new evidence for anthropological research from the perspective of computer technology.\u003c/p\u003e"},{"header":"2 Literature Review","content":"\u003cp\u003eThe scope of oracle bone inscriptions research is extensive. In the traditional literature, only oracle experts can competently recognize oracle characters. The confirmation of oracle characters must be studied in many aspects. The common goal of all basic research works is to interpret more deracinated oracle bone inscriptions to unravel the history and culture of the Shang Dynasty 3,000 years ago. Differs from traditional Optical Character Recognition (OCR) workflow in digitization, pre-processing and post correction (Nguyen et al. \u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e2021\u003c/span\u003e), Oracle Bone Inscriptions (OBI) present manifested challenges due to their linguistic complexity, unique physical form, and handwriting pattern recognition demands.\u003c/p\u003e \u003cp\u003eIn the field of linguistics, oracle bone inscriptions are glyph drawings and the rudiment of the modern Chinese character system. In the field of computer science, Li and Zhou (\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e1996\u003c/span\u003e) took the stroke direction, curvature, and bending times of the oracle shaped drawings as secondary features and encoded them. But only experts can master this method skillfully. Gu (\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e2018\u003c/span\u003e) calculated the fractal dimension of oracle bone inscriptions, and matched it with the characteristic database of oracle bone inscriptions to identify oracle bone inscriptions. Liu et al. (\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e2020\u003c/span\u003e) developed the oracle character recognition method based on SVM. However, due to the coexistence of complex and simplified oracle characters, the recognition accuracy can still be improved. Gao and Liang (\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e2020\u003c/span\u003e) presented a two-stage method of variant character recognition for oracle bone inscriptions, which used VGG16 to recognize oracle bone inscriptions variant characters.\u003c/p\u003e \u003cp\u003eIn the deep learning field, the presence of CNNs has revolutionized traditional image recognition in unprecedented levels. Integration of the Region Proposal Network (Ren et al., \u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e2015\u003c/span\u003e), residual blocks (He et al., \u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e2016\u003c/span\u003e), Fully Convolutional Network with Region of Interest (He et al., \u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e2017\u003c/span\u003e), and efficient building blocks (Tan and Le, \u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e2019\u003c/span\u003e) into CNNs represents benchmark development in the field of image recognition. Furthermore, by deploying attention-mechanism in input grid points, attention-based deformable convolutional network performs better in Chinese character recognition (Zhuo and Zhang, \u003cspan citationid=\"CR44\" class=\"CitationRef\"\u003e2024\u003c/span\u003e). In the aspect of image embedding, Liang (\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e2020\u003c/span\u003e) attempted to use an unsupervised method to process the image content by binarization, combining the vectors of each direction into a row vector as the feature representation of the image content, but the recognition accuracy is low. Instead of treating image as sequence of pixels, Swin Transformer constructs hierarchical representation of patched image through Shifted Window Self-Attention mechanism. The approach can partitions an image into fixed-size windows then shifts the windows to cover global regions (Liu et al., \u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e2021\u003c/span\u003e). Researchers not only mainly use traditional image processing ideas and introduces the supervised network model (Hinton et al., \u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e2006\u003c/span\u003e, Cevikalp et al., \u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e2019\u003c/span\u003e, Marin et al., \u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e2018\u003c/span\u003e) but also combine different neural networks in two stages. Obtaining the features of images and texts through convolution and sequence networks and used GRU to map the text and image to the same subspace (Faghri et al.,2017). Combining the idea of mask language model and a transformer to achieve the unsupervised learning image task, with the goal of improving resnet-50 (Trinh et al., \u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e2019\u003c/span\u003e). Using the manifold relation to learn image feature embedding and manifold learning to refine the distance measurement (Chen and Li, \u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e2020\u003c/span\u003e).Using Node2Vecto generate graph embeddings of nodes and calculates user-news relevance using cosine similarity (Ren et al., \u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e2019\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eFor the recognition of glyph images, statistical analysis of data sets is often used to extract features and classify them. Thinning, binarization, normalization and other methods are used in image preprocessing. In the aspect of feature extraction, researchers generally extract global static features (Abdi and Khemakhem, \u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e2012\u003c/span\u003e), such as the writing center of text, font size spacing, etc., in addition to moment features (Darwish and Auda, \u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e1994\u003c/span\u003e) and texture features (Said et al., \u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e2000\u003c/span\u003e). Support Vector Machine and nearest neighbor algorithm are generally used in the initial classifier, but neural network is mostly used at present. As mentioned in three studies of handwriting optical character recognition (Memon et al., \u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e2020\u003c/span\u003e; Preetha et al. \u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e2020\u003c/span\u003e; Vashist et al., \u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e2020\u003c/span\u003e), Oracle Bone Inscriptions handwriting recognition requires multiple classification methodologies with structural pattern recognition in grammar and graphs, non-parametric statistical approach, kernel based approach such as Support Vector Machine and Artificial Neural Network. Combining ResNet and Bidirectional Long Short-Term Memory - Deep Neural Network model (Bi-LSTM-DNN) in generally two stages is also a solution to handwriting recognition (Rao and Babu, \u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e2024\u003c/span\u003e).\u003c/p\u003e"},{"header":"3 Experimental Design","content":"\u003cp\u003eWe study the problem of oracle font recognition based on unsupervised learning method. There were TF-IDF and subtractive clustering methods (Singh and Sundaram, \u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e2015\u003c/span\u003e; Shivram et al., \u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e2013\u003c/span\u003e; Taghavi Sangdehi and Faez, \u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e2009\u003c/span\u003e) for font feature extraction based on unsupervised learning. It is found that the success rate of unsupervised learning based font feature recognition is very low with few training samples. We extract the horizontal and vertical features of the image from the image pixels and convert them into text. Using the internal correlation of the image, we realize unsupervised image recognition with high accuracy.\u003c/p\u003e \u003cp\u003eIn this study, we proposed the model Bone2Vec (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e) to enable the recognition of oracle bone scripts and to test our conjecture as to whether left-handedness existed among the carvers of oracle bones more than 3,000 years ago. The model Bone2Vec is divided into 3 stages. Through the experimental results, our model preliminarily empowers the identification of font structure of oracle bone inscriptions. This method can be used in many unlabeled classification scenarios, such as the classification of ancient Mayan characters with hieroglyphic characteristics and Sumerian cuneiform characters.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eStage 1 - Image Enhancement: Considering the ancient and data-scarce nature of oracle inscriptions, we employ image enhancement techniques to generate numerous images. These images reflect the diverse characteristics of oracle characters, thereby expanding our training sample set and enhancing the model's generalization capabilities. This approach also improves the recognition of oracle structure. By referring to oracle bone inscriptions that exhibit multiple morphological features, we generate a variety of images through targeted image enhancements.\u003c/p\u003e \u003cp\u003eStage 2 - Image Textualization: The image content mapping method converts the gray value of pixels into text to achieve image textualization. In this approach, each pixel's gray value in an oracle bone image is treated as a word in text. The variability in gray values across different images, and the frequency of each gray value, mirrors the variation and frequency of words in different documents within natural language processing. These variations form the distinct characteristics of the images or documents.\u003c/p\u003e \u003cp\u003eStage 3 \u0026ndash; Image Vectorization: The image text is vectorized using the unsupervised deep learning model, Doc2Vec. By introducing prior knowledge and clustering methods, we identify the structural patterns of the oracle font. The high-dimensional vectors are then reduced to two dimensions using a nonlinear method, facilitating the visualization of clustering results.\u003c/p\u003e \u003cp\u003eWe have constructed four datasets for this study. Dataset 1 consists of the digit '5' written by ten individuals using both left and right hands, totaling 115 images each. Dataset 2 features modern imitations of oracle bone inscriptions, comprising 770 images across 381 types, focusing solely on the shapes without considering stroke thickness or style. Dataset 3, the largest in this field, includes photocopied data from all identified real oracle rubbings, meticulously scanned and processed into single characters by the authors, amounting to approximately 10GB with 4,634 images of 1,108 unique oracle bone characters. This dataset preserves details such as stroke thickness and edge smoothness. Dataset 4 contains 64 images of original oracle bone topographies, replicated by four left-handed and four right-handed individuals, split evenly between the two groups.\u003c/p\u003e \u003cdiv id=\"Sec4\" class=\"Section2\"\u003e \u003ch2\u003e3.1 Recognition of Handwritten Digit '5' by Left and Right Hands\u003c/h2\u003e \u003cp\u003eWe empirically investigate the four datasets using our proposed model to assess its effectiveness in recognizing oracle scripts and determining left or right-handedness. Initially, we utilized Datasets 1 and 2 to validate our model's effectiveness. Subsequently, we applied the model to identify and cluster Datasets 3 and 4. The details of the model are outlined in the Methodology section in the Appendix.\u003c/p\u003e \u003cp\u003eFor Dataset 1, we constructed the Bone2Vec model. This dataset comprises images of the number '5' written by 10 modern individuals. Given that modern individuals have greater finger flexibility than ancient people, we chose a simple character like '5' to test the model's ability to distinguish between left and right-handed writing.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eThe images in Dataset 1 are processed according to Algorithm 1 (see Appendix) to generate image text. The optimal dimension and optimal window are used to generate the document embedding vector. The vector is normalized to get the document embedding matrix. The normalized vector of each document is manifold learned, and the T-SNE method is used to reduce the dimension and visualize. Taking left hand and right hand as categorial labels, 60% of all datasets as training set and 40% as test set, using logistic regression classification algorithm for classification. The classification results (Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e, Experiment 1) show that the classification accuracy reaches 83%. It can be seen from the clustering visualization results (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e) and classification results that the model can preliminarily distinguish the number 5 written by left or right hand.\u003c/p\u003e \u003cp\u003eIn order to avoid the contingency of the outcome, we carried out 10 repeated experiments, each time selected one third of the data from the original dataset for processing. The clustering result (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e) of number 5 written in left and right handwriting is good, and Bone2Vec model can be used to distinguish the words written by left or right hand.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec5\" class=\"Section2\"\u003e \u003ch2\u003e3.2 Modern oracle characters recognition\u003c/h2\u003e \u003cp\u003eIn order to verify the effectiveness of the model on the oracle dataset, we extracted two oracle characters \u0026ldquo;Bao\u0026rdquo; and \u0026ldquo;Bing\u0026rdquo; from Dataset 2, and use image enhancement to expand the dataset to 230 images, in which the characters \u0026ldquo;Bao\u0026rdquo; has 120 images and \u0026ldquo;Bing\u0026rdquo; has 130 images. Using Bone2Vec, the results are visualized as Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003e. Taking the Chinese characters corresponding to the oracle bone inscriptions as categorial labels, 60% of all datasets as training set and 40% as test set, using logistic regression classification algorithm for classification. The classification results (Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e, Experiment 2) show that the classification accuracy reaches 88%. The algorithm can gather similar oracle characters together, which proves that the algorithm has good effect in the recognition of oracle font structure.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eAgain, to avoid the contingency of the outcome, we ran 10 replicates, each with a third of the data from the original dataset. The clustering result of oracle (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003e) is good, and the Bone2vec model can be used to distinguish oracle characters.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec6\" class=\"Section2\"\u003e \u003ch2\u003e3.3 Structure recognition of genuine oracle characters\u003c/h2\u003e \u003cp\u003eThrough Experiments 1 and 2, we verified the effectiveness of Bone2Vec in the recognition of handwritten digit 5 and oracle bone inscriptions. Next, we applied Bone2Vec model to oracle rubbings, which come from genuine oracle.\u003c/p\u003e \u003cp\u003eWe have showed that Bone2Vec can extract features of oracle bone inscriptions copied by modern people. In order to prove that this method is still effective for the original image of oracle bone inscriptions, we select several groups of oracle bone inscriptions with different features. From the dimensionality reduction visualization results, the complex and simple glyphs (Fig.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e7\u003c/span\u003ea) are non-linear separable, our model can distinguish complex and simple oracle images. The similar glyphs are clustered together, while the glyphs (Fig.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e7\u003c/span\u003eb) containing circles and cross structures are still non-linear separable.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eAlthough we can see from the experimental results that our algorithm can make oracle bone inscriptions images with common features gather together. However, the characteristics of oracle bone inscriptions may be determined by many factors, such as the structure of the glyph, the style of the writer, the writing period, the left and right hands used by the writer, etc.\u003c/p\u003e \u003cp\u003eIn order to verify our initial conjecture that whether there are sinistromanual or not among the seal carvers of oracle bone inscriptions, we copied several groups of oracle bone inscriptions with left and right hands respectively, and carried out several groups of experiments combined with real oracle bone images. We copied 64 copies of \u0026ldquo;Bao\u0026rdquo; and expanded the dataset to 512 images after data enhancement.\u003c/p\u003e \u003cp\u003eFirst, we constructed the image embedding vector from the character \u0026ldquo;Bao\u0026rdquo; data copied by our left and right hands. The clustering result (Fig.\u0026nbsp;\u003cspan refid=\"Fig8\" class=\"InternalRef\"\u003e8\u003c/span\u003ea) shows that the model can preliminarily distinguish the left and right hand written text. Taking left hand and right hand as categorial labels, 60% of all datasets as training set and 40% as test set, using logistic regression classification algorithm for classification. The classification results (Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e, Experiment 3) show that the classification accuracy reaches 81%, which indicates that the model has well extracted the features of oracle bone image.\u003c/p\u003e \u003cp\u003eNext, we added the real oracle rubbings image to the image we copied. After data enhancement, a total of 18 images were obtained. The real rubbings of oracle bone inscriptions are mixed with the left and right handwritten characters (Fig.\u0026nbsp;\u003cspan refid=\"Fig8\" class=\"InternalRef\"\u003e8\u003c/span\u003eb). Some of them are mixed with the left-handed characters, and some are mixed with the right-handed characters. One of the genuine oracle characters fell in the left-handed images, accounting for 5%.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eClassification results of three sets of experiments.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"6\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eprecision\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003erecall\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003ef1-score\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003esupport\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003eExperiment 1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eLeft\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.85\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.82\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.84\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e50\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eRight\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.80\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.83\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.81\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e42\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAccuracy\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.83\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e92\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003eExperiment 2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eBao\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.89\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.85\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.87\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e48\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eBing\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.88\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.91\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.89\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e56\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAccuracy\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.88\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e104\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003eExperiment 3\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eLeft\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.79\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.86\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.81\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e109\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eRight\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.84\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.75\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.79\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e96\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAccuracy\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.81\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e205\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eThere is evidence suggesting that the scribes of oracle bone inscriptions were predominantly right-handed (dextromanual). Utilizing an unsupervised adaptive graph embedding learning algorithm, we can employ computers to analyze ancient character images, enabling us to infer the production methods and lifestyle of ancient human beings.\u003c/p\u003e \u003c/div\u003e"},{"header":"4 Conclusion","content":"\u003cp\u003eOracle bone inscriptions represent the earliest mature writing system discovered in China and are among the most recognized word systems globally. Despite many inscriptions remaining undeciphered, withholding their stories from modern understanding, the challenges in labeling oracle words make supervised deep learning infeasible. Consequently, our research employs unsupervised learning, which remains challenging due to the presumed dominance of right-handed inscriptions. By incorporating artificially created data from both left and right hands, our experiments with both real and simulated oracle rubbings demonstrate a significant clustering of ancient human images in the right-handed group, while our created data populate both handedness clusters. Given the right-handedness prevalent among most primates, this might suggest a biological rather than social evolutionary trait. Further exploration into Zhenren authors, who are linked to specific kings and eras, provides new insights into the behaviors of these legendary figures from 3,000 years ago. The robustness of our method to image size and language type suggests its applicability beyond oracle bones to ancient Mayan and Sumerian cuneiform characters, reflecting the deep-seated significance of these symbols in understanding ancient civilizations.\u003c/p\u003e \u003cp\u003eOur study significantly advances the field of historical linguistics and computer vision by employing a unique dataset of over 1,000 manually scanned images of genuine oracle bone inscriptions. This dataset, the largest of its kind, enables a level of analysis previously unattainable in deciphering ancient scripts. The use of these authentic inscriptions is crucial, as it ensures the fidelity and accuracy of our research, allowing us to draw more precise conclusions about ancient writing practices.\u003c/p\u003e \u003cp\u003eFurthermore, the introduction of the Bone2Vec model represents a novel contribution to the field. This unsupervised learning algorithm successfully handles the complex variability inherent in oracle bone scripts, which traditional models struggle to process. By treating each pixel of the scanned images as textual data, Bone2Vec allows for an innovative approach to pattern recognition in ancient scripts, bridging the gap between traditional linguistic analysis and modern computational techniques.\u003c/p\u003e \u003cp\u003eThis combination of a unique, high-quality dataset and pioneering analytical techniques not only enhances our understanding of ancient Chinese civilizations but also sets a precedent for future studies involving other ancient scripts. Our findings offer new insights into the behavioral patterns of ancient societies and underscore the importance of technological innovation in unraveling historical mysteries.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003eDuring the preparation of this work the authors used OpenAI/ChatGPT4.0 in order to check on the grammar and refine the English writing for the abstract and sections 1 and 4 of the paper. After using this tool/service, the authors carefully reviewed and edited the content as needed and take full responsibility for the content of the publication.\u003c/p\u003e\n\u003cp\u003eConflict of interest\u003c/p\u003e\n\u003cp\u003eThe authors have no relevant financial or non-financial interests to disclose.\u003c/p\u003e\n\u003cp\u003eEthics approval\u003c/p\u003e\n\u003cp\u003eThe submitted work is original and has not been published elsewhere in any form or language\u0026nbsp;(partially or in full).\u003c/p\u003e\n\u003cp\u003eFunding\u003c/p\u003e\n\u003cp\u003eNo external fund is used for this research.\u003c/p\u003e\n\u003cp\u003eData availability\u003c/p\u003e\n\u003cp\u003eData will be made available upon reasonable requests.\u003c/p\u003e\n\u003cp\u003eAuthors contribution\u003c/p\u003e\n\u003cp\u003eConceptualization: [Hefei Wang], [Zihuan Feng]; Methodology: [Xun Liang], [Zihuan Feng]; [Zhenhao Li]; Formal analysis and investigation: [Xun Liang], [Zihuan Feng], [Zhenhao Li]; Writing - original draft preparation: [Hefei Wang], [Zihuan Feng]; Writing - review and editing: [Hefei Wang], [Zhenhao Li]\u003c/p\u003e\n\u003cp\u003eAcknowledgement\u003c/p\u003e\n\u003cp\u003eThe authors are grateful for Mr. Kevin McSpadden at South China Morning Post for featuring our research for public awareness of the topic. The article is accessible online at: https://www.scmp.com/news/people-culture/article/3278486/recent-study-ancient-chinese-oracle-bones-highlights-how-ai-changing-archaeology?module=perpetual_scroll_0\u0026amp;pgtype=article\u003c/p\u003e\n\u003cp\u003eHuman Participants and/or Animals\u003c/p\u003e\n\u003cp\u003eThere are no animals in our work. This research involved human subjects in forming the training set to distinguish between left-handed and right-handed writers; however, no individuals were harmed during the study.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eAbdi, M. N., \u0026amp; Khemakhem, M. (2012, October). Arabic writer identification and verification using template matching analysis of texture. In 2012 IEEE 12th International Conference on Computer and Information Technology (pp. 592-597). IEEE.\u003c/li\u003e\n\u003cli\u003eBengio, Y., Ducharme, R., \u0026amp; Vincent, P. (2000). A neural probabilistic language model. Advances in neural information processing systems, 13.\u003c/li\u003e\n\u003cli\u003eBloice, M. D., Stocker, C., \u0026amp; Holzinger, A. (2017). Augmentor: an image augmentation library for machine learning. arXiv preprint arXiv:1708.04680.\u003c/li\u003e\n\u003cli\u003eCevikalp, H., Benligiray, B., Gerek, \u0026Ouml;. N., \u0026amp; Saribas, H. (2019, June). Semi-Supervised Robust Deep Neural Networks for Multi-Label Classification. In CVPR workshops (pp. 9-17).\u003c/li\u003e\n\u003cli\u003eChen, X., \u0026amp; Li, Y. (2020). Deep feature learning with manifold embedding for robust image retrieval. Algorithms, 13(12), 318.\u003c/li\u003e\n\u003cli\u003eChen, T. Restudy on the glyph system of Shang oracle bone inscriptions (Shanghai People\u0026rsquo;s Publishing House, 2010). (In Chinese)\u003c/li\u003e\n\u003cli\u003eDarwish, A. M., \u0026amp; Auda, G. A. (1994, April). A new composite feature vector for Arabic handwritten signature recognition. In Proceedings of ICASSP\u0026apos;94. IEEE International Conference on Acoustics, Speech and Signal Processing (Vol. 2, pp. II-613). IEEE.\u003c/li\u003e\n\u003cli\u003eFaghri, F., Fleet, D. J., Kiros, J. R., \u0026amp; Fidler, S. (2017). Vse++: Improving visual-semantic embeddings with hard negatives. arXiv preprint arXiv:1707.05612.\u003c/li\u003e\n\u003cli\u003eFrayer, D. W., Clarke, R. J., Fiore, I., Blumenschine, R. J., P\u0026eacute;rez-P\u0026eacute;rez, A., Martinez, L. M., Estebaranz, F., Holloway, R., \u0026amp; Bondioli, L. (2016). OH-65: The earliest evidence for right-handedness in the fossil record. Journal of human evolution, 100, 65\u0026ndash;72. https://doi.org/10.1016/j.jhevol.2016.07.002\u003c/li\u003e\n\u003cli\u003eFinch, G. (1941). Chimpanzee handedness. Science, 94(2431), 117-118.\u003c/li\u003e\n\u003cli\u003eGao, J., \u0026amp; Liang, X. (2020). Distinguishing oracle variants based on the isomorphism and symmetry invariances of oracle-bone inscriptions. IEEE Access, 8, 152258-152275.\u003c/li\u003e\n\u003cli\u003eGu, S. T. (2018). A method of oracle character recognition based on fractal geometry. J. Chin. Inf. Process, 32(10), 138-142.\u003c/li\u003e\n\u003cli\u003eHamilton, W. L., Leskovec, J., \u0026amp; Jurafsky, D. (2016). Diachronic word embeddings reveal statistical laws of semantic change. arXiv preprint arXiv:1605.09096.\u003c/li\u003e\n\u003cli\u003eHe, K., Gkioxari, G., Doll\u0026aacute;r, P., \u0026amp; Girshick, R. (2017). Mask r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 2961-2969).\u003c/li\u003e\n\u003cli\u003eHe, K., Zhang, X., Ren, S., \u0026amp; Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).\u003c/li\u003e\n\u003cli\u003eHinton, G. E., Osindero, S., \u0026amp; Teh, Y. W. (2006). A fast learning algorithm for deep belief nets. Neural computation, 18(7), 1527-1554.\u003c/li\u003e\n\u003cli\u003eLe, Q., \u0026amp; Mikolov, T. (2014, June). Distributed representations of sentences and documents. In International conference on machine learning (pp. 1188-1196). PMLR.\u003c/li\u003e\n\u003cli\u003eLiang, X. (2020). Social Computing with Artificial Intelligence. [electronic resource] (1st ed. 2020.). Springer Singapore.\u003c/li\u003e\n\u003cli\u003eLi, F., \u0026amp; Zhou, X. L. (1996). The graph theory method of oracle bone inscriptions automatic recognition. J. Electron, 18(1), 41-47.\u003c/li\u003e\n\u003cli\u003eLin, J. \u0026amp; Li, J. Feature extraction and preprocessing of offline Chinese signature verification. J. Shanghai Jiaotong Univ. 30, 40\u0026ndash;45 (1996). (In Chinese)\u003c/li\u003e\n\u003cli\u003eLiu, M., Liu, G., Liu, Y., \u0026amp; Jiao, Q. (2020). Oracle bone inscriptions recognition based on deep convolutional neural network. Journal of image and graphics, 8(4), 114-119.\u003c/li\u003e\n\u003cli\u003eLiu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., ... \u0026amp; Guo, B. (2021). Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 10012-10022).\u003c/li\u003e\n\u003cli\u003eMaaten, L. V. D. (2008). Visualizing data using t-SNE. Journal of machine learning research, 9(Nov), 2579.\u003c/li\u003e\n\u003cli\u003eMarin, J., Biswas, A., Ofli, F., Hynes, N., Salvador, A., Aytar, Y., ... \u0026amp; Torralba, A. (2018). Recipe1M+: a dataset for learning cross-modal embeddings for cooking recipes and food images. arXiv preprint arXiv:1810.06553.\u003c/li\u003e\n\u003cli\u003eMemon, J., Sami, M., Khan, R. A., \u0026amp; Uddin, M. (2020). Handwritten optical character recognition (OCR): A comprehensive systematic literature review (SLR). IEEE access, 8, 142642-142668.\u003c/li\u003e\n\u003cli\u003eNguyen, T. T. H., Jatowt, A., Coustaty, M., \u0026amp; Doucet, A. (2021). Survey of post-OCR processing approaches. ACM Computing Surveys (CSUR), 54(6), 1-37.\u003c/li\u003e\n\u003cli\u003ePreetha, S., Afrid, I. M., \u0026amp; Nishchay, S. K. (2020). Machine learning for handwriting recognition. International Journal of Computer (IJC), 38(1), 93-101.\u003c/li\u003e\n\u003cli\u003eRao N, S., \u0026amp; Babu C, N. K. (2024). Enhanced ResNet-151-based fused features for optimized Bi-LSTM-DNN-aided handwritten character and digits recognition. Expert Systems With Applications, 244. https://doi-org-s.elink.xjtlu.edu.cn:443/10.1016/j.eswa.2023.122860\u003c/li\u003e\n\u003cli\u003eRen, J., Long, J., \u0026amp; Xu, Z. (2019). Financial news recommendation based on graph embeddings. Decision Support Systems, 125, 113115.\u003c/li\u003e\n\u003cli\u003eRen, S., He, K., Girshick, R., \u0026amp; Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, 28.\u003c/li\u003e\n\u003cli\u003eSaid, H. E., Tan, T. N., \u0026amp; Baker, K. D. (2000). Personal identification based on handwriting. Pattern Recognition, 33(1), 149-160.\u003c/li\u003e\n\u003cli\u003eShivram, A., Ramaiah, C., \u0026amp; Govindaraju, V. (2013). A hierarchical Bayesian approach to online writer identification. Iet Biometrics, 2(4), 191-198.\u003c/li\u003e\n\u003cli\u003eSingh, G., \u0026amp; Sundaram, S. (2015, August). A subtractive clustering scheme for text-independent online writer identification. In 2015 13th International Conference on Document Analysis and Recognition (ICDAR) (pp. 311-315). IEEE.\u003c/li\u003e\n\u003cli\u003eTaghavi Sangdehi, S. A., \u0026amp; Faez, K. (2009). Writer Identification Using Super Paramagnetic Clustering and Spatio Temporal Neural Network. In Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications: 14th Iberoamerican Conference on Pattern Recognition, CIARP 2009, Guadalajara, Jalisco, Mexico, November 15-18, 2009. Proceedings 14 (pp. 669-676). Springer Berlin Heidelberg.\u003c/li\u003e\n\u003cli\u003eTan, M., \u0026amp; Le, Q. (2019, May). Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning (pp. 6105-6114). PMLR.\u003c/li\u003e\n\u003cli\u003eTrinh, T. H., Luong, M. T., \u0026amp; Le, Q. V. (2019). Selfie: Self-supervised pretraining for image embedding. arXiv preprint arXiv:1906.02940.\u003c/li\u003e\n\u003cli\u003eTsai, L. S., \u0026amp; Maurer, S. (1930). \u0026quot;Right-handedness\u0026quot; in white rats. Science, 72(1869), 436-438.\u003c/li\u003e\n\u003cli\u003eVashist, P. C., Pandey, A., \u0026amp; Tripathi, A. (2020, January). A comparative study of handwriting recognition techniques. In 2020 International Conference on Computation, Automation and Knowledge Management (ICCAKM) (pp. 456-461). IEEE.\u003c/li\u003e\n\u003cli\u003eWang, B. Oracle Bone Inscriptions Copybook (Beijing Arts and crafts Publishing House, Beijing, China, 2019). (In Chinese)\u003c/li\u003e\n\u003cli\u003eWang, J. \u0026amp; Qin, F. Design and application of adaptive binary filtering algorithm for gray text image. J. Hefei Univ. Technol. 509\u0026ndash;512 (2004).(In Chinese)\u003c/li\u003e\n\u003cli\u003eWarren, J. M. (1953). Handedness in the rhesus monkey. Science, 118(3073), 622-623.\u003c/li\u003e\n\u003cli\u003eYin, Z., \u0026amp; Shen, Y. (2018). On the dimensionality of word embedding. Advances in neural information processing systems, 31.\u003c/li\u003e\n\u003cli\u003eZhou, X., Li, F. \u0026amp; Hua, X. Study on computer identification method of oracle. J. Fudan Univ. 5, 481\u0026ndash;486 (1996). (In Chinese)\u003c/li\u003e\n\u003cli\u003eZhuo, S., \u0026amp; Zhang, J. (2024). Attention-based deformable convolutional network for Chinese various dynasties character recognition. Expert Systems With Applications, 238(Part B).\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Image Recognition, Artificial Intelligence in Archaeology, Historical Linguistics, Ancient Writing Systems, Oracle Bone Inscriptions","lastPublishedDoi":"10.21203/rs.3.rs-5142859/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-5142859/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eTracing the evolution of human habits and cognition through ancient artifacts offers unique insights into our past. This paper explores the handedness of ancient Chinese individuals through the study of oracle bone inscriptions\u0026mdash;some of the earliest forms of Chinese writing, dating back to the Shang Dynasty approximately 3,000 years ago. Our research utilizes manually scanned real images of genuine oracle bone rubbings provided by National Museum of Chinese Writing. We have constructed the largest genuine oracle bone inscriptions dataset currently used in the field of computer technology, which presents unique challenges due to their variable and pictographic nature. Employing unsupervised deep learning techniques, we analyze the subtle stylistic differences in these images to discern whether the inscriptions were crafted by left-handed (sinistromanual) or right-handed (dextromanual) individuals. Our novel computational method, Bone2Vec, treats each pixel of the oracle bone image as a word in text, enabling us to embed and cluster these images to determine handedness patterns. Our findings not only advance our understanding of early Chinese script and its creators but also contribute to anthropological research by providing new evidence of handedness in ancient civilizations. This interdisciplinary approach underscores the potential of artificial intelligence in historical linguistics and archaeology, offering a fresh perspective on the cognitive behaviors of ancient societies.\u003c/p\u003e","manuscriptTitle":"A bone to pick with ancient Chinese: AI Analysis of Handedness in Bone Inscriptions","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-06-18 08:46:01","doi":"10.21203/rs.3.rs-5142859/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"f7b10e69-18e6-45ef-99d1-89672a4788eb","owner":[],"postedDate":"June 18th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2025-07-05T12:37:29+00:00","versionOfRecord":[],"versionCreatedAt":"2025-06-18 08:46:01","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-5142859","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-5142859","identity":"rs-5142859","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.