Nondestructive detection and classification of traditional handmade paper using near-infrared hyperspectral imaging and machine learning

preprint OA: closed
Full text JSON View at publisher
Full text 160,142 characters · extracted from preprint-html · click to expand
Nondestructive detection and classification of traditional handmade paper using near-infrared hyperspectral imaging and machine learning | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Nondestructive detection and classification of traditional handmade paper using near-infrared hyperspectral imaging and machine learning Yong Ju Lee, Seo Young Won, Seong Bin Park, Tai-Ju Lee, Hyoung Jin Kim This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-6397784/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Traditional handmade papers such as Hanji, Washi, and Xuan paper hold substantial cultural and historical value across East Asia. However, their classification and authentication remain challenging due to variations in raw materials and manufacturing techniques. In this study, we propose a nondestructive approach using near-infrared (NIR) hyperspectral imaging combined with machine learning to classify traditional handmade papers from China, Japan, and Korea. NIR spectra (900–1,700 nm) were extracted from hyperspectral images of 26 paper samples and preprocessed using first derivatives. Dimensionality reduction and clustering were performed using principal component analysis (PCA) and density-based spatial clustering of applications with noise (DBSCAN), which also identified outliers of spectra. Multiple classification models, including support vector machine (SVM), FNN, and XGBoost, were trained and evaluated, with SVM achieving the highest F1-score (1.000). Feature importance derived from XGBoost highlighted key spectral regions relevant to classification. Additionally, the spectral angle mapper (SAM) enabled pixel-wise visualization, revealing spectral heterogeneity among the samples. This study demonstrates the effectiveness of NIR hyperspectral imaging and machine learning for the rapid, interpretable, and noninvasive classification of traditional handmade papers, providing valuable tools for heritage conservation and authenticity verification. Physical sciences/Chemistry Physical sciences/Mathematics and computing Xuan paper Washi Hanji DBSCAN support vector machine (SVM) spectral angle mapper (SAM) Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Introduction Handmade paper has long served as a vital medium for preserving and transmitting human knowledge. Today, traditional hand papermaking is recognized as an intangible cultural heritage and continues to hold cultural, historical, and artistic value[ 1 – 3 ]. In East Asia, distinct traditions such as Chinese Xuan paper[ 4 – 6 ], Japanese Washi[ 7 , 8 ], and Korean Hanji[ 2 , 9 , 10 ] have evolved based on regionally available plant fibers, tools, and techniques. These practices were often passed down through families or guilds and adapted over time to reflect local environments and cultural identities[ 1 , 11 – 15 ]. The use of various plant-based fibers has resulted in papers with unique physical and esthetic properties[ 3 , 16 – 18 ]. While these papers were traditionally used for writing, painting, and calligraphy, their applications have expanded into other areas, including material studies[ 19 – 21 ]. However, accurately identifying and classifying traditional papers remains challenging due to their complex and diverse origins. This presents a substantial obstacle for researchers in cultural heritage, conservation, and forensic science[ 3 , 22 – 24 ]. Handmade papers are primarily composed of cellulose, with varying levels of residual lignin depending on the fiber source and processing methods. Common raw materials include bamboo, reed hemp, bark, grasses, and paper mulberry[ 1 , 24 , 25 ]. In Korea, the production of Hanji—a traditional handmade paper—is carried out through a culturally substantial and well-preserved process. Typically, one-year-old stems of paper mulberry are steamed and manually peeled to isolate the inner white bark[ 26 ]. This bark is then boiled in an alkaline solution made from plant-derived ash (e.g., burned soybean, chili, or buckwheat stalks), which facilitates the removal of noncellulosic substances such as lignin and pectin. After repeated rinsing, the cleaned fibers are blended with mucilage extracted from Hibiscus manihot to control the dispersibility during sheet formation—a key factor in producing evenly layered paper. The resulting sheets are traditionally sun-dried on wooden panels[ 24 , 27 , 28 ]. In recent years, however, the increased use of imported mulberry and synthetic chemicals, such as sodium hydroxide (NaOH) and polyacrylamide (PAM), has been reported, driven by cost and supply constraints[ 24 ]. Nevertheless, the core techniques of Hanji production continue to be preserved by skilled artisans and remain an integral part of Korea’s intangible cultural heritage. Comparable practices are also found in neighboring countries, where regionally adapted techniques have produced handmade papers with similar functional qualities but distinct cultural identities[ 1 ]. The diversity and complexity of traditional papers make it difficult to determine their precise origin or composition through visual inspection alone[ 3 ]. Misinterpretations regarding fiber sources were already noted in the 19th century, when the absence of scientific methodology limited historical analysis to rudimentary botanical observations. It was not until the early 20th century that material-based investigations were recognized as essential to the study of paper history[ 3 ], marking the emergence of an interdisciplinary approach that integrated natural science with cultural heritage studies. In recent decades, scientific characterization techniques have been increasingly applied to handmade papers to understand their composition and provenance better. Conventional methods such as optical microscopy[ 29 , 30 ], physical property measurements[ 18 , 31 ], pyrolysis gas chromatography/mass spectrometry (Py-GC/MS)[ 3 , 18 ], elemental analysis[ 32 , 33 ], and size-exclusion chromatography coupled with multi-angle light scattering (SEC-MALS)[ 26 ] have provided valuable insights into fiber types and quality. However, these techniques are inherently destructive and pose risks to fragile or culturally significant specimens. To address this limitation, nondestructive alternatives have gained prominence—particularly vibrational spectroscopic techniques such as Raman[ 17 , 34 ], near-infrared (NIR)[ 24 ], and infrared (IR)[ 22 , 35 , 36 ] spectroscopy. These methods offer rapid, chemical-specific information without damaging the sample, making them highly suitable for the classification, authentication, and conservation of traditional handmade papers. Hyperspectral imaging (HSI) is an advanced, noninvasive technique that captures detailed spectral information across numerous contiguous wavelengths[ 37 ]. By recording a complete spectral profile for each pixel[ 38 ], HSI facilitates the precise identification of materials based on their characteristic absorption features[ 39 ]. The NIR region, in particular, is highly informative for detecting cellulose and other polysaccharides, which exhibit distinct spectral signatures[ 39 – 42 ]. As such, HSI serves as a powerful tool for assessing key structural components in organic materials without the need for direct sampling. The combination of NIR HSI and machine learning is gaining traction across various fields, including food[ 43 ], packaging[ 44 ], agriculture[ 45 , 46 ], heritage science[ 23 ], and material science[ 47 ]. Recently, advanced computational algorithms have helped overcome challenges associated with nondestructive, time-consuming, and labor-intensive analyses of spectral data in the study of organic matter[ 22 – 24 , 39 – 41 , 48 , 49 ]. This study aims to develop machine learning models using NIR HSI for the accurate classification of traditional handmade papers from East Asia. To this end, full NIR spectral data extracted from HSI were used to construct classification models. To enhance the interpretability and transparency of the model in the decision-making process, feature importance scores were computed. Another objective of this study is to visualize the spectral heterogeneity among the sample using spectral angle mapper (SAM) enabled pixel-wise visualization. This article presents the results of a machine learning-based classification of the manufacturing origin of traditional handmade paper, contributing valuable insights to the ongoing research in the field of cultural heritage and material identification. Materials and methods Traditional handmade paper samples A summary of the traditional handmade paper samples used in this study is provided in Table 1 . Samples containing any form of coloration were excluded to ensure consistency in chemical composition and to minimize interference in spectroscopic analysis. Preference was given to papers obtained from manufacturers actively engaged in traditional production practices. It was assumed that variations in fiber species and production techniques could influence the chemical and structural properties of the papers. Accordingly, the sample selection was designed to reflect differences in commonly used plant materials, as well as to capture variability in key processing steps such as fiber cooking, bleaching, and mucilage preparation methods. Table 1. List of traditional handmade paper samples analyzed. Code. Country Product name Pulp fiber Pulping agents Dispersant additive China (No. 01) China Dakji(构皮紙/楮皮紙) Broussonetia papyrifera (paper mulberry, 构屬), China Plant Ash lye Actinidia chinensis stem mucilage 杨桃藤汁(中华猕猴桃茎液) - China (No. 02) China (No. 03) Mulberry bark paper(桑皮紙) Morus (mulberry, 桑屬), China China (No. 04) China (No. 05) Bast fibers with grass mixed paper (綿料紙) Pteroceltis tatarinowii (blue sandalwood or wingceltis, 青檀屬) + Oryza sativa (rice straw) glue-alum sizing China (No. 06) China (No. 07) Traditional Bamboo Paper (竹紙) Bambusoideae (bamboo), China - China (No. 08) Japan (No. 09) Japan Sekishu washi Broussonetia papyrifera (paper mulberry), Japan soda ash Abelmoschus manihot root mucilage - Japan (No.10) Mino washi Japan (No.11) Japan (No.12) Misu washi caustic soda White clay Japan (No.13) Japan (No.14) Uda washi Plant Ash lye Panicle hydrangea root mucilage White clay (Limestone) Japan (No.15) Korea (No. 16) Korea Pulp Dakji Broussonetia papyrifera (paper mulberry), Korea + wood pulp Sodium hydroxide(NaOH) PAM - Korea (No. 17) Dakji Broussonetia papyrifera (paper mulberry), Thailand Korea (No. 18) Olbal Hanji Broussonetia papyrifera (paper mulberry), Korea Korea (No. 19) Plant Ash lye Abelmoschus manihot root mucilage Korea (No. 20) Ssangbal Hanji Korea (No. 21) Hanji–Choksae Sodium hydroxide(NaOH) PAM Korea (No. 22) Korea (No. 23) Eumyungji Plant Ash lye Abelmoschus manihot root mucilage Korea (No. 24) Korea (No. 25) Korea (No. 26) Hyperspectral image acquisition and NIR spectral dataset NIR–HSI images of each traditional handmade paper sample were acquired using a Resonon Pika NIR-320 hyperspectral camera (Resonon Inc., USA), which covers a spectral range of 900–1,700 nm with a spectral resolution of 4.9 nm. 120-W halogen light sources provided illumination. For each paper type, 10 measurements were conducted to generate mean reflectance spectra, resulting in a database of 260 NIR spectra. Each spectrum contained 168 variables corresponding to wavelengths ranging from 900 to 1,700 nm. For classification modeling, the original NIR spectra and their first derivatives were used. The first derivative spectra were calculated using the Savitzky–Golay filter[ 50 ] to enhance spectral features and suppress baseline effects. Before model construction, root mean square normalization (RMSNorm) was applied to minimize intensity variations across the spectra. Partitioning of the NIR spectral dataset for classification modeling The dataset was split into training and test sets in a 7:3 ratio to construct and evaluate the classification models. Stratified random sampling was used to maintain a balanced class distribution across both sets. Additionally, threefold cross-validation was conducted exclusively on the training set to optimize hyperparameters and assess model stability while mitigating the risk of overfitting. Principal component analysis (PCA) and density-based spatial clustering of applications with noise (DBSCAN) PCA was performed to evaluate differences in the spectral patterns of traditional handmade papers. PCA transformed the original 168-dimensional NIR spectral data into a set of principal components (PCs), reducing dimensionality while preserving the majority of the variance. In this study, seven PCs were retained, and each data point was projected onto these new orthogonal axes. The first two PCs were used to visualize the relationships among paper types. PC loading plots were further analyzed to interpret the contribution of specific wavelengths to each PC, providing insights into how spectral variations influence projections in the reduced coordinate space. DBSCAN[ 51 – 54 ] was used to cluster paper types with similar NIR spectral characteristics. A secondary objective of using DBSCAN was to identify potential outliers in the PC-transformed space. The DBSCAN parameters were empirically set with an epsilon (eps) value of 0.005 and a minimum number of points (minPts) of 3. A data point was considered a core point if it had at least three neighboring points within a radius of 0.005 in PC space. Clusters were thus formed from regions containing three or more closely located points, while data points outside this density threshold were classified as noise or outliers. Machine learning models for the classification of traditional handmade paper using NIR spectra To evaluate classification performance while balancing computational cost, model interpretability, and robustness, various machine learning algorithms were implemented for the identification of traditional handmade papers. A schematic overview of the classification workflow, which combines hyperspectral image analysis and machine learning, is presented in Fig. 1 . The k-nearest neighbors (k-NN) algorithm, a distance-based nonparametric method, was employed as a simple and interpretable classifier. It assigns class labels based on the majority vote among the k closest training instances without requiring model training. Odd values of k ranging from 1 to 9 were tested, and the optimal number of neighbors was determined through a grid search. A support vector machine (SVM) was also implemented to construct a nonlinear decision boundary by maximizing the margin between classes. A radial basis function (RBF) kernel was applied to map the input data into a higher-dimensional feature space[ 55 ]. The SVM hyperparameters—penalty parameter (C) and kernel coefficient (gamma)—were optimized using a grid search across logarithmic scales: C from 2 –5 to 2⁵ and gamma from 10 –1 to 10 –5 . These parameters control the trade-off between margin maximization and training error, as well as the influence of individual data points within the kernel function. For artificial neural networks (ANNs), a multilayer feedforward architecture with backpropagation was adopted. The rectified linear unit was used as the activation function, and cross-entropy was selected as the loss function. Two solvers—stochastic gradient descent (SGD) and Adam—were evaluated for optimization. Various network configurations, including those with one or two hidden layers and differing numbers of neurons, were evaluated to determine the most effective structure. Initial learning rates of 0.0001, 0.001, 0.01, and 0.1 were examined, with a maximum of 1,000 training iterations. Extreme gradient boosting (XGBoost)[ 56 ], an ensemble learning algorithm based on boosting, was also employed due to its high accuracy and flexibility. The base learners of XGBoost are decision trees (DTs)[ 57 ]. A grid search was performed to optimize key hyperparameters, including the learning rate (lr, 0.01–0.1), maximum tree depth (max_depth, 3–7), number of boosting rounds (n_rounds, 50–300), and subsample ratio (subsample, 0.3–0.7). The column sampling ratio (colsample_bytree) was fixed at 0.3, while gamma and min_child_weight were set to 1.0 and 1, respectively. Separate optimizations were conducted for models trained on the original spectra and those using first-derivative preprocessing. For visualization purposes, the spectral angle mapper (SAM) algorithm[ 58 ] was applied. SAM calculates spectral similarity by measuring the angle between the spectral vector of each pixel and a reference spectrum, known as an endmember. This results in one raster layer per end member, where each pixel value represents the spectral angle. Smaller angles indicate higher similarity to the corresponding endmember class. In the next step, a classification map was generated by assigning each pixel to the endmember with the smallest spectral angle—referred to as minimum angle classification. Optionally, this classification can be refined by applying a maximum angle threshold to exclude pixels with low spectral similarity. Spectral feature importance measures Feature importance scores were derived from the XGBoost classifier to identify key spectral variables contributing to the classification of traditional handmade papers. As an ensemble method based on boosted decision trees, XCBoost evaluates the relevance of each input feature by quantifying its contribution to reducing prediction error during node splits. This importance is typically estimated through the mean decrease in node impurity (MDI)[ 59 ], also known as information gain. When a split is made at a parent node \(\:{P}_{j}\) using a specific feature \(\:{f}_{i}\) , resulting in two child nodes \(\:{L}_{j}\) and \(\:{R}_{j}\) , the gain from that split is computed as follows: $$\:{G}_{Pj}={w}_{Pj}{M}_{Pj}-{w}_{Lj}{M}_{Lj}-{w}_{Rj}{M}_{Rj.}$$ 1 Here, \(\:w\) represents the relative number of samples at each node, and \(\:M\) denotes the impurity of the node. This value captures how effectively feature \(\:{f}_{i}\) separates the data. The overall contribution of feature \(\:{f}_{i}\) in a single decision tree is calculated as the total gain accumulated from all splits where \(\:{f}_{i}\) : $$\:{I}_{DT}\left({f}_{i}\right)=\sum\:_{j\in\:\left\{nodes\:split\:by\:{f}_{i}\right\}}{G}_{j}.$$ 2 In XGBoost, the final importance of a feature is determined by aggregating the normalized importance of that feature across all DTs in the ensemble. First, the importance score of each feature is scaled relative to the total importance of all features: $$\:{I}_{norm}\left({f}_{i}\right)=\frac{I\left({f}_{i}\right)}{\sum\:_{j}I\left({f}_{j}\right)}.$$ 3 Then, the average across all base learners yields the ensemble-level importance: $$\:{I}_{XGBoost}\left({f}_{i}\right)=\frac{1}{{N}_{T}}\sum\:_{t=1}^{{N}_{T}}{I\left({f}_{i}^{\left(t\right)}\right)}^{norm},$$ 4 where \(\:{N}_{T}\) denotes the number of boosting rounds or trees. Evaluation metric In binary classification, prediction outcomes are typically categorized based on correctness. A correctly predicted instance from the positive class is referred to as a true positive (TP), while an accurately predicted instance from the negative class is known as a true negative. Conversely, a positive instance mistakenly predicted as negative is termed a false negative, and a negative instance incorrectly classified as positive is referred to as a false positive (FP)[ 60 ]. To evaluate the performance of classification models—particularly in the presence of imbalanced datasets—the F1-score is commonly used[ 61 ]. Unlike overall accuracy, which can be misleading when one class dominates, the F1-score provides a more balanced and informative assessment. It is defined as the harmonic mean of precision and recall: $$\:Precision\:=\:\frac{TP}{TP\:+\:FP}.$$ 5 $$\:Recall\:=\:\frac{TP}{(TP\:+\:FN)}.$$ 6 $$\:F1\:=\:2\:\times\:\:\frac{Precision\:\times\:\:Recall}{Precision\:+\:Recall}.$$ 7 To evaluate the performance of the classification models in the presence of class imbalance, the weighted F1-score was employed. This metric accounts for class distribution by assigning weights to each class based on their relative frequencies (as shown in Eq. 8 ), and then incorporating these weights into the corresponding class-specific F1-scores (Eq. 9 ). This approach provides a comprehensive assessment that reflects both individual class performance and overall effectiveness, even when the dataset is imbalanced[ 49 ]: $$\:{w}_{i}\:=\:\frac{{N}_{i}}{{T}_{i}}$$ 8 , where \(\:{w}_{i}\) is the weight of class i , \(\:{N}_{i}\) is the number of samples in class i , and \(\:{T}_{i}\) is the total number of samples across all classes. $$\:Weighted\:{F1}_{i}\:=\:{\sum\:}_{i\:=\:1}^{N}{w}_{i}\:\times\:\:{F1}_{i}.$$ 9 All data processing and classification modeling were performed using R statistical software (R Core Team, version 4.4.2, Auckland, New Zealand). Results and discussion The DBSCAN clustering results, based on both the original and first derivative NIR spectra of the biomass materials, were visualized using PC score plots in Fig. 2 . In general, the formulation of raw materials in traditional handmade paper is relatively simple compared to that of modern machine-made paper. Modern papermaking typically involves a complex furnish composed of refined pulp, internal sizing agents, retention aids, fillers, defoamers, optical brighteners, and dry-strength additives in an aqueous solution[ 62 ]. In contrast, traditional handmade paper production primarily uses pulp and small amounts of natural mucilage as raw materials[ 24 ]. As a result, the primary differences in chemical composition among traditional papers are attributed mainly to variations in the pulp, which are influenced by factors such as wood species and cooking methods[ 2 , 22 , 24 , 63 , 64 ]. The clustering results reflect these similarities, as wood species and cooking agents directly affect the composition of cellulose, hemicellulose, and lignin. Common wood species used in traditional handmade papers include hemp, bark, bamboo, grasses, and paper mulberry[ 1 , 2 , 24 , 25 ]. The effect of spectral preprocessing using the first derivative was also examined. For instance, the number of data points from 14 different data classes assigned to cluster 1 remained the same for original and first derivative spectra. In the original spectra (Fig. 2 a), cluster 1 consisted of broad, overlapping subgroups. In contrast, the first derivative spectra (Fig. 2 b) showed the individual data points more distinctly positioned and better separated. Similarly, cluster 5 contained only a single data class in both Figs. 2 a and 2 b; however, the clustering result based on the first derivative spectra showed a more consistent and compact distribution along the first two PCs. Although this comparison is indirect, it indicates that preprocessing with the first derivative may enhance the separability of spectral features and contribute to improved classification accuracy in subsequent machine learning models. A direct comparison of classification performance is provided in the classification modeling subsection. Additionally, the DBSCAN[ 51 – 54 ] proved effective in detecting outliers among data points projected onto the PC coordinate space. Using the first derivative spectra, DBSCAN identified six outlier data points. As shown in Fig. 2 b, these outliers did not align with their original cluster locations and were spatially separated from the main groupings. Figures 3 a and 3 b display the original NIR spectra and their first derivatives within the 900–1,700 nm range, grouped according to the cluster assignments shown in Fig. 2 . The spectral region between 900 and 970 nm contains overlapping reflectance bands attributed to starch, cellulose, sucrose (900–920 nm), as well as water (960–970 nm)[ 65 ]. The region between 1,000 and 1,050 nm exhibits subtle spectral features, though these are difficult to specific components[ 66 ]. The 1,075–1,250 nm range corresponds to the second overtones of aromatic and aliphatic C–H vibrations, typically associated with lignin[ 66 ]. A band at 1,256 nm is attributed to the second overtone of C–H stretching vibrations in cellulose[ 67 ]. The 1,390 nm feature reflects the variability in the intensity of adsorbed water[ 68 ]. The 1,420–1,600 nm regions correspond to the first overtone of O–H stretching vibrations[ 69 ]; within this range, bands at 1,537 and 1,576 nm are likely related to mucilage or hydrogen-bonded structures in polysaccharides[ 24 , 68 , 70 ]. Finally, the bands at 1,646 and 1,672 nm are associated with the first overtone of aliphatic C–H vibrations[ 71 ]. Figures 3 c and 3 d present the PC1 and PC2 loading plots derived from the PCA of the original and first derivative NIR spectra (900–1,700 nm). These loading plots illustrate the contribution of individual wavelengths to the respective PCs and highlight the most influential spectral regions for differentiating among clusters. In the PC1 loading plot (Fig. 3 c), both the original and first derivative spectra exhibit pronounced negative loadings between 1,400 and 1,500 nm, corresponding to the first overtone of the O–H stretching vibration—an indicator of cellulose and adsorbed water. Notably, the first derivative spectrum reveals greater variation and sharper features in this region, indicating enhanced sensitivity to subtle spectral differences. Additionally, the first derivative PC1 loading displays clear peaks at approximately 1,537 and 1,576 nm, likely linked to mucilage or hydrogen-bonded polysaccharide structures. The PC2 loading plot (Fig. 3 d) reveals more subtle contributions across most wavelengths. However, in the short-wavelength region (900–970 nm), a distinguishable signal is observed, especially in the first derivative spectrum. This region includes overlapping features associated with cellulose and water. Overall, the first derivative spectra enhance the differentiation of chemically meaningful spectral regions at the cluster level, even though some clusters still contain a mixture of data classes. These findings support previous studies that highlight the effectiveness of spectral preprocessing techniques in improving classification performance[ 72 – 76 ]. Classification models for traditional handmade paper Various classifiers were tested to discriminate among traditional handmade papers. The classification performance and corresponding optimal hyperparameters are summarized in Table 2 . During model construction, outliers identified by DBSCAN were removed from the dataset. Previous studies have shown that eliminating outliers during the training phase can substantially improve classification accuracy [ 22 , 23 , 41 ]. Accordingly, a total of 254 NIR spectral samples, representing 26 paper classes, were used for training, validation, and testing. Given the class imbalance in the dataset, an evaluation metric that accounts for unequal class distribution was necessary for a reliable assessment of model performance. Therefore, weighted F1 scores were used, providing a more comprehensive evaluation by considering both individual class performance and overall model effectiveness, particularly under imbalanced conditions[ 49 ]. Table 2 Performance of the machine learning models in traditional handmade paper classification and their optimal hyperparameters Model Preproc. Hyperparameters F1 score Training Test k-NN Original k = 1 1.000 0.851 First deriv. k = 1 1.000 0.901 DT Original cp = 0.03 0.824 0.634 First deriv. cp = 0.03 0.824 0.683 SVM Original gamma = 10 − 2 , C = 2 4 0.989 0.958 First deriv. gamma = 10 − 2 , C = 2 3 1.000 1.000 FNN Original hl_size = (16), lr = 0.1, optimizer = SGD 0.978 0.934 First deriv. hl_size = (16, 16), lr = 0.001, optimizer = Adam 0.989 0.974 XGBoost Original lr = 0.07, max_depth = 5, gamma = 1, colsample_bytree = 0.3, min_child_weight = 1, subsample = 0.5, n_rounds = 75 1.000 0.900 First deriv. lr = 0.1, max_depth = 3, gamma = 1, colsample_bytree = 0.3, min_child_weight = 1, subsample = 0.7, n_rounds = 50 1.000 0.963 Preproc., preprocessing; First deriv., first derivative; hl_size, hidden layer sizes; lr, learning rate As shown in Table 2 , the SVM trained with the original spectra achieved the highest performance, with a weighted F1-score of 0.958. Although FNN and XGBoost are generally regarded as high-performing models[ 56 , 77 – 79 ], their F1-scores in this study were lower than that of the SVM. This result aligns with Occam’s razor theory[ 80 ], which indicates that when multiple models yield comparable performance, the simplest should be preferred for better generalization. The SVM, which seeks to find an optimal hyperplane that maximizes the margin between classes, is inherently simpler than FNN and XGBoost. Its performance was enhanced by the use of the RBF kernel, which allows for nonlinear decision boundaries. Another possible reason for the superior performance of the SVM is that the structure of the dataset may not have been complex enough to fully leverage the advantages of deep neural networks (e.g., FNN) or ensemble-based methods (e.g., XGBoost). The optimized FNN architecture in this study was relatively simple, consisting of shallow or single-layer networks. Spectral preprocessing using the first derivative transformation substantially improved classification performance across all models. The F1-scores for FNN and XGBoost increased from 0.934 to 0.974 and from 0.900 to 0.963, respectively. Notably, the SVM trained on first derivative spectra achieved perfect classification accuracy (F1-score = 1.000). Even simpler models such as k-NN and DT, which initially exhibited relatively low performance, showed improvement after first derivative transformation—though their accuracy remained lower than that of the more advanced models. This is likely due to their simpler learning mechanisms. For instance, k-NN, a nonparametric algorithm, classifies instances based purely on distance metrics and lacks a formal training phase, making it less robust for high-dimensional datasets. Given the 26 classes and 163 spectral variables per sample of the study, models with greater computational and modeling capacity—such as SVM, FNN, and XGBoost—were better suited to the classification tasks. Among these, SVM emerges as the most suitable model, offering a balance between computational efficiency and high classification accuracy, which is consistent with the principle of Occam’s razor. Feature importance of the NIR spectra for the classification of traditional handmade paper The model comparison presented in Table 2 confirms the strong performance of the SVM. However, unlike XGBoost, the SVM lacks interpretability, making it difficult to explain the decision-making process. Similarly, the FNN does not offer logical transparency in how classifications are derived. Therefore, no single model can fully replace the unique advantages of others, as each possesses distinct strengths and limitations. Feature importance based on MDI[ 59 ] was calculated using the XGBoost algorithm, and the results are shown in Fig. 4 . Feature importance metrics provide valuable interpretability for machine learning models by identifying the spectral variables that most strongly influence classification decisions[ 23 , 48 ]. Figure 4 a presents the first-derivative NIR spectra (900–1,700 nm) for five representative samples, each selected from a different cluster identified by DBSCAN in Fig. 2 . The overlaid feature importance values reflect the contribution of individual spectral variables to the separation of these five clusters. Notably, the spectral region between 1,075 and 1,250 nm, associated with the second overtones of aromatic and aliphatic C–H vibrations in lignin, exhibited high importance scores, indicating that lignin-related signals played a key role in cluster differentiation. Figure 4 b illustrates the first-derivative NIR spectra of representative samples from the selected classes—China (No. 3), Japan (No. 9), and Korea (Nos. 16, 17, and 23)—along with the feature importance values computed from the 26-class classification model (Table 1 ). In contrast to the clustering model, which enabled clearer identification of discriminative spectral regions through five groups, the 26-class classification model did not yield a sharply localized feature importance profile. This is attributed to the increased number of classes and the associated variability among samples, which makes it more difficult to isolate a narrow set of key wavelengths. Nevertheless, several spectral regions consistently exhibited high information gain across the 26-class model: 920–1,265, 1290–1440, 1460–1,580, and 1,660–1,680 nm. These regions appear to be particularly informative for distinguishing traditional handmade papers. Differences in carbohydrate composition—arising from the use of diverse raw materials such as wood and nonwood sources[ 3 ] and variations in pulping and bleaching methods—are known to influence the contents of cellulose, hemicellulose, and lignin[ 81 – 85 ], which likely accounts for the spectral diversity captured by the model. Although variable selection was not applied in this study, selecting highly informative spectral regions for model construction has been shown to enhance model accuracy while reducing computational cost. Narrowing the spectral range decreases the number of input variables the model must process. Our previous research, along with other studies, has demonstrated the effectiveness of this approach in improving model robustness[ 22 , 23 , 48 , 49 ]. Alternatively, these informative regions may also serve to enhance the interpretability and transparency of the decision-making process of the model. Visualization Figure 5 presents a comparison of hyperspectral images for representative traditional handmade paper samples. For this visualization analysis, the same classes used in the feature importance plots were selected: China (No. 03), Japan (No. 09), and Korea (Nos. 16, 17, and 23). Color mapping was performed using classification outputs, where each pixel in the hyperspectral image was assigned a color corresponding to the class with the highest spectral similarity, as determined by the trained model. In the five-class composite visualization (left panel), the colors appear highly intermixed, making class separation visually challenging. However, in the pairwise binary visualizations (right matrix), clearer spatial distinctions are observed between individual samples. Each panel in the matrix displays the result of minimum angle classification using the SAM, with colored regions indicating spectral uniqueness for each paper type. These findings indicate that while simultaneous classification across all classes may be limited by spectral overlap, pixel–level comparisons using hyperspectral data enables effective identification of traditional handmade paper types. Moreover, spectral image-based color mapping serves as a valuable tool for assessing authenticity and compositional similarity, as it visualizes how well unknown samples match known references. Future research will focus on developing convolutional neural network-based models capable of processing high-dimensional tensor data, such as hyperspectral images composed of numerous spectral channels. These multichannel datasets, which integrate both spatial and spectral information, require specialized network architectures to learn complex spectral–spatial features effectively. The goal is to enable accurate defect detection, classification, and instance-level segmentation for automated quality evaluation and assessment. Ongoing efforts include the design of deep learning frameworks tailored to hyperspectral tensor data, as well as the development of an automated imaging system optimized for high-resolution hyperspectral acquisition. Conclusions PCA and DBSCAN were effectively utilized to explore chemical composition similarities among traditional handmade paper samples based on NIR spectra extracted from hyperspectral images. DBSCAN also identified and excluded special outliers that could negatively affect model training and decision-making. The application of first-derivative preprocessing substantially improved model performance, yielding higher F1-scores compared to models trained on raw spectral data. Classification models, including SVM, FNN, and XGBoost, demonstrated strong performance when trained on the first derivative spectra, achieving F1-scores of 1.00, 0.97, and 0.96, respectively. Among them, SVM provided an optimal balance between classification accuracy and computational efficiency, though it lacks interpretability. In contrast, XGBoost provides feature importance metrics that enhance transparency and offer insight into the model’s decision-making process. For visualization, the SAM method was effectively employed to highlight spectral differences between paper samples through pixel-wise color mapping. Overall, this study demonstrates the potential of HSI combined with machine learning for accurately classifying traditional East Asian handmade papers, enabling reference-based spectral matching across diverse material origins. Overall, this study provides meaningful contributions to the classification of traditional handmade paper using machine learning and NIR hyperspectral imaging, offering valuable insights for future research in cultural heritage science and material identification. Declarations Additional Information Competing interests: The authors declare no competing interests. Author Contribution Y.J.L. conceptualized the study, performed the investigation, and wrote the original manuscript draft. Y.J.L., S.Y.W., and S.B.P. developed the methodology. S.B.P. contributed to data acquisition and preprocessing. H.J.K. acquired funding and supervised the project together with T.J.L. H.J.K. and T.J.L. reviewed and edited the manuscript. All authors reviewed and approved the final version of the manuscript. Acknowledgements The authors gratefully acknowledge the financial support provided by the Ministry of Science and ICT of the Korean government and the National Research Foundation of Korea (Grant No. RS-2023-00301889). Data Availability The datasets generated and/or analyzed during the current study are available from the corresponding author upon reasonable request. References Hubbe, M. A. & Bowden, C. Handmade paper: a review of its history, craft, and science. BioRes ouces 4 , 1736-1792 (2009). Han, B. et al. Characterization of Korean handmade papers collected in a Hanji reference book. Herit . Sci . 9 , 1-12 (2021). Han, B., Vial, J., Sakamoto, S. & Sablier, M. Identification of traditional East Asian handmade papers through the multivariate data analysis of pyrolysis-GC/MS data. Analyst 144 , 1230-1244 (2019). Mullock, H. J. T. P. C. Xuan paper. 19, 23-30 (1995). Luo, Y., Cigić, I. K., Wei, Q. & Strlič, M. Characterisation and durability of contemporary unsized Xuan paper. Cellulose 28 , 1011-1023 (2021). Tang, Y. & Smith, G. J. Fluorescence and photodegradation of Xuan paper: the photostability of traditional Chinese handmade paper. J. Cult. Herit. 14 , 464-470 (2013). Inaba, M. & Sugisita, R. Permanence of wash (Japanese paper). Stud. Conserv. 33 , 1-4 (1988). Prestowitz, B., Katayama, Y. J. Washi: understanding Japanese paper as a material of culture and conservation. Book Paper Group Annual 37 , 77-91 (2018). Lee, O.-K., Kim, S. & Lee, H. W. Evolution of the Hanji-making technology, from ancient times to the present. J. Korean Wood Sci. Technol. 51 , 509-525 (2023). Jeong, M.-J. et al. Deterioration of ancient Korean paper (Hanji), treated with beeswax: a mechanistic study. Carbohyd. Polym. 101 , 1249-1254 (2014). Tindale, T. K. & Tindale, H. R. J. The handmade papers of Japan (1952). Goto, S. Japanese hand-made paper (1958). Barrett, T. J. & Lutz, W. Japanese papermaking: traditions, tools, and techniques (1983). Lee, S. C. Hanji: everything you need to know about traditional Korean paper (Hyeonamsa Publishing Co., Ltd., 2012). Cheon, C., Kim, S.-J., Jin, Y.-M. Properties of indigenous Korean paper (Hanji)-Classification of Oebal (single frame) papermaking methods. J. Korean Wood Sci. Technol. 27 , 88-104 (1999). Tsien, T.-H. Raw materials for old papermaking in China. J. Am. Oriental Soc. 93 , 510-519 (1973). Shi, J. & Li, T. Technical investigation of 15th and 19th century Chinese paper currencies: fiber use and pigment identification. J. Raman Spectrosc. 44 , 892-898 (2013). Han, B. et al. Characterization of Korean handmade papers collected in a Hanji reference book. Herit . Sci . 9 , 96 (2021). Dong, L.-Y. & Zhu, Y.-J. Fire-Resistant inorganic analogous xuan paper with thousands of years’ super-durability. ACS Sustain. Chem. Eng. 6 , 17239-17251 (2018). Kim, Y. J., Yoon, S., Cho, Y.-H., Kim, G. & Kim, H.-K. Paintable and writable electrodes using black conductive ink on traditional Korean paper (Hanji). RSC Adv. 10 , 24631-24641 (2020). Choi, Y. et al. Enhancing Li-S battery performance with porous carbon from Hanji. Batteries 11 , 4 (2024). Lee, Y. J., Won, S. Y., Park, S. B. & Kim, H.-J. Chemometric approaches for discriminating manufacturers of Korean handmade paper using infrared spectroscopy. Herit Sci 12 , 10.1186/s40494-024-01460-6 (2024). Lee, Y. J., Kweon, S. W., Jeong, C. W. & Kim, H. J. Evaluating the performance of machine learning and variable selection methods to identify document paper using infrared spectral data. Spectrochim . Acta A Mol . Biomol . Spectrosc . 327 , 125299 (2025). Jang, K. J., Heo, T. Y. & Jeong, S. H. Classification option for Korean traditional paper based on type of raw materials, using near-infrared spectroscopy and multivariate statistical methods. BioRes ources 15 , 9045-9058 (2020). Wang, J. Papermaking raw materials of China: an atlas of micrographs and the characteristics of fibers (China Light Industry Press, 1999). Jeong, M.-J. et al. Deterioration of ancient cellulose paper, Hanji: evaluation of paper permanence. Cellulose 21 , 4621-4632 (2014). Lee, S. H. Adhesives used in conservation treatment in cultural properties: paintings and written artifacts (Conservation of Papers and Textiles, National Research Institute of Cultural Heritage, 2011). Choi, T. J. F. Development of a natural dispersant for Korean traditional papermaking (Ⅰ)-Viscosity and papermaking characteristics of Hydrangea paniculata mucilage. Forest Bioenergy 23 , 38-44 (2004). Ilvessalo-Pfäffli, M.-S. Fiber atlas: identification of papermaking fibers (Springer Science & Business Media, 1995). Dragojević, A., Gregor-Svetec, D., Vodopivec Tomažič, J. & Lozo, B. Characterization of seventeenth century papers from Valvasor's collection of the Zagreb Archdiocese. Herit . Sci . 9 , 35 (2021). Grant, J. The role of paper in questioned document work. J. Forensic Sci. Soc. 13 , 91-95 (1973). Spence, L. D., Baker, A. T. & Byrne, J. P. Characterization of document paper using elemental compositions determined by inductively coupled plasma mass spectrometry. J. Anal. At. Spectrom. 15 , 813-819 (2000). Spence, L. D., Francis, R. B. & Tinggi, U. Comparison of the elemental composition of office document paper: evidence in a homicide case. J. Forensic Sci. 47 , 648-651 (2002). Yan, C. et al. Analysis of handmade paper by Raman spectroscopy combined with machine learning. J. Raman Spectrosc. 53 , 260-271 (2021). Yan, Y. et al. FTIR Spectroscopy in cultural heritage studies: non-destructive analysis of Chinese handmade papers. Chem. Res. Chin. Univ. 35 , 586-591 (2019). Wertz, J. H., McClelland, A. A., Mayer, D. D. & Knipe, P. Modeling chemical tests and fiber identification of paper materials using principal component analysis and specular reflection FTIR data. Heritage 5 , 1960-1973 (2022). Kellicut, D. C. et al. Emerging technology: hyperspectral imaging. Perspect. Vasc. Surg. Endovasc. Ther. 16 , 53-57 (2004). Schultz, R. A. et al. Hyperspectral imaging: a novel approach for microscopic analysis. Cytometry 43 , 239-247 (2001). Wu, Y. et al. Non-destructive prediction and pixel-level visualization of polysaccharide-based properties in ancient paper using SWNIR hyperspectral imaging and machine learning. Carbohyd. Polym. 352 , 123198 (2025). Hwang, S.-W. et al. NIR-chemometric approaches for evaluating carbonization characteristics of hydrothermally carbonized lignin. Sci. Rep. 11 , 16979 (2021). Hwang, S.-W. et al. Investigation of NIR spectroscopy and electrical resistance-based approaches for moisture determination of logging residues and sweet sorghum. BioRes ources 18 , 2064-2082 (2023). Sun, B., Liu, J., Liu, S. & Yang, Q. Application of FT-NIR-DR and FT-IR-ATR spectroscopy to estimate the chemical composition of bamboo (Neosinocalamus affinis Keng). Holzforschung 65 , (2011). Mendez, J., Mendoza, L., Cruz-Tirado, J. P., Quevedo, R. & Siche, R. Trends in application of NIR and hyperspectral imaging for food authentication. Sci. A gropecu . 10 , 143-161 (2019). Medus, L. D., Saban, M., Francés-Víllora, J. V., Bataller-Mompeán, M. & Rosado-Muñoz, A. Hyperspectral image classification using CNN: application to industrial food packaging. Food Control 125 , 107962 (2021). Mahesh, S., Jayas, D. S., Paliwal, J. & White, N. D. G. Hyperspectral imaging to classify and monitor quality of agricultural materials. J. Stored Prod. Res. 61 , 17-26 (2015). Agilandeeswari, L., Prabukumar, M., Radhesyam, V., Phaneendra, K. L. N. B. & Farhan, A. Crop classification for agricultural applications in hyperspectral remote sensing images. Appl. Sci. 12 , 1670 (2022). Tatzer, P., Wolf, M. & Panner, T. Industrial application for inline material sorting using hyperspectral imaging in the NIR range. Real-Time Imaging 11 , 99-107 (2005). Hwang, S.-W. et al. Feature importance measures from random forest regressor using near-infrared spectra for predicting carbonization characteristics of kraft lignin-derived hydrochar. J. Wood Sci. 69 , (2023). Hwang, S.-W., Park, G., Kim, J., Kang, K.-H. & Lee, W.-H. One-dimensional convolutional neural networks with infrared spectroscopy for classifying the origin of printing paper. BioResources 19 , 1633-1651 (2024). Savitzky, A. & Golay, M. J. E. Smoothing and differentiation of data by simplified least squares procedures. Anal. Chem. 36 , 1627-1639 (1964). Hahsler, M., Piekenbrock, M. & Doran, D. dbscan: fast density-based clustering with R. J. Stat. Soft. 91 , 1-30 (2019). Ester, M., Kriegel, H.-P., Sander, J. & Xu, X. A density-based algorithm for discovering clusters in large spatial databases with noise in Proceedings of the second international conference on knowledge discovery and data mining 226-231 (AAAI Press, 1996). Campello, R. J., Moulavi, D. & Sander, J. Density-based clustering based on hierarchical density estimates in Pacific-Asia conference on knowledge discovery and data mining 160-172 (Springer Berlin Heidelberg, 2013). Sander, J., Ester, M., Kriegel, H.-P. & Xu, X. Density-based clustering in spatial databases: the algorithm orbscan and its applications. Data Min. Knowl. Disc ov . 2 , 169-194 (1998). Vert, J.-P., Tsuda, K. Schölkopf, B. A primer on the kernel methods. Kernel Methods Comput. Boil. 47 , 35-70 (2004). Chen, T. & Guestrin, C. Xgboost: a scalable tree boosting system in Proceedings of the ACM signed international conference on knowledge discovery and data mining 785-794 (Association for Computing Machinery, 2016). Breiman, L., Friedman, J., Olshen, R. Stone, C. Classification and regression trees (Routledge, 1984). Kruse, F. A. et al. The spectral image processing system (SIPS)—interactive visualization and analysis of imaging spectrometer data. Remote Sens. Environ. 44 , 145-163 (1993). Louppe, G., Wehenkel, L., Sutera, A. & Geurts, P. J. A. Understanding the variable importances in forests of randomized trees in Proceedings of the international conference on neural information processing systems 431-439 (Curran Associates Inc., 2013). Altman, D. G. & Bland, J. M. Diagnostic tests 1: sensitivity and specificity. BMJ 308 , 1552-1552 (1994). Velez, D. R. et al. A balanced accuracy function for epistasis modeling in imbalanced datasets using multifactor dimensionality reduction. Genet. Epidemiol. 31 , 306-315 (2007). Smook, G. A. Handbook for pulp & paper technologists (A. Wilde, 2002). Yan, C. et al. Analysis of handmade paper by Raman spectroscopy combined with machine learning. J. Raman Spectrosc. 53 , 260-271 (2022). Han, B., Yang, Y., Wang, B., Jiang, H. & Sablier, M. Rapid identification of bast fibers in ancient handmade papers based on improved characterization of lignin monomers by Py-GCxGC/MS. Cellulose 30 , 575-590 (2023). Travers, S., Bertelsen, M. G., Petersen, K. K. & Kucheryavskiy, S. V. Predicting pear (cv. Clara Frijs) dry matter and soluble solids content with near infrared spectroscopy. LWT - Food Sci. Technol. 59 , 1107-1113 (2014). Kelley, S., Rials, T., Snell, R., Groom, L. & Sluiter, A. Use of near infrared spectroscopy to measure the chemical and mechanical properties of solid wood. Wood Sci. Technol. 38 , 257-276 (2004). Schwanninger, M., Rodrigues, J. C. & Fackler, K. A review of band assignments in near infrared spectra of wood and wood components. J. Near Infrared Spec. 19 , 287-308 (2011). Quintero Balbas, D., Lanterna, G., Cirrincione, C., Fontana, R. & Striova, J. Non-invasive identification of textile fibers using near-infrared fiber optics reflectance spectroscopy and multivariate classification techniques. Eur. Phys. J. Plus 137 , 85 (2022). Osborne, B. G. Fearn, T. Near-infrared spectroscopy in food analysis (Longman Scientific & Technical New York, 1986). Zhang, X. & Wyeth, P. Moisture sorption as a potential condition marker for historic silks: noninvasive determination by near-infrared spectroscopy. Appl. Spectrosc. 61 , 218-222 (2007). Huang, A., Li, G., Fu, F. & Fei, B. Use of visible and near infrared spectroscopy to predict the klason lignin content of bamboo, Chinese fir, Paulownia, and Poplar. J. Wood Chem. Technol. 28 , 194-206 (2008). Zhou, C. et al. Rapid determination of cellulose content in pulp using near infrared modeling technique. BioRes ources 13 , 6122-6132 (2018). Horikawa, Y. Assessment of cellulose structural variety from different origins using near infrared spectroscopy. Cellulose 24 , 5313-5325 (2017). Huang, C., Han, L., Liu, X. & Ma, L. The rapid estimation of cellulose, hemicellulose, and lignin contents in rice straw by near infrared spectroscopy. Energy Sources A: Recovery Util. Environ. Eff. 33 , 114-120 (2010). Gouveia, C. S. S., Lebot, V. & Pinheiro de Carvalho, M. NIRS estimation of drought stress on chemical quality constituents of taro (Colocasia esculenta L.) and sweet potato (Ipomoea batatas L.) flours. Appl. Sci. 10 , 8724 (2020). Font, R., del Río-Celestino, M., Luna, D., Gil, J. & de Haro-Bailón, A. Rapid and cost-effective assessment of the neutral and acid detergent fiber fractions of chickpea (Cicer arietinum L.) by combining modified PLS and visible with near-infrared spectroscopy. Agronomy 11 , 666 (2021). Bebis, G. & Georgiopoulos, M. Feed-forward neural networks. IEEE Potentials 13 , 27-31 (1994). Svozil, D., Kvasnicka, V. & Pospichal, J. Introduction to multi-layer feed-forward neural networks. Chemometr. Intell. Lab. 39 , 43-62 (1997). Nielsen, D. Tree boosting with xgboost-why does xgboost win" every" machine learning competition? (NTNU, 2016). Domingos, P. Occam's two razors: the sharp and the blunt in KDD 37-43 (Artificial Intelligence Group, 1998). Gümüskaya, E., Usta, M. & Kirci, H. The effects of various pulping conditions on crystalline structure of cellulose in cotton linters. Polym. Degrad. Stab. 81 , 559-564 (2003). Abd El-Sayed, E. S., El-Sakhawy, M. & El-Sakhawy, M. A.-M. Non-wood fibers as raw material for pulp and paper industry. Nord. Pulp Paper Res. J. 35 , 215-230 (2020). Liu, Z., Wang, H. & Hui, L. Pulping and papermaking of non-wood fibers. Pulp Paper Process. 1, 4-31 (2018). Areej, F., Ashadie, K. M., Zakiah, S. & Ainun, Z. Pulping process for nonwoody plants in Pulping and papermaking of norwood plant fibers 17-32 (Academic Press, 2023). Weng, J. & Chen, G. The influence of papermaking process on the properties of chinese handmade bamboo paper. Restaurator 46 , 59-83 (2025). Additional Declarations No competing interests reported. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-6397784","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":447885543,"identity":"34350963-511e-4bc2-b900-7828785fedc5","order_by":0,"name":"Yong Ju Lee","email":"","orcid":"","institution":"Kookmin University","correspondingAuthor":false,"prefix":"","firstName":"Yong","middleName":"Ju","lastName":"Lee","suffix":""},{"id":447885544,"identity":"2c83cc8e-42ab-44fc-ac15-f2f1087cf158","order_by":1,"name":"Seo Young Won","email":"","orcid":"","institution":"Kookmin University","correspondingAuthor":false,"prefix":"","firstName":"Seo","middleName":"Young","lastName":"Won","suffix":""},{"id":447885545,"identity":"70197ea5-c40c-4484-8092-469324de71f4","order_by":2,"name":"Seong Bin Park","email":"","orcid":"","institution":"Korea high tech textile research institute","correspondingAuthor":false,"prefix":"","firstName":"Seong","middleName":"Bin","lastName":"Park","suffix":""},{"id":447885546,"identity":"e4adecc6-cdd8-486b-876b-6719c9201b10","order_by":3,"name":"Tai-Ju Lee","email":"","orcid":"","institution":"Kookmin University","correspondingAuthor":false,"prefix":"","firstName":"Tai-Ju","middleName":"","lastName":"Lee","suffix":""},{"id":447885547,"identity":"5b1279e8-0fc9-4519-b92d-d0861ab8d920","order_by":4,"name":"Hyoung Jin Kim","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA3ElEQVRIie3RMQrCMBTG8U+6FnSMFOoVUjuIt0nJ4BKho2Ohg0sP0CJ4BifnhkC79ACFOuji7t7BqAi6hLo55D/lwfvBgwA22x9GASYTCp8AznPE8zGAhD8RjBIgyj82zWRB2EXm8XFVbFM1b+MTxtvSCTcGsswZkwfarXduxbmgV5CGOVFjOqzV5KzJnohQCaqAFo5MBpCV/yazQUQfxjxN+INQTSIjac5M5rQLiqzioSZu0ERpYCS14Les72akTpUneuX7tVJTE8FEsK/Zff2TqXFdmhdsNpvNdgdUVlUyVlPBQAAAAABJRU5ErkJggg==","orcid":"","institution":"Kookmin University","correspondingAuthor":true,"prefix":"","firstName":"Hyoung","middleName":"Jin","lastName":"Kim","suffix":""}],"badges":[],"createdAt":"2025-04-08 00:08:11","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-6397784/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-6397784/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":82146784,"identity":"60758cd3-0620-414b-9790-4128cbc242c8","added_by":"auto","created_at":"2025-05-07 06:57:18","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":340777,"visible":true,"origin":"","legend":"\u003cp\u003eDiagram for the classification of traditional handmade paper.\u003c/p\u003e","description":"","filename":"floatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-6397784/v1/8a3ccfd9419ab6fe92ebec0f.png"},{"id":82146785,"identity":"4dc733d8-cf44-4bb1-b413-9d1772f3cbf9","added_by":"auto","created_at":"2025-05-07 06:57:18","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":739442,"visible":true,"origin":"","legend":"\u003cp\u003ePair plots of PC scores showing DBSCAN-based clustering on the original (a) and first derivative (b) NIR spectra, with data class information assigned to each cluster. The percentage values in parentheses in the axis titles represent the variance explained by each PC.\u003c/p\u003e","description":"","filename":"floatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-6397784/v1/45f0905be618c669d16c3704.png"},{"id":82146787,"identity":"aa278b4d-b863-40d9-a136-3cb73f62348b","added_by":"auto","created_at":"2025-05-07 06:57:18","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":353511,"visible":true,"origin":"","legend":"\u003cp\u003eOriginal NIR spectra (a), first derivative spectra (b), and their corresponding PC1 (c) and PC2 (d) loadings in the 900–1,700 nm region for clustering of traditional handmade paper.\u003c/p\u003e","description":"","filename":"floatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-6397784/v1/b6cd338a84fbb3b9752557cf.png"},{"id":82147917,"identity":"8081724b-ace8-41b5-92a5-2019d8fef7fc","added_by":"auto","created_at":"2025-05-07 07:05:18","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":233774,"visible":true,"origin":"","legend":"\u003cp\u003eFirst derivative NIR spectra and spectral feature importance derived from XGBoost models for two classification levels: (a) clustering results based on DBSCAN and (b) classification of 26 traditional handmade paper classes. In (b), one representative spectrum from each cluster is displayed, while the feature importance values reflect the contribution of each wavelength to the full multiclass model.\u003c/p\u003e","description":"","filename":"floatimage4.png","url":"https://assets-eu.researchsquare.com/files/rs-6397784/v1/68fa9d832167a8a588216e7e.png"},{"id":82146796,"identity":"935fbad1-7db1-435a-a3de-63427f382903","added_by":"auto","created_at":"2025-05-07 06:57:18","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":1059926,"visible":true,"origin":"","legend":"\u003cp\u003eHyperspectral image-based classification and visualization of traditional handmade paper samples from China (No. 03), Japan (No. 09), and Korea (Nos. 16, 17, and 23). The leftmost composite image shows five-class classification using pixel-wise color mapping based on the SAM, where each pixel is colored according to the class with the smallest spectral angle. The matrix on the right displays binary classification results for each pairwise comparison, with white boxes indicating self-comparisons.\u003c/p\u003e","description":"","filename":"floatimage5.png","url":"https://assets-eu.researchsquare.com/files/rs-6397784/v1/2ab9c7048a250b5698a73468.png"},{"id":88071494,"identity":"838ecfc2-3b33-4bb1-8b58-524351f156ae","added_by":"auto","created_at":"2025-08-01 05:53:30","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":3801482,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-6397784/v1/5dffebc0-69fa-4368-9e4b-6552a5b44521.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Nondestructive detection and classification of traditional handmade paper using near-infrared hyperspectral imaging and machine learning","fulltext":[{"header":"Introduction","content":"\u003cp\u003eHandmade paper has long served as a vital medium for preserving and transmitting human knowledge. Today, traditional hand papermaking is recognized as an intangible cultural heritage and continues to hold cultural, historical, and artistic value[\u003cspan additionalcitationids=\"CR2\" citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e]. In East Asia, distinct traditions such as Chinese \u003cem\u003eXuan\u003c/em\u003e paper[\u003cspan additionalcitationids=\"CR5\" citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e], Japanese Washi[\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e, \u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e], and Korean Hanji[\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e, \u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e, \u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e] have evolved based on regionally available plant fibers, tools, and techniques. These practices were often passed down through families or guilds and adapted over time to reflect local environments and cultural identities[\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e, \u003cspan additionalcitationids=\"CR12 CR13 CR14\" citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e]. The use of various plant-based fibers has resulted in papers with unique physical and esthetic properties[\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e, \u003cspan additionalcitationids=\"CR17\" citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e]. While these papers were traditionally used for writing, painting, and calligraphy, their applications have expanded into other areas, including material studies[\u003cspan additionalcitationids=\"CR20\" citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e]. However, accurately identifying and classifying traditional papers remains challenging due to their complex and diverse origins. This presents a substantial obstacle for researchers in cultural heritage, conservation, and forensic science[\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e, \u003cspan additionalcitationids=\"CR23\" citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eHandmade papers are primarily composed of cellulose, with varying levels of residual lignin depending on the fiber source and processing methods. Common raw materials include bamboo, reed hemp, bark, grasses, and paper mulberry[\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e, \u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e, \u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e]. In Korea, the production of Hanji\u0026mdash;a traditional handmade paper\u0026mdash;is carried out through a culturally substantial and well-preserved process. Typically, one-year-old stems of paper mulberry are steamed and manually peeled to isolate the inner white bark[\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e]. This bark is then boiled in an alkaline solution made from plant-derived ash (e.g., burned soybean, chili, or buckwheat stalks), which facilitates the removal of noncellulosic substances such as lignin and pectin. After repeated rinsing, the cleaned fibers are blended with mucilage extracted from \u003cem\u003eHibiscus manihot\u003c/em\u003e to control the dispersibility during sheet formation\u0026mdash;a key factor in producing evenly layered paper. The resulting sheets are traditionally sun-dried on wooden panels[\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e, \u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e, \u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e]. In recent years, however, the increased use of imported mulberry and synthetic chemicals, such as sodium hydroxide (NaOH) and polyacrylamide (PAM), has been reported, driven by cost and supply constraints[\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e]. Nevertheless, the core techniques of Hanji production continue to be preserved by skilled artisans and remain an integral part of Korea\u0026rsquo;s intangible cultural heritage.\u003c/p\u003e \u003cp\u003eComparable practices are also found in neighboring countries, where regionally adapted techniques have produced handmade papers with similar functional qualities but distinct cultural identities[\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e]. The diversity and complexity of traditional papers make it difficult to determine their precise origin or composition through visual inspection alone[\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e]. Misinterpretations regarding fiber sources were already noted in the 19th century, when the absence of scientific methodology limited historical analysis to rudimentary botanical observations. It was not until the early 20th century that material-based investigations were recognized as essential to the study of paper history[\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e], marking the emergence of an interdisciplinary approach that integrated natural science with cultural heritage studies.\u003c/p\u003e \u003cp\u003eIn recent decades, scientific characterization techniques have been increasingly applied to handmade papers to understand their composition and provenance better. Conventional methods such as optical microscopy[\u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e, \u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e], physical property measurements[\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e, \u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e31\u003c/span\u003e], pyrolysis gas chromatography/mass spectrometry (Py-GC/MS)[\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e, \u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e], elemental analysis[\u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e32\u003c/span\u003e, \u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e], and size-exclusion chromatography coupled with multi-angle light scattering (SEC-MALS)[\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e] have provided valuable insights into fiber types and quality. However, these techniques are inherently destructive and pose risks to fragile or culturally significant specimens. To address this limitation, nondestructive alternatives have gained prominence\u0026mdash;particularly vibrational spectroscopic techniques such as Raman[\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e, \u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e34\u003c/span\u003e], near-infrared (NIR)[\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e], and infrared (IR)[\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e, \u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e35\u003c/span\u003e, \u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e36\u003c/span\u003e] spectroscopy. These methods offer rapid, chemical-specific information without damaging the sample, making them highly suitable for the classification, authentication, and conservation of traditional handmade papers.\u003c/p\u003e \u003cp\u003eHyperspectral imaging (HSI) is an advanced, noninvasive technique that captures detailed spectral information across numerous contiguous wavelengths[\u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e37\u003c/span\u003e]. By recording a complete spectral profile for each pixel[\u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e38\u003c/span\u003e], HSI facilitates the precise identification of materials based on their characteristic absorption features[\u003cspan citationid=\"CR39\" class=\"CitationRef\"\u003e39\u003c/span\u003e]. The NIR region, in particular, is highly informative for detecting cellulose and other polysaccharides, which exhibit distinct spectral signatures[\u003cspan additionalcitationids=\"CR40 CR41\" citationid=\"CR39\" class=\"CitationRef\"\u003e39\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR42\" class=\"CitationRef\"\u003e42\u003c/span\u003e]. As such, HSI serves as a powerful tool for assessing key structural components in organic materials without the need for direct sampling. The combination of NIR HSI and machine learning is gaining traction across various fields, including food[\u003cspan citationid=\"CR43\" class=\"CitationRef\"\u003e43\u003c/span\u003e], packaging[\u003cspan citationid=\"CR44\" class=\"CitationRef\"\u003e44\u003c/span\u003e], agriculture[\u003cspan citationid=\"CR45\" class=\"CitationRef\"\u003e45\u003c/span\u003e, \u003cspan citationid=\"CR46\" class=\"CitationRef\"\u003e46\u003c/span\u003e], heritage science[\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e], and material science[\u003cspan citationid=\"CR47\" class=\"CitationRef\"\u003e47\u003c/span\u003e]. Recently, advanced computational algorithms have helped overcome challenges associated with nondestructive, time-consuming, and labor-intensive analyses of spectral data in the study of organic matter[\u003cspan additionalcitationids=\"CR23\" citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e, \u003cspan additionalcitationids=\"CR40\" citationid=\"CR39\" class=\"CitationRef\"\u003e39\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e41\u003c/span\u003e, \u003cspan citationid=\"CR48\" class=\"CitationRef\"\u003e48\u003c/span\u003e, \u003cspan citationid=\"CR49\" class=\"CitationRef\"\u003e49\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eThis study aims to develop machine learning models using NIR HSI for the accurate classification of traditional handmade papers from East Asia. To this end, full NIR spectral data extracted from HSI were used to construct classification models. To enhance the interpretability and transparency of the model in the decision-making process, feature importance scores were computed. Another objective of this study is to visualize the spectral heterogeneity among the sample using spectral angle mapper (SAM) enabled pixel-wise visualization. This article presents the results of a machine learning-based classification of the manufacturing origin of traditional handmade paper, contributing valuable insights to the ongoing research in the field of cultural heritage and material identification.\u003c/p\u003e"},{"header":"Materials and methods","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003eTraditional handmade paper samples\u003c/h2\u003e \u003cp\u003eA summary of the traditional handmade paper samples used in this study is provided in Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e. Samples containing any form of coloration were excluded to ensure consistency in chemical composition and to minimize interference in spectroscopic analysis. Preference was given to papers obtained from manufacturers actively engaged in traditional production practices. It was assumed that variations in fiber species and production techniques could influence the chemical and structural properties of the papers. Accordingly, the sample selection was designed to reflect differences in commonly used plant materials, as well as to capture variability in key processing steps such as fiber cooking, bleaching, and mucilage preparation methods.\u003c/p\u003e \n\u003cp\u003e\u003cstrong\u003eTable 1.\u003c/strong\u003e List of traditional handmade paper samples analyzed.\u003c/p\u003e\n\u003ctable border=\"1\" cellspacing=\"0\" cellpadding=\"0\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 54px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eCode.\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 58px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eCountry\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 71px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eProduct name\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 151px;\"\u003e\n \u003cp\u003e\u003cstrong\u003ePulp fiber\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 91px;\"\u003e\n \u003cp\u003e\u003cstrong\u003ePulping agents\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 99px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eDispersant\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 77px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eadditive\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 54px;\"\u003e\n \u003cp\u003eChina (No. 01)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"8\" style=\"width: 58px;\"\u003e\n \u003cp\u003eChina\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"2\" style=\"width: 71px;\"\u003e\n \u003cp\u003eDakji(构皮紙/楮皮紙)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"2\" style=\"width: 151px;\"\u003e\n \u003cp\u003e\u003cem\u003eBroussonetia papyrifera (paper mulberry,\u0026nbsp;\u003c/em\u003e构屬),\u0026nbsp;China\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"8\" valign=\"top\" style=\"width: 91px;\"\u003e\n \u003cp\u003ePlant Ash lye\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"8\" valign=\"top\" style=\"width: 99px;\"\u003e\n \u003cp\u003e\u003cem\u003eActinidia chinensis\u003c/em\u003e stem mucilage\u0026nbsp;杨桃藤汁(中华猕猴桃茎液)\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"4\" valign=\"top\" style=\"width: 77px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 54px;\"\u003e\n \u003cp\u003eChina (No. 02)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 54px;\"\u003e\n \u003cp\u003eChina (No. 03)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"2\" style=\"width: 71px;\"\u003e\n \u003cp\u003eMulberry bark paper(桑皮紙)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"2\" style=\"width: 151px;\"\u003e\n \u003cp\u003e\u003cem\u003eMorus (mulberry,\u0026nbsp;\u003c/em\u003e桑屬), China\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 54px;\"\u003e\n \u003cp\u003eChina (No. 04)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 54px;\"\u003e\n \u003cp\u003eChina (No. 05)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"2\" style=\"width: 71px;\"\u003e\n \u003cp\u003eBast fibers with grass mixed paper\u003c/p\u003e\n \u003cp\u003e(綿料紙)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"2\" style=\"width: 151px;\"\u003e\n \u003cp\u003e\u003cem\u003ePteroceltis tatarinowii\u003c/em\u003e (blue sandalwood or wingceltis,\u0026nbsp;青檀屬) + \u0026nbsp;\u003cem\u003eOryza sativa\u003c/em\u003e (rice straw)\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"2\" valign=\"top\" style=\"width: 77px;\"\u003e\n \u003cp\u003eglue-alum sizing\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 54px;\"\u003e\n \u003cp\u003eChina (No. 06)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 54px;\"\u003e\n \u003cp\u003eChina (No. 07)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"2\" style=\"width: 71px;\"\u003e\n \u003cp\u003eTraditional Bamboo Paper (竹紙)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"2\" style=\"width: 151px;\"\u003e\n \u003cp\u003e\u003cem\u003eBambusoideae\u003c/em\u003e (bamboo), China\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"2\" valign=\"top\" style=\"width: 77px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 54px;\"\u003e\n \u003cp\u003eChina (No. 08)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 54px;\"\u003e\n \u003cp\u003eJapan (No. 09)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"7\" style=\"width: 58px;\"\u003e\n \u003cp\u003eJapan\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 71px;\"\u003e\n \u003cp\u003eSekishu washi\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"7\" style=\"width: 151px;\"\u003e\n \u003cp\u003e\u003cem\u003eBroussonetia papyrifera\u0026nbsp;\u003c/em\u003e(paper mulberry), Japan\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"3\" valign=\"top\" style=\"width: 91px;\"\u003e\n \u003cp\u003esoda ash\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"5\" valign=\"top\" style=\"width: 99px;\"\u003e\n \u003cp\u003e\u003cem\u003eAbelmoschus manihot\u0026nbsp;\u003c/em\u003eroot mucilage\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"3\" valign=\"top\" style=\"width: 77px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 54px;\"\u003e\n \u003cp\u003eJapan (No.10)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"2\" style=\"width: 71px;\"\u003e\n \u003cp\u003eMino washi\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 54px;\"\u003e\n \u003cp\u003eJapan (No.11)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 54px;\"\u003e\n \u003cp\u003eJapan (No.12)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"2\" style=\"width: 71px;\"\u003e\n \u003cp\u003eMisu washi\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"2\" valign=\"top\" style=\"width: 91px;\"\u003e\n \u003cp\u003ecaustic soda\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"2\" valign=\"top\" style=\"width: 77px;\"\u003e\n \u003cp\u003eWhite clay\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 54px;\"\u003e\n \u003cp\u003eJapan (No.13)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 54px;\"\u003e\n \u003cp\u003eJapan (No.14)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"2\" style=\"width: 71px;\"\u003e\n \u003cp\u003eUda washi\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"2\" valign=\"top\" style=\"width: 91px;\"\u003e\n \u003cp\u003ePlant Ash lye\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"2\" valign=\"top\" style=\"width: 99px;\"\u003e\n \u003cp\u003e\u003cem\u003ePanicle hydrangea\u0026nbsp;\u003c/em\u003eroot mucilage\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"2\" valign=\"top\" style=\"width: 77px;\"\u003e\n \u003cp\u003eWhite clay (Limestone)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 54px;\"\u003e\n \u003cp\u003eJapan (No.15)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 54px;\"\u003e\n \u003cp\u003eKorea (No. 16)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"11\" style=\"width: 58px;\"\u003e\n \u003cp\u003eKorea\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 71px;\"\u003e\n \u003cp\u003ePulp Dakji\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 151px;\"\u003e\n \u003cp\u003e\u003cem\u003eBroussonetia papyrifera\u0026nbsp;\u003c/em\u003e(paper mulberry), Korea + wood pulp\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"3\" valign=\"top\" style=\"width: 91px;\"\u003e\n \u003cp\u003eSodium hydroxide(NaOH)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"3\" valign=\"top\" style=\"width: 99px;\"\u003e\n \u003cp\u003ePAM\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"11\" valign=\"top\" style=\"width: 77px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 54px;\"\u003e\n \u003cp\u003eKorea (No. 17)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 71px;\"\u003e\n \u003cp\u003eDakji\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 151px;\"\u003e\n \u003cp\u003e\u003cem\u003eBroussonetia papyrifera\u0026nbsp;\u003c/em\u003e(paper mulberry), Thailand\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 54px;\"\u003e\n \u003cp\u003eKorea (No. 18)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"2\" style=\"width: 71px;\"\u003e\n \u003cp\u003eOlbal Hanji\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"9\" style=\"width: 151px;\"\u003e\n \u003cp\u003e\u003cem\u003eBroussonetia papyrifera\u0026nbsp;\u003c/em\u003e(paper mulberry), Korea\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 54px;\"\u003e\n \u003cp\u003eKorea (No. 19)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"2\" valign=\"top\" style=\"width: 91px;\"\u003e\n \u003cp\u003ePlant Ash lye\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"2\" valign=\"top\" style=\"width: 99px;\"\u003e\n \u003cp\u003e\u003cem\u003eAbelmoschus manihot\u0026nbsp;\u003c/em\u003eroot mucilage\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 54px;\"\u003e\n \u003cp\u003eKorea (No. 20)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 71px;\"\u003e\n \u003cp\u003eSsangbal Hanji\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 54px;\"\u003e\n \u003cp\u003eKorea (No. 21)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"2\" style=\"width: 71px;\"\u003e\n \u003cp\u003eHanji\u0026ndash;Choksae\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"2\" valign=\"top\" style=\"width: 91px;\"\u003e\n \u003cp\u003eSodium hydroxide(NaOH)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"2\" valign=\"top\" style=\"width: 99px;\"\u003e\n \u003cp\u003ePAM\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 54px;\"\u003e\n \u003cp\u003eKorea (No. 22)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 54px;\"\u003e\n \u003cp\u003eKorea (No. 23)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"4\" style=\"width: 71px;\"\u003e\n \u003cp\u003eEumyungji\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"4\" valign=\"top\" style=\"width: 91px;\"\u003e\n \u003cp\u003ePlant Ash lye\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"4\" valign=\"top\" style=\"width: 99px;\"\u003e\n \u003cp\u003e\u003cem\u003eAbelmoschus manihot\u0026nbsp;\u003c/em\u003eroot mucilage\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 54px;\"\u003e\n \u003cp\u003eKorea (No. 24)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 54px;\"\u003e\n \u003cp\u003eKorea (No. 25)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 54px;\"\u003e\n \u003cp\u003eKorea (No. 26)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e\n\u003ch3\u003eHyperspectral image acquisition and NIR spectral dataset\u003c/h3\u003e\n\u003cp\u003eNIR\u0026ndash;HSI images of each traditional handmade paper sample were acquired using a Resonon Pika NIR-320 hyperspectral camera (Resonon Inc., USA), which covers a spectral range of 900\u0026ndash;1,700 nm with a spectral resolution of 4.9 nm. 120-W halogen light sources provided illumination. For each paper type, 10 measurements were conducted to generate mean reflectance spectra, resulting in a database of 260 NIR spectra. Each spectrum contained 168 variables corresponding to wavelengths ranging from 900 to 1,700 nm. For classification modeling, the original NIR spectra and their first derivatives were used. The first derivative spectra were calculated using the Savitzky\u0026ndash;Golay filter[\u003cspan citationid=\"CR50\" class=\"CitationRef\"\u003e50\u003c/span\u003e] to enhance spectral features and suppress baseline effects. Before model construction, root mean square normalization (RMSNorm) was applied to minimize intensity variations across the spectra.\u003c/p\u003e\n\u003ch3\u003e\u003c/h3\u003e\n\u003cdiv class=\"Heading\"\u003e\u003cb\u003ePartitioning of the NIR spectral dataset for classification modeling\u003c/b\u003e\u003c/div\u003e \u003cp\u003eThe dataset was split into training and test sets in a 7:3 ratio to construct and evaluate the classification models. Stratified random sampling was used to maintain a balanced class distribution across both sets. Additionally, threefold cross-validation was conducted exclusively on the training set to optimize hyperparameters and assess model stability while mitigating the risk of overfitting.\u003c/p\u003e\n\u003ch3\u003ePrincipal component analysis (PCA) and density-based spatial clustering of applications with noise (DBSCAN)\u003c/h3\u003e\n\u003cp\u003ePCA was performed to evaluate differences in the spectral patterns of traditional handmade papers. PCA transformed the original 168-dimensional NIR spectral data into a set of principal components (PCs), reducing dimensionality while preserving the majority of the variance. In this study, seven PCs were retained, and each data point was projected onto these new orthogonal axes. The first two PCs were used to visualize the relationships among paper types. PC loading plots were further analyzed to interpret the contribution of specific wavelengths to each PC, providing insights into how spectral variations influence projections in the reduced coordinate space.\u003c/p\u003e \u003cp\u003eDBSCAN[\u003cspan additionalcitationids=\"CR52 CR53\" citationid=\"CR51\" class=\"CitationRef\"\u003e51\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR54\" class=\"CitationRef\"\u003e54\u003c/span\u003e] was used to cluster paper types with similar NIR spectral characteristics. A secondary objective of using DBSCAN was to identify potential outliers in the PC-transformed space. The DBSCAN parameters were empirically set with an epsilon (eps) value of 0.005 and a minimum number of points (minPts) of 3. A data point was considered a core point if it had at least three neighboring points within a radius of 0.005 in PC space. Clusters were thus formed from regions containing three or more closely located points, while data points outside this density threshold were classified as noise or outliers.\u003c/p\u003e\n\u003ch3\u003eMachine learning models for the classification of traditional handmade paper using NIR spectra\u003c/h3\u003e\n\u003cp\u003eTo evaluate classification performance while balancing computational cost, model interpretability, and robustness, various machine learning algorithms were implemented for the identification of traditional handmade papers. A schematic overview of the classification workflow, which combines hyperspectral image analysis and machine learning, is presented in Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eThe k-nearest neighbors (k-NN) algorithm, a distance-based nonparametric method, was employed as a simple and interpretable classifier. It assigns class labels based on the majority vote among the \u003cem\u003ek\u003c/em\u003e closest training instances without requiring model training. Odd values of k ranging from 1 to 9 were tested, and the optimal number of neighbors was determined through a grid search.\u003c/p\u003e \u003cp\u003eA support vector machine (SVM) was also implemented to construct a nonlinear decision boundary by maximizing the margin between classes. A radial basis function (RBF) kernel was applied to map the input data into a higher-dimensional feature space[\u003cspan citationid=\"CR55\" class=\"CitationRef\"\u003e55\u003c/span\u003e]. The SVM hyperparameters\u0026mdash;penalty parameter (C) and kernel coefficient (gamma)\u0026mdash;were optimized using a grid search across logarithmic scales: C from 2\u003csup\u003e\u0026ndash;5\u003c/sup\u003e to 2⁵ and gamma from 10\u003csup\u003e\u0026ndash;1\u003c/sup\u003e to 10\u003csup\u003e\u0026ndash;5\u003c/sup\u003e. These parameters control the trade-off between margin maximization and training error, as well as the influence of individual data points within the kernel function.\u003c/p\u003e \u003cp\u003eFor artificial neural networks (ANNs), a multilayer feedforward architecture with backpropagation was adopted. The rectified linear unit was used as the activation function, and cross-entropy was selected as the loss function. Two solvers\u0026mdash;stochastic gradient descent (SGD) and Adam\u0026mdash;were evaluated for optimization. Various network configurations, including those with one or two hidden layers and differing numbers of neurons, were evaluated to determine the most effective structure. Initial learning rates of 0.0001, 0.001, 0.01, and 0.1 were examined, with a maximum of 1,000 training iterations.\u003c/p\u003e \u003cp\u003eExtreme gradient boosting (XGBoost)[\u003cspan citationid=\"CR56\" class=\"CitationRef\"\u003e56\u003c/span\u003e], an ensemble learning algorithm based on boosting, was also employed due to its high accuracy and flexibility. The base learners of XGBoost are decision trees (DTs)[\u003cspan citationid=\"CR57\" class=\"CitationRef\"\u003e57\u003c/span\u003e]. A grid search was performed to optimize key hyperparameters, including the learning rate (lr, 0.01\u0026ndash;0.1), maximum tree depth (max_depth, 3\u0026ndash;7), number of boosting rounds (n_rounds, 50\u0026ndash;300), and subsample ratio (subsample, 0.3\u0026ndash;0.7). The column sampling ratio (colsample_bytree) was fixed at 0.3, while gamma and min_child_weight were set to 1.0 and 1, respectively. Separate optimizations were conducted for models trained on the original spectra and those using first-derivative preprocessing.\u003c/p\u003e \u003cp\u003eFor visualization purposes, the spectral angle mapper (SAM) algorithm[\u003cspan citationid=\"CR58\" class=\"CitationRef\"\u003e58\u003c/span\u003e] was applied. SAM calculates spectral similarity by measuring the angle between the spectral vector of each pixel and a reference spectrum, known as an endmember. This results in one raster layer per end member, where each pixel value represents the spectral angle. Smaller angles indicate higher similarity to the corresponding endmember class. In the next step, a classification map was generated by assigning each pixel to the endmember with the smallest spectral angle\u0026mdash;referred to as minimum angle classification. Optionally, this classification can be refined by applying a maximum angle threshold to exclude pixels with low spectral similarity.\u003c/p\u003e \u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003ch2\u003eSpectral feature importance measures\u003c/h2\u003e \u003cp\u003eFeature importance scores were derived from the XGBoost classifier to identify key spectral variables contributing to the classification of traditional handmade papers. As an ensemble method based on boosted decision trees, XCBoost evaluates the relevance of each input feature by quantifying its contribution to reducing prediction error during node splits. This importance is typically estimated through the mean decrease in node impurity (MDI)[\u003cspan citationid=\"CR59\" class=\"CitationRef\"\u003e59\u003c/span\u003e], also known as information gain.\u003c/p\u003e \u003cp\u003eWhen a split is made at a parent node \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{P}_{j}\\)\u003c/span\u003e\u003c/span\u003e using a specific feature \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{f}_{i}\\)\u003c/span\u003e\u003c/span\u003e, resulting in two child nodes \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{L}_{j}\\)\u003c/span\u003e\u003c/span\u003e and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{R}_{j}\\)\u003c/span\u003e\u003c/span\u003e, the gain from that split is computed as follows:\u003cdiv id=\"Equ1\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ1\" name=\"EquationSource\"\u003e\n$$\\:{G}_{Pj}={w}_{Pj}{M}_{Pj}-{w}_{Lj}{M}_{Lj}-{w}_{Rj}{M}_{Rj.}$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e1\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003eHere, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:w\\)\u003c/span\u003e\u003c/span\u003e represents the relative number of samples at each node, and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:M\\)\u003c/span\u003e\u003c/span\u003e denotes the impurity of the node. This value captures how effectively feature \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{f}_{i}\\)\u003c/span\u003e\u003c/span\u003e separates the data.\u003c/p\u003e \u003cp\u003eThe overall contribution of feature \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{f}_{i}\\)\u003c/span\u003e\u003c/span\u003e in a single decision tree is calculated as the total gain accumulated from all splits where \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{f}_{i}\\)\u003c/span\u003e\u003c/span\u003e:\u003cdiv id=\"Equ2\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ2\" name=\"EquationSource\"\u003e\n$$\\:{I}_{DT}\\left({f}_{i}\\right)=\\sum\\:_{j\\in\\:\\left\\{nodes\\:split\\:by\\:{f}_{i}\\right\\}}{G}_{j}.$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e2\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003eIn XGBoost, the final importance of a feature is determined by aggregating the normalized importance of that feature across all DTs in the ensemble. First, the importance score of each feature is scaled relative to the total importance of all features:\u003cdiv id=\"Equ3\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ3\" name=\"EquationSource\"\u003e\n$$\\:{I}_{norm}\\left({f}_{i}\\right)=\\frac{I\\left({f}_{i}\\right)}{\\sum\\:_{j}I\\left({f}_{j}\\right)}.$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e3\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003eThen, the average across all base learners yields the ensemble-level importance:\u003cdiv id=\"Equ4\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ4\" name=\"EquationSource\"\u003e\n$$\\:{I}_{XGBoost}\\left({f}_{i}\\right)=\\frac{1}{{N}_{T}}\\sum\\:_{t=1}^{{N}_{T}}{I\\left({f}_{i}^{\\left(t\\right)}\\right)}^{norm},$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e4\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003ewhere \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{N}_{T}\\)\u003c/span\u003e\u003c/span\u003e denotes the number of boosting rounds or trees.\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eEvaluation metric\u003c/h3\u003e\n\u003cp\u003eIn binary classification, prediction outcomes are typically categorized based on correctness. A correctly predicted instance from the positive class is referred to as a true positive (TP), while an accurately predicted instance from the negative class is known as a true negative. Conversely, a positive instance mistakenly predicted as negative is termed a false negative, and a negative instance incorrectly classified as positive is referred to as a false positive (FP)[\u003cspan citationid=\"CR60\" class=\"CitationRef\"\u003e60\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eTo evaluate the performance of classification models\u0026mdash;particularly in the presence of imbalanced datasets\u0026mdash;the F1-score is commonly used[\u003cspan citationid=\"CR61\" class=\"CitationRef\"\u003e61\u003c/span\u003e]. Unlike overall accuracy, which can be misleading when one class dominates, the F1-score provides a more balanced and informative assessment. It is defined as the harmonic mean of precision and recall:\u003cdiv id=\"Equ5\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ5\" name=\"EquationSource\"\u003e\n$$\\:Precision\\:=\\:\\frac{TP}{TP\\:+\\:FP}.$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e5\u003c/div\u003e\u003c/div\u003e\u003cdiv id=\"Equ6\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ6\" name=\"EquationSource\"\u003e\n$$\\:Recall\\:=\\:\\frac{TP}{(TP\\:+\\:FN)}.$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e6\u003c/div\u003e\u003c/div\u003e\u003cdiv id=\"Equ7\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ7\" name=\"EquationSource\"\u003e\n$$\\:F1\\:=\\:2\\:\\times\\:\\:\\frac{Precision\\:\\times\\:\\:Recall}{Precision\\:+\\:Recall}.$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e7\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003eTo evaluate the performance of the classification models in the presence of class imbalance, the weighted F1-score was employed. This metric accounts for class distribution by assigning weights to each class based on their relative frequencies (as shown in Eq.\u0026nbsp;\u003cspan refid=\"Equ8\" class=\"InternalRef\"\u003e8\u003c/span\u003e), and then incorporating these weights into the corresponding class-specific F1-scores (Eq.\u0026nbsp;\u003cspan refid=\"Equ9\" class=\"InternalRef\"\u003e9\u003c/span\u003e). This approach provides a comprehensive assessment that reflects both individual class performance and overall effectiveness, even when the dataset is imbalanced[\u003cspan citationid=\"CR49\" class=\"CitationRef\"\u003e49\u003c/span\u003e]:\u003cdiv id=\"Equ8\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ8\" name=\"EquationSource\"\u003e\n$$\\:{w}_{i}\\:=\\:\\frac{{N}_{i}}{{T}_{i}}$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e8\u003c/div\u003e\u003c/div\u003e,\u003c/p\u003e \u003cp\u003ewhere \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{w}_{i}\\)\u003c/span\u003e\u003c/span\u003e is the weight of class \u003cem\u003ei\u003c/em\u003e, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{N}_{i}\\)\u003c/span\u003e\u003c/span\u003e is the number of samples in class \u003cem\u003ei\u003c/em\u003e, and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{T}_{i}\\)\u003c/span\u003e\u003c/span\u003e is the total number of samples across all classes.\u003cdiv id=\"Equ9\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ9\" name=\"EquationSource\"\u003e\n$$\\:Weighted\\:{F1}_{i}\\:=\\:{\\sum\\:}_{i\\:=\\:1}^{N}{w}_{i}\\:\\times\\:\\:{F1}_{i}.$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e9\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003eAll data processing and classification modeling were performed using R statistical software (R Core Team, version 4.4.2, Auckland, New Zealand).\u003c/p\u003e"},{"header":"Results and discussion","content":"\u003cp\u003eThe DBSCAN clustering results, based on both the original and first derivative NIR spectra of the biomass materials, were visualized using PC score plots in Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e. In general, the formulation of raw materials in traditional handmade paper is relatively simple compared to that of modern machine-made paper. Modern papermaking typically involves a complex furnish composed of refined pulp, internal sizing agents, retention aids, fillers, defoamers, optical brighteners, and dry-strength additives in an aqueous solution[\u003cspan citationid=\"CR62\" class=\"CitationRef\"\u003e62\u003c/span\u003e]. In contrast, traditional handmade paper production primarily uses pulp and small amounts of natural mucilage as raw materials[\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e]. As a result, the primary differences in chemical composition among traditional papers are attributed mainly to variations in the pulp, which are influenced by factors such as wood species and cooking methods[\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e, \u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e, \u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e, \u003cspan citationid=\"CR63\" class=\"CitationRef\"\u003e63\u003c/span\u003e, \u003cspan citationid=\"CR64\" class=\"CitationRef\"\u003e64\u003c/span\u003e]. The clustering results reflect these similarities, as wood species and cooking agents directly affect the composition of cellulose, hemicellulose, and lignin. Common wood species used in traditional handmade papers include hemp, bark, bamboo, grasses, and paper mulberry[\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e, \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e, \u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e, \u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eThe effect of spectral preprocessing using the first derivative was also examined. For instance, the number of data points from 14 different data classes assigned to cluster 1 remained the same for original and first derivative spectra. In the original spectra (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003ea), cluster 1 consisted of broad, overlapping subgroups. In contrast, the first derivative spectra (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eb) showed the individual data points more distinctly positioned and better separated. Similarly, cluster 5 contained only a single data class in both Figs.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003ea and \u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eb; however, the clustering result based on the first derivative spectra showed a more consistent and compact distribution along the first two PCs. Although this comparison is indirect, it indicates that preprocessing with the first derivative may enhance the separability of spectral features and contribute to improved classification accuracy in subsequent machine learning models. A direct comparison of classification performance is provided in the classification modeling subsection.\u003c/p\u003e \u003cp\u003eAdditionally, the DBSCAN[\u003cspan additionalcitationids=\"CR52 CR53\" citationid=\"CR51\" class=\"CitationRef\"\u003e51\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR54\" class=\"CitationRef\"\u003e54\u003c/span\u003e] proved effective in detecting outliers among data points projected onto the PC coordinate space. Using the first derivative spectra, DBSCAN identified six outlier data points. As shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eb, these outliers did not align with their original cluster locations and were spatially separated from the main groupings.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eFigures \u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003ea and \u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eb display the original NIR spectra and their first derivatives within the 900\u0026ndash;1,700 nm range, grouped according to the cluster assignments shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e. The spectral region between 900 and 970 nm contains overlapping reflectance bands attributed to starch, cellulose, sucrose (900\u0026ndash;920 nm), as well as water (960\u0026ndash;970 nm)[\u003cspan citationid=\"CR65\" class=\"CitationRef\"\u003e65\u003c/span\u003e]. The region between 1,000 and 1,050 nm exhibits subtle spectral features, though these are difficult to specific components[\u003cspan citationid=\"CR66\" class=\"CitationRef\"\u003e66\u003c/span\u003e]. The 1,075\u0026ndash;1,250 nm range corresponds to the second overtones of aromatic and aliphatic C\u0026ndash;H vibrations, typically associated with lignin[\u003cspan citationid=\"CR66\" class=\"CitationRef\"\u003e66\u003c/span\u003e]. A band at 1,256 nm is attributed to the second overtone of C\u0026ndash;H stretching vibrations in cellulose[\u003cspan citationid=\"CR67\" class=\"CitationRef\"\u003e67\u003c/span\u003e]. The 1,390 nm feature reflects the variability in the intensity of adsorbed water[\u003cspan citationid=\"CR68\" class=\"CitationRef\"\u003e68\u003c/span\u003e]. The 1,420\u0026ndash;1,600 nm regions correspond to the first overtone of O\u0026ndash;H stretching vibrations[\u003cspan citationid=\"CR69\" class=\"CitationRef\"\u003e69\u003c/span\u003e]; within this range, bands at 1,537 and 1,576 nm are likely related to mucilage or hydrogen-bonded structures in polysaccharides[\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e, \u003cspan citationid=\"CR68\" class=\"CitationRef\"\u003e68\u003c/span\u003e, \u003cspan citationid=\"CR70\" class=\"CitationRef\"\u003e70\u003c/span\u003e]. Finally, the bands at 1,646 and 1,672 nm are associated with the first overtone of aliphatic C\u0026ndash;H vibrations[\u003cspan citationid=\"CR71\" class=\"CitationRef\"\u003e71\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eFigures \u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003ec and \u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003ed present the PC1 and PC2 loading plots derived from the PCA of the original and first derivative NIR spectra (900\u0026ndash;1,700 nm). These loading plots illustrate the contribution of individual wavelengths to the respective PCs and highlight the most influential spectral regions for differentiating among clusters.\u003c/p\u003e \u003cp\u003eIn the PC1 loading plot (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003ec), both the original and first derivative spectra exhibit pronounced negative loadings between 1,400 and 1,500 nm, corresponding to the first overtone of the O\u0026ndash;H stretching vibration\u0026mdash;an indicator of cellulose and adsorbed water. Notably, the first derivative spectrum reveals greater variation and sharper features in this region, indicating enhanced sensitivity to subtle spectral differences. Additionally, the first derivative PC1 loading displays clear peaks at approximately 1,537 and 1,576 nm, likely linked to mucilage or hydrogen-bonded polysaccharide structures.\u003c/p\u003e \u003cp\u003eThe PC2 loading plot (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003ed) reveals more subtle contributions across most wavelengths. However, in the short-wavelength region (900\u0026ndash;970 nm), a distinguishable signal is observed, especially in the first derivative spectrum. This region includes overlapping features associated with cellulose and water.\u003c/p\u003e \u003cp\u003eOverall, the first derivative spectra enhance the differentiation of chemically meaningful spectral regions at the cluster level, even though some clusters still contain a mixture of data classes. These findings support previous studies that highlight the effectiveness of spectral preprocessing techniques in improving classification performance[\u003cspan additionalcitationids=\"CR73 CR74 CR75\" citationid=\"CR72\" class=\"CitationRef\"\u003e72\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR76\" class=\"CitationRef\"\u003e76\u003c/span\u003e].\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cdiv id=\"Sec11\" class=\"Section2\"\u003e \u003ch2\u003eClassification models for traditional handmade paper\u003c/h2\u003e \u003cp\u003eVarious classifiers were tested to discriminate among traditional handmade papers. The classification performance and corresponding optimal hyperparameters are summarized in Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e. During model construction, outliers identified by DBSCAN were removed from the dataset. Previous studies have shown that eliminating outliers during the training phase can substantially improve classification accuracy [\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e, \u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e, \u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e41\u003c/span\u003e]. Accordingly, a total of 254 NIR spectral samples, representing 26 paper classes, were used for training, validation, and testing. Given the class imbalance in the dataset, an evaluation metric that accounts for unequal class distribution was necessary for a reliable assessment of model performance. Therefore, weighted F1 scores were used, providing a more comprehensive evaluation by considering both individual class performance and overall model effectiveness, particularly under imbalanced conditions[\u003cspan citationid=\"CR49\" class=\"CitationRef\"\u003e49\u003c/span\u003e].\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003ePerformance of the machine learning models in traditional handmade paper classification and their optimal hyperparameters\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"5\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003eModel\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003ePreproc.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003eHyperparameters\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colspan=\"2\" nameend=\"c5\" namest=\"c4\"\u003e \u003cp\u003eF1 score\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eTraining\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eTest\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003ek-NN\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eOriginal\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003ek\u0026thinsp;=\u0026thinsp;1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e1.000\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.851\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eFirst deriv.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003ek\u0026thinsp;=\u0026thinsp;1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e1.000\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.901\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003eDT\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eOriginal\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003ecp\u0026thinsp;=\u0026thinsp;0.03\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.824\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.634\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eFirst deriv.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003ecp\u0026thinsp;=\u0026thinsp;0.03\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.824\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.683\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003eSVM\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eOriginal\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003egamma\u0026thinsp;=\u0026thinsp;10\u003csup\u003e\u0026minus;\u0026thinsp;2\u003c/sup\u003e, C\u0026thinsp;=\u0026thinsp;2\u003csup\u003e4\u003c/sup\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.989\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.958\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eFirst deriv.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003egamma\u0026thinsp;=\u0026thinsp;10\u003csup\u003e\u0026minus;\u0026thinsp;2\u003c/sup\u003e, C\u0026thinsp;=\u0026thinsp;2\u003csup\u003e3\u003c/sup\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e1.000\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e1.000\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003eFNN\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eOriginal\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003ehl_size = (16), lr\u0026thinsp;=\u0026thinsp;0.1, optimizer\u0026thinsp;=\u0026thinsp;SGD\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.978\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.934\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eFirst deriv.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003ehl_size = (16, 16), lr\u0026thinsp;=\u0026thinsp;0.001, optimizer\u0026thinsp;=\u0026thinsp;Adam\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.989\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.974\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003eXGBoost\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eOriginal\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003elr\u0026thinsp;=\u0026thinsp;0.07, max_depth\u0026thinsp;=\u0026thinsp;5, gamma\u0026thinsp;=\u0026thinsp;1, colsample_bytree\u0026thinsp;=\u0026thinsp;0.3, min_child_weight\u0026thinsp;=\u0026thinsp;1, subsample\u0026thinsp;=\u0026thinsp;0.5, n_rounds\u0026thinsp;=\u0026thinsp;75\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e1.000\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.900\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eFirst deriv.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003elr\u0026thinsp;=\u0026thinsp;0.1, max_depth\u0026thinsp;=\u0026thinsp;3, gamma\u0026thinsp;=\u0026thinsp;1, colsample_bytree\u0026thinsp;=\u0026thinsp;0.3, min_child_weight\u0026thinsp;=\u0026thinsp;1, subsample\u0026thinsp;=\u0026thinsp;0.7, n_rounds\u0026thinsp;=\u0026thinsp;50\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e1.000\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.963\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003ctfoot\u003e \u003ctr\u003e\u003ctd colspan=\"5\"\u003ePreproc., preprocessing; First deriv., first derivative; hl_size, hidden layer sizes; lr, learning rate\u003c/td\u003e\u003c/tr\u003e \u003c/tfoot\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eAs shown in Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e, the SVM trained with the original spectra achieved the highest performance, with a weighted F1-score of 0.958. Although FNN and XGBoost are generally regarded as high-performing models[\u003cspan citationid=\"CR56\" class=\"CitationRef\"\u003e56\u003c/span\u003e, \u003cspan additionalcitationids=\"CR78\" citationid=\"CR77\" class=\"CitationRef\"\u003e77\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR79\" class=\"CitationRef\"\u003e79\u003c/span\u003e], their F1-scores in this study were lower than that of the SVM. This result aligns with Occam\u0026rsquo;s razor theory[\u003cspan citationid=\"CR80\" class=\"CitationRef\"\u003e80\u003c/span\u003e], which indicates that when multiple models yield comparable performance, the simplest should be preferred for better generalization. The SVM, which seeks to find an optimal hyperplane that maximizes the margin between classes, is inherently simpler than FNN and XGBoost. Its performance was enhanced by the use of the RBF kernel, which allows for nonlinear decision boundaries. Another possible reason for the superior performance of the SVM is that the structure of the dataset may not have been complex enough to fully leverage the advantages of deep neural networks (e.g., FNN) or ensemble-based methods (e.g., XGBoost). The optimized FNN architecture in this study was relatively simple, consisting of shallow or single-layer networks.\u003c/p\u003e \u003cp\u003eSpectral preprocessing using the first derivative transformation substantially improved classification performance across all models. The F1-scores for FNN and XGBoost increased from 0.934 to 0.974 and from 0.900 to 0.963, respectively. Notably, the SVM trained on first derivative spectra achieved perfect classification accuracy (F1-score\u0026thinsp;=\u0026thinsp;1.000). Even simpler models such as k-NN and DT, which initially exhibited relatively low performance, showed improvement after first derivative transformation\u0026mdash;though their accuracy remained lower than that of the more advanced models. This is likely due to their simpler learning mechanisms. For instance, k-NN, a nonparametric algorithm, classifies instances based purely on distance metrics and lacks a formal training phase, making it less robust for high-dimensional datasets. Given the 26 classes and 163 spectral variables per sample of the study, models with greater computational and modeling capacity\u0026mdash;such as SVM, FNN, and XGBoost\u0026mdash;were better suited to the classification tasks. Among these, SVM emerges as the most suitable model, offering a balance between computational efficiency and high classification accuracy, which is consistent with the principle of Occam\u0026rsquo;s razor.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec12\" class=\"Section2\"\u003e \u003ch2\u003eFeature importance of the NIR spectra for the classification of traditional handmade paper\u003c/h2\u003e \u003cp\u003eThe model comparison presented in Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e confirms the strong performance of the SVM. However, unlike XGBoost, the SVM lacks interpretability, making it difficult to explain the decision-making process. Similarly, the FNN does not offer logical transparency in how classifications are derived. Therefore, no single model can fully replace the unique advantages of others, as each possesses distinct strengths and limitations.\u003c/p\u003e \u003cp\u003eFeature importance based on MDI[\u003cspan citationid=\"CR59\" class=\"CitationRef\"\u003e59\u003c/span\u003e] was calculated using the XGBoost algorithm, and the results are shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e. Feature importance metrics provide valuable interpretability for machine learning models by identifying the spectral variables that most strongly influence classification decisions[\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e, \u003cspan citationid=\"CR48\" class=\"CitationRef\"\u003e48\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eFigure \u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003ea presents the first-derivative NIR spectra (900\u0026ndash;1,700 nm) for five representative samples, each selected from a different cluster identified by DBSCAN in Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e. The overlaid feature importance values reflect the contribution of individual spectral variables to the separation of these five clusters. Notably, the spectral region between 1,075 and 1,250 nm, associated with the second overtones of aromatic and aliphatic C\u0026ndash;H vibrations in lignin, exhibited high importance scores, indicating that lignin-related signals played a key role in cluster differentiation.\u003c/p\u003e \u003cp\u003eFigure \u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eb illustrates the first-derivative NIR spectra of representative samples from the selected classes\u0026mdash;China (No. 3), Japan (No. 9), and Korea (Nos. 16, 17, and 23)\u0026mdash;along with the feature importance values computed from the 26-class classification model (Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). In contrast to the clustering model, which enabled clearer identification of discriminative spectral regions through five groups, the 26-class classification model did not yield a sharply localized feature importance profile. This is attributed to the increased number of classes and the associated variability among samples, which makes it more difficult to isolate a narrow set of key wavelengths.\u003c/p\u003e \u003cp\u003eNevertheless, several spectral regions consistently exhibited high information gain across the 26-class model: 920\u0026ndash;1,265, 1290\u0026ndash;1440, 1460\u0026ndash;1,580, and 1,660\u0026ndash;1,680 nm. These regions appear to be particularly informative for distinguishing traditional handmade papers. Differences in carbohydrate composition\u0026mdash;arising from the use of diverse raw materials such as wood and nonwood sources[\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e] and variations in pulping and bleaching methods\u0026mdash;are known to influence the contents of cellulose, hemicellulose, and lignin[\u003cspan additionalcitationids=\"CR82 CR83 CR84\" citationid=\"CR81\" class=\"CitationRef\"\u003e81\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR85\" class=\"CitationRef\"\u003e85\u003c/span\u003e], which likely accounts for the spectral diversity captured by the model.\u003c/p\u003e \u003cp\u003eAlthough variable selection was not applied in this study, selecting highly informative spectral regions for model construction has been shown to enhance model accuracy while reducing computational cost. Narrowing the spectral range decreases the number of input variables the model must process. Our previous research, along with other studies, has demonstrated the effectiveness of this approach in improving model robustness[\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e, \u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e, \u003cspan citationid=\"CR48\" class=\"CitationRef\"\u003e48\u003c/span\u003e, \u003cspan citationid=\"CR49\" class=\"CitationRef\"\u003e49\u003c/span\u003e]. Alternatively, these informative regions may also serve to enhance the interpretability and transparency of the decision-making process of the model.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec13\" class=\"Section2\"\u003e \u003ch2\u003eVisualization\u003c/h2\u003e \u003cp\u003eFigure \u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003e presents a comparison of hyperspectral images for representative traditional handmade paper samples. For this visualization analysis, the same classes used in the feature importance plots were selected: China (No. 03), Japan (No. 09), and Korea (Nos. 16, 17, and 23). Color mapping was performed using classification outputs, where each pixel in the hyperspectral image was assigned a color corresponding to the class with the highest spectral similarity, as determined by the trained model.\u003c/p\u003e \u003cp\u003eIn the five-class composite visualization (left panel), the colors appear highly intermixed, making class separation visually challenging. However, in the pairwise binary visualizations (right matrix), clearer spatial distinctions are observed between individual samples. Each panel in the matrix displays the result of minimum angle classification using the SAM, with colored regions indicating spectral uniqueness for each paper type.\u003c/p\u003e \u003cp\u003eThese findings indicate that while simultaneous classification across all classes may be limited by spectral overlap, pixel\u0026ndash;level comparisons using hyperspectral data enables effective identification of traditional handmade paper types. Moreover, spectral image-based color mapping serves as a valuable tool for assessing authenticity and compositional similarity, as it visualizes how well unknown samples match known references.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eFuture research will focus on developing convolutional neural network-based models capable of processing high-dimensional tensor data, such as hyperspectral images composed of numerous spectral channels. These multichannel datasets, which integrate both spatial and spectral information, require specialized network architectures to learn complex spectral\u0026ndash;spatial features effectively. The goal is to enable accurate defect detection, classification, and instance-level segmentation for automated quality evaluation and assessment. Ongoing efforts include the design of deep learning frameworks tailored to hyperspectral tensor data, as well as the development of an automated imaging system optimized for high-resolution hyperspectral acquisition.\u003c/p\u003e \u003c/div\u003e"},{"header":"Conclusions","content":"\u003cp\u003e \u003col\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003ePCA and DBSCAN were effectively utilized to explore chemical composition similarities among traditional handmade paper samples based on NIR spectra extracted from hyperspectral images. DBSCAN also identified and excluded special outliers that could negatively affect model training and decision-making.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eThe application of first-derivative preprocessing substantially improved model performance, yielding higher F1-scores compared to models trained on raw spectral data.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eClassification models, including SVM, FNN, and XGBoost, demonstrated strong performance when trained on the first derivative spectra, achieving F1-scores of 1.00, 0.97, and 0.96, respectively. Among them, SVM provided an optimal balance between classification accuracy and computational efficiency, though it lacks interpretability. In contrast, XGBoost provides feature importance metrics that enhance transparency and offer insight into the model\u0026rsquo;s decision-making process.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eFor visualization, the SAM method was effectively employed to highlight spectral differences between paper samples through pixel-wise color mapping. Overall, this study demonstrates the potential of HSI combined with machine learning for accurately classifying traditional East Asian handmade papers, enabling reference-based spectral matching across diverse material origins.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003c/ol\u003e \u003c/p\u003e \u003cp\u003eOverall, this study provides meaningful contributions to the classification of traditional handmade paper using machine learning and NIR hyperspectral imaging, offering valuable insights for future research in cultural heritage science and material identification.\u003c/p\u003e"},{"header":"Declarations","content":"\u003ch2\u003e \u003cb\u003eAdditional Information\u003c/b\u003e \u003c/h2\u003e \u003cp\u003e \u003cstrong\u003eCompeting interests:\u003c/strong\u003e \u003cp\u003eThe authors declare no competing interests.\u003c/p\u003e \u003c/p\u003e\u003ch2\u003eAuthor Contribution\u003c/h2\u003e\u003cp\u003eY.J.L. conceptualized the study, performed the investigation, and wrote the original manuscript draft. Y.J.L., S.Y.W., and S.B.P. developed the methodology. S.B.P. contributed to data acquisition and preprocessing. H.J.K. acquired funding and supervised the project together with T.J.L. H.J.K. and T.J.L. reviewed and edited the manuscript. All authors reviewed and approved the final version of the manuscript.\u003c/p\u003e\u003ch2\u003eAcknowledgements\u003c/h2\u003e \u003cp\u003eThe authors gratefully acknowledge the financial support provided by the Ministry of Science and ICT of the Korean government and the National Research Foundation of Korea (Grant No. RS-2023-00301889).\u003c/p\u003e\u003ch2\u003eData Availability\u003c/h2\u003e\u003cp\u003eThe datasets generated and/or analyzed during the current study are available from the corresponding author upon reasonable request.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eHubbe, M. A. \u0026amp; Bowden, C. Handmade paper: a review of its history, craft, and science. \u003cem\u003eBioRes\u003c/em\u003e\u003cem\u003eouces\u003c/em\u003e \u003cstrong\u003e4\u003c/strong\u003e, 1736-1792 (2009).\u003c/li\u003e\n\u003cli\u003eHan, B. et al. Characterization of Korean handmade papers collected in a Hanji reference book. \u003cem\u003eHerit\u003c/em\u003e\u003cem\u003e.\u003c/em\u003e\u003cem\u003e Sci\u003c/em\u003e\u003cem\u003e.\u003c/em\u003e \u003cstrong\u003e9\u003c/strong\u003e, 1-12 (2021).\u003c/li\u003e\n\u003cli\u003eHan, B., Vial, J., Sakamoto, S. \u0026amp; Sablier, M. Identification of traditional East Asian handmade papers through the multivariate data analysis of pyrolysis-GC/MS data. \u003cem\u003eAnalyst\u003c/em\u003e \u003cstrong\u003e144\u003c/strong\u003e, 1230-1244 (2019).\u003c/li\u003e\n\u003cli\u003eMullock, H. J. T. P. C. Xuan paper. 19, 23-30 (1995).\u003c/li\u003e\n\u003cli\u003eLuo, Y., Cigić, I. K., Wei, Q. \u0026amp; Strlič, M. Characterisation and durability of contemporary unsized Xuan paper. \u003cem\u003eCellulose\u003c/em\u003e \u003cstrong\u003e28\u003c/strong\u003e, 1011-1023 (2021).\u003c/li\u003e\n\u003cli\u003eTang, Y. \u0026amp; Smith, G. J. Fluorescence and photodegradation of Xuan paper: the photostability of traditional Chinese handmade paper. \u003cem\u003eJ. Cult. Herit.\u003c/em\u003e \u003cstrong\u003e14\u003c/strong\u003e, 464-470 (2013).\u003c/li\u003e\n\u003cli\u003eInaba, M. \u0026amp; Sugisita, R. Permanence of wash (Japanese paper). \u003cem\u003eStud. Conserv.\u003c/em\u003e \u003cstrong\u003e33\u003c/strong\u003e, 1-4 (1988).\u003c/li\u003e\n\u003cli\u003ePrestowitz, B., Katayama, Y. J. Washi: understanding Japanese paper as a material of culture and conservation. \u003cem\u003eBook Paper Group Annual\u003c/em\u003e \u003cstrong\u003e37\u003c/strong\u003e, 77-91 (2018).\u003c/li\u003e\n\u003cli\u003eLee, O.-K., Kim, S. \u0026amp; Lee, H. W. Evolution of the Hanji-making technology, from ancient times to the present. \u003cem\u003eJ. Korean Wood Sci. Technol.\u003c/em\u003e \u003cstrong\u003e51\u003c/strong\u003e, 509-525 (2023).\u003c/li\u003e\n\u003cli\u003eJeong, M.-J. et al. Deterioration of ancient Korean paper (Hanji), treated with beeswax: a mechanistic study. \u003cem\u003eCarbohyd. Polym.\u003c/em\u003e \u003cstrong\u003e101\u003c/strong\u003e, 1249-1254 (2014).\u003c/li\u003e\n\u003cli\u003eTindale, T. K. \u0026amp; Tindale, H. R. J. \u003cem\u003eThe handmade papers of Japan\u003c/em\u003e (1952).\u003c/li\u003e\n\u003cli\u003eGoto, S. \u003cem\u003eJapanese hand-made paper\u003c/em\u003e (1958).\u003c/li\u003e\n\u003cli\u003eBarrett, T. J. \u0026amp; Lutz, W. \u003cem\u003eJapanese papermaking: traditions, tools, and techniques\u003c/em\u003e (1983).\u003c/li\u003e\n\u003cli\u003eLee, S. C. \u003cem\u003eHanji: everything you need to know about traditional Korean paper\u003c/em\u003e (Hyeonamsa Publishing Co., Ltd., 2012).\u003c/li\u003e\n\u003cli\u003eCheon, C., Kim, S.-J., Jin, Y.-M. Properties of indigenous Korean paper (Hanji)-Classification of Oebal (single frame) papermaking methods. \u003cem\u003eJ. Korean Wood Sci. Technol.\u003c/em\u003e \u003cstrong\u003e27\u003c/strong\u003e, 88-104 (1999).\u003c/li\u003e\n\u003cli\u003eTsien, T.-H. Raw materials for old papermaking in China. \u003cem\u003eJ. Am. Oriental Soc.\u003c/em\u003e \u003cstrong\u003e93\u003c/strong\u003e, 510-519 (1973).\u003c/li\u003e\n\u003cli\u003eShi, J. \u0026amp; Li, T. Technical investigation of 15th and 19th century Chinese paper currencies: fiber use and pigment identification. \u003cem\u003eJ. Raman Spectrosc.\u003c/em\u003e \u003cstrong\u003e44\u003c/strong\u003e, 892-898 (2013).\u003c/li\u003e\n\u003cli\u003eHan, B. et al. Characterization of Korean handmade papers collected in a Hanji reference book. \u003cem\u003eHerit\u003c/em\u003e\u003cem\u003e.\u003c/em\u003e\u003cem\u003e Sci\u003c/em\u003e\u003cem\u003e.\u003c/em\u003e \u003cstrong\u003e9\u003c/strong\u003e, 96 (2021).\u003c/li\u003e\n\u003cli\u003eDong, L.-Y. \u0026amp; Zhu, Y.-J. Fire-Resistant inorganic analogous xuan paper with thousands of years\u0026rsquo; super-durability. \u003cem\u003eACS Sustain.\u003c/em\u003e\u003cem\u003e Chem. Eng.\u003c/em\u003e \u003cstrong\u003e6\u003c/strong\u003e, 17239-17251 (2018).\u003c/li\u003e\n\u003cli\u003eKim, Y. J., Yoon, S., Cho, Y.-H., Kim, G. \u0026amp; Kim, H.-K. Paintable and writable electrodes using black conductive ink on traditional Korean paper (Hanji). \u003cem\u003eRSC Adv.\u003c/em\u003e \u003cstrong\u003e10\u003c/strong\u003e, 24631-24641 (2020).\u003c/li\u003e\n\u003cli\u003eChoi, Y. et al. Enhancing Li-S battery performance with porous carbon from Hanji. \u003cem\u003eBatteries\u003c/em\u003e \u003cstrong\u003e11\u003c/strong\u003e, 4 (2024).\u003c/li\u003e\n\u003cli\u003eLee, Y. J., Won, S. Y., Park, S. B. \u0026amp; Kim, H.-J. Chemometric approaches for discriminating manufacturers of Korean handmade paper using infrared spectroscopy. \u003cem\u003eHerit Sci\u003c/em\u003e \u003cstrong\u003e12\u003c/strong\u003e, 10.1186/s40494-024-01460-6 (2024).\u003c/li\u003e\n\u003cli\u003eLee, Y. J., Kweon, S. W., Jeong, C. W. \u0026amp; Kim, H. J. Evaluating the performance of machine learning and variable selection methods to identify document paper using infrared spectral data. \u003cem\u003eSpectrochim\u003c/em\u003e\u003cem\u003e.\u003c/em\u003e\u003cem\u003e Acta A Mol\u003c/em\u003e\u003cem\u003e.\u003c/em\u003e\u003cem\u003e Biomol\u003c/em\u003e\u003cem\u003e.\u003c/em\u003e\u003cem\u003e Spectrosc\u003c/em\u003e\u003cem\u003e.\u003c/em\u003e \u003cstrong\u003e327\u003c/strong\u003e, 125299 (2025).\u003c/li\u003e\n\u003cli\u003eJang, K. J., Heo, T. Y. \u0026amp; Jeong, S. H. Classification option for Korean traditional paper based on type of raw materials, using near-infrared spectroscopy and multivariate statistical methods. \u003cem\u003eBioRes\u003c/em\u003e\u003cem\u003eources\u003c/em\u003e \u003cstrong\u003e15\u003c/strong\u003e, 9045-9058 (2020).\u003c/li\u003e\n\u003cli\u003eWang, J. \u003cem\u003ePapermaking raw materials of China: an atlas of micrographs and the characteristics of fibers\u003c/em\u003e (China Light Industry Press, 1999).\u003c/li\u003e\n\u003cli\u003eJeong, M.-J. et al. Deterioration of ancient cellulose paper, Hanji: evaluation of paper permanence. \u003cem\u003eCellulose\u003c/em\u003e \u003cstrong\u003e21\u003c/strong\u003e, 4621-4632 (2014).\u003c/li\u003e\n\u003cli\u003eLee, S. H. \u003cem\u003eAdhesives used in conservation treatment in cultural properties: paintings and written artifacts\u003c/em\u003e (Conservation of Papers and Textiles, National Research Institute of Cultural Heritage, 2011).\u003c/li\u003e\n\u003cli\u003eChoi, T. J. F. Development of a natural dispersant for Korean traditional papermaking (Ⅰ)-Viscosity and papermaking characteristics of Hydrangea paniculata mucilage. \u003cem\u003eForest Bioenergy \u003c/em\u003e\u003cstrong\u003e23\u003c/strong\u003e, 38-44 (2004).\u003c/li\u003e\n\u003cli\u003eIlvessalo-Pf\u0026auml;ffli, M.-S. \u003cem\u003eFiber atlas: identification of papermaking fibers\u003c/em\u003e (Springer Science \u0026amp; Business Media, 1995).\u003c/li\u003e\n\u003cli\u003eDragojević, A., Gregor-Svetec, D., Vodopivec Tomažič, J. \u0026amp; Lozo, B. Characterization of seventeenth century papers from Valvasor\u0026apos;s collection of the Zagreb Archdiocese. \u003cem\u003eHerit\u003c/em\u003e\u003cem\u003e.\u003c/em\u003e\u003cem\u003e Sci\u003c/em\u003e\u003cem\u003e.\u003c/em\u003e \u003cstrong\u003e9\u003c/strong\u003e, 35 (2021).\u003c/li\u003e\n\u003cli\u003eGrant, J. The role of paper in questioned document work. \u003cem\u003eJ. Forensic Sci. Soc.\u003c/em\u003e \u003cstrong\u003e13\u003c/strong\u003e, 91-95 (1973).\u003c/li\u003e\n\u003cli\u003eSpence, L. D., Baker, A. T. \u0026amp; Byrne, J. P. Characterization of document paper using elemental compositions determined by inductively coupled plasma mass spectrometry. \u003cem\u003eJ. Anal. At. Spectrom.\u003c/em\u003e \u003cstrong\u003e15\u003c/strong\u003e, 813-819 (2000).\u003c/li\u003e\n\u003cli\u003eSpence, L. D., Francis, R. B. \u0026amp; Tinggi, U. Comparison of the elemental composition of office document paper: evidence in a homicide case. \u003cem\u003eJ. Forensic Sci. \u003c/em\u003e\u003cstrong\u003e47\u003c/strong\u003e, 648-651 (2002).\u003c/li\u003e\n\u003cli\u003eYan, C. et al. Analysis of handmade paper by Raman spectroscopy combined with machine learning. \u003cem\u003eJ. Raman Spectrosc.\u003c/em\u003e \u003cstrong\u003e53\u003c/strong\u003e, 260-271 (2021).\u003c/li\u003e\n\u003cli\u003eYan, Y. et al. FTIR Spectroscopy in cultural heritage studies: non-destructive analysis of Chinese handmade papers. \u003cem\u003eChem. Res. Chin. Univ.\u003c/em\u003e \u003cstrong\u003e35\u003c/strong\u003e, 586-591 (2019).\u003c/li\u003e\n\u003cli\u003eWertz, J. H., McClelland, A. A., Mayer, D. D. \u0026amp; Knipe, P. Modeling chemical tests and fiber identification of paper materials using principal component analysis and specular reflection FTIR data. \u003cem\u003eHeritage\u003c/em\u003e \u003cstrong\u003e5\u003c/strong\u003e, 1960-1973 (2022).\u003c/li\u003e\n\u003cli\u003eKellicut, D. C. et al. Emerging technology: hyperspectral imaging. \u003cem\u003ePerspect. Vasc. Surg. Endovasc. Ther.\u003c/em\u003e \u003cstrong\u003e16\u003c/strong\u003e, 53-57 (2004).\u003c/li\u003e\n\u003cli\u003eSchultz, R. A. et al. Hyperspectral imaging: a novel approach for microscopic analysis. \u003cem\u003eCytometry\u003c/em\u003e \u003cstrong\u003e43\u003c/strong\u003e, 239-247 (2001).\u003c/li\u003e\n\u003cli\u003eWu, Y. et al. Non-destructive prediction and pixel-level visualization of polysaccharide-based properties in ancient paper using SWNIR hyperspectral imaging and machine learning. \u003cem\u003eCarbohyd. Polym.\u003c/em\u003e \u003cstrong\u003e352\u003c/strong\u003e, 123198 (2025).\u003c/li\u003e\n\u003cli\u003eHwang, S.-W. et al. NIR-chemometric approaches for evaluating carbonization characteristics of hydrothermally carbonized lignin. \u003cem\u003eSci. Rep.\u003c/em\u003e \u003cstrong\u003e11\u003c/strong\u003e, 16979 (2021).\u003c/li\u003e\n\u003cli\u003eHwang, S.-W. et al. Investigation of NIR spectroscopy and electrical resistance-based approaches for moisture determination of logging residues and sweet sorghum. \u003cem\u003eBioRes\u003c/em\u003e\u003cem\u003eources\u003c/em\u003e \u003cstrong\u003e18\u003c/strong\u003e, 2064-2082 (2023).\u003c/li\u003e\n\u003cli\u003eSun, B., Liu, J., Liu, S. \u0026amp; Yang, Q. Application of FT-NIR-DR and FT-IR-ATR spectroscopy to estimate the chemical composition of bamboo (Neosinocalamus affinis Keng). \u003cem\u003eHolzforschung \u003c/em\u003e\u003cstrong\u003e65\u003c/strong\u003e, (2011).\u003c/li\u003e\n\u003cli\u003eMendez, J., Mendoza, L., Cruz-Tirado, J. P., Quevedo, R. \u0026amp; Siche, R. Trends in application of NIR and hyperspectral imaging for food authentication. \u003cem\u003eSci.\u003c/em\u003e\u003cem\u003e A\u003c/em\u003e\u003cem\u003egropecu\u003c/em\u003e\u003cem\u003e.\u003c/em\u003e \u003cstrong\u003e10\u003c/strong\u003e, 143-161 (2019).\u003c/li\u003e\n\u003cli\u003eMedus, L. D., Saban, M., Franc\u0026eacute;s-V\u0026iacute;llora, J. V., Bataller-Mompe\u0026aacute;n, M. \u0026amp; Rosado-Mu\u0026ntilde;oz, A. Hyperspectral image classification using CNN: application to industrial food packaging. \u003cem\u003eFood Control\u003c/em\u003e \u003cstrong\u003e125\u003c/strong\u003e, 107962 (2021).\u003c/li\u003e\n\u003cli\u003eMahesh, S., Jayas, D. S., Paliwal, J. \u0026amp; White, N. D. G. Hyperspectral imaging to classify and monitor quality of agricultural materials. \u003cem\u003eJ. Stored Prod. Res.\u003c/em\u003e \u003cstrong\u003e61\u003c/strong\u003e, 17-26 (2015).\u003c/li\u003e\n\u003cli\u003eAgilandeeswari, L., Prabukumar, M., Radhesyam, V., Phaneendra, K. L. N. B. \u0026amp; Farhan, A. Crop classification for agricultural applications in hyperspectral remote sensing images. \u003cem\u003eAppl. Sci.\u003c/em\u003e \u003cstrong\u003e12\u003c/strong\u003e, 1670 (2022).\u003c/li\u003e\n\u003cli\u003eTatzer, P., Wolf, M. \u0026amp; Panner, T. Industrial application for inline material sorting using hyperspectral imaging in the NIR range. \u003cem\u003eReal-Time Imaging\u003c/em\u003e \u003cstrong\u003e11\u003c/strong\u003e, 99-107 (2005).\u003c/li\u003e\n\u003cli\u003eHwang, S.-W. et al. Feature importance measures from random forest regressor using near-infrared spectra for predicting carbonization characteristics of kraft lignin-derived hydrochar. \u003cem\u003eJ. Wood Sci.\u003c/em\u003e \u003cstrong\u003e69\u003c/strong\u003e, (2023).\u003c/li\u003e\n\u003cli\u003eHwang, S.-W., Park, G., Kim, J., Kang, K.-H. \u0026amp; Lee, W.-H. One-dimensional convolutional neural networks with infrared spectroscopy for classifying the origin of printing paper. \u003cem\u003eBioResources\u003c/em\u003e \u003cstrong\u003e19\u003c/strong\u003e, 1633-1651 (2024).\u003c/li\u003e\n\u003cli\u003eSavitzky, A. \u0026amp; Golay, M. J. E. Smoothing and differentiation of data by simplified least squares procedures. \u003cem\u003eAnal. Chem.\u003c/em\u003e \u003cstrong\u003e36\u003c/strong\u003e, 1627-1639 (1964).\u003c/li\u003e\n\u003cli\u003eHahsler, M., Piekenbrock, M. \u0026amp; Doran, D. dbscan: fast density-based clustering with R. \u003cem\u003eJ. Stat. Soft.\u003c/em\u003e \u003cstrong\u003e91\u003c/strong\u003e, 1-30 (2019).\u003c/li\u003e\n\u003cli\u003eEster, M., Kriegel, H.-P., Sander, J. \u0026amp; Xu, X. A density-based algorithm for discovering clusters in large spatial databases with noise in \u003cem\u003eProceedings of the second international conference on knowledge discovery and data mining\u003c/em\u003e 226-231 (AAAI Press, 1996).\u003c/li\u003e\n\u003cli\u003eCampello, R. J., Moulavi, D. \u0026amp; Sander, J. Density-based clustering based on hierarchical density estimates in \u003cem\u003ePacific-Asia conference on knowledge discovery and data mining\u003c/em\u003e 160-172 (Springer Berlin Heidelberg, 2013).\u003c/li\u003e\n\u003cli\u003eSander, J., Ester, M., Kriegel, H.-P. \u0026amp; Xu, X. Density-based clustering in spatial databases: the algorithm orbscan and its applications. \u003cem\u003eData Min. Knowl. Disc\u003c/em\u003e\u003cem\u003eov\u003c/em\u003e\u003cem\u003e.\u003c/em\u003e \u003cstrong\u003e2\u003c/strong\u003e, 169-194 (1998).\u003c/li\u003e\n\u003cli\u003eVert, J.-P., Tsuda, K. Sch\u0026ouml;lkopf, B. A primer on the kernel methods. \u003cem\u003eKernel Methods Comput. Boil.\u003c/em\u003e \u003cstrong\u003e47\u003c/strong\u003e, 35-70 (2004).\u003c/li\u003e\n\u003cli\u003eChen, T. \u0026amp; Guestrin, C. Xgboost: a scalable tree boosting system in \u003cem\u003eProceedings of the ACM signed international conference on knowledge discovery and data mining \u003c/em\u003e785-794 (Association for Computing Machinery, 2016).\u003c/li\u003e\n\u003cli\u003eBreiman, L., Friedman, J., Olshen, R. Stone, C. \u003cem\u003eClassification and regression trees\u003c/em\u003e (Routledge, 1984).\u003c/li\u003e\n\u003cli\u003eKruse, F. A. et al. The spectral image processing system (SIPS)\u0026mdash;interactive visualization and analysis of imaging spectrometer data. \u003cem\u003eRemote Sens. Environ.\u003c/em\u003e \u003cstrong\u003e44\u003c/strong\u003e, 145-163 (1993).\u003c/li\u003e\n\u003cli\u003eLouppe, G., Wehenkel, L., Sutera, A. \u0026amp; Geurts, P. J. A. Understanding the variable importances in forests of randomized trees in \u003cem\u003eProceedings of the international conference on neural information processing systems\u003c/em\u003e 431-439 (Curran Associates Inc., 2013).\u003c/li\u003e\n\u003cli\u003eAltman, D. G. \u0026amp; Bland, J. M. Diagnostic tests 1: sensitivity and specificity. \u003cem\u003eBMJ\u003c/em\u003e \u003cstrong\u003e308\u003c/strong\u003e, 1552-1552 (1994).\u003c/li\u003e\n\u003cli\u003eVelez, D. R. et al. A balanced accuracy function for epistasis modeling in imbalanced datasets using multifactor dimensionality reduction. \u003cem\u003eGenet. Epidemiol.\u003c/em\u003e \u003cstrong\u003e31\u003c/strong\u003e, 306-315 (2007).\u003c/li\u003e\n\u003cli\u003eSmook, G. A. \u003cem\u003eHandbook for pulp \u0026amp; paper technologists\u003c/em\u003e (A. Wilde, 2002).\u003c/li\u003e\n\u003cli\u003eYan, C. et al. Analysis of handmade paper by Raman spectroscopy combined with machine learning. \u003cem\u003eJ. Raman Spectrosc.\u003c/em\u003e \u003cstrong\u003e53\u003c/strong\u003e, 260-271 (2022).\u003c/li\u003e\n\u003cli\u003eHan, B., Yang, Y., Wang, B., Jiang, H. \u0026amp; Sablier, M. Rapid identification of bast fibers in ancient handmade papers based on improved characterization of lignin monomers by Py-GCxGC/MS. \u003cem\u003eCellulose\u003c/em\u003e \u003cstrong\u003e30\u003c/strong\u003e, 575-590 (2023).\u003c/li\u003e\n\u003cli\u003eTravers, S., Bertelsen, M. G., Petersen, K. K. \u0026amp; Kucheryavskiy, S. V. Predicting pear (cv. Clara Frijs) dry matter and soluble solids content with near infrared spectroscopy. \u003cem\u003eLWT - Food Sci. Technol.\u003c/em\u003e \u003cstrong\u003e59\u003c/strong\u003e, 1107-1113 (2014).\u003c/li\u003e\n\u003cli\u003eKelley, S., Rials, T., Snell, R., Groom, L. \u0026amp; Sluiter, A. Use of near infrared spectroscopy to measure the chemical and mechanical properties of solid wood. \u003cem\u003eWood Sci. Technol.\u003c/em\u003e \u003cstrong\u003e38\u003c/strong\u003e, 257-276 (2004).\u003c/li\u003e\n\u003cli\u003eSchwanninger, M., Rodrigues, J. C. \u0026amp; Fackler, K. A review of band assignments in near infrared spectra of wood and wood components. \u003cem\u003eJ. Near Infrared Spec.\u003c/em\u003e \u003cstrong\u003e19\u003c/strong\u003e, 287-308 (2011).\u003c/li\u003e\n\u003cli\u003eQuintero Balbas, D., Lanterna, G., Cirrincione, C., Fontana, R. \u0026amp; Striova, J. Non-invasive identification of textile fibers using near-infrared fiber optics reflectance spectroscopy and multivariate classification techniques. \u003cem\u003eEur. Phys. J.\u003c/em\u003e\u003cem\u003e Plus\u003c/em\u003e \u003cstrong\u003e137\u003c/strong\u003e, 85 (2022).\u003c/li\u003e\n\u003cli\u003eOsborne, B. G. Fearn, T. \u003cem\u003eNear-infrared spectroscopy in food analysis\u003c/em\u003e (Longman Scientific \u0026amp; Technical New York, 1986).\u003c/li\u003e\n\u003cli\u003eZhang, X. \u0026amp; Wyeth, P. Moisture sorption as a potential condition marker for historic silks: noninvasive determination by near-infrared spectroscopy. \u003cem\u003eAppl. Spectrosc.\u003c/em\u003e \u003cstrong\u003e61\u003c/strong\u003e, 218-222 (2007).\u003c/li\u003e\n\u003cli\u003eHuang, A., Li, G., Fu, F. \u0026amp; Fei, B. Use of visible and near infrared spectroscopy to predict the klason lignin content of bamboo, Chinese fir, Paulownia, and Poplar. \u003cem\u003eJ. Wood Chem. Technol.\u003c/em\u003e \u003cstrong\u003e28\u003c/strong\u003e, 194-206 (2008).\u003c/li\u003e\n\u003cli\u003eZhou, C. et al. Rapid determination of cellulose content in pulp using near infrared modeling technique. \u003cem\u003eBioRes\u003c/em\u003e\u003cem\u003eources\u003c/em\u003e \u003cstrong\u003e13\u003c/strong\u003e, 6122-6132 (2018).\u003c/li\u003e\n\u003cli\u003eHorikawa, Y. Assessment of cellulose structural variety from different origins using near infrared spectroscopy. \u003cem\u003eCellulose\u003c/em\u003e \u003cstrong\u003e24\u003c/strong\u003e, 5313-5325 (2017).\u003c/li\u003e\n\u003cli\u003eHuang, C., Han, L., Liu, X. \u0026amp; Ma, L. The rapid estimation of cellulose, hemicellulose, and lignin contents in rice straw by near infrared spectroscopy. \u003cem\u003eEnergy Sources A: Recovery Util. Environ. Eff.\u003c/em\u003e \u003cstrong\u003e33\u003c/strong\u003e, 114-120 (2010).\u003c/li\u003e\n\u003cli\u003eGouveia, C. S. S., Lebot, V. \u0026amp; Pinheiro de Carvalho, M. NIRS estimation of drought stress on chemical quality constituents of taro (Colocasia esculenta L.) and sweet potato (Ipomoea batatas L.) flours. \u003cem\u003eAppl. Sci.\u003c/em\u003e \u003cstrong\u003e10\u003c/strong\u003e, 8724 (2020).\u003c/li\u003e\n\u003cli\u003eFont, R., del R\u0026iacute;o-Celestino, M., Luna, D., Gil, J. \u0026amp; de Haro-Bail\u0026oacute;n, A. Rapid and cost-effective assessment of the neutral and acid detergent fiber fractions of chickpea (Cicer arietinum L.) by combining modified PLS and visible with near-infrared spectroscopy. \u003cem\u003eAgronomy\u003c/em\u003e \u003cstrong\u003e11\u003c/strong\u003e, 666 (2021).\u003c/li\u003e\n\u003cli\u003eBebis, G. \u0026amp; Georgiopoulos, M. Feed-forward neural networks. \u003cem\u003eIEEE Potentials\u003c/em\u003e \u003cstrong\u003e13\u003c/strong\u003e, 27-31 (1994).\u003c/li\u003e\n\u003cli\u003eSvozil, D., Kvasnicka, V. \u0026amp; Pospichal, J. Introduction to multi-layer feed-forward neural networks. \u003cem\u003eChemometr. Intell. Lab.\u003c/em\u003e \u003cstrong\u003e39\u003c/strong\u003e, 43-62 (1997).\u003c/li\u003e\n\u003cli\u003eNielsen, D. \u003cem\u003eTree boosting with xgboost-why does xgboost win\u0026quot; every\u0026quot; machine learning competition?\u003c/em\u003e (NTNU, 2016).\u003c/li\u003e\n\u003cli\u003eDomingos, P. Occam\u0026apos;s two razors: the sharp and the blunt in \u003cem\u003eKDD\u003c/em\u003e 37-43 (Artificial Intelligence Group, 1998).\u003c/li\u003e\n\u003cli\u003eG\u0026uuml;m\u0026uuml;skaya, E., Usta, M. \u0026amp; Kirci, H. The effects of various pulping conditions on crystalline structure of cellulose in cotton linters. \u003cem\u003ePolym. Degrad. Stab.\u003c/em\u003e \u003cstrong\u003e81\u003c/strong\u003e, 559-564 (2003).\u003c/li\u003e\n\u003cli\u003eAbd El-Sayed, E. S., El-Sakhawy, M. \u0026amp; El-Sakhawy, M. A.-M. Non-wood fibers as raw material for pulp and paper industry. \u003cem\u003eNord. Pulp Paper Res.\u003c/em\u003e\u003cem\u003e J.\u003c/em\u003e \u003cstrong\u003e35\u003c/strong\u003e, 215-230 (2020).\u003c/li\u003e\n\u003cli\u003eLiu, Z., Wang, H. \u0026amp; Hui, L. Pulping and papermaking of non-wood fibers. \u003cem\u003ePulp Paper Process.\u003c/em\u003e 1, 4-31 (2018).\u003c/li\u003e\n\u003cli\u003eAreej, F., Ashadie, K. M., Zakiah, S. \u0026amp; Ainun, Z. Pulping process for nonwoody plants in \u003cem\u003ePulping and papermaking of norwood plant fibers\u003c/em\u003e 17-32 (Academic Press, 2023).\u003c/li\u003e\n\u003cli\u003eWeng, J. \u0026amp; Chen, G. The influence of papermaking process on the properties of chinese handmade bamboo paper. \u003cem\u003eRestaurator \u003c/em\u003e\u003cstrong\u003e46\u003c/strong\u003e, 59-83 (2025).\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Xuan paper, Washi, Hanji, DBSCAN, support vector machine (SVM), spectral angle mapper (SAM)","lastPublishedDoi":"10.21203/rs.3.rs-6397784/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-6397784/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eTraditional handmade papers such as Hanji, Washi, and Xuan paper hold substantial cultural and historical value across East Asia. However, their classification and authentication remain challenging due to variations in raw materials and manufacturing techniques. In this study, we propose a nondestructive approach using near-infrared (NIR) hyperspectral imaging combined with machine learning to classify traditional handmade papers from China, Japan, and Korea. NIR spectra (900\u0026ndash;1,700 nm) were extracted from hyperspectral images of 26 paper samples and preprocessed using first derivatives. Dimensionality reduction and clustering were performed using principal component analysis (PCA) and density-based spatial clustering of applications with noise (DBSCAN), which also identified outliers of spectra. Multiple classification models, including support vector machine (SVM), FNN, and XGBoost, were trained and evaluated, with SVM achieving the highest F1-score (1.000). Feature importance derived from XGBoost highlighted key spectral regions relevant to classification. Additionally, the spectral angle mapper (SAM) enabled pixel-wise visualization, revealing spectral heterogeneity among the samples. This study demonstrates the effectiveness of NIR hyperspectral imaging and machine learning for the rapid, interpretable, and noninvasive classification of traditional handmade papers, providing valuable tools for heritage conservation and authenticity verification.\u003c/p\u003e","manuscriptTitle":"Nondestructive detection and classification of traditional handmade paper using near-infrared hyperspectral imaging and machine learning","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-05-07 06:57:13","doi":"10.21203/rs.3.rs-6397784/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"03f80a1a-5c8d-4968-931d-4bd4dc70eed5","owner":[],"postedDate":"May 7th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[{"id":47672074,"name":"Physical sciences/Chemistry"},{"id":47672075,"name":"Physical sciences/Mathematics and computing"}],"tags":[],"updatedAt":"2025-08-01T05:53:14+00:00","versionOfRecord":[],"versionCreatedAt":"2025-05-07 06:57:13","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-6397784","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-6397784","identity":"rs-6397784","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00