Controllable Generation of Single Blue Calico Patterns Using Stable Diffusion Model

doi:10.21203/rs.3.rs-6422102/v1

Controllable Generation of Single Blue Calico Patterns Using Stable Diffusion Model

2025 · doi:10.21203/rs.3.rs-6422102/v1

preprint OA: closed

Full text JSON View at publisher

Full text 78,774 characters · extracted from preprint-html · click to expand

Controllable Generation of Single Blue Calico Patterns Using Stable Diffusion Model | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Controllable Generation of Single Blue Calico Patterns Using Stable Diffusion Model Hanlin Jin This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-6422102/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Artificial intelligence (AI) technologies are increasingly permeating various industries, and technological advancements have also brought new opportunities to traditional craftsmanship. This study focuses on the application of artificial intelligence-generated content (AIGC) to assist traditional printing and dyeing techniques, aiming to address the generation of new patterns for blue calico by constructing an end-to-end low-loss pattern generation framework. Through analyzing the stylistic characteristics of blue calico patterns and comparing how the Low-Rank Adaptation (LoRA) model optimizes the base model of Stable Diffusion, as well as the structural control effects of ControlNet on images, this paper ultimately establishes a single-pattern generation pathway for blue calico. This pathway integrates multi-model control based on LoRA and ControlNet with the generation mechanism of the Stable Diffusion model. Validation results demonstrate that patterns generated using this approach expand design content while preserving the intrinsic features of blue calico, thereby providing a digitally innovative solution that balances cultural heritage and technical innovation for the preservation of intangible cultural heritage in traditional printing and dyeing craftsmanship. Physical sciences/Mathematics and computing/Computer science Physical sciences/Mathematics and computing/Information technology Artificial Intelligence Blue Calico Stable Diffusion LoRA Model ControlNet Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 Figure 9 Figure 10 Figure 11 I. Introduction The bidirectional driving effect of technological innovation and design innovation has opened up new avenues for the dynamic inheritance of traditional craftsmanship. As an intangible cultural heritage of China, blue calico has a wide application foundation in ancient production and life fields due to its simple and elegant aesthetics of blue and white contrast. Current studies in the academic community on blue calico primarily focus on humanistic dimensions such as symbolic semantic analysis of patterns, historical tracing, and craftsmanship textual research. However, significant research gaps remain in technological innovation aspects like pattern generation. The design and practice of blue calico patterns face dual challenges: insufficient understanding of the cultural essence of traditional printing and dyeing among professional designers, and limitations of intangible cultural heritage inheritors in commercial design. These factors lead to the prevalent approach of fragmenting and stacking traditional elements in contemporary blue calico production. This method is not only limited by material resources but also struggles to meet the rapid changes in market demands. However, breakthrough advancements in artificial intelligence (AI) technology for image generation have provided a novel technical path to address this issue. With the continuous iteration of image generation technologies based on diverse training logics, the applicability of Artificial Intelligence Generated Content (AIGC) has been increasingly broadened. In the field of AI-generated content, Stable Diffusion has emerged as one of the iconic models marking the AI industry's transition from the era of traditional deep learning to the AIGC era, leveraging its superior image generation capabilities. Stable Diffusion, a large-scale model, exhibits distinctive strengths in generating diverse content, capable of providing ample inspirational resources for the design field. However, the model's intricate training procedures and massive dataset make it challenging to focus on a specific domain. Therefore, Hu et al. proposed the concept of Low-Rank Adaptation (LoRA) for large language models and introduced it into image generation models, aiming to add specific features to generated content with minimal data [1] . Zhang et al. developed the fine-grained control method "ControlNet," which was applied to image generation tasks to enable precise control over image semantics and structure [2] . Building upon the aforesaid research, this study analyzed the core features of blue calico to identify the essential information of the craft, constructed a LoRA model adhering to the style constraints of blue calico, and integrated text-to-image generation technology to achieve digital style transfer for traditional printing and dyeing. This approach provides technical support for the profound integration of technology and craftsmanship. II. Introduction to the Characteristics of Blue Calico As a significant branch of traditional Chinese resist dyeing craftsmanship, this craft employs natural indigo extract as its coloring medium. It uses a resist paste formulated from soybean powder and lime powder, combined with a paper-cut stencil technique, collectively forming a printing and dyeing craft with distinct Oriental aesthetic attributes. As a quintessential representative of traditional Chinese resist dyeing craftsmanship, blue calico craftsmanship emerged in the Pre-Qin period, evolved through breakthroughs in printing technology and advancements in immersion dyeing, and culminated in the establishment of a complete craftsmanship system by the Sui and Tang dynasties. During the Southern Song Dynasty, with the invention and popularization of oil-paper, printing and dyeing artisans creatively applied it to stencil-making techniques, completely replacing traditional wooden stencils with oil-paper engraved stencils. Concurrently, improvements in alkali agent formulas propelled the blue calico craftsmanship to achieve leapfrog development [ 3 ] , gradually increasing its proportion in folk production and daily life and gaining widespread popularity among the public. In ancient Chinese social systems, the regulatory framework governing permissible apparel colors based on social hierarchy profoundly influenced the developmental trajectory of textile art. According to the record of "dress color, dress color" in the book of rites · yuzao, the five color hierarchy (green, red, yellow, white and Xuan) formed since the Zhou Dynasty has built a strict dress color ethics. Under this institutional regulation, commoners developed plant-based indigo dyeing techniques as a technical approach to circumvent color taboos, allowing blue calico products to gradually emerge as significant material manifestations of folk clothing culture during the Song and Yuan dynasties. As shown in Fig. 1 , during the Southern Song Dynasty, the large-scale application of oil-paper medium replaced traditional wooden engraved stencils. Craftsmen of the Song Dynasty improved alkali agents into a soybean-and-lime resist paste to block dye penetration, using hand-carved paper stencils for filtering and printing the paste. After the fabric was dyed and dried, the resist paste was removed, resulting in blue-and-white decorative patterns on the cloth surface. As a representative craft of traditional printing and dyeing, blue calico exhibits a unique style distinct from other traditional dyeing techniques. As shown in Fig. 2 , blue calico utilizes stencils to transfer resist paste into patterns. During stencil engraving, the "segmented cutting" technique is employed to break long lines into sections, creating characteristic discontinuous visual effects. As a core technique of blue calico, "segmented cutting" involves systematic planning of dots, lines, and planes by stencil engravers, combined with precise control over blade trajectories. This process ultimately creates decorative patterns characterized by a distinctive dot-line-plane combination and an aesthetic appeal of "disconnected cuts with continuous intent." As described in National Intangible Cultural Heritage of Jingchu by Zuo Shanghong et al., the broken knife technique addresses linear and planar pattern details characterized by elongated or highly curved lines (e.g., plant stems/leaves, long flower petals, ribbons, tree branches, and wave patterns). Due to limitations in craftsmanship and tools, this unique method involves segmenting the engraved patterns into multiple sections and connecting them sequentially through "bridging segments" (过桥) [ 9 ] . The "disconnected cuts with continuous intent" stencil-engraving technique requires shortening line lengths while maintaining pattern continuity, ensuring compact lines during paste application to prevent leakage. This process thereby creates the characteristic combination of continuous dots and broken lines unique to traditional blue calico. Additionally, patterns are also one of the artistic characteristics of blue calico. Patterns on blue calico predominantly feature scattered flowers, twining flowers, and checked patterns. Layouts typically employ either a scattered dot arrangement or an all-over four-directional continuous design. Depending on fabric structure, additional motifs may include rare birds and mythical beasts, landscapes and figures, geometric patterns, and auspicious characters [ 11 ] . The formation of blue calico's pattern system is influenced by economic foundations, local customs, and living habits, culminating in a distinctive artistic style. The Jiangnan region, relying on a mature commercial economy and refined craftsmanship traditions, developed diverse forms and exquisite stencil-engraving techniques, resulting in elegant shapes and intricate decorative motifs. Blue calico from the Hunan and Hubei region exhibits distinct Chu cultural characteristics, using dots as the basis to outline patterns or form blocky surfaces, resulting in visually rich and decorative effects. Meanwhile, northern Chinese patterns primarily feature leek flower and eggplant flower motifs, characterized by simple and bold designs. Blue calico takes natural indigo as its foundational hue, blending simple yet elegant colors with patterns featuring "interlacing voids and solids, disconnected cuts with continuous intent." This synthesis not only exemplifies the implicit and introverted aesthetic charm of Oriental artistry but also carries millennia-old cultural heritage. Through its unique visual attributes and stylistic features, blue calico offers novel insights for contemporary printing and dyeing design. III. Method 3.1 Foundation Model Construction The Latent Diffusion Model (LDM), serving as the core architecture for generative image content models, demonstrates significant advantages in terms of image content diversity and creativity. Developed by the CompVis research group at the University of Munich, the Stable Diffusion model operates based on a progressive noise transformation mechanism, implementing iterative approximation from normal distribution to target data distribution via Markov chains [ 6 ] . Its core architecture is depicted in Fig. 3: first, the encoder (E) of the perceptual compression model maps input images into a latent space. Random normal distribution noise is then added to the latent space over a Markov chain of length "T". Next, conditionally denoising autoencoders driven by text-image semantic cross-attention mechanisms perform iterative noise reduction. Finally, the decoder (D) reconstructs pixel-level images. The perceptual compression model extracts high-frequency details that are less perceptually significant to human vision from the data, resulting in a low-dimensional latent space that substantially improves training efficiency [ 7 ] . The underlying logic of diffusion models and the broad generality of large models lead to high randomness in generated content, resulting in different outputs even when the same semantic prompts and generation parameters are used. However, this stochastic property also enables the generation of diverse and inspirational reference images for blue calico patterns using identical prompts during the pattern generation process. The Stable Diffusion 1.5 architecture adopted in this experiment, while possessing robust general generation capabilities that meet basic pattern creation needs, exhibits significant limitations when addressing the high specificity and unconventional expression requirements of traditional craftsmanship. To overcome this technical bottleneck, it is urgent to explore an effective solution that leverages image generation technology to effectively transfer the core features of blue calico patterns into generated content. 3.2 Transfer Model Construction The core structure of the Stable Diffusion Model is made up of two main components: the Text Encoder and the Noise Predictor. In the current leading optimization approaches, the Textual Inversion technology, which is used for fine - tuning large models in downstream applications, only makes limited adjustments to the text encoder [ 8 ] . In contrast, advanced methods such as Dream Booth adopt a full - parameter joint optimization strategy. However, this approach requires simultaneous modification of the parameter spaces of both the text encoder and the noise predictor. As a result, the training complexity grows exponentially. Traditional full-parameter fine-tuning methods require global adjustments to original weight matrices, while the introduction of LoRA (Low-Rank Adaptation) technology provides an innovative parameter optimization approach. LoRA freezes pre-trained model weights and injects trainable layers—rank-decomposed matrices—into each Transformer block. This approach reduces required training parameters while maintaining learning efficacy. As shown in Fig. 4, the architecture establishes a parallel branch alongside the original pre-trained language model (PLM), simulating the so-called intrinsic rank through dimensionality reduction followed by reconstruction. Specifically, assuming the pre-trained parameter weights are denoted as W0, the model formula during fine-tuning phase is: $$\:h={W}_{0}+\varDelta\:W$$ In the formula, ΔW represents the magnitude of weight updates during fine-tuning. For LoRA, only ΔW needs to be fine-tuned, which is factorized into matrix BA. Here, B corresponds to the product of rank and rows in the original model matrix, while A corresponds to the product of rank and columns. The parameter count of rank is significantly smaller than the parameters requiring updates in the original model. Consequently, during the forward pass, both W₀ and ΔW are multiplied by the same input x, and the final additive formula is: $$\:h={W}_{0X}+\varDelta\:{W}_{X}={W}_{0X}+{BA}_{X}$$ LoRA introduces minor modifications to the critical cross-attention layers of the Stable Diffusion model. By freezing the pre-trained model weights, it injects trainable factorized matrices into the transformation model. The factorized matrices contain significantly fewer parameters compared to the original model's matrices. By leveraging this approach, adjusting domain-specific features of the latent diffusion model in practical applications requires only a training dataset of a dozen or so images. This enables high-quality and efficient training of blue calico style models using limited data resources, thus facilitating rapid and flexible implementation of targeted style transfer. 3.3 Control Model Construction During the image generation process of Stable Diffusion, the high generalization capacity of large models often results in a high proportion of stylistically divergent image outputs under identical prompts, making precise control over image composition or object morphology challenging. To address this, a research team at Stanford University proposed the "ControlNet" architecture—a novel end-to-end neural network framework designed to condition diffusion models for task-specific input requirements. ControlNet achieves generative process control by introducing additional conditioning factors [ 4 ] . In style transfer applications, it can utilize a depth map preprocessor to extract depth maps as control inputs for enforcing predefined shapes. The architecture modulates individual neural network blocks (Fig. 5 ). Without ControlNet, the diffusion model's native neural network F maps input x to output y using parameters Θ, following the formulation: $$\:y=F(x;\varTheta\:)$$ In ControlNet (Fig. 6 ), the architecture freezes the U-Net encoder of Stable Diffusion while copying the original model weights into two submodules: a "trainable copy" and a "frozen copy," while connecting them via "Zeroconvolution". The frozen copy maintains parameter immutability throughout training, whereas the trainable copy undergoes conditional fine-tuning through applied control signals. This architecture precisely confines adaptive training of data information within the trainable module, effectively mitigating the catastrophic forgetting commonly observed in traditional fine-tuning approaches. The core operation can be formalized as: $$\:{y}_{c}=F(x;\varTheta\:)+Z(F\left(x+Z\left(c;{\varTheta\:}_{z1}\right);{\varTheta\:}_{c}\right);{\varTheta\:}_{z2})$$ In this formulation, ‘Z’ denotes Zeroconvolution—a 1×1 convolution with initialized weights and biases set to zero. Here, Θz1 and Θz2 represent the parameters of the two Zeroconvolution layers, while ‘c’ corresponds to the input control signal. The control signal undergoes feature transformation through the Zeroconvolution layers, and the transformed result is superimposed onto the original input before being fed into ControlNet's "frozen copy" module. The module output, after undergoing subsequent Zeroconvolution processing, is added to the output features of the original network. During the model initialization phase, since all weight parameters of the Zeroconvolution layers are initialized to zero, the feature stream passing through two Zeroconvolution operations maintains an all-zero output. This design not only enables flexible integration of control signals but also strictly preserves the knowledge integrity of the original network, laying a reliable foundation for subsequent progressive fine-tuning. IV. Results 4.1 LoRA Model Training and Data Preparation 4.1.1 Experimental Environment and Training Parameter Setup The experimental environment for model training and image generation is as follows: CPU: 12th Gen Intel® Core™ i5–12490F, GPU: NVIDIA GeForce RTX 4060 Ti 8G, System: Windows 10, Programming language: Python. The SD1.5 model is selected as the base model. The training image set consists of 60 image materials with a resolution of 512×512 pixels. The batch size is 1, the maximum number of training epochs (Max Train Epochs) is 30. The learning rate for the unet (Unte_lr) is 1e − 4 (0.0001), and the learning rate for the text encoder (Text Encoder lr) is 1e − 5 (0.00001). The optimizer type is AdamW8. The trained model is saved every 3 epochs. 4.1.2 Model Training Dataset Material Preparation Based on the previous analysis of blue calico patterns, it is evident that blue calico is characterized by its blue-and-white color scheme and the "disconnected cuts with continuous intent" expression influenced by the "segmented cutting" technique. Therefore, when collecting training dataset materials, image resources that align with these characteristics should be prioritized. Simultaneously, to achieve model training precision, preprocessing of selected images is necessary. Using Photoshop software, the blue calico patterns are preprocessed by segmenting visual elements to minimize interference from unnecessary image information, ensuring pattern accuracy. In the prompt composition section, centered on aspects such as visual effects, pattern content, and craft characteristics, tags are written for training dataset images using the software dataset tag editor. Prompts are derived through reverse engineering of image materials, with manual adjustment applied to specific tags based on visual analysis. 4.2 Model Validation and Result Analysis 4.2.1 Repeat and Epochs Impact on Transferability During LoRA model training, the relationship between image repetition in the training dataset and model performance exhibits a non-linear association. Basic theory posits that increased repetition typically enhances model sensitivity to specific samples, while prolonging training cycles (Epochs) drives loss function values toward the minimum, thereby enhancing model robustness. However, in LoRA models using the mean squared error loss function, when training epochs exceed the critical threshold, the risk of overfitting increases significantly. As shown in Fig. 7, when the Repeat setting for in-dataset images is increased to 20 iterations, the model's learning of blue calico style continues to improve with increasing Epochs. At the Epochs = 3 stage, generated images are dominated by the descriptive content of prompt words, with minimal blue calico style information. As training epochs increase, the proportion of pattern style information gradually increases. At Epochs = 15, the generated patterns primarily exhibit blue calico style. However, when the Repeat setting is increased to 24, weakening of prompt word recognition is observed in the model. At Repeat = 30, the model's generalization capability fails, and the emergence of training dataset image duplication confirms that excessive training leads to overfitting in the LoRA model, causing the training to enter a negative progression state of generating content replication. Through comparative experiments, it was found that in the training of the blue calico style model, when Repeat = 20 for each image material and Repeat is set between 12 and 21, this range can not only fully preserve the color characteristics of "blue - and - white alternation" and the craftsmanship feature of "disconnected lines with continuous meaning", but also enrich the inspiration samples of the blue calico style through the analysis of prompt words. 4.2.2 Influence of Label Input on Style Transfer This study verifies the impact of keyword input methods on the rendering effects of automatically generated images through controlled variables, and uses this as a basis to identify optimal keywords for generating images that better reflect blue calico characteristics. During the training process, three control model groups—Lan A, Lan B, and Lan C—were established: The LoRA A model in Lan A added the label word "huabu" to all training images while removing other label words, ensuring the uniqueness of "huabu" in the training dataset labels;The LoRA B model in Lan B added "huabu" to the original label words; whereas the LoRA C model in Lan C, which served as the experimental control, did not add any label words. In the generation experiment, the shared prompt words were: "A bird, spread wings, blue and white color scheme, clear structure, indigo cotton background, retro blue printed fabric, hand cut template texture, no human presence." Additionally, all three model groups generated images with the "huabu" trigger label word appended to the shared prompt. As shown in Fig. 8, when the Lan A and Lan B models appended the "huabu" trigger label word, the visual effects and craftsmanship features of blue calico patterns were significantly enhanced in the generated images. Not only did the main elements partially replicate the sample characteristics of the dataset, but they also aligned with the unique aesthetic principle of "disconnected lines with continuous meaning" inherent to this craft. In contrast, although the generation results without prompt words retained some blue calico features, they inevitably incorporated reflections of semantic information from other prompt words, leading to the dilution of the craft's special aesthetic appeal. The control group Lan C did not respond to the "huabu" trigger label word. Although the generated images retained certain blue calico pattern features in terms of color and elements, they more flexibly reflected the descriptive semantic information from other keywords. Experimental data indicate that the addition of trigger label words significantly impacts the style of generated images during LoRA model training. However, whether to preserve the uniqueness of label semantics also has a notable influence on generated content. While adopting a uniform labeling system can enhance the expression of blue calico style features, it reduces the model's generalization flexibility. This approach yields more ideal output results in specific style-controlled scenarios. 4.2.3 Influence of the Control Model on Style Transfer The LoRA model for blue calico demonstrates varying effectiveness in image generation tasks. For instance, it exhibits strong representational capabilities for traditional patterns with simple structures such as birds, animals, and flowers, achieving high alignment between generated results and predefined objectives. However, during the generation of complex multi-element patterns, the model manifests insufficient decoupling capability. Take the human figure pattern holding a lotus as an example. When the "huabu" model trigger label word and three primary prompt words—"a person, stand, hold a lotus"—are inputted, the generated image after LoRA fine-tuning is shown in Fig. 9. Although the generated images retain blue calico features and exhibit basic form recognition, there are significant deficiencies in detail rendering. The fusion of element boundaries among the three motifs results in reduced subject recognition, with the aesthetic appeal and precision of detail rendering in the patterns significantly lower than those of simple object generation. In response to the above issues, the introduction of ControlNet enables further control over generated patterns. By leveraging depth information extracted from reference images, it effectively regulates the elemental structure and local detail features during the blue calico pattern generation process. The ControlNet-depth control model incorporates four distinct computational methods—leres, lere++, midas, and zone—each differing in their emphasis on processing image depth information. As shown in Fig. 10, leres and lere + + yield richer details, while midas and zone exhibit higher contrast. Higher detail levels correspond to denser structural frameworks, requiring more content reproduction and potentially generating extraneous output less relevant to the current validation. Therefore, a high-contrast depth_zone was selected for this validation. As shown in Fig. 11, by using "huabu, a person, stand, hold a lotus" and the depth map as joint conditional inputs, it is possible to generate patterns that adhere to the blue calico style while achieving more precise element boundaries and structures. This experimental result validates the effectiveness of ControlNet in controlling image structure and enhancing blue calico style transfer. V. Conclusion As a significant representative of China’s intangible cultural heritage, Huixie Blue Printing embodies dual value in cultural heritage preservation and technological innovation through its digital inheritance and style transfer research. Addressing the digital conservation needs of traditional craftsmanship, this study presents an innovative approach to Huixie Blue Printing style transfer via LoRA model fine-tuning. By adapting LoRA—originally designed for large language model fine-tuning—to the Stable Diffusion model, and integrating text prompts, LoRA models, and ControlNet-depth constraints, the proposed method enables stable generation of digital works that harmonize traditional Huixie Blue Printing stylistic features with modern design aesthetics. This provides a replicable technical framework for the dynamic inheritance of intangible cultural heritage techniques. Although current intelligent generation systems still face limitations in pattern detail reconstruction, iterative advancements in deep learning will enable higher precision in cultural element extraction and style control. The "AI + intangible cultural heritage" approach is emerging as an effective pathway for profound integration of technology and craftsmanship, which will play an increasingly significant role in the digital inheritance of intangible cultural heritage. Declarations Author Contributions ： Yu Yong ,and Jin Hanlin." Research on Lacquer Art Molding Techniques under the Influence of Digital Technology ." Journal of Chinese Lacquer 43.01(2024):23-26+52.doi:10.19334/j.cnki.issn.1000-7067.2024.01.006. Jin Hanlin." Image Analysis of Yangjiabu "Theatrical Scenes" New Year Pictures: Taking *Borrowing Arrows with Straw Boats* as an Example." Couplets 29.06(2023):40-43. Data Availability Statement: All data generated or analyzed during this study are included in this published article. Conflict of Interest Statement: The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Author: JIN Hanlin* [email protected] References Hu, E. J. et al. LoRA: Low-Rank Adaptation Large Lang. Models (2021). Zhang, L. et al. Adding Conditional Control to Diffusion Models with Reinforcement Learning. (2024). CHEN et al. Ancient Printing Technology of Hanfu. J. Cloth. Res. 3 (1), 7 (2018). Karras, T., Laine, S., Aila, T. & Recognition, P. A Style-Based Generator Architecture for Generative Adversarial Networks, 2019 IEEE/CVF Conference on Computer Vision and (CVPR), Long Beach, CA, USA, pp. 4396–4405, (2019). 10.1109/CVPR.2019.00453 Duthé, G. et al. Flexible Multi-Fidelity Framework for Load Estimation of Wind Farms through Graph Neural Networks and Transfer Learning. Data-Centric Eng. 5 , e29 (2024). Web. Li Guo, Z., Tiandu & Zhiwei, X. On the Eve of the Technological Revolution: Architectural and Scene Design Innovation under the Wave of Generative AI Tools. Chin. Overseas Archit. 9 , 24–28 (2023). Rombach, R. et al. High-Resolution Image Synthesis Latent Diffus. Models (2021). Ruiz, N. et al. DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation. (2022). Youcheng, T. A. N. G. et al. Application analysis of automatic batching system for intelligent leather production. Leather Sci. Eng. 34 (2), 30–36 (2024). Zuo Shanghong. et al. Natl. Intangible Cult. Herit. Jingchu Hubei People’s Publishing House , :6. (2008). YE You-xin. Shandong Folk Indigo Printed Cloth. Ji' nan 19 (Shandong Fine Arts Publishing House, 1986). Dosovitskiy, A., Springenberg, I. T. & Brox, T. Learning to generate chairs with convolutional neural networks. CoRR , (2014). Additional Declarations No competing interests reported. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-6422102","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":459563556,"identity":"f65a08f6-20f7-49f8-9359-55ab0db68707","order_by":0,"name":"Hanlin Jin","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAyElEQVRIiWNgGAWjYBACfoYDCYd//rHh4WdvIFKLZOOBh4cZG9LkJHsOEKnF4PDBx0Ath40NbiQQ67JjhxMOF+44nDhz5uONNxhqbKIJ6mDsOZZweOaZ9MR+6bRiC4ZjabkNhLQwS5xJOMDDZp04c3aOmQTQhYS1sMm//wDUwpy44eYZIrXwgAKZt80Z6H0eIrVIALUcnHEGFMhAvyQQ4xf7AweSP3yoAEXl4Y03PtTYENaCDAwkEkhRDtFCqo5RMApGwSgYGQAAxIZKrjN5dQ0AAAAASUVORK5CYII=","orcid":"","institution":"Qilu University of Technology","correspondingAuthor":true,"prefix":"","firstName":"Hanlin","middleName":"","lastName":"Jin","suffix":""}],"badges":[],"createdAt":"2025-04-10 16:53:19","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-6422102/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-6422102/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":83334540,"identity":"6ffda572-0226-4257-88cc-2a333489b99c","added_by":"auto","created_at":"2025-05-23 08:49:13","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":478617,"visible":true,"origin":"","legend":"\u003cp\u003eBlue Calico Production Process\u003c/p\u003e","description":"","filename":"image1.png","url":"https://assets-eu.researchsquare.com/files/rs-6422102/v1/375abd70ccbecfddae332c3a.png"},{"id":83335352,"identity":"fed26493-c7ee-4e5b-9b44-db2e120f65e2","added_by":"auto","created_at":"2025-05-23 08:57:13","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":375938,"visible":true,"origin":"","legend":"\u003cp\u003eBroken Knife Technique and Pattern Effect\u003c/p\u003e","description":"","filename":"image2.png","url":"https://assets-eu.researchsquare.com/files/rs-6422102/v1/fba7cdb83608cd7d3857750d.png"},{"id":83334539,"identity":"6fae2a23-3fe4-4179-bf49-d4d1c87c923a","added_by":"auto","created_at":"2025-05-23 08:49:13","extension":"jpeg","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":20672,"visible":true,"origin":"","legend":"\u003cp\u003eStable Diffusion Model Framework\u003c/p\u003e","description":"","filename":"image3.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-6422102/v1/e59a4bcdacc7e3339eecff64.jpeg"},{"id":83334543,"identity":"e4c78268-f33d-4083-8c1d-3d581d8a1f9a","added_by":"auto","created_at":"2025-05-23 08:49:13","extension":"jpeg","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":15682,"visible":true,"origin":"","legend":"\u003cp\u003eLora Model Architecture Diagram\u003c/p\u003e","description":"","filename":"image4.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-6422102/v1/c314dc140cfa25aab5623eb5.jpeg"},{"id":83334545,"identity":"b019ee55-9b41-4cdd-8293-14f7cb485e37","added_by":"auto","created_at":"2025-05-23 08:49:13","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":53568,"visible":true,"origin":"","legend":"\u003cp\u003eControlNet Network Architecture\u003c/p\u003e","description":"","filename":"image5.png","url":"https://assets-eu.researchsquare.com/files/rs-6422102/v1/8064db36ba55bdfc31d6ae78.png"},{"id":83335354,"identity":"a03d8c27-ab46-486f-89f1-aca8111a1fd4","added_by":"auto","created_at":"2025-05-23 08:57:13","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":172958,"visible":true,"origin":"","legend":"\u003cp\u003eControlNet Network Architecture Implementation in Stable Diffusion\u003c/p\u003e","description":"","filename":"image6.png","url":"https://assets-eu.researchsquare.com/files/rs-6422102/v1/fefcb9df8de9c384e04cdb3d.png"},{"id":83334546,"identity":"68240e4b-1005-4f5a-80bd-46edef947497","added_by":"auto","created_at":"2025-05-23 08:49:13","extension":"png","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":849123,"visible":true,"origin":"","legend":"\u003cp\u003eComparison of Generated Content under Repeat and Epochs\u003c/p\u003e","description":"","filename":"image7.png","url":"https://assets-eu.researchsquare.com/files/rs-6422102/v1/571f18dadd5dd2a75538d4e3.png"},{"id":83334553,"identity":"1df470a8-a623-4a74-8f06-c4c07a29deb5","added_by":"auto","created_at":"2025-05-23 08:49:13","extension":"png","order_by":8,"title":"Figure 8","display":"","copyAsset":false,"role":"figure","size":618507,"visible":true,"origin":"","legend":"\u003cp\u003eComparison of Label Writing Methods\u003c/p\u003e","description":"","filename":"image8.png","url":"https://assets-eu.researchsquare.com/files/rs-6422102/v1/70e1578bd23e040970435b06.png"},{"id":83335461,"identity":"281440da-07a9-423c-91ec-b93249e9c4ce","added_by":"auto","created_at":"2025-05-23 09:05:13","extension":"png","order_by":9,"title":"Figure 9","display":"","copyAsset":false,"role":"figure","size":467944,"visible":true,"origin":"","legend":"\u003cp\u003eMulti-Element Generation Effect After LoRA Fine-Tuning\u003c/p\u003e","description":"","filename":"image9.png","url":"https://assets-eu.researchsquare.com/files/rs-6422102/v1/0179ccd34b03b25d3028993d.png"},{"id":83334549,"identity":"bcc9a94c-5ca0-4a9d-93df-2e793210a2de","added_by":"auto","created_at":"2025-05-23 08:49:13","extension":"png","order_by":10,"title":"Figure 10","display":"","copyAsset":false,"role":"figure","size":98991,"visible":true,"origin":"","legend":"\u003cp\u003eFour Depth Modes in ControlNet\u003c/p\u003e","description":"","filename":"image10.png","url":"https://assets-eu.researchsquare.com/files/rs-6422102/v1/d71bdc55ff1569dc572c508f.png"},{"id":83334554,"identity":"8fa05dc7-6069-4dee-a7b9-3cc0d2f41374","added_by":"auto","created_at":"2025-05-23 08:49:13","extension":"png","order_by":11,"title":"Figure 11","display":"","copyAsset":false,"role":"figure","size":461245,"visible":true,"origin":"","legend":"\u003cp\u003eComparison of ControlNet Generation Results\u003c/p\u003e","description":"","filename":"image11.png","url":"https://assets-eu.researchsquare.com/files/rs-6422102/v1/3018182cb5776ab832d00e2c.png"},{"id":85653535,"identity":"792f7306-0ada-4335-a43f-f342c2a96aff","added_by":"auto","created_at":"2025-06-30 09:54:02","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":4485906,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-6422102/v1/ad1f7c4d-255a-4d56-8d93-4c9d76c988fa.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Controllable Generation of Single Blue Calico Patterns Using Stable Diffusion Model","fulltext":[{"header":"I. Introduction","content":"\u003cp\u003eThe bidirectional driving effect of technological innovation and design innovation has opened up new avenues for the dynamic inheritance of traditional craftsmanship. As an intangible cultural heritage of China, blue calico has a wide application foundation in ancient production and life fields due to its simple and elegant aesthetics of blue and white contrast.\u003c/p\u003e\n\u003cp\u003eCurrent studies in the academic community on blue calico primarily focus on humanistic dimensions such as symbolic semantic analysis of patterns, historical tracing, and craftsmanship textual research. However, significant research gaps remain in technological innovation aspects like pattern generation. The design and practice of blue calico patterns face dual challenges: insufficient understanding of the cultural essence of traditional printing and dyeing among professional designers, and limitations of intangible cultural heritage inheritors in commercial design. These factors lead to the prevalent approach of fragmenting and stacking traditional elements in contemporary blue calico production. This method is not only limited by material resources but also struggles to meet the rapid changes in market demands. However, breakthrough advancements in artificial intelligence (AI) technology for image generation have provided a novel technical path to address this issue.\u003c/p\u003e\n\u003cp\u003eWith the continuous iteration of image generation technologies based on diverse training logics, the applicability of Artificial Intelligence Generated Content (AIGC) has been increasingly broadened. In the field of AI-generated content, Stable Diffusion has emerged as one of the iconic models marking the AI industry's transition from the era of traditional deep learning to the AIGC era, leveraging its superior image generation capabilities. Stable Diffusion, a large-scale model, exhibits distinctive strengths in generating diverse content, capable of providing ample inspirational resources for the design field. However, the model's intricate training procedures and massive dataset make it challenging to focus on a specific domain. Therefore, Hu et al. proposed the concept of Low-Rank Adaptation (LoRA) for large language models and introduced it into image generation models, aiming to add specific features to generated content with minimal data\u003csup\u003e[1]\u003c/sup\u003e. Zhang et al. developed the fine-grained control method \"ControlNet,\" which was applied to image generation tasks to enable precise control over image semantics and structure\u003csup\u003e[2]\u003c/sup\u003e.\u003c/p\u003e\n\u003cp\u003eBuilding upon the aforesaid research, this study analyzed the core features of blue calico to identify the essential information of the craft, constructed a LoRA model adhering to the style constraints of blue calico, and integrated text-to-image generation technology to achieve digital style transfer for traditional printing and dyeing. This approach provides technical support for the profound integration of technology and craftsmanship.\u003c/p\u003e"},{"header":"II. Introduction to the Characteristics of Blue Calico","content":"\u003cp\u003eAs a significant branch of traditional Chinese resist dyeing craftsmanship, this craft employs natural indigo extract as its coloring medium. It uses a resist paste formulated from soybean powder and lime powder, combined with a paper-cut stencil technique, collectively forming a printing and dyeing craft with distinct Oriental aesthetic attributes. As a quintessential representative of traditional Chinese resist dyeing craftsmanship, blue calico craftsmanship emerged in the Pre-Qin period, evolved through breakthroughs in printing technology and advancements in immersion dyeing, and culminated in the establishment of a complete craftsmanship system by the Sui and Tang dynasties. During the Southern Song Dynasty, with the invention and popularization of oil-paper, printing and dyeing artisans creatively applied it to stencil-making techniques, completely replacing traditional wooden stencils with oil-paper engraved stencils. Concurrently, improvements in alkali agent formulas propelled the blue calico craftsmanship to achieve leapfrog development\u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e3\u003c/span\u003e]\u003c/sup\u003e, gradually increasing its proportion in folk production and daily life and gaining widespread popularity among the public.\u003c/p\u003e\n\u003cp\u003eIn ancient Chinese social systems, the regulatory framework governing permissible apparel colors based on social hierarchy profoundly influenced the developmental trajectory of textile art. According to the record of \"dress color, dress color\" in the book of rites \u0026middot; yuzao, the five color hierarchy (green, red, yellow, white and Xuan) formed since the Zhou Dynasty has built a strict dress color ethics. Under this institutional regulation, commoners developed plant-based indigo dyeing techniques as a technical approach to circumvent color taboos, allowing blue calico products to gradually emerge as significant material manifestations of folk clothing culture during the Song and Yuan dynasties. As shown in Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e1\u003c/span\u003e, during the Southern Song Dynasty, the large-scale application of oil-paper medium replaced traditional wooden engraved stencils. Craftsmen of the Song Dynasty improved alkali agents into a soybean-and-lime resist paste to block dye penetration, using hand-carved paper stencils for filtering and printing the paste. After the fabric was dyed and dried, the resist paste was removed, resulting in blue-and-white decorative patterns on the cloth surface.\u003c/p\u003e\n\u003cp\u003eAs a representative craft of traditional printing and dyeing, blue calico exhibits a unique style distinct from other traditional dyeing techniques. As shown in Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e2\u003c/span\u003e, blue calico utilizes stencils to transfer resist paste into patterns. During stencil engraving, the \"segmented cutting\" technique is employed to break long lines into sections, creating characteristic discontinuous visual effects. As a core technique of blue calico, \"segmented cutting\" involves systematic planning of dots, lines, and planes by stencil engravers, combined with precise control over blade trajectories. This process ultimately creates decorative patterns characterized by a distinctive dot-line-plane combination and an aesthetic appeal of \"disconnected cuts with continuous intent.\" As described in \u003cem\u003eNational Intangible Cultural Heritage of Jingchu\u003c/em\u003e by Zuo Shanghong et al., the broken knife technique addresses linear and planar pattern details characterized by elongated or highly curved lines (e.g., plant stems/leaves, long flower petals, ribbons, tree branches, and wave patterns). Due to limitations in craftsmanship and tools, this unique method involves segmenting the engraved patterns into multiple sections and connecting them sequentially through \"bridging segments\" (过桥)\u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e9\u003c/span\u003e]\u003c/sup\u003e. The \"disconnected cuts with continuous intent\" stencil-engraving technique requires shortening line lengths while maintaining pattern continuity, ensuring compact lines during paste application to prevent leakage. This process thereby creates the characteristic combination of continuous dots and broken lines unique to traditional blue calico.\u003c/p\u003e\n\u003cp\u003eAdditionally, patterns are also one of the artistic characteristics of blue calico. Patterns on blue calico predominantly feature scattered flowers, twining flowers, and checked patterns. Layouts typically employ either a scattered dot arrangement or an all-over four-directional continuous design. Depending on fabric structure, additional motifs may include rare birds and mythical beasts, landscapes and figures, geometric patterns, and auspicious characters\u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e11\u003c/span\u003e]\u003c/sup\u003e. The formation of blue calico's pattern system is influenced by economic foundations, local customs, and living habits, culminating in a distinctive artistic style. The Jiangnan region, relying on a mature commercial economy and refined craftsmanship traditions, developed diverse forms and exquisite stencil-engraving techniques, resulting in elegant shapes and intricate decorative motifs. Blue calico from the Hunan and Hubei region exhibits distinct Chu cultural characteristics, using dots as the basis to outline patterns or form blocky surfaces, resulting in visually rich and decorative effects. Meanwhile, northern Chinese patterns primarily feature leek flower and eggplant flower motifs, characterized by simple and bold designs.\u003c/p\u003e\n\u003cp\u003eBlue calico takes natural indigo as its foundational hue, blending simple yet elegant colors with patterns featuring \"interlacing voids and solids, disconnected cuts with continuous intent.\" This synthesis not only exemplifies the implicit and introverted aesthetic charm of Oriental artistry but also carries millennia-old cultural heritage. Through its unique visual attributes and stylistic features, blue calico offers novel insights for contemporary printing and dyeing design.\u003c/p\u003e"},{"header":"III. Method","content":"\u003cp\u003e\u003cstrong\u003e3.1 Foundation Model Construction\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe Latent Diffusion Model (LDM), serving as the core architecture for generative image content models, demonstrates significant advantages in terms of image content diversity and creativity. Developed by the CompVis research group at the University of Munich, the Stable Diffusion model operates based on a progressive noise transformation mechanism, implementing iterative approximation from normal distribution to target data distribution via Markov chains\u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e6\u003c/span\u003e]\u003c/sup\u003e. Its core architecture is depicted in Fig.\u0026nbsp;3: first, the encoder (E) of the perceptual compression model maps input images into a latent space. Random normal distribution noise is then added to the latent space over a Markov chain of length \"T\". Next, conditionally denoising autoencoders driven by text-image semantic cross-attention mechanisms perform iterative noise reduction. Finally, the decoder (D) reconstructs pixel-level images. The perceptual compression model extracts high-frequency details that are less perceptually significant to human vision from the data, resulting in a low-dimensional latent space that substantially improves training efficiency\u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e7\u003c/span\u003e]\u003c/sup\u003e.\u003c/p\u003e\n\u003cp\u003eThe underlying logic of diffusion models and the broad generality of large models lead to high randomness in generated content, resulting in different outputs even when the same semantic prompts and generation parameters are used. However, this stochastic property also enables the generation of diverse and inspirational reference images for blue calico patterns using identical prompts during the pattern generation process. The Stable Diffusion 1.5 architecture adopted in this experiment, while possessing robust general generation capabilities that meet basic pattern creation needs, exhibits significant limitations when addressing the high specificity and unconventional expression requirements of traditional craftsmanship. To overcome this technical bottleneck, it is urgent to explore an effective solution that leverages image generation technology to effectively transfer the core features of blue calico patterns into generated content.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e3.2 Transfer Model Construction\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe core structure of the Stable Diffusion Model is made up of two main components: the Text Encoder and the Noise Predictor. In the current leading optimization approaches, the Textual Inversion technology, which is used for fine - tuning large models in downstream applications, only makes limited adjustments to the text encoder\u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e8\u003c/span\u003e]\u003c/sup\u003e. In contrast, advanced methods such as Dream Booth adopt a full - parameter joint optimization strategy. However, this approach requires simultaneous modification of the parameter spaces of both the text encoder and the noise predictor. As a result, the training complexity grows exponentially.\u003c/p\u003e\n\u003cp\u003eTraditional full-parameter fine-tuning methods require global adjustments to original weight matrices, while the introduction of LoRA (Low-Rank Adaptation) technology provides an innovative parameter optimization approach. LoRA freezes pre-trained model weights and injects trainable layers\u0026mdash;rank-decomposed matrices\u0026mdash;into each Transformer block. This approach reduces required training parameters while maintaining learning efficacy. As shown in Fig.\u0026nbsp;4, the architecture establishes a parallel branch alongside the original pre-trained language model (PLM), simulating the so-called intrinsic rank through dimensionality reduction followed by reconstruction. Specifically, assuming the pre-trained parameter weights are denoted as W0, the model formula during fine-tuning phase is:\u003c/p\u003e\n\u003cdiv id=\"Equa\" class=\"Equation\"\u003e\n\u003cdiv id=\"FileID_Equa\" class=\"mathdisplay\"\u003e$$\\:h={W}_{0}+\\varDelta\\:W$$\u003c/div\u003e\n\u003c/div\u003e\n\u003cp\u003eIn the formula, \u0026Delta;W represents the magnitude of weight updates during fine-tuning. For LoRA, only \u0026Delta;W needs to be fine-tuned, which is factorized into matrix BA. Here, B corresponds to the product of rank and rows in the original model matrix, while A corresponds to the product of rank and columns. The parameter count of rank is significantly smaller than the parameters requiring updates in the original model. Consequently, during the forward pass, both W₀ and \u0026Delta;W are multiplied by the same input x, and the final additive formula is:\u003c/p\u003e\n\u003cdiv id=\"Equb\" class=\"Equation\"\u003e\n\u003cdiv id=\"FileID_Equb\" class=\"mathdisplay\"\u003e$$\\:h={W}_{0X}+\\varDelta\\:{W}_{X}={W}_{0X}+{BA}_{X}$$\u003c/div\u003e\n\u003c/div\u003e\n\u003cp\u003eLoRA introduces minor modifications to the critical cross-attention layers of the Stable Diffusion model. By freezing the pre-trained model weights, it injects trainable factorized matrices into the transformation model. The factorized matrices contain significantly fewer parameters compared to the original model's matrices. By leveraging this approach, adjusting domain-specific features of the latent diffusion model in practical applications requires only a training dataset of a dozen or so images. This enables high-quality and efficient training of blue calico style models using limited data resources, thus facilitating rapid and flexible implementation of targeted style transfer.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e3.3 Control Model Construction\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eDuring the image generation process of Stable Diffusion, the high generalization capacity of large models often results in a high proportion of stylistically divergent image outputs under identical prompts, making precise control over image composition or object morphology challenging. To address this, a research team at Stanford University proposed the \"ControlNet\" architecture\u0026mdash;a novel end-to-end neural network framework designed to condition diffusion models for task-specific input requirements.\u003c/p\u003e\n\u003cp\u003eControlNet achieves generative process control by introducing additional conditioning factors\u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e4\u003c/span\u003e]\u003c/sup\u003e. In style transfer applications, it can utilize a depth map preprocessor to extract depth maps as control inputs for enforcing predefined shapes. The architecture modulates individual neural network blocks (Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e5\u003c/span\u003e). Without ControlNet, the diffusion model's native neural network F maps input x to output y using parameters \u0026Theta;, following the formulation:\u003c/p\u003e\n\u003cdiv id=\"Equc\" class=\"Equation\"\u003e\n\u003cdiv id=\"FileID_Equc\" class=\"mathdisplay\"\u003e$$\\:y=F(x;\\varTheta\\:)$$\u003c/div\u003e\n\u003c/div\u003e\n\u003cp\u003eIn ControlNet (Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e6\u003c/span\u003e), the architecture freezes the U-Net encoder of Stable Diffusion while copying the original model weights into two submodules: a \"trainable copy\" and a \"frozen copy,\" while connecting them via \"Zeroconvolution\". The frozen copy maintains parameter immutability throughout training, whereas the trainable copy undergoes conditional fine-tuning through applied control signals. This architecture precisely confines adaptive training of data information within the trainable module, effectively mitigating the catastrophic forgetting commonly observed in traditional fine-tuning approaches. The core operation can be formalized as:\u003c/p\u003e\n\u003cdiv id=\"Equd\" class=\"Equation\"\u003e\n\u003cdiv id=\"FileID_Equd\" class=\"mathdisplay\"\u003e$$\\:{y}_{c}=F(x;\\varTheta\\:)+Z(F\\left(x+Z\\left(c;{\\varTheta\\:}_{z1}\\right);{\\varTheta\\:}_{c}\\right);{\\varTheta\\:}_{z2})$$\u003c/div\u003e\n\u003c/div\u003e\n\u003cp\u003eIn this formulation, \u0026lsquo;Z\u0026rsquo; denotes Zeroconvolution\u0026mdash;a 1\u0026times;1 convolution with initialized weights and biases set to zero. Here, \u0026Theta;z1 and \u0026Theta;z2 represent the parameters of the two Zeroconvolution layers, while \u0026lsquo;c\u0026rsquo; corresponds to the input control signal. The control signal undergoes feature transformation through the Zeroconvolution layers, and the transformed result is superimposed onto the original input before being fed into ControlNet's \"frozen copy\" module. The module output, after undergoing subsequent Zeroconvolution processing, is added to the output features of the original network.\u003c/p\u003e\n\u003cp\u003eDuring the model initialization phase, since all weight parameters of the Zeroconvolution layers are initialized to zero, the feature stream passing through two Zeroconvolution operations maintains an all-zero output. This design not only enables flexible integration of control signals but also strictly preserves the knowledge integrity of the original network, laying a reliable foundation for subsequent progressive fine-tuning.\u003c/p\u003e"},{"header":"IV. Results","content":"\u003cp\u003e\u003cstrong\u003e4.1 LoRA Model Training and Data Preparation\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e4.1.1 Experimental Environment and Training Parameter Setup\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe experimental environment for model training and image generation is as follows: CPU: 12th Gen Intel\u0026reg; Core\u0026trade; i5\u0026ndash;12490F, GPU: NVIDIA GeForce RTX 4060 Ti 8G, System: Windows 10, Programming language: Python. The SD1.5 model is selected as the base model. The training image set consists of 60 image materials with a resolution of 512\u0026times;512 pixels. The batch size is 1, the maximum number of training epochs (Max Train Epochs) is 30. The learning rate for the unet (Unte_lr) is 1e \u0026minus;\u0026thinsp;4 (0.0001), and the learning rate for the text encoder (Text Encoder lr) is 1e \u0026minus;\u0026thinsp;5 (0.00001). The optimizer type is AdamW8. The trained model is saved every 3 epochs.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e4.1.2 Model Training Dataset Material Preparation\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eBased on the previous analysis of blue calico patterns, it is evident that blue calico is characterized by its blue-and-white color scheme and the \"disconnected cuts with continuous intent\" expression influenced by the \"segmented cutting\" technique. Therefore, when collecting training dataset materials, image resources that align with these characteristics should be prioritized. Simultaneously, to achieve model training precision, preprocessing of selected images is necessary. Using Photoshop software, the blue calico patterns are preprocessed by segmenting visual elements to minimize interference from unnecessary image information, ensuring pattern accuracy.\u003c/p\u003e\n\u003cp\u003eIn the prompt composition section, centered on aspects such as visual effects, pattern content, and craft characteristics, tags are written for training dataset images using the software dataset tag editor. Prompts are derived through reverse engineering of image materials, with manual adjustment applied to specific tags based on visual analysis.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e4.2 Model Validation and Result Analysis\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e4.2.1 Repeat and Epochs Impact on Transferability\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eDuring LoRA model training, the relationship between image repetition in the training dataset and model performance exhibits a non-linear association. Basic theory posits that increased repetition typically enhances model sensitivity to specific samples, while prolonging training cycles (Epochs) drives loss function values toward the minimum, thereby enhancing model robustness.\u003c/p\u003e\n\u003cp\u003eHowever, in LoRA models using the mean squared error loss function, when training epochs exceed the critical threshold, the risk of overfitting increases significantly. As shown in Fig.\u0026nbsp;7, when the Repeat setting for in-dataset images is increased to 20 iterations, the model's learning of blue calico style continues to improve with increasing Epochs. At the Epochs\u0026thinsp;=\u0026thinsp;3 stage, generated images are dominated by the descriptive content of prompt words, with minimal blue calico style information. As training epochs increase, the proportion of pattern style information gradually increases. At Epochs\u0026thinsp;=\u0026thinsp;15, the generated patterns primarily exhibit blue calico style. However, when the Repeat setting is increased to 24, weakening of prompt word recognition is observed in the model. At Repeat\u0026thinsp;=\u0026thinsp;30, the model's generalization capability fails, and the emergence of training dataset image duplication confirms that excessive training leads to overfitting in the LoRA model, causing the training to enter a negative progression state of generating content replication.\u003c/p\u003e\n\u003cp\u003eThrough comparative experiments, it was found that in the training of the blue calico style model, when Repeat\u0026thinsp;=\u0026thinsp;20 for each image material and Repeat is set between 12 and 21, this range can not only fully preserve the color characteristics of \"blue - and - white alternation\" and the craftsmanship feature of \"disconnected lines with continuous meaning\", but also enrich the inspiration samples of the blue calico style through the analysis of prompt words.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e4.2.2 Influence of Label Input on Style Transfer\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis study verifies the impact of keyword input methods on the rendering effects of automatically generated images through controlled variables, and uses this as a basis to identify optimal keywords for generating images that better reflect blue calico characteristics.\u003c/p\u003e\n\u003cp\u003eDuring the training process, three control model groups\u0026mdash;Lan A, Lan B, and Lan C\u0026mdash;were established:\u003c/p\u003e\n\u003cp\u003eThe LoRA A model in Lan A added the label word \"huabu\" to all training images while removing other label words, ensuring the uniqueness of \"huabu\" in the training dataset labels;The LoRA B model in Lan B added \"huabu\" to the original label words; whereas the LoRA C model in Lan C, which served as the experimental control, did not add any label words.\u003c/p\u003e\n\u003cp\u003eIn the generation experiment, the shared prompt words were: \"A bird, spread wings, blue and white color scheme, clear structure, indigo cotton background, retro blue printed fabric, hand cut template texture, no human presence.\" Additionally, all three model groups generated images with the \"huabu\" trigger label word appended to the shared prompt.\u003c/p\u003e\n\u003cp\u003eAs shown in Fig.\u0026nbsp;8, when the Lan A and Lan B models appended the \"huabu\" trigger label word, the visual effects and craftsmanship features of blue calico patterns were significantly enhanced in the generated images. Not only did the main elements partially replicate the sample characteristics of the dataset, but they also aligned with the unique aesthetic principle of \"disconnected lines with continuous meaning\" inherent to this craft. In contrast, although the generation results without prompt words retained some blue calico features, they inevitably incorporated reflections of semantic information from other prompt words, leading to the dilution of the craft's special aesthetic appeal.\u003c/p\u003e\n\u003cp\u003eThe control group Lan C did not respond to the \"huabu\" trigger label word. Although the generated images retained certain blue calico pattern features in terms of color and elements, they more flexibly reflected the descriptive semantic information from other keywords. Experimental data indicate that the addition of trigger label words significantly impacts the style of generated images during LoRA model training. However, whether to preserve the uniqueness of label semantics also has a notable influence on generated content. While adopting a uniform labeling system can enhance the expression of blue calico style features, it reduces the model's generalization flexibility. This approach yields more ideal output results in specific style-controlled scenarios.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e4.2.3 Influence of the Control Model on Style Transfer\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe LoRA model for blue calico demonstrates varying effectiveness in image generation tasks. For instance, it exhibits strong representational capabilities for traditional patterns with simple structures such as birds, animals, and flowers, achieving high alignment between generated results and predefined objectives. However, during the generation of complex multi-element patterns, the model manifests insufficient decoupling capability.\u003c/p\u003e\n\u003cp\u003eTake the human figure pattern holding a lotus as an example. When the \"huabu\" model trigger label word and three primary prompt words\u0026mdash;\"a person, stand, hold a lotus\"\u0026mdash;are inputted, the generated image after LoRA fine-tuning is shown in Fig.\u0026nbsp;9. Although the generated images retain blue calico features and exhibit basic form recognition, there are significant deficiencies in detail rendering. The fusion of element boundaries among the three motifs results in reduced subject recognition, with the aesthetic appeal and precision of detail rendering in the patterns significantly lower than those of simple object generation.\u003c/p\u003e\n\u003cp\u003eIn response to the above issues, the introduction of ControlNet enables further control over generated patterns. By leveraging depth information extracted from reference images, it effectively regulates the elemental structure and local detail features during the blue calico pattern generation process. The ControlNet-depth control model incorporates four distinct computational methods\u0026mdash;leres, lere++, midas, and zone\u0026mdash;each differing in their emphasis on processing image depth information. As shown in Fig.\u0026nbsp;10, leres and lere\u0026thinsp;+\u0026thinsp;+\u0026thinsp;yield richer details, while midas and zone exhibit higher contrast. Higher detail levels correspond to denser structural frameworks, requiring more content reproduction and potentially generating extraneous output less relevant to the current validation. Therefore, a high-contrast depth_zone was selected for this validation. As shown in Fig.\u0026nbsp;11, by using \"huabu, a person, stand, hold a lotus\" and the depth map as joint conditional inputs, it is possible to generate patterns that adhere to the blue calico style while achieving more precise element boundaries and structures. This experimental result validates the effectiveness of ControlNet in controlling image structure and enhancing blue calico style transfer.\u003c/p\u003e"},{"header":"V. Conclusion","content":"\u003cp\u003eAs a significant representative of China\u0026rsquo;s intangible cultural heritage, Huixie Blue Printing embodies dual value in cultural heritage preservation and technological innovation through its digital inheritance and style transfer research. Addressing the digital conservation needs of traditional craftsmanship, this study presents an innovative approach to Huixie Blue Printing style transfer via LoRA model fine-tuning. By adapting LoRA\u0026mdash;originally designed for large language model fine-tuning\u0026mdash;to the Stable Diffusion model, and integrating text prompts, LoRA models, and ControlNet-depth constraints, the proposed method enables stable generation of digital works that harmonize traditional Huixie Blue Printing stylistic features with modern design aesthetics. This provides a replicable technical framework for the dynamic inheritance of intangible cultural heritage techniques.\u003c/p\u003e \u003cp\u003eAlthough current intelligent generation systems still face limitations in pattern detail reconstruction, iterative advancements in deep learning will enable higher precision in cultural element extraction and style control. The \"AI\u0026thinsp;+\u0026thinsp;intangible cultural heritage\" approach is emerging as an effective pathway for profound integration of technology and craftsmanship, which will play an increasingly significant role in the digital inheritance of intangible cultural heritage.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eAuthor Contributions\u003c/strong\u003e\u003cstrong\u003e：\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eYu Yong ,and Jin Hanlin.\" Research on Lacquer Art Molding Techniques under the Influence of Digital Technology .\" Journal of Chinese Lacquer 43.01(2024):23-26+52.doi:10.19334/j.cnki.issn.1000-7067.2024.01.006.\u003c/p\u003e\n\u003cp\u003eJin Hanlin.\" Image Analysis of Yangjiabu \"Theatrical Scenes\" New Year Pictures: Taking *Borrowing Arrows with Straw Boats* as an Example.\" Couplets 29.06(2023):40-43.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eData Availability Statement:\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eAll data generated or analyzed during this study are included in this published article.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConflict of Interest Statement:\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAuthor:\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eJIN Hanlin* [email protected]\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eHu, E. J. et al. \u003cem\u003eLoRA: Low-Rank Adaptation Large Lang. Models\u003c/em\u003e (2021).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhang, L. et al. Adding Conditional Control to Diffusion Models with Reinforcement Learning. (2024).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCHEN et al. Ancient Printing Technology of Hanfu. \u003cem\u003eJ. Cloth. Res.\u003c/em\u003e \u003cb\u003e3\u003c/b\u003e (1), 7 (2018).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKarras, T., Laine, S., Aila, T. \u0026amp; Recognition, P. A Style-Based Generator Architecture for Generative Adversarial Networks, 2019 IEEE/CVF Conference on Computer Vision and (CVPR), Long Beach, CA, USA, pp. 4396\u0026ndash;4405, (2019). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1109/CVPR.2019.00453\u003c/span\u003e\u003cspan address=\"10.1109/CVPR.2019.00453\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDuth\u0026eacute;, G. et al. Flexible Multi-Fidelity Framework for Load Estimation of Wind Farms through Graph Neural Networks and Transfer Learning. \u003cem\u003eData-Centric Eng.\u003c/em\u003e \u003cb\u003e5\u003c/b\u003e, e29 (2024). Web.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLi Guo, Z., Tiandu \u0026amp; Zhiwei, X. On the Eve of the Technological Revolution: Architectural and Scene Design Innovation under the Wave of Generative AI Tools. \u003cem\u003eChin. Overseas Archit.\u003c/em\u003e \u003cb\u003e9\u003c/b\u003e, 24\u0026ndash;28 (2023).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRombach, R. et al. \u003cem\u003eHigh-Resolution Image Synthesis Latent Diffus. Models\u003c/em\u003e (2021).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRuiz, N. et al. DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation. (2022).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYoucheng, T. A. N. G. et al. Application analysis of automatic batching system for intelligent leather production. \u003cem\u003eLeather Sci. Eng.\u003c/em\u003e \u003cb\u003e34\u003c/b\u003e (2), 30\u0026ndash;36 (2024).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZuo Shanghong. et al. \u003cem\u003eNatl. Intangible Cult. Herit. Jingchu Hubei People\u0026rsquo;s Publishing House\u003c/em\u003e, :6. (2008).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYE You-xin. \u003cem\u003eShandong Folk Indigo Printed Cloth. Ji' nan\u003c/em\u003e19 (Shandong Fine Arts Publishing House, 1986).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDosovitskiy, A., Springenberg, I. T. \u0026amp; Brox, T. Learning to generate chairs with convolutional neural networks. \u003cem\u003eCoRR\u003c/em\u003e, (2014).\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Artificial Intelligence, Blue Calico, Stable Diffusion, LoRA Model, ControlNet","lastPublishedDoi":"10.21203/rs.3.rs-6422102/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-6422102/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eArtificial intelligence (AI) technologies are increasingly permeating various industries, and technological advancements have also brought new opportunities to traditional craftsmanship. This study focuses on the application of artificial intelligence-generated content (AIGC) to assist traditional printing and dyeing techniques, aiming to address the generation of new patterns for blue calico by constructing an end-to-end low-loss pattern generation framework. Through analyzing the stylistic characteristics of blue calico patterns and comparing how the Low-Rank Adaptation (LoRA) model optimizes the base model of Stable Diffusion, as well as the structural control effects of ControlNet on images, this paper ultimately establishes a single-pattern generation pathway for blue calico. This pathway integrates multi-model control based on LoRA and ControlNet with the generation mechanism of the Stable Diffusion model. Validation results demonstrate that patterns generated using this approach expand design content while preserving the intrinsic features of blue calico, thereby providing a digitally innovative solution that balances cultural heritage and technical innovation for the preservation of intangible cultural heritage in traditional printing and dyeing craftsmanship.\u003c/p\u003e","manuscriptTitle":"Controllable Generation of Single Blue Calico Patterns Using Stable Diffusion Model","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-05-23 08:49:08","doi":"10.21203/rs.3.rs-6422102/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"c9832932-24c8-49df-9d5b-afab6f540335","owner":[],"postedDate":"May 23rd, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[{"id":48821406,"name":"Physical sciences/Mathematics and computing/Computer science"},{"id":48821407,"name":"Physical sciences/Mathematics and computing/Information technology"}],"tags":[],"updatedAt":"2025-06-30T09:53:23+00:00","versionOfRecord":[],"versionCreatedAt":"2025-05-23 08:49:08","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-6422102","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-6422102","identity":"rs-6422102","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00