Fast Tracing Method for Electricity Marketing Sensitive Data Based on ProVOC | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Fast Tracing Method for Electricity Marketing Sensitive Data Based on ProVOC Jinkai Sun, Yulu Ren, Junwei Zhang, Xiaofang Chen This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-6338948/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract The power marketing business system contains a large amount of sensitive data such as customer information. These data may be leaked or misused when data deepening applications and sharing. The current existence of a variety of data security auditing programs is more or less flawed, unable to comprehensively rule out the risk of data leakage. This paper proposes a fast traceability method for power marketing sensitive data based on ProVOC model, identifying power marketing sensitive data from the data flowing through the network, designing a structured storage model for sensitive data based on China's ProVOC data traceability model standard, and then adopting blockchain technology to build a private Ether, generating a blockchain for data flow, reducing the storage space, and improving the speed of contract generation. Meanwhile, a light node data traceability and trusted verification method based on the Merkle Mountain Range (MMR) is designed to enable fast and reliable verification of sensitive data and accurate querying of complex verification information. The traceability experimental system, developed using this method, has been demonstrated and verified in a provincial subsidiary of the State Grid Corporation of China. Results show that the proposed method enables accurate and comprehensive traceability and auditing of sensitive business data in a non-intrusive manner, without impacting the business system, achieving an accuracy rate of up to 100%. Additionally, it reduces storage pressure on light nodes and significantly enhances processing efficiency, meeting the practical demands of production. Sensitive Data electricity marketing Blockchain smart contracts ProVOC Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 1. Introduction The customer-oriented power marketing business system contains a substantial amount of sensitive data, such as customer information, electricity consumption, electricity tariffs, and meter asset details. These sensitive data are crucial role in the daily operation of electric power enterprises and must be shared across various business departments. However, in the process of marketing data sharing, the absence of effective access tracking mechanisms, coupled with the prevalence of information security risks, increases the likelihood of data leakage or misuse. Currently, there are several technical solutions in the market for data security issues in the traffic audit category, such as database auditing systems, web application auditing systems, and API security auditing systems, etc. Although these systems have improved data security protection capabilities to a certain extent, they are unable to comprehensively cover all protocol types, leading to auditing and monitoring blind spots and the risk of data leakage. Furthermore, although the solution based on Network Traffic Analysis (NTA) comprehensively analyze protocol types, they are ineffective in organizing business assets, has a lot of low-value traffic, and exhibit minimal correlation between core capabilities and enterprise operations [1–4] . While Network Data Loss Protection (NDLP) focuses on behavioral monitoring and analysis, the protocol types it identifies are not comprehensive enough, and it is unable to focus on data as the core of data security governance [5–7] . These shortcomings prevent interoperability between different security solutions, cannot form a unified data security management capability, which in turn affects the overall security protection effectiveness. To address the shortcomings of the above approaches, this paper proposes a fast traceability method for sensitive power marketing data based on ProVOC (Provenance Vocabulary Model) model, which solves the issues related to missing access trace tracking and information security risks during marketing data sharing, enables full-link control and tracking of sensitive data, ensuring both security and integrity throughout the data transmission process. Since its development in the 1990s, data traceability technology has gradually become an important research direction in data management. Its core value lies in recording the evolutionary information and specific processes of data throughout its life cycle ensuring its completeness, accuracy, and credibility [8–10] . The method used to describe data traceability information serves as the foundation of data traceability management. Internationally, for example, the W3 model proposed by P. Buneman et al [11] and the W7 model further extended by S. Ram et al [12] provide the theoretical basis for the expression data traceability information. China is at the forefront of data traceability research and has established the national standard “Information Technology- Data Traceability Description Model” (GB/T 34945 − 2017), which proposes a data traceability description model, ProVOC, for standardizing the expression of data traceability information and providing a structured framework for the practice of data traceability [13] . However, traditional data traceability methods continue to face significant challenges in practical applications. For example, high storage and management costs for traceability information, low efficiency in traceability queries and verification, and insufficient standardization and interoperability. To address these problems, researchers are exploring new technologies and methodologies, with blockchain technology has become a new favorite in data traceability due to its unique characteristics of decentralization, immutability, and inherent traceability. Blockchain technology enhances data security and integrity during transmission and storage through distributed ledgers and consensus mechanisms [14] . In data traceability, blockchain technology can record every data operation creating an immutable data chain, that makes the source, transmission path and final destination of the data clearly visible. These attributes establish blockchain one of the ideal technologies to implement data traceability. 2. A fast traceability method for sensitive business data based on ProVOC modeling 2.1. Processing(methodology) Building on existing research on blockchain-based data traceability methods, designed to address the unique characteristics of sensitive business data in the electric power marketing system and its flow across various business departments, Fast Tracing Method for Electricity Marketing Sensitive Data Based on ProVOC Model. This method enables the full-link control and traceability of marketing business sensitive data. The process is shown in Fig. 1 . By capturing, parsing, and identifying sensitive data—such as customer information, electricity volume, electricity bills, OEI, and meter asset information—within the marketing business application system, we propose a structured model based on the ProVOC Model. This model establishes a Power Marketing Sensitive Information Storage Model that aligns with ProVOC standards and enables the storage of traceability information. To enhance the efficiency of trusted verification, we introduce a Light Node Data Traceability Trusted Verification Method based on the Merkle Mountain Range (MMR). This method improves verification efficiency, provides traceability functions in the event of data security risks, and enables comprehensive auditing of sensitive data operations within the marketing department. 2.2. Structured Model Based on ProVOC Model(Structured Model Based on ProVOC) The vast volume of power marketing business data that needs to be shared is so huge that it is impractical to apply blockchain technology for all the data, hash it and save its credentials. Although blockchain can ensure absolute traceability and enhance security, the cost is huge. A substantial portion of the data in power marketing consists of non-sensitive information, which not only accounts for a large share of the total data, but also increases the complexity of technical processing. This leads to expand processing times and an exponential expansion of storage requirements. Therefore, this paper proposes a structured storage model based on ProVOC model (Structured Model Based on ProVOC), which is integrated with the marketing business, performs sensitivity analysis on the shared marketing data to analyzes the business-sensitive data. Based on the ProVOC model standard, it designs the data traceability descriptive model for the sensitive business data of electric power marketing and stores the Traceability information. 2.3. Data Traceability Trusted Verification Method with Light Node Based on MMR(Merkle Mountain Range, MMR) For the traceability description information generated by the structured storage model based on the ProVOC model, this paper applies blockchain technology to design a Data Traceability Trusted Verification Method with Light Node Based on Merkle Mountain Range (MMR), establish a private domain Ethereum network, construct light nodes, and introduce the Merkle Mountain Range commitment mechanism into the traditional Ethernet mechanism framework [15], which reduces querying. Data Traceability Trusted Verification Method with Light Node Based on MMR implements the Merkle Mountain Range commitment mechanism on the basis of the traditional Ethereum mechanism[16], reduce the storage space required for querying and verifying the traceability information, improve the storage efficiency of the data traceability model, and make the traceability model more lightweight; establishes a Data Traceability Information Verification Mechanism, which enables rapid tracking and validation of sensitive data in marketing business and ensures data security. Light nodes are a widely used practical technique in blockchain. Building a traceability light node independently from the blockchain network eliminates the need to maintain blockchain information locally. The light node only accesses the blockchain network for information synchronization when querying, uploading, or verifying the final traceability information. Traditional light nodes must to maintain complete block header information locally when querying and verifying the trustworthiness and integrity values of data traceability information, resulting in light nodes seriously occupying a large amount of storage resources and increasing the memory occupancy rate. To overcome this drawback, this paper proposes the above method(Data Traceability Trusted Verification Method with Light Node Based on MMR), which allows light nodes to verify the final traceability information stored on the blockchain without maintaining detailed block header data, greatly reducing the storage pressure of light nodes. To enable programmatic verification of light nodes for data traceability locally, Merkle mountain range proofs are introduced. MMR is a variant of Merkle tree proposed by Peter Tod. Similar to Merkle trees, the values of Merkle mountain nodes—except for leaf nodes—are hashes of their left and right child nodes, This allows Merkle mountains can also provide Merkle mountain range proofs (MMR Proofs), which similar to Merkle proofs for proving whether a leaf node exists in a Merkle mountain range. However, unlike a Merkle tree, which is a perfect binary tree, a Merkle mountain range consists of multiple perfect binary trees and allows for structural imperfections. MMR is designed to be in append-only mode, meaning that once nodes are inserted, they remain unchanged, and dynamic insertions do not require reconstructing the entire structure. This feature of the Merkle mountain makes MMR particularly suitable for committing to the full block header of a blockchain, as blockchain blocks are also appended sequentially. By constructing an MMR where the leaf nodes consist of all blockheads, it is possible to prove that a block exists in the blockchain using the MMR proof mechanism. In this paper, the light node verification of traceability information is divided into two parts: transaction inclusion proof and block inclusion proof. The block inclusion proof utilizes the Merkle mountain range to commit blocks in the blockchain, and a block can be possible to verify whether a specific block exists in the blockchain efficiently. Transaction inclusion proofs utilize Merkle trees to commit transactions within a block to prove whether a transaction exists in the block. When applied to a data traceability system, transaction inclusion proofs then utilize Merkle tree-based Merkle proofs to determine whether a piece of traceability data exists in a Merkle tree, and thus whether the traceability data exists within the block that contains the corresponding Merkle tree. The purpose of a block inclusion proof is to prove that a block is included in the blockchain. In traditional Simplified Payment Verification(SPV) block containment inclusion is proven by downloading the complete block header information from the genesis block to the latest block. The verification process then checks whether the block at the specified height matches the block to be verified. If they are equal, then the block to be verified is included in the blockchain. However this verification method requires maintaining the complete block header information and occupies too much resources. To address this issue, this paper introduces the unique Merkle Mountain Commitment method to reduce storage efficiency. Specifically, a root hash of Merkle mountains is added to the block header of Ethereum block header. All the block headers from the genesis block to the previous block of the latest block are stitched together to form the MMR, and the root hash of this commitment mechanism is written into the hash value field of the latest block header. As a result, in this new generation mechanism, any change in the content of any previous block header will cause the value of the root hash value to change. After the block header is committed using the commitment mechanism, the Merkle Mountain Proofs can be employed to efficiently verify the existence of a block within the blockchain. The query verification process of the Merkle Mountain Range (MMR) based trusted verification method for light node data traceability is shown in Fig. 2 shown The method proposed in this paper ensures that a light node does not need to store all the block header information when verifying the credibility and integrity of the data traceability information. Instead, it only needs to send a lookup request to the blockchain network. Upon receiving the lookup command from the light node, a full node in the blockchain network searches for the corresponding traceability information and generates both a transaction existence proof and a block existence proof; Then the full node packages the data traceability information, block existence proof and transaction existence proof into a data traceability packet, which is returned to the user who initiated the request; the user must to synchronize the latest block header by connecting to the blockchain network while initiating the lookup command in order to ensure the latest root hash of the Merkle mountain range, which is used for final block containment validation; the light node, upon receiving the traceability data packet from the full node, will generate the corresponding transaction existence proof and block existence proof. Once the light node receives the traceability packet, it can complete the verification of the trustworthiness and integrity of the data using the provided proofs, without requiring full blockchain storage. 3. Application Scenario Building The application of the above methodology can be demonstrated by constructing a corresponding system within an electric utility and conducting a pilot validation. 3.1. System architecture A layered design approach can be constructed as shown in the following figure.3. The whole system is divided into three layers: data traceability information storage layer, ProVOC layer and data traceability transaction layer. 3.1.1. Data Traceability Storage Layer In the data traceability system, the traceability information must be stored in both the blockchain and a relational database. The storage layer plays a role in a sustainable storage environment for the data traceability information. Serialized traceability information is stored on the blockchain to ensure data integrity and trustworthiness verification for data traceability information. The system utilizes the widely adopted Ethereum technology in the industry as the underlying blockchain technology of the system, and adds a Merkle mountain range commitment mechanism into Ethereum, thus reducing the storage space required for light nodes to verify the traceability information. The storage layer is designed based on a blockchain-based data traceability storage model to store traceability information. The structure is shown in Fig. 4 . Ethereum The Merkle Mountain Range (MMR) Commitment Mechanism is introduced into the Ethereum framework to reduce the storage resources required by light nodes for verifying data traceability information. To achieve this, the system employs two proof mechanisms: MMR proof and Merkle proof. Integrating the Merkle Mountain Range Commitment Mechanism into Ethereum requires enhancements in three key areas: (1) Adding a Root Hash (Root) field to the block header for MMR integration, (2) Modifying the block encapsulation process so that miners calculate and include the latest Root field when generating new blocks, and (3) Incorporating a Root field legitimacy verification step into Ethereum’s block validation mechanism. Database The system provides fast and efficient query service for data traceability information. It establishes database tables based on the ProVOC model, stores the traceability information described by the ProVOC model, storing traceability information in a structured format within a relational database. This approach supports the query requirements of the patent traceability business. The database stores the basic components of the ProVOC model, including relationships, traceability documents, and storage locations of traceability data in the blockchain, along with other relevant metadata. 3.1.2. ProVOC Layer The data traceability information is transformed into the finally obtained serialized traceability information and structured traceability information through the ProVOC model at this layer. Structured traceability information is stored in the database and serialized traceability information is stored in the blockchain. The traceability system integrates structured modeling and serialization scheme of the existing ProVOC model into the storage model. First, the traceability information uploaded by the user is converted to JSON format by the serialization module of the ProVOC model; This data is then uploaded to the blockchain, while the system monitors the blockchain in real time for new traceability records. Once the system detects and confirms that new traceability information has been packaged into a block and written to the blockchain, the structured modeling module of the ProVOC model converts the traceability information into structured data, which is subsequently stored in a relational database. The following mainly introduces the process of uploading data traceability information and data traceability information into the database. Data traceability information on the chain Traceability information uploading is uploading the information obtained from data traceability to the blockchain for anchoring. Essentially, his process writes the traceability information into blockchain blocks as part of the transaction data. The process of uploading data traceability information proceeds as follows: First, the traceability information is serialized into JSON format using the ProVOC model. Then, it is recorded on the blockchain through a series of smart protocols. considering the lightweight requirements, this paper utilizes the blockchain technology as a trusted storage environment for traceability information, without leveraging smart contracts to handle complex business logic. So the blockchain intelligent protocols also only contain the the logic of traceability information on the chain. Data Traceability Information Entry The data traceability information is stored in a database to facilitate complex operations, with the database serving as a replica of the traceability information recorded on the blockchain. The process of storing data traceability information in the database is shown in Fig. 5 . When the data traceability system detects new traceability information written into the blockchain network, it first retrieves the newly recorded data. Subsequently, the system processes and structures this information using the ProVOC model before storing it in the database. The database records the complete traceability information, including all traceability components and their relationships, to support complex user queries. 3.1.3. Data Traceability Transaction Layer This layer provides essential business functions to end users by encapsulating up linking, querying, and verification operations within dedicated functional modules. These modules are exposed to the upper-level data traceability system through API interfaces. The system offers four primary functional modules: login module, data traceability submission module, data traceability query module, and user management module. Among these, the data traceability submission and query modules serve as the core functionalities of the system. Data Traceability Submission Module The main function of this module is to upload data traceability information into the blockchain and synchronize the information into the database of the traceability system. The data traceability submission process follows these steps: a data traceability information is a traceability document composed of ProVOC model components and relationship. When submitting data traceability information, you need to first create a traceability document as a container for storing ProVOC model components and relationships; then, upload the ProVOC model components and relationships in turn; at the same time the system will serialize the traceability information into JSON format and return it to the user (JSON format is a lightweight data interaction format, which is easy to circulate in different programming languages); after that, the user needs to log into the Ethereum wallet and give authorization to the traceability application, and then invoke the smart contract through the Ethereum wallet account to achieve the submission of the traceability information; when the system listens to the fact that the traceability information has been written into the blockchain, it will synchronize and write the traceability information to the database. So as to facilitate the subsequent generation of the final data traceability information verification package. Data Traceability Query Module The main function of this module is to return the data traceability information that meets the query conditions. As the blockchain is not structurally suited for overly complex lookup operations, it needs to be implemented with the help of a database. The structured data of uplinked traceability information is stored in the database. Whenever the system receives new traceability data, the module synchronizes the information with the database. However, due to network delays and other factors, the traceability data in the database may not always be current. To ensure the real-time availability of traceability data, it is essential to verify that the latest traceability information in the database matches that in the blockchain before each query operation. If inconsistencies are detected in the local database, the most recent traceability data from the blockchain must be retrieved and synchronized with the database. Once data consistency is ensured, the query is performed based on the user's criteria, and the results are converted into JSON format and returned to the user. The process is illustrated in the Fig. 6 below. 3.2. Experiment The developed power marketing sensitive business data traceability system is an efficient and non-intrusive solution that accurately captures marketing-related traffic by deploying physical switches to mirror traffic at network exit nodes. This approach avoids interference with business systems and databases, ensuring that the network structure and business operations remain unaffected. By utilizing traffic filtering and content parsing technologies, the system captures only network traffic related to marketing business, reducing the processing burden of non-essential data. Through deep packet inspection, the system analyzes the captured packets to automatically identify and extract sensitive power marketing data, such as customer information, power consumption, electricity tariffs, and meter asset information, ensuring 100% identification of sensitive information. Based on the system's ProVOC model, traceability information compliant with China's ProVOC standard is generated with a 100% qualification rate. The generated information is securely stored in both the blockchain and the database, ensuring a 100% traceability accuracy rate during experiments on sensitive data that has undergone multiple transmissions. Additionally, the customizable query function for traceability information is implemented, seamlessly integrating the credibility of data traceability with functional richness. To address the issue of frequent and large-scale maintenance of block header information in traditional Ethernet technology, practical verification shows that the system developed based on the method proposed in this paper outperforms traditional Ethernet processing. However, the extent of the performance enhancement depends on factors such as the number of distributed nodes, transaction frequency, the volume of sensitive data, and other variables, resulting in significant variability. Overall, this comprehensive data security monitoring system offers non-intrusive deployment, intelligent identification and parsing, real-time warning and response, and full traceability and auditing, ensuring robust data security for the marketing department. It effectively reduces the risk of data leakage while enhancing the response speed and processing efficiency of security incidents, thereby laying a strong foundation for enterprise data security. 4. Conclusion This paper proposes a fast traceability method for power marketing sensitive data based on the ProVOC model, addressing the challenges of determining which data traceability information needs to be recorded and the format in which it should be stored. The developed storage model stores traceability information in both the blockchain and a type-specific database. The blockchain stores data traceability information for trusted verification, while the database handles complex query requirements for traceability, providing a hybrid approach that offers different forms of data traceability tailored to specific scenarios. This method is better aligned with the practical needs of production. Meanwhile, traditional light nodes, when querying and verifying the trustworthiness and integrity of data traceability information, must locally maintain complete block header information, consuming significant storage resources. The method proposed in this paper eliminates the need for light nodes to store block header information, thereby significantly reducing storage pressure and improving the overall system’s efficiency. Declarations Funding Declaration This work was supported by the Science and Technology Project of the State Grid Shanxi Electric Power Company (Grant No. 52051L230005), titled "Research and Application of Full Link Control and Tracking Technology for Sensitive Business Data Based on Fluorescent Labeling. The authors declare no competing financial interests or personal relationships that could influence the work reported in this paper. Author Contribution J.S, Y.R, and J.Z wrote the main manuscript text, and X.C prepared figures 1-7. All authors reviewed the manuscript. ACKNOWLEDGMENT This paper is supported by the Science and Technology Project of the State Grid Shanxi Electric Power Company under grant, named Research and Application of Full Link Control and Tracking Technology for Sensitive Business Data Based on Fluorescent Labeling, No. 52051L230005. Data Availability Our data is currently unavailable for public use as it contains a large amount of private user information. References Jiachen Zhang,Daoqi Han,Zhaoxuan Lv,Yueming Lu,Junke Duan,Yang Liu & Xinyu Zhang.(2025).Bag2image: a multi-instance network traffic representation for network security event prediction.Cybersecurity,8(1),31-31. Dinglin Gu,Jian Zhang,Zhangguo Tang,Qizhen Li,Min Zhu,Hao Yan & Huanzhou Li.(2024).IoT device identification based on network traffic.Wireless Networks,31(2),1-17. Olusola Olabanjo,Ashiribo Wusu,Edwin Aigbokhan,Olufemi Olabanjo,Oseni Afisi & Boluwaji Akinnuwesi.(2024).A novel graph convolutional networks model for an intelligent network traffic analysis and classification.International Journal of Information Technology,(prepublish),1-13. Muneeb Hassan Khan,Abdul Rehman Javed,Zafar Iqbal,Muhammad Asim & Ali Ismail Awad.(2024).DivaCAN: Detecting in-vehicle intrusion attacks on a controller area network using ensemble learning.Computers & Security,139,103712-. Bhanu Priyanka Valluri & Nitin Sharma.(2024).Trusted head node for Node Behaviour Analysis for malicious node detection in wireless sensor networks.Measurement: Sensors,36,101159-101159. Niala den Braber,Carlijn I R Braem,Miriam M R Vollenbroek Hutten,Hermie J Hermens,Thomas Urgert,Utku S Yavuz... & Gozewijn D Laverman.(2024).Consequences of Data Loss on Clinical Decision-Making in Continuous Glucose Monitoring: Retrospective Cohort Study..Interactive journal of medical research,13,e50849. Xiao Wang,Jianwei Xia,Jun e Feng & Shihua Fu.(2024).Robust stability of Boolean networks with data loss and disturbance inputs.Neural Networks,179,106504-106504. Zhiyuan Wang,Mingan Gao & Gehao Lu.(2024).Research on Oracle Technology Based on Multi-Threshold Aggregate Signature Algorithm and Enhanced Trustworthy Oracle Reputation Mechanism.Sensors,24(2), Muhammad Bin Saif,Sara Migliorini & Fausto Spoto.(2024).Efficient and Secure Distributed Data Storage and Retrieval Using Interplanetary File System and Blockchain.Future Internet,16(3),98-. Zhihua Cheng,Lingchao Gao,Zhouchun Lei,Zhenyu Chen,Xiangzhou Chen,Jiakai Wang & Jiasong Sun.(2020).Data LifeCycle Management Method Research Based on Traceability Technology.(eds.) A. Ohori & P. Buneman.(1989).Static type inference for parametric classes.(eds.)Department of Computer and Information Science, University of Pennsylvania, 200 South 33rd Street, Philadelphia, PA;;Department of Computer and Information Science, University of Pennsylvania, 200 South 33rd Street, Philadelphia, PA Ram S., Liu J.A New Perspective on Semantics of Data Provenance[C]//International Conference on Semantic Web in Provenance Management.CEUR-WS.org, 2009. Russell Miller,Harvey Whelan,Michael Chrubasik,David Whittaker,Paul Duncan & João Gregório.(2024).A Framework for Current and New Data Quality Dimensions: An Overview.Data,9(12),151-151. Vipina Valsan,Naga Sushanth Kumar Vuppala,Sri Sai Harshith Koganti,Likhit Sai Eswar Kalla,Kumar Aditya Pappala,Kanakasabapathy P. & Maneesha V. Ramesh.(2025).Conceptual study—Artificial intelligence-integrated blockchain micromarkets for sustainable energy.Renewable and Sustainable Energy Reviews,214,115482-115482. Tran Thai Hoa,Thanh Manh Le & Cuong H. Nguyen Dinh.(2025).Hybrid model of 1D-CNN and LSTM for forecasting Ethereum closing prices: a case study of temporal analysis.International Journal of Information Technology,(prepublish),1-13. Oleksandr Kuznetsov,Emanuele Frontoni,Kateryna Kuznetsova & Marco Arnesano.(2025).Optimizing Merkle Proof Size Through Path Length Analysis: A Probabilistic Framework for Efficient Blockchain State Verification.Future Internet,17(2),72-72. Additional Declarations No competing interests reported. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-6338948","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":453242367,"identity":"dd5ebdf4-b426-482b-8dc2-3f51b186de6c","order_by":0,"name":"Jinkai Sun","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA/UlEQVRIiWNgGAWjYDADAyA+8KHinxwDMwlaGB/OOHPAmCQtzMa8bQcSGwipNGc/fOwxD4OdvblE7jNpHrY76fPbeQ9+YKixicalxbInLd2YhyE5ceeMdDPJOTzPcjcc5kuWYDiWlovLOoMbPGbSPAzMCQY30tgk3kgw525g5jGQYGw4jEcL/zeglnp7sBYeA+Z0+WYe4x/4tfCwAbUcZtxwI43ZkCfhcALDYR4z/LacSTM3nMNwPHHDmWfAQD6QZrgBqMUiAZ9fjh9+9uANQ7W9wfE0hgMf/9nIy/efMb7xocYGpxYgYGNg/IculoBbOUTLKBgFo2AUjAK8AAC+0FXdeQv2agAAAABJRU5ErkJggg==","orcid":"","institution":"Marketing Service Center of State Grid Shanxi Electric Power Company","correspondingAuthor":true,"prefix":"","firstName":"Jinkai","middleName":"","lastName":"Sun","suffix":""},{"id":453242368,"identity":"f185e798-b87b-463a-be89-6287214a2be3","order_by":1,"name":"Yulu Ren","email":"","orcid":"","institution":"Marketing Service Center of State Grid Shanxi Electric Power Company","correspondingAuthor":false,"prefix":"","firstName":"Yulu","middleName":"","lastName":"Ren","suffix":""},{"id":453242369,"identity":"457d450d-b563-4ec6-8ead-8c0546bd17a2","order_by":2,"name":"Junwei Zhang","email":"","orcid":"","institution":"Marketing Service Center of State Grid Shanxi Electric Power Company","correspondingAuthor":false,"prefix":"","firstName":"Junwei","middleName":"","lastName":"Zhang","suffix":""},{"id":453242371,"identity":"3cee52f0-6702-45bc-bc6a-962231fc3f77","order_by":3,"name":"Xiaofang Chen","email":"","orcid":"","institution":"Marketing Service Center of State Grid Shanxi Electric Power Company","correspondingAuthor":false,"prefix":"","firstName":"Xiaofang","middleName":"","lastName":"Chen","suffix":""}],"badges":[],"createdAt":"2025-03-30 14:23:16","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-6338948/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-6338948/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":82532894,"identity":"e84bb101-bc21-4238-82e5-a3df56b3ff98","added_by":"auto","created_at":"2025-05-12 15:05:03","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":103700,"visible":true,"origin":"","legend":"\u003cp\u003eFast Tracing Method for Electricity Marketing Sensitive Data Based on ProVOC Processing Flowchart\u003c/p\u003e","description":"","filename":"image1.png","url":"https://assets-eu.researchsquare.com/files/rs-6338948/v1/ce62d1881865d96defc45b10.png"},{"id":82534287,"identity":"eebde83b-cefb-464b-a8ef-0ee9677f4228","added_by":"auto","created_at":"2025-05-12 15:21:03","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":78249,"visible":true,"origin":"","legend":"\u003cp\u003eData Traceability Trusted Query Program Flow\u003c/p\u003e","description":"","filename":"image2.png","url":"https://assets-eu.researchsquare.com/files/rs-6338948/v1/968fd59f6f20a7cea6a56834.png"},{"id":82532896,"identity":"d623efa3-a2b6-41a0-83c4-220e5ff956c2","added_by":"auto","created_at":"2025-05-12 15:05:03","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":120283,"visible":true,"origin":"","legend":"\u003cp\u003eSystem Functional Architecture\u003c/p\u003e","description":"","filename":"image3.png","url":"https://assets-eu.researchsquare.com/files/rs-6338948/v1/229c25247335197f59617042.png"},{"id":82532899,"identity":"a760179c-6d9f-4960-9cf4-3b7302131b8e","added_by":"auto","created_at":"2025-05-12 15:05:03","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":110882,"visible":true,"origin":"","legend":"\u003cp\u003eEthereum structural improvements\u003c/p\u003e","description":"","filename":"image4.png","url":"https://assets-eu.researchsquare.com/files/rs-6338948/v1/78820e4428b81abf8c5177bd.png"},{"id":82533988,"identity":"34c2494d-6921-4c02-b969-8e2c244f4269","added_by":"auto","created_at":"2025-05-12 15:13:03","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":37986,"visible":true,"origin":"","legend":"\u003cp\u003eData Traceability Final Information Entry\u003c/p\u003e","description":"","filename":"image5.png","url":"https://assets-eu.researchsquare.com/files/rs-6338948/v1/25df13daf16be68d89a750cc.png"},{"id":82534288,"identity":"e6ac360c-990b-485e-b245-f342dff29a75","added_by":"auto","created_at":"2025-05-12 15:21:03","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":106092,"visible":true,"origin":"","legend":"\u003cp\u003eFlowchart of traceability information query\u003c/p\u003e","description":"","filename":"image6.png","url":"https://assets-eu.researchsquare.com/files/rs-6338948/v1/f9ef8fb9de55babb496c84e5.png"},{"id":82532916,"identity":"a59094bc-5be6-40af-a719-74d4eaf73b00","added_by":"auto","created_at":"2025-05-12 15:05:04","extension":"png","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":50976,"visible":true,"origin":"","legend":"\u003cp\u003eDeployment diagram\u003c/p\u003e","description":"","filename":"image7.png","url":"https://assets-eu.researchsquare.com/files/rs-6338948/v1/43b3a16444908a9f784d533c.png"},{"id":105190294,"identity":"eecd30a2-f59f-4dc8-8671-a916e18f5d97","added_by":"auto","created_at":"2026-03-23 09:14:15","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1194092,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-6338948/v1/b386a4fe-1b30-4465-b1a9-a75559e226df.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Fast Tracing Method for Electricity Marketing Sensitive Data Based on ProVOC","fulltext":[{"header":"1. Introduction","content":"\u003cp\u003eThe customer-oriented power marketing business system contains a substantial amount of sensitive data, such as customer information, electricity consumption, electricity tariffs, and meter asset details. These sensitive data are crucial role in the daily operation of electric power enterprises and must be shared across various business departments. However, in the process of marketing data sharing, the absence of effective access tracking mechanisms, coupled with the prevalence of information security risks, increases the likelihood of data leakage or misuse. Currently, there are several technical solutions in the market for data security issues in the traffic audit category, such as database auditing systems, web application auditing systems, and API security auditing systems, etc. Although these systems have improved data security protection capabilities to a certain extent, they are unable to comprehensively cover all protocol types, leading to auditing and monitoring blind spots and the risk of data leakage.\u003c/p\u003e \u003cp\u003eFurthermore, although the solution based on Network Traffic Analysis (NTA) comprehensively analyze protocol types, they are ineffective in organizing business assets, has a lot of low-value traffic, and exhibit minimal correlation between core capabilities and enterprise operations \u003csup\u003e[1\u0026ndash;4]\u003c/sup\u003e. While Network Data Loss Protection (NDLP) focuses on behavioral monitoring and analysis, the protocol types it identifies are not comprehensive enough, and it is unable to focus on data as the core of data security governance \u003csup\u003e[5\u0026ndash;7]\u003c/sup\u003e. These shortcomings prevent interoperability between different security solutions, cannot form a unified data security management capability, which in turn affects the overall security protection effectiveness.\u003c/p\u003e \u003cp\u003eTo address the shortcomings of the above approaches, this paper proposes a fast traceability method for sensitive power marketing data based on ProVOC (Provenance Vocabulary Model) model, which solves the issues related to missing access trace tracking and information security risks during marketing data sharing, enables full-link control and tracking of sensitive data, ensuring both security and integrity throughout the data transmission process.\u003c/p\u003e \u003cp\u003eSince its development in the 1990s, data traceability technology has gradually become an important research direction in data management. Its core value lies in recording the evolutionary information and specific processes of data throughout its life cycle ensuring its completeness, accuracy, and credibility \u003csup\u003e[8\u0026ndash;10]\u003c/sup\u003e. The method used to describe data traceability information serves as the foundation of data traceability management. Internationally, for example, the W3 model proposed by P. Buneman et al [11] and the W7 model further extended by S. Ram et al [12] provide the theoretical basis for the expression data traceability information. China is at the forefront of data traceability research and has established the national standard \u0026ldquo;Information Technology- Data Traceability Description Model\u0026rdquo; (GB/T 34945\u0026thinsp;\u0026minus;\u0026thinsp;2017), which proposes a data traceability description model, ProVOC, for standardizing the expression of data traceability information and providing a structured framework for the practice of data traceability \u003csup\u003e[13]\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eHowever, traditional data traceability methods continue to face significant challenges in practical applications. For example, high storage and management costs for traceability information, low efficiency in traceability queries and verification, and insufficient standardization and interoperability. To address these problems, researchers are exploring new technologies and methodologies, with blockchain technology has become a new favorite in data traceability due to its unique characteristics of decentralization, immutability, and inherent traceability.\u003c/p\u003e \u003cp\u003eBlockchain technology enhances data security and integrity during transmission and storage through distributed ledgers and consensus mechanisms \u003csup\u003e[14]\u003c/sup\u003e. In data traceability, blockchain technology can record every data operation creating an immutable data chain, that makes the source, transmission path and final destination of the data clearly visible. These attributes establish blockchain one of the ideal technologies to implement data traceability.\u003c/p\u003e"},{"header":"2. A fast traceability method for sensitive business data based on ProVOC modeling","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003e2.1. Processing(methodology)\u003c/h2\u003e \u003cp\u003eBuilding on existing research on blockchain-based data traceability methods, designed to address the unique characteristics of sensitive business data in the electric power marketing system and its flow across various business departments, Fast Tracing Method for Electricity Marketing Sensitive Data Based on ProVOC Model. This method enables the full-link control and traceability of marketing business sensitive data. The process is shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eBy capturing, parsing, and identifying sensitive data\u0026mdash;such as customer information, electricity volume, electricity bills, OEI, and meter asset information\u0026mdash;within the marketing business application system, we propose a structured model based on the ProVOC Model. This model establishes a Power Marketing Sensitive Information Storage Model that aligns with ProVOC standards and enables the storage of traceability information. To enhance the efficiency of trusted verification, we introduce a Light Node Data Traceability Trusted Verification Method based on the Merkle Mountain Range (MMR). This method improves verification efficiency, provides traceability functions in the event of data security risks, and enables comprehensive auditing of sensitive data operations within the marketing department.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec4\" class=\"Section2\"\u003e \u003ch2\u003e2.2. Structured Model Based on ProVOC Model(Structured Model Based on ProVOC)\u003c/h2\u003e \u003cp\u003eThe vast volume of power marketing business data that needs to be shared is so huge that it is impractical to apply blockchain technology for all the data, hash it and save its credentials. Although blockchain can ensure absolute traceability and enhance security, the cost is huge. A substantial portion of the data in power marketing consists of non-sensitive information, which not only accounts for a large share of the total data, but also increases the complexity of technical processing. This leads to expand processing times and an exponential expansion of storage requirements. Therefore, this paper proposes a structured storage model based on ProVOC model (Structured Model Based on ProVOC), which is integrated with the marketing business, performs sensitivity analysis on the shared marketing data to analyzes the business-sensitive data. Based on the ProVOC model standard, it designs the data traceability descriptive model for the sensitive business data of electric power marketing and stores the Traceability information.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec5\" class=\"Section2\"\u003e \u003ch2\u003e2.3. Data Traceability Trusted Verification Method with Light Node Based on MMR(Merkle Mountain Range, MMR)\u003c/h2\u003e \u003cp\u003eFor the traceability description information generated by the structured storage model based on the ProVOC model, this paper applies blockchain technology to design a Data Traceability Trusted Verification Method with Light Node Based on Merkle Mountain Range (MMR), establish a private domain Ethereum network, construct light nodes, and introduce the Merkle Mountain Range commitment mechanism into the traditional Ethernet mechanism framework [15], which reduces querying. Data Traceability Trusted Verification Method with Light Node Based on MMR implements the Merkle Mountain Range commitment mechanism on the basis of the traditional Ethereum mechanism[16], reduce the storage space required for querying and verifying the traceability information, improve the storage efficiency of the data traceability model, and make the traceability model more lightweight; establishes a Data Traceability Information Verification Mechanism, which enables rapid tracking and validation of sensitive data in marketing business and ensures data security.\u003c/p\u003e \u003cp\u003eLight nodes are a widely used practical technique in blockchain. Building a traceability light node independently from the blockchain network eliminates the need to maintain blockchain information locally. The light node only accesses the blockchain network for information synchronization when querying, uploading, or verifying the final traceability information. Traditional light nodes must to maintain complete block header information locally when querying and verifying the trustworthiness and integrity values of data traceability information, resulting in light nodes seriously occupying a large amount of storage resources and increasing the memory occupancy rate. To overcome this drawback, this paper proposes the above method(Data Traceability Trusted Verification Method with Light Node Based on MMR), which allows light nodes to verify the final traceability information stored on the blockchain without maintaining detailed block header data, greatly reducing the storage pressure of light nodes.\u003c/p\u003e \u003cp\u003eTo enable programmatic verification of light nodes for data traceability locally, Merkle mountain range proofs are introduced. MMR is a variant of Merkle tree proposed by Peter Tod. Similar to Merkle trees, the values of Merkle mountain nodes\u0026mdash;except for leaf nodes\u0026mdash;are hashes of their left and right child nodes, This allows Merkle mountains can also provide Merkle mountain range proofs (MMR Proofs), which similar to Merkle proofs for proving whether a leaf node exists in a Merkle mountain range. However, unlike a Merkle tree, which is a perfect binary tree, a Merkle mountain range consists of multiple perfect binary trees and allows for structural imperfections. MMR is designed to be in append-only mode, meaning that once nodes are inserted, they remain unchanged, and dynamic insertions do not require reconstructing the entire structure. This feature of the Merkle mountain makes MMR particularly suitable for committing to the full block header of a blockchain, as blockchain blocks are also appended sequentially. By constructing an MMR where the leaf nodes consist of all blockheads, it is possible to prove that a block exists in the blockchain using the MMR proof mechanism.\u003c/p\u003e \u003cp\u003eIn this paper, the light node verification of traceability information is divided into two parts: transaction inclusion proof and block inclusion proof. The block inclusion proof utilizes the Merkle mountain range to commit blocks in the blockchain, and a block can be possible to verify whether a specific block exists in the blockchain efficiently.\u003c/p\u003e \u003cp\u003eTransaction inclusion proofs utilize Merkle trees to commit transactions within a block to prove whether a transaction exists in the block. When applied to a data traceability system, transaction inclusion proofs then utilize Merkle tree-based Merkle proofs to determine whether a piece of traceability data exists in a Merkle tree, and thus whether the traceability data exists within the block that contains the corresponding Merkle tree.\u003c/p\u003e \u003cp\u003eThe purpose of a block inclusion proof is to prove that a block is included in the blockchain. In traditional Simplified Payment Verification(SPV) block containment inclusion is proven by downloading the complete block header information from the genesis block to the latest block. The verification process then checks whether the block at the specified height matches the block to be verified. If they are equal, then the block to be verified is included in the blockchain. However this verification method requires maintaining the complete block header information and occupies too much resources. To address this issue, this paper introduces the unique Merkle Mountain Commitment method to reduce storage efficiency. Specifically, a root hash of Merkle mountains is added to the block header of Ethereum block header. All the block headers from the genesis block to the previous block of the latest block are stitched together to form the MMR, and the root hash of this commitment mechanism is written into the hash value field of the latest block header. As a result, in this new generation mechanism, any change in the content of any previous block header will cause the value of the root hash value to change. After the block header is committed using the commitment mechanism, the Merkle Mountain Proofs can be employed to efficiently verify the existence of a block within the blockchain.\u003c/p\u003e \u003cp\u003eThe query verification process of the Merkle Mountain Range (MMR) based trusted verification method for light node data traceability is shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e shown\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eThe method proposed in this paper ensures that a light node does not need to store all the block header information when verifying the credibility and integrity of the data traceability information. Instead, it only needs to send a lookup request to the blockchain network. Upon receiving the lookup command from the light node, a full node in the blockchain network searches for the corresponding traceability information and generates both a transaction existence proof and a block existence proof; Then the full node packages the data traceability information, block existence proof and transaction existence proof into a data traceability packet, which is returned to the user who initiated the request; the user must to synchronize the latest block header by connecting to the blockchain network while initiating the lookup command in order to ensure the latest root hash of the Merkle mountain range, which is used for final block containment validation; the light node, upon receiving the traceability data packet from the full node, will generate the corresponding transaction existence proof and block existence proof. Once the light node receives the traceability packet, it can complete the verification of the trustworthiness and integrity of the data using the provided proofs, without requiring full blockchain storage.\u003c/p\u003e \u003c/div\u003e"},{"header":"3. Application Scenario Building","content":"\u003cp\u003eThe application of the above methodology can be demonstrated by constructing a corresponding system within an electric utility and conducting a pilot validation.\u003c/p\u003e \u003cdiv id=\"Sec7\" class=\"Section2\"\u003e \u003ch2\u003e3.1. System architecture\u003c/h2\u003e \u003cp\u003eA layered design approach can be constructed as shown in the following figure.3.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eThe whole system is divided into three layers: data traceability information storage layer, ProVOC layer and data traceability transaction layer.\u003c/p\u003e \u003cdiv id=\"Sec8\" class=\"Section3\"\u003e \u003ch2\u003e3.1.1. Data Traceability Storage Layer\u003c/h2\u003e \u003cp\u003eIn the data traceability system, the traceability information must be stored in both the blockchain and a relational database. The storage layer plays a role in a sustainable storage environment for the data traceability information.\u003c/p\u003e \u003cp\u003eSerialized traceability information is stored on the blockchain to ensure data integrity and trustworthiness verification for data traceability information. The system utilizes the widely adopted Ethereum technology in the industry as the underlying blockchain technology of the system, and adds a Merkle mountain range commitment mechanism into Ethereum, thus reducing the storage space required for light nodes to verify the traceability information.\u003c/p\u003e \u003cp\u003eThe storage layer is designed based on a blockchain-based data traceability storage model to store traceability information. The structure is shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eEthereum\u003c/b\u003e \u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003cp\u003eThe Merkle Mountain Range (MMR) Commitment Mechanism is introduced into the Ethereum framework to reduce the storage resources required by light nodes for verifying data traceability information. To achieve this, the system employs two proof mechanisms: MMR proof and Merkle proof. Integrating the Merkle Mountain Range Commitment Mechanism into Ethereum requires enhancements in three key areas: (1) Adding a Root Hash (Root) field to the block header for MMR integration, (2) Modifying the block encapsulation process so that miners calculate and include the latest Root field when generating new blocks, and (3) Incorporating a Root field legitimacy verification step into Ethereum\u0026rsquo;s block validation mechanism.\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eDatabase\u003c/b\u003e \u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003cp\u003eThe system provides fast and efficient query service for data traceability information. It establishes database tables based on the ProVOC model, stores the traceability information described by the ProVOC model, storing traceability information in a structured format within a relational database. This approach supports the query requirements of the patent traceability business. The database stores the basic components of the ProVOC model, including relationships, traceability documents, and storage locations of traceability data in the blockchain, along with other relevant metadata.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec9\" class=\"Section3\"\u003e \u003ch2\u003e3.1.2. ProVOC Layer\u003c/h2\u003e \u003cp\u003eThe data traceability information is transformed into the finally obtained serialized traceability information and structured traceability information through the ProVOC model at this layer. Structured traceability information is stored in the database and serialized traceability information is stored in the blockchain.\u003c/p\u003e \u003cp\u003eThe traceability system integrates structured modeling and serialization scheme of the existing ProVOC model into the storage model. First, the traceability information uploaded by the user is converted to JSON format by the serialization module of the ProVOC model; This data is then uploaded to the blockchain, while the system monitors the blockchain in real time for new traceability records. Once the system detects and confirms that new traceability information has been packaged into a block and written to the blockchain, the structured modeling module of the ProVOC model converts the traceability information into structured data, which is subsequently stored in a relational database.\u003c/p\u003e \u003cp\u003eThe following mainly introduces the process of uploading data traceability information and data traceability information into the database.\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eData traceability information on the chain\u003c/b\u003e \u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003cp\u003eTraceability information uploading is uploading the information obtained from data traceability to the blockchain for anchoring. Essentially, his process writes the traceability information into blockchain blocks as part of the transaction data.\u003c/p\u003e \u003cp\u003eThe process of uploading data traceability information proceeds as follows: First, the traceability information is serialized into JSON format using the ProVOC model. Then, it is recorded on the blockchain through a series of smart protocols. considering the lightweight requirements, this paper utilizes the blockchain technology as a trusted storage environment for traceability information, without leveraging smart contracts to handle complex business logic. So the blockchain intelligent protocols also only contain the the logic of traceability information on the chain.\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eData Traceability Information Entry\u003c/b\u003e \u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003cp\u003eThe data traceability information is stored in a database to facilitate complex operations, with the database serving as a replica of the traceability information recorded on the blockchain.\u003c/p\u003e \u003cp\u003eThe process of storing data traceability information in the database is shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003e.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eWhen the data traceability system detects new traceability information written into the blockchain network, it first retrieves the newly recorded data. Subsequently, the system processes and structures this information using the ProVOC model before storing it in the database. The database records the complete traceability information, including all traceability components and their relationships, to support complex user queries.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec10\" class=\"Section3\"\u003e \u003ch2\u003e3.1.3. Data Traceability Transaction Layer\u003c/h2\u003e \u003cp\u003eThis layer provides essential business functions to end users by encapsulating up linking, querying, and verification operations within dedicated functional modules. These modules are exposed to the upper-level data traceability system through API interfaces. The system offers four primary functional modules: login module, data traceability submission module, data traceability query module, and user management module. Among these, the data traceability submission and query modules serve as the core functionalities of the system.\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eData Traceability Submission Module\u003c/b\u003e \u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003cp\u003eThe main function of this module is to upload data traceability information into the blockchain and synchronize the information into the database of the traceability system.\u003c/p\u003e \u003cp\u003eThe data traceability submission process follows these steps: a data traceability information is a traceability document composed of ProVOC model components and relationship. When submitting data traceability information, you need to first create a traceability document as a container for storing ProVOC model components and relationships; then, upload the ProVOC model components and relationships in turn; at the same time the system will serialize the traceability information into JSON format and return it to the user (JSON format is a lightweight data interaction format, which is easy to circulate in different programming languages); after that, the user needs to log into the Ethereum wallet and give authorization to the traceability application, and then invoke the smart contract through the Ethereum wallet account to achieve the submission of the traceability information; when the system listens to the fact that the traceability information has been written into the blockchain, it will synchronize and write the traceability information to the database. So as to facilitate the subsequent generation of the final data traceability information verification package.\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eData Traceability Query Module\u003c/b\u003e \u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003cp\u003eThe main function of this module is to return the data traceability information that meets the query conditions.\u003c/p\u003e \u003cp\u003eAs the blockchain is not structurally suited for overly complex lookup operations, it needs to be implemented with the help of a database.\u003c/p\u003e \u003cp\u003eThe structured data of uplinked traceability information is stored in the database. Whenever the system receives new traceability data, the module synchronizes the information with the database. However, due to network delays and other factors, the traceability data in the database may not always be current. To ensure the real-time availability of traceability data, it is essential to verify that the latest traceability information in the database matches that in the blockchain before each query operation. If inconsistencies are detected in the local database, the most recent traceability data from the blockchain must be retrieved and synchronized with the database. Once data consistency is ensured, the query is performed based on the user's criteria, and the results are converted into JSON format and returned to the user. The process is illustrated in the Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003e below.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv id=\"Sec11\" class=\"Section2\"\u003e \u003ch2\u003e3.2. Experiment\u003c/h2\u003e \u003cp\u003eThe developed power marketing sensitive business data traceability system is an efficient and non-intrusive solution that accurately captures marketing-related traffic by deploying physical switches to mirror traffic at network exit nodes. This approach avoids interference with business systems and databases, ensuring that the network structure and business operations remain unaffected.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eBy utilizing traffic filtering and content parsing technologies, the system captures only network traffic related to marketing business, reducing the processing burden of non-essential data. Through deep packet inspection, the system analyzes the captured packets to automatically identify and extract sensitive power marketing data, such as customer information, power consumption, electricity tariffs, and meter asset information, ensuring 100% identification of sensitive information. Based on the system's ProVOC model, traceability information compliant with China's ProVOC standard is generated with a 100% qualification rate. The generated information is securely stored in both the blockchain and the database, ensuring a 100% traceability accuracy rate during experiments on sensitive data that has undergone multiple transmissions. Additionally, the customizable query function for traceability information is implemented, seamlessly integrating the credibility of data traceability with functional richness.\u003c/p\u003e \u003cp\u003eTo address the issue of frequent and large-scale maintenance of block header information in traditional Ethernet technology, practical verification shows that the system developed based on the method proposed in this paper outperforms traditional Ethernet processing. However, the extent of the performance enhancement depends on factors such as the number of distributed nodes, transaction frequency, the volume of sensitive data, and other variables, resulting in significant variability.\u003c/p\u003e \u003cp\u003eOverall, this comprehensive data security monitoring system offers non-intrusive deployment, intelligent identification and parsing, real-time warning and response, and full traceability and auditing, ensuring robust data security for the marketing department. It effectively reduces the risk of data leakage while enhancing the response speed and processing efficiency of security incidents, thereby laying a strong foundation for enterprise data security.\u003c/p\u003e \u003c/div\u003e"},{"header":"4. Conclusion","content":"\u003cp\u003eThis paper proposes a fast traceability method for power marketing sensitive data based on the ProVOC model, addressing the challenges of determining which data traceability information needs to be recorded and the format in which it should be stored. The developed storage model stores traceability information in both the blockchain and a type-specific database. The blockchain stores data traceability information for trusted verification, while the database handles complex query requirements for traceability, providing a hybrid approach that offers different forms of data traceability tailored to specific scenarios. This method is better aligned with the practical needs of production.\u003c/p\u003e \u003cp\u003eMeanwhile, traditional light nodes, when querying and verifying the trustworthiness and integrity of data traceability information, must locally maintain complete block header information, consuming significant storage resources. The method proposed in this paper eliminates the need for light nodes to store block header information, thereby significantly reducing storage pressure and improving the overall system\u0026rsquo;s efficiency.\u003c/p\u003e"},{"header":"Declarations","content":"\u003ch2\u003eFunding Declaration\u003c/h2\u003e \u003cp\u003eThis work was supported by the Science and Technology Project of the State Grid Shanxi Electric Power Company (Grant No. 52051L230005), titled \"Research and Application of Full Link Control and Tracking Technology for Sensitive Business Data Based on Fluorescent Labeling. The authors declare no competing financial interests or personal relationships that could influence the work reported in this paper.\u003c/p\u003e\u003ch2\u003eAuthor Contribution\u003c/h2\u003e\u003cp\u003eJ.S, Y.R, and J.Z wrote the main manuscript text, and X.C prepared figures 1-7. All authors reviewed the manuscript.\u003c/p\u003e\u003ch2\u003eACKNOWLEDGMENT\u003c/h2\u003e \u003cp\u003eThis paper is supported by the Science and Technology Project of the State Grid Shanxi Electric Power Company under grant, named Research and Application of Full Link Control and Tracking Technology for Sensitive Business Data Based on Fluorescent Labeling, No. 52051L230005.\u003c/p\u003e\u003ch2\u003eData Availability\u003c/h2\u003e\u003cp\u003eOur data is currently unavailable for public use as it contains a large amount of private user information.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eJiachen Zhang,Daoqi Han,Zhaoxuan Lv,Yueming Lu,Junke Duan,Yang Liu \u0026amp; Xinyu Zhang.(2025).Bag2image: a multi-instance network traffic representation for network security event prediction.Cybersecurity,8(1),31-31.\u003c/li\u003e\n\u003cli\u003eDinglin Gu,Jian Zhang,Zhangguo Tang,Qizhen Li,Min Zhu,Hao Yan \u0026amp; Huanzhou Li.(2024).IoT device identification based on network traffic.Wireless Networks,31(2),1-17.\u003c/li\u003e\n\u003cli\u003eOlusola Olabanjo,Ashiribo Wusu,Edwin Aigbokhan,Olufemi Olabanjo,Oseni Afisi \u0026amp; Boluwaji Akinnuwesi.(2024).A novel graph convolutional networks model for an intelligent network traffic analysis and classification.International Journal of Information Technology,(prepublish),1-13.\u003c/li\u003e\n\u003cli\u003eMuneeb Hassan Khan,Abdul Rehman Javed,Zafar Iqbal,Muhammad Asim \u0026amp; Ali Ismail Awad.(2024).DivaCAN: Detecting in-vehicle intrusion attacks on a controller area network using ensemble learning.Computers \u0026amp; Security,139,103712-.\u003c/li\u003e\n\u003cli\u003eBhanu Priyanka Valluri \u0026amp; Nitin Sharma.(2024).Trusted head node for Node Behaviour Analysis for malicious node detection in wireless sensor networks.Measurement: Sensors,36,101159-101159.\u003c/li\u003e\n\u003cli\u003eNiala den Braber,Carlijn I R Braem,Miriam M R Vollenbroek Hutten,Hermie J Hermens,Thomas Urgert,Utku S Yavuz... \u0026amp; Gozewijn D Laverman.(2024).Consequences of Data Loss on Clinical Decision-Making in Continuous Glucose Monitoring: Retrospective Cohort Study..Interactive journal of medical research,13,e50849.\u003c/li\u003e\n\u003cli\u003eXiao Wang,Jianwei Xia,Jun e Feng \u0026amp; Shihua Fu.(2024).Robust stability of Boolean networks with data loss and disturbance inputs.Neural Networks,179,106504-106504.\u003c/li\u003e\n\u003cli\u003eZhiyuan Wang,Mingan Gao \u0026amp; Gehao Lu.(2024).Research on Oracle Technology Based on Multi-Threshold Aggregate Signature Algorithm and Enhanced Trustworthy Oracle Reputation Mechanism.Sensors,24(2),\u003c/li\u003e\n\u003cli\u003eMuhammad Bin Saif,Sara Migliorini \u0026amp; Fausto Spoto.(2024).Efficient and Secure Distributed Data Storage and Retrieval Using Interplanetary File System and Blockchain.Future Internet,16(3),98-.\u003c/li\u003e\n\u003cli\u003eZhihua Cheng,Lingchao Gao,Zhouchun Lei,Zhenyu Chen,Xiangzhou Chen,Jiakai Wang \u0026amp; Jiasong Sun.(2020).Data LifeCycle Management Method Research Based on Traceability Technology.(eds.)\u003c/li\u003e\n\u003cli\u003eA. Ohori \u0026amp; P. Buneman.(1989).Static type inference for parametric classes.(eds.)Department of Computer and Information Science, University of Pennsylvania, 200 South 33rd Street, Philadelphia, PA;;Department of Computer and Information Science, University of Pennsylvania, 200 South 33rd Street, Philadelphia, PA\u003c/li\u003e\n\u003cli\u003eRam S., Liu J.A New Perspective on Semantics of Data Provenance[C]//International Conference on Semantic Web in Provenance Management.CEUR-WS.org, 2009.\u003c/li\u003e\n\u003cli\u003eRussell Miller,Harvey Whelan,Michael Chrubasik,David Whittaker,Paul Duncan \u0026amp; Jo\u0026atilde;o Greg\u0026oacute;rio.(2024).A Framework for Current and New Data Quality Dimensions: An Overview.Data,9(12),151-151.\u003c/li\u003e\n\u003cli\u003eVipina Valsan,Naga Sushanth Kumar Vuppala,Sri Sai Harshith Koganti,Likhit Sai Eswar Kalla,Kumar Aditya Pappala,Kanakasabapathy P. \u0026amp; Maneesha V. Ramesh.(2025).Conceptual study\u0026mdash;Artificial intelligence-integrated blockchain micromarkets for sustainable energy.Renewable and Sustainable Energy Reviews,214,115482-115482.\u003c/li\u003e\n\u003cli\u003eTran Thai Hoa,Thanh Manh Le \u0026amp; Cuong H. Nguyen Dinh.(2025).Hybrid model of 1D-CNN and LSTM for forecasting Ethereum closing prices: a case study of temporal analysis.International Journal of Information Technology,(prepublish),1-13.\u003c/li\u003e\n\u003cli\u003eOleksandr Kuznetsov,Emanuele Frontoni,Kateryna Kuznetsova \u0026amp; Marco Arnesano.(2025).Optimizing Merkle Proof Size Through Path Length Analysis: A Probabilistic Framework for Efficient Blockchain State Verification.Future Internet,17(2),72-72.\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Sensitive Data, electricity marketing, Blockchain, smart contracts, ProVOC","lastPublishedDoi":"10.21203/rs.3.rs-6338948/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-6338948/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eThe power marketing business system contains a large amount of sensitive data such as customer information. These data may be leaked or misused when data deepening applications and sharing. The current existence of a variety of data security auditing programs is more or less flawed, unable to comprehensively rule out the risk of data leakage. This paper proposes a fast traceability method for power marketing sensitive data based on ProVOC model, identifying power marketing sensitive data from the data flowing through the network, designing a structured storage model for sensitive data based on China's ProVOC data traceability model standard, and then adopting blockchain technology to build a private Ether, generating a blockchain for data flow, reducing the storage space, and improving the speed of contract generation. Meanwhile, a light node data traceability and trusted verification method based on the Merkle Mountain Range (MMR) is designed to enable fast and reliable verification of sensitive data and accurate querying of complex verification information. The traceability experimental system, developed using this method, has been demonstrated and verified in a provincial subsidiary of the State Grid Corporation of China. Results show that the proposed method enables accurate and comprehensive traceability and auditing of sensitive business data in a non-intrusive manner, without impacting the business system, achieving an accuracy rate of up to 100%. Additionally, it reduces storage pressure on light nodes and significantly enhances processing efficiency, meeting the practical demands of production.\u003c/p\u003e","manuscriptTitle":"Fast Tracing Method for Electricity Marketing Sensitive Data Based on ProVOC","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-05-12 15:04:59","doi":"10.21203/rs.3.rs-6338948/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"a548cb58-e00b-424d-87b3-c54ee4230eef","owner":[],"postedDate":"May 12th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2026-03-23T09:12:19+00:00","versionOfRecord":[],"versionCreatedAt":"2025-05-12 15:04:59","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-6338948","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-6338948","identity":"rs-6338948","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.