The Role of Smart Electricity Meter Data Analysis in Driving Sustainable Development

preprint OA: closed
Full text JSON View at publisher
Full text 89,480 characters · extracted from preprint-html · click to expand
The Role of Smart Electricity Meter Data Analysis in Driving Sustainable Development | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article The Role of Smart Electricity Meter Data Analysis in Driving Sustainable Development Archana Y. Chaudhari, Shrada Chavan, Preeti Mulay This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-4837042/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract The sustainability of the electricity system is closely related to the analysis of smart electricity meter data, which plays an important role in enhancing energy management and overall grid operation. The widespread use of household smart meters generates a substantial volume of data, offering an opportunity to enhance overall energy management by analyzing household electricity usage data. However, when faced with an influx of new data, traditional clustering methods require re-clustering all the data from scratch, which can be computationally intensive. To address the challenge of handling the ever-increasing data, an incremental clustering algorithm proves to be the most suitable choice. Incremental learning, accomplished through incremental clustering, provides a straightforward and effective approach. In this research, the proposed Closeness-based Gaussian Mixture Incremental Clustering (CGMIC) Algorithm updates load patterns without relying on overall daily load curve clustering. The CGMIC algorithm first extracts load patterns from new data and then either intergrades the existing load patterns or forms new ones. Real-world electricity smart meter data, such as the IITB Indian Residential Energy Dataset, is utilized to validate the proposed system. The effectiveness of the proposed system is assessed using metrics like the silhouette score and Davis Bouldin index, employing the incremental K-means algorithm. The insight gained from the proposed system contribute directly to sustainable development goals. By effectively identifies changes in residential electricity consumption behavior, providing valuable insights for utility companies to optimize electricity load management. incremental learning smart meter data pattern recognition electricity management sustainable development Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 1. INTRODUCTION The world is rapidly shifting towards sustainable practices, and energy consumption is a key focal point. The Smart Grid (SG)[ 1 ], as a powerful tool in this transition, represents an advanced version of the electrical grid that efficiently delivers electricity from power plants to households and businesses in a smart and controlled manner [ 2 ]. One of its key features is the establishment of two-way communication between utility companies and their customers, with the integration of sensing technology along transmission lines enabling the grid to operate intelligently. Through SG, consumers have the ability to adjust their energy consumption patterns and behaviors based on the information, incentives, and disincentives they receive [ 3 ]. The benefits of SG encompass precise tracking of energy consumption, providing interpretable explanations for anomalies, and swift restoration of electricity following power disturbances. Furthermore, SG can bring advantages to utility providers by reducing the need for excessive electricity generation and minimizing the costs associated with installing new utility base stations. To ensure the efficient operation of the SG, the integration of various components and their collaborative work is crucial. This includes the utilization of smart sensors, communication system, devices, specialized processors to name a few [ 4 ]. These components form the foundation of the SG infrastructure, enabling seamless communication, intelligent decision-making, and effective management of electricity distribution throughout the grid. In SG, the implementation of Smart Electricity Meter (SEM) is one of the key technologies. SEMs record the fine-grained energy consumption (electricity load) of customers and will provide recorded information to the utility company for advanced measurement and control applications. As compared to a conventional electricity meter [ 3 ], an SEM has controls, automation, and communication unit give consumers better information and automatically report outages [ 4 ]. As the deployment of SEM is increasing, it generates a wealth of fine-grained incremental data to provide benefits to various stakeholders (Customers, Generation unit, Transmission and distribution unit) of power systems. However, such data are not useful without analytical power. The analytics solutions will be able to obtain valuable insight into large data generated by SEM. In power system domain, various data mining techniques are employed to analyze load data. Among these techniques, clustering stands out as a widely utilized and advantageous unsupervised method for organizing vast amounts of information and data in a meaningful way. The purpose is to enable decision-makers to effectively utilize this organized data for activities such as forecasting, assessment, and planning. Clustering finds extensive applicability in both numeric and text-based data, which is readily available and generated in real-time on a large scale, due to inventing of the Internet of Things (IoT), and other related techniques [ 5 ]. However, the traditional clustering algorithm reclusters all of the data from scratch whenever any influx of new data arrives. The incremental clustering approach is an essential way to solve the clustering of growing data. This incremental clustering (algorithmic analysis) system, deliver electricity more reliably and efficiently can greatly reduce the frequency and duration of power outages[ 6 ]. The following parts of the article are structured as Section 2 gives a detailed related work which includes work done by various researchers and research gaps. Section 3 delves into the specifics of the proposed methodologies. The results obtained are then shown in Section 4, followed by a comprehensive analysis. Section 5 offers a concise concluding remark of the study with some last reflections. 2. LITERATURE REVIEW To achieve Smart Electricity Meter Data Analysis (SEMDA) effectively, it is necessary to understand smart meter datasets with attributes and types, various incremental clustering types, cluster evaluation methods etc. 2.1. Datasets for Research The main challenge to do research on SEMDA is the availability of datasets but many authentic datasets are made available online. Table 1 summarized the publicly available SEM datasets at National and International levels. Table 1 Summary of Several Open Load Datasets Dataset Brief Description No. of houses (buildings) Data Resolution Duration Ref. CBTs Data from smart meter readings and pre- and post-trial surveys;(Ireland’s) 6445 Every 30 min 2009/9 - 2011/1 [ 9 ] LCL Data encompasses smart meter readings, electricity price data, and data from surveys related to appliances and attitudes (London) 5567 Every 30 min 2013/1- 2013/12 [ 10 ] Umass Smart Residential Apartment electricity consumption data; (USA) 114 Apartments Every 1 min 2014/10- 2016/12 [ 11 ] ENERTALK Electricity consumption data: aggregate and appliance-level; (Korea) 22 15Hz 2016/11- 2017/01 [ 12 ] IBLEND Commercial and residential buildings of an academic institute campus(Delhi,India) 7 buildings Every 10 min 2013/8- 2017/12 [ 13 ] Prayas Energy Smart meter consumer data (Pune) 70 Every 1 min 2018/1- 2018/2 [ 14 ] IITBSmart Energy Informatics Residential building electricity consumption data of IIT Bombay campus (Mumbai) 60 Every 60 min 2016/12 to 2018/1 [ 15 ] The LCL, CBT, and Umass datasets often used in the existing literature. In addition, IITB, ENERTALK, and I-BLEND are newly released datasets. 2.2. Literature on Incremental Clustering Algorithms This literature survey focuses on incremental clustering algorithms that have been proposed to address the challenges of processing residential electricity consumption streaming and evolving data [ 16 ]. Authors in [ 17 ] constructs a hierarchical clustering structure that can capture both global and local consumption patterns. It enables efficient clustering of large-scale smart meter datasets and provides insights into consumption behavior at different levels of granularity. The study by [ 6 ] utilizes nearness factor incremental clustering techniques to identify clusters of similar energy consumption patterns. By adaptively updating the clusters as new data arrives, incremental clustering with closeness factor can handle concept drift and evolving consumption patterns in smart meter data. NFICA [ 18 ] is an incremental clustering algorithm developed for smart meter data analysis. It employs a nearness factor based approach to dynamically group similar consumption profiles. NFICA effectively handles streaming data by continuously updating cluster centers and adapting to changing energy usage patterns.In the work [ 19 ], author proposed a Log Likelihood-based Gradational Clustering Algorithm to identify consumer consumption patterns. However, the existing reference faces the challenge of order sensitivity. To address this issue, it is hypothesized that an incremental clustering [ 20 , 21 ] approach is crucial in mitigating the challenges associated with clustering large and growing datasets from smart electricity meters. The study in [ 22 ] proposed an android application called "PowerStats" that provides users with statistics on their mobile phone charging patterns. The application collects data such as the phone model, battery percentage when the device is plugged in or unplugged, timestamps of plugging in and unplugging, as well as voltage and current information. With PowerStats, users can gain insights into their charging behaviors and better understand the usage patterns of their mobile devices. The author in [ 23 ] incorporates the fuzzy C-means approach to handle uncertain and imprecise consumption patterns. It adaptively updates cluster centers and membership degrees to capture evolving energy usage trends and variations. 2.3. Research Gaps Through an extensive literature survey, The researcher identified research gaps in the field of SEMDA: Leveraging incremental clustering algorithms can significantly enhance load profiling, leading to reduced electricity consumption. SEMDA provides energy saving recommendation with detailed consumption and appliance transition timestamp SEMDA involves descriptive, predictive, and prescriptive analytics to provide insights to both energy providers and consumers, 3. PROPOSED METHODOLOGY The proposed research aims to enhance the Expectation Maximization (EM) algorithm [ 24 ], The EM algorithm heavily relies on the initial estimation of the number of Gaussian components in the mixture model [ 25 ]. Choosing the correct attributes (initial guess) is potentially the most important aspect of a successful clustering. The present study proposed a Closeness based Gaussian Mixture Incremental Clustering (CGMIC) algorithm that extends the Closeness Factor-Based Algorithm (CFBA) [ 26 ] with the EM algorithm. An informed initial guess reduces the number of iterations required for convergence, making the EM algorithm efficient for handling incremental datasets. Furthermore, an accurate informed initial guess enhances the quality and accuracy of the EM output [ 24 ]. Figure 1.2 illustrates the system framework. The essential elements of the framework consist of data collection, data preprocessing, Proposed CGMIC algorithm, and performance evaluation. Subsequent sections detail the extraction of hidden pattern of electricity consumption through cluster analysis. 3.1. Data Collection Acquiring the Indian residential power consumption dataset form IIT Bombay. The final dataset contains Unix Time Stamp (Indian Standard Time (GMT + 5.30); apartment id; voltage of phase 1, 2 and 3 ; active power of all three different phases; reactive power of all three different phases; current for all three different phases, power factor of all three different phases, and phase angle. Figure 1.3 depicted the year 2017 comprehensive electricity consumption pattern of residential customers. The majority of electricity usage is concentrated in the evening, specifically between 6:30 pm and 8:00 pm and the highest peak occurring at 7:00 pm. 3.2. Data Preprocessing Processing the dataset is essential to represent the quality of data accurately. There are various methods to handle missing values, including ignoring them entirely, replacing them with a numeric value, using the most frequent value for the feature, or substituting them by the mean value of the attribute. It's crucial to remember that the smart meter dataset that is utilized for this research has no missing values. 3.3. Proposed Closeness based Gaussian Mixture Incremental ClusteringAlgorithm(CGMIC) Algorithm The proposed Algorithm 1 effectively handle the incremental data and analysis of hidden patterns. With this predictive analytics, energy generators can make precise decisions about commissioning more solar panels or reducing the number of coal generators in their portfolio. The novel and inventive features of the proposed algorithm are as follow: CGMIC algorithm is parameter-free i.e. free from selecting the number of cluster initialization Cluster formation first, Less complex to implement, and Converge guaranteed Cluster ranking during iterations for outliers detection Learn from the influx of new data, without discarding the previously acquired knowledge Incremental learning achieved for automatically suggesting groups of clients for specific actions, such as commercial offers for energy reduction Log-likelihood based order-independent statistical IC algorithm Algorithm 1 Proposed Closeness based Gaussian Mixture Incremental Clustering (CGMIC) Algorithm Input : I x = {I x1 , I x2 , ... ,I xn } a set of n d- dimensional time series smart meter raw datasets, M iter : a maximum number of iterations, converge criteria(ε) for loglikelihood. Output A series of the cluster stored in clusterdb Outcome Incremental learning of load shedding patterns day wise, time wise, season-wise Phase I: Formation of Basic Clusters 1) WHILE change in loglikelihood(llh) is greater than ε and M iter has not been reached DO: a) for i = 1 to n i. Consider every two time series I x1 and I x2 , I xn (l) is the point l in series n. Sum(l) is the total of the corresponding parameters of the series considered. ii. The Relationship Probability(RP) of I x1 is calculated as ratio of first series to the sum of the corresponding parameters iii. Closeness(CN) between series are CN=[(Er(l)) 2 * sqrt(Sum(l) )] [sqrt(Sum(l)] -1 wherein Er(l) = [RP*Sum(l) – I xn (l)][sqrt(Sum(l) * RP * (1-RP))] -1 iv. number of cluster (k), Mean (µ), Variance (∑) are stored in clusterdb endfor b. Initialize: Set µ, ∑, k, prior probability (Π) by using output of previous step(a).llh= -ꝏ c. for i = 1 to n for j = 1 to k Posterior Relationship Probability PRP(Ix i | Π,µ, ∑) = Π j * PP(Ix i |µ j , ∑ j ) Wherein,PP(Ix i |µ j , ∑ j )=[exp(-1/2 (Ix i -µ j ) t ∑ j -1 (Ix i -µ j )]/[(2Π) d/2 |∑ j | 1/2 ] endfor endfor d. for i = 1 to k $$\:{\mu\:}_{i\:=\:}\frac{\sum\:_{j=1}^{n}{I}_{Xj}PRP\left({{C}_{i}/I}_{Xj}\right)}{\sum\:_{j=1}^{n}PRP\left({{C}_{i}/I}_{Xj}\right)}$$ $$\:{\sum\:}_{i}=\:\frac{\sum\:_{j=1}^{n}{{(I}_{Xj}-\:{\mu\:}_{i})}^{2}PRP\left({{C}_{i}/I}_{Xj}\right)}{\sum\:_{j=1}^{n}PRP\left({{C}_{i}/I}_{Xj}\right)}$$ llh = llh + log( \(\:PRP\left({C}_{i}\right)PRP\left({I}_{Xi}/{C}_{i}\right)\) ) endfor Phase II: On influx of new data either updation of existing cluster(s) or formation of new cluster(s) 4. RESULTS AND DISCUSSION The results of the CGMIC algorithms on IIT Bombay smart meter datasets are shown in this section. 4.1. Dataset Description The IIT Bombay Indian Residential Energy Dataset[ 15 ] is valuable resource for energy consumption analysis. The Key features includes: Region and Country: IIT Bombay India Data Period consider for research: January 2017 to December 2017 Apartment Type: 3BHK (3 Bedrooms, 1 Hall, 1 Kitchen) Sampling Rate: Originally 5 seconds, down sampled to 1 hour Attributes: Timestamp,Voltage and energy consumption for each phase Format: CSV files 4.2. Load Profiling via CGMIC The CGMIC algorithm analyzes data from smart electricity meters in Indian homes and result is shown in Fig. 1.4. In Fig. 1.4(a), the clusters obtained from the CGMIC algorithm applied to the IITBombay dataset are illustrated. This represents the initial implementation of the CGMIC algorithm, resulting in the formation of four primary clusters. Figure 1.4(b) demonstrates how new influx data seamlessly fits into the existing clusters. However, in Fig. 1.4(c) and 1.4(d), we observe that with the arrival of new data, certain data points do not align with the existing clusters due to different closeness values. Consequently, the CGMIC algorithm automatically forms two new clusters, labeled as cluster number five and six, respectively. As a result, the CGMIC algorithm divides the SEM data into a total of six clusters, effectively grouping consumers based on their electricity usage patterns and behaviors. Figure 1.5 illustrates the consumption patterns of residential customers organized by clusters. The data reveals that cluster 4 consists of consumers with the highest electricity consumption. Cluster 1 and Cluster 5 contains the customers whose electricity consumption is average that is the reason cluster 1 and 5 updated frequently. As shown in Table 2 we can see that, a total 1 lac smart meter data were grouped into 6 clusters. 57.9% were used for cluster 1 and 4.2% were used for cluster 4. Table 2 Descriptive analysis of Clusters generated from CGMIC Clusters 1 2 3 4 5 6 Percentage of Data 57.9% 6.3% 10.4% 4.2% 16.4% 4.8% 4.3. Data Insights with Cluster Analysis for Sustainable Development The CGMIC clustering can reveal various distinct consumption patterns among the daily load curves of the customer as shown in Fig. 1.6. The cluster analysis is given below: Cluster1: Comprises 70 percent of the population. These residents fall into the category of ordinary consumers who use average amounts of electricity. The electricity demand tends to peak around noon. Cluster1 indicates that most residents stay at home during the day Cluster 2: Comprising 12 percent of the population, Cluster 2 consists of high-demand consumers. Their electricity consumption is elevated during the morning, remains stable in the afternoon, and peaks around 4:00 PM. Cluster 3: Representing 7 percent of the population, Cluster 3 includes average electricity consumers. Consumption patterns indicate an afternoon peak, suggesting frequent use of electric appliances like air conditioners. Cluster 4: Accounting for 6 percent of the population, Cluster 4 contains the highest-demand consumers. These users consume electricity without price considerations, resulting in unstable consumption. Peaks occur during the afternoon. Cluster 5: Constituting 5 percent of the population, Cluster 5 comprises low-demand consumers. Their daily electricity consumption remains stable, with a peak around 1:00 PM. The insight gained from the proposed system. The insight of the cluster analysis contribute directly to sustainable development goals by informing targeted energy efficiency programs to different customer segments. 5. SYSTEM PERFORMANCE EVALUATION The cluster generated by the proposed CGMIC algorithm compared using clustering validity indexes with DBSCAN algorithm for the validation. Generally, the final decision is made based on the result of the validity indexes. The DBSCAN algorithm[ 31 ] finding a number of clusters by estimated node density distribution. In power system literature, Davies-Bouldin Index(DBI) Dunn validity index (DVI), and Silhouette Criterion (SC) [ 29 ] is among the most popular index therefore it’s used in this study too. Table 3 compares the precision, recall, F1- measure and DBI of the two methods. Table 3 Evaluations of the Proposed System Parameters CGMIC DBSCAN DBI 0.78 1.59 SC 0.91 0.90 Precision 0.97 0.87 Recall 0.94 0.84 F1-Measures 0.95 0.85 The value of F1-score of CGMIC is higher than DBSCAN. Smaller values of DBI implies that the clustering algorithm separates the data set properly. Hence, the proposed method generates better clustering results. 6. CONCLUSIVE SUMMARY the proposed Gaussian Mixture Incremental Clustering(CGMIC) Algorithm accommodates the influx of new data seamlessly for accurate analysis. The CGMIC algorithm does not require prior knowledge about the number of clusters. This research found that the proposed system is ordered independent and parameter-free. The proposed system learned from the new labelled or unlabelled instances of data, without discarding the previously acquired knowledge. This study used the real World IIT Bombay Indian Residential Energy dataset, which contains SEM data of households. The various entities that are getting educated about sustainability through the findings of this research are customers, utility providers, policymaker, and the environment is explained as below: The implemented system informed the household customer in advance of the specific times when their electricity tariff would be higher or lower than the normal price. Household electricity consumers to monitor and improvise their consumption patterns. Utility providers can find appropriate customer groups for the effectiveness of demand response programs, improve the load forecasting accuracy, and estimate the electricity consumption pattern of new customers. Policymakers can make policies for effective energy reduction with the help of insight from customers' electricity consumption habits. This study also helps in reducing pollution by individual vehicles used for driving to the individual customer’s meters for manual reading and error if any By motivating consumers, utilities, and stakeholders, this proposed system encourages more sustainable behavior on a voluntary basis. The benefits of SEMDA extend beyond technical improvement by creating sustainable human beings via this research as this showcases and warns in advance when and how frequently there will be electricity shut down, cut offs. Floating solar plants energy capture data analysis, patterns recognition, and forecasting can be analysed using the proposed system in upcoming phases of research. Declarations Author Contribution All author equally contributed in Ideation, Implementation and Drafting Data Availability https://figshare.com/s/a4aedbe2ae2aadc5618b?file=13842431 References S. M. Amin, "Smart grid: overview, issues and opportunities. Advances and challenges in sensing, modeling, simulation, optimization and control," Eur J Control, vol. 17, 2011. P. Siano, "Demand response and smart grids—a survey," Renew Sustain Energy Rev, vol. 30, 2014. Mseb, "SCADA Data: Maharashtra Generation, Exchange and Demand overview and Load shedding data," ed, 2018. B. Yildiz, J. I. Bilbao, J. Dore, and A. B. Sproul, "Recent advances in the analysis of residential electricity consumption and applications of smart meter data," Applied Energy, vol. 208, pp. 402-427, 2017. P. Cichosz, Data mining algorithms: explained using R : John Wiley & Sons, 2014. A. Y. Chaudhari and P. Mulay, "Unleashing analytics to reduce electricity consumption using incremental clustering algorithm," International Journal of Energy Sector Management, vol. ahead-of-print, 2021. P. Mulay, "Threshold Computation to Discover Cluster Structure, a New Approach," International Journal of Electrical and Computer Engineering (IJECE), vol. 6, pp. 275-282, 2016. P. A. Kulkarni and P. Mulay, "Evolve systems using incremental clustering approach," Evolving Systems, vol. 4, pp. 71–85-71–85, 2013. Issda, "Irish Social Science Data Archive: Commission for Energy Regulation (CER) smart metering project," ed, 2018. R. C. J. R. Schofield, S. H. Tindemans, M. Bilton, M. Woolf, and G. Strbac, "Low carbon london project: Data from the dynamic time-of-use electricity pricing trial, 2013," 2015. S. Barker, A. Mishra, D. Irwin, E. Cecchet, P. Shenoy, and J. Albrecht, "Smart*: An Open Data Set and Tools for Enabling Research in Sustainable Homes," in Workshop on Data Mining Applications in Sustainability (SustKDD 2012) . C. Shin, E. Lee, J. Han, J. Yim, W. Rhee, and H. Lee, "The ENERTALK dataset, 15 Hz electricity consumption data from 22 houses in Korea," Scientific Data, vol. 6, pp. 1-13, 2019. H. Rashid, P. Singh, and A. Singh, "I-BLEND, a campus-scale commercial and residential buildings electrical energy dataset," Scientific Data, vol. 6, pp. 1-12, 2019. P. E. Group, "Smart meter consumer data (Pune)," ed, 2018. M. P. Mary, K. Hareesh, R. Krithi, and H. Rashid, "Want to Reduce Energy Consumption, Whom should we call?," in Proceedings of the Ninth International Conference on Future Energy Systems , 2018, pp. 12–20. S. Kuralkar, P. Mulay, and A. Chaudhari, "Smart Energy Meter: Applications, Bibliometric Reviews and Future Research Directions," Science & Technology Libraries, vol. 39, pp. 165-188, 2020/04/02 2020. A. M. Alonso, F. J. Nogales, and C. Ruiz, "Hierarchical clustering for smart meter electricity loads based on quantile autocovariances," IEEE Transactions on Smart Grid, vol. 11, pp. 4522-4530, 2020. A. Y. Chaudhari and P. Mulay, "Cloud4NFICA-Nearness Factor-Based Incremental Clustering Algorithm Using Microsoft Azure for the Analysis of Intelligent Meter Data," International Journal of Information Retrieval Research (IJIRR), vol. 10, pp. 21-39, 2020. A. Chaudhari and P. Mulay, "Algorithmic analysis of intelligent electricity meter data for reduction of energy consumption and carbon emission," The Electricity Journal, vol. 32, p. 106674, 2019/12/01/ 2019. A. Chaudhari and P. Mulay, "A bibliometric survey on incremental clustering algorithm for electricity smart meter data analysis," Iran Journal of Computer Science, July 25 2019. A. Chaudhari, R. R. Joshi, P. Mulay, K. Kotecha, and P. Kulkarni, "Bibliometric Survey on Incremental Clustering Algorithms," Library Philosophy and Practice (e-journal), vol. 2762, pp. 1-25, 2019. Saloni Kuralkar, Preeti Mulay, and A. Chaudhari, "Mobile Phone Charging: Power Statistics & Energy Consumption Pattern Analysis Using Developed “Powerstats” Android Application," International Journal of Modern Agriculture, vol. 9, pp. 1682 - 1710, 09/30 2020. J. L. Viegas, S. M. Vieira, and J. M. Sousa, "Fuzzy clustering and prediction of electricity demand based on household characteristics," in 2015 Conference of the International Fuzzy Systems Association and the European Society for Fuzzy Logic and Technology (IFSA-EUSFLAT-15) , 2015, pp. 1040-1046. F. Melzi, A. Same, M. Zayani, and L. Oukhellou, "A dedicated mixture model for clustering smart meter data: identification and analysis of electricity consumption behaviors," Energies, vol. 10, pp. 1446-1446, 2017. J. Li and A. Nehorai, "Gaussian mixture learning via adaptive hierarchical clustering," Signal Processing, vol. 150, pp. 116–121-116–121, 2018. Preeti Mulay and P. Kulkarni, "Evolving Systems using incremental clustering approach," Evolving Systems, vol. 4, pp. 70-85, 2013. G. Bradski and A. Kaehler, "OpenCV," Dr. Dobb’s journal of software tools, vol. 3, 2000. A. Revathi and N. A. Modi, "Comparative analysis of text extraction from color images using tesseract and opencv," in 2021 8th International Conference on Computing for Sustainable Global Development (INDIACom) , 2021, pp. 931-936. A. Rajabi, M. Eskandari, M. Jabbari Ghadi, S. Ghavidel, L. Li, J. Zhang , et al. , "A pattern recognition methodology for analyzing residential customers load data and targeting demand response applications," Energy and Buildings, vol. 203, p. 109455, 2019/11/15/ 2019. M. Sakthi, "Effective Methods to Improve the Performance of K Means Clustering Algorithm," Department of Computer Science, Mother Teresa Womens University, 2017. Avory Bryant and K. Cios, " RNN-DBSCAN: A Density-based Clustering Algorithm using Reverse Nearest Neighbor Density Estimates," IEEE Transaction On Knowledge and Data Engineering, vol. 5, pp. 1-14, 2017. Additional Declarations No competing interests reported. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-4837042","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":345140199,"identity":"edec449c-4325-4aa9-b3d2-f6c05191f405","order_by":0,"name":"Archana Y. Chaudhari","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA50lEQVRIiWNgGAWjYHACNiid2MDwAcRlJ0FLY+MMEJeZeC0JjM08IJqQFv7Zh589+LnDTp6BPbn9sc2vbfJ8zAyMHz7m4NYicS7N3LD3TLJhA8/DxubcvtuGbcwMzJIzt+Gx5gwPmwRvG3MCg0QiUEvPbUagFjZmXjxa5IFaJP+21UO0WPbctieoxQCoRZq37TBEC8OP24kEtRieYTOTlm07btgG9MvM3obbyW3MjM14/SJ3hvmZ5Nu2anl+9vQHH378uW07v7354IeP+LwPA+DYYWwDkw1EqIeDP6QoHgWjYBSMgpECAHtwS64DuAqRAAAAAElFTkSuQmCC","orcid":"","institution":"Symbiosis Institute of Technology (SIT Pune), Symbiosis International (Deemed University)","correspondingAuthor":true,"prefix":"","firstName":"Archana","middleName":"Y.","lastName":"Chaudhari","suffix":""},{"id":345140200,"identity":"e8754302-56d3-45ab-9bc9-f38e63596cae","order_by":1,"name":"Shrada Chavan","email":"","orcid":"","institution":"Symbiosis Skill and Professional University","correspondingAuthor":false,"prefix":"","firstName":"Shrada","middleName":"","lastName":"Chavan","suffix":""},{"id":345140201,"identity":"53c77c88-b888-4486-8ea4-fa240e9c22a1","order_by":2,"name":"Preeti Mulay","email":"","orcid":"","institution":"Founder CEO Weekend Forever","correspondingAuthor":false,"prefix":"","firstName":"Preeti","middleName":"","lastName":"Mulay","suffix":""}],"badges":[],"createdAt":"2024-07-31 16:23:28","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-4837042/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-4837042/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":63437086,"identity":"42d0c878-b5b3-4c2f-9069-e5b37eb50f11","added_by":"auto","created_at":"2024-08-28 06:35:07","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":127441,"visible":true,"origin":"","legend":"\u003cp\u003eFig. 1.2. Logical Flow of the Proposed System\u003c/p\u003e","description":"","filename":"1.png","url":"https://assets-eu.researchsquare.com/files/rs-4837042/v1/34efac9e482b9ccd87ff3c26.png"},{"id":63437090,"identity":"07fc21f5-a436-4460-acb4-02462ae04f70","added_by":"auto","created_at":"2024-08-28 06:35:07","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":177252,"visible":true,"origin":"","legend":"\u003cp\u003eFig. 1.3. Residential customer electricity consumption pattern for the year 2017\u003c/p\u003e","description":"","filename":"2.png","url":"https://assets-eu.researchsquare.com/files/rs-4837042/v1/92a2de4a8e8bcd292d9de6e6.png"},{"id":63437688,"identity":"a0b61e23-dae0-4d18-b8fb-663794776304","added_by":"auto","created_at":"2024-08-28 06:43:07","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":149074,"visible":true,"origin":"","legend":"\u003cp\u003eFig 1.4: CGMIC Clustering Results on IIT Bombay Dataset\u003c/p\u003e","description":"","filename":"3.png","url":"https://assets-eu.researchsquare.com/files/rs-4837042/v1/0a49f6c4c96ccf1f79589aa7.png"},{"id":63437087,"identity":"46c4f509-aad9-4b57-a889-651a08d8f617","added_by":"auto","created_at":"2024-08-28 06:35:07","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":46146,"visible":true,"origin":"","legend":"\u003cp\u003eFig. 1.5. Cluster wise consumption analysis\u003c/p\u003e","description":"","filename":"4.png","url":"https://assets-eu.researchsquare.com/files/rs-4837042/v1/81c199d9b5bf60b95b550810.png"},{"id":63437088,"identity":"7a8f606e-1b0c-401a-93df-dd8892d7a642","added_by":"auto","created_at":"2024-08-28 06:35:07","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":193071,"visible":true,"origin":"","legend":"\u003cp\u003eFig. 1.6. Data Insights with Cluster Analysis\u003c/p\u003e","description":"","filename":"5.png","url":"https://assets-eu.researchsquare.com/files/rs-4837042/v1/98727bf5ef84c227deac79b4.png"},{"id":64138839,"identity":"5b587edc-6823-4703-b548-95dd0f6e5457","added_by":"auto","created_at":"2024-09-08 15:29:35","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1230727,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-4837042/v1/5829bcf2-b540-4ea4-b3ae-ef8b162a5656.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"The Role of Smart Electricity Meter Data Analysis in Driving Sustainable Development","fulltext":[{"header":"1. INTRODUCTION","content":"\u003cp\u003eThe world is rapidly shifting towards sustainable practices, and energy consumption is a key focal point. The Smart Grid (SG)[\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e], as a powerful tool in this transition, represents an advanced version of the electrical grid that efficiently delivers electricity from power plants to households and businesses in a smart and controlled manner [\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e]. One of its key features is the establishment of two-way communication between utility companies and their customers, with the integration of sensing technology along transmission lines enabling the grid to operate intelligently. Through SG, consumers have the ability to adjust their energy consumption patterns and behaviors based on the information, incentives, and disincentives they receive [\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e]. The benefits of SG encompass precise tracking of energy consumption, providing interpretable explanations for anomalies, and swift restoration of electricity following power disturbances. Furthermore, SG can bring advantages to utility providers by reducing the need for excessive electricity generation and minimizing the costs associated with installing new utility base stations.\u003c/p\u003e \u003cp\u003eTo ensure the efficient operation of the SG, the integration of various components and their collaborative work is crucial. This includes the utilization of smart sensors, communication system, devices, specialized processors to name a few [\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e]. These components form the foundation of the SG infrastructure, enabling seamless communication, intelligent decision-making, and effective management of electricity distribution throughout the grid. In SG, the implementation of Smart Electricity Meter (SEM) is one of the key technologies. SEMs record the fine-grained energy consumption (electricity load) of customers and will provide recorded information to the utility company for advanced measurement and control applications. As compared to a conventional electricity meter [\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e], an SEM has controls, automation, and communication unit give consumers better information and automatically report outages [\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e]. As the deployment of SEM is increasing, it generates a wealth of fine-grained incremental data to provide benefits to various stakeholders (Customers, Generation unit, Transmission and distribution unit) of power systems. However, such data are not useful without analytical power. The analytics solutions will be able to obtain valuable insight into large data generated by SEM.\u003c/p\u003e \u003cp\u003eIn power system domain, various data mining techniques are employed to analyze load data. Among these techniques, clustering stands out as a widely utilized and advantageous unsupervised method for organizing vast amounts of information and data in a meaningful way. The purpose is to enable decision-makers to effectively utilize this organized data for activities such as forecasting, assessment, and planning. Clustering finds extensive applicability in both numeric and text-based data, which is readily available and generated in real-time on a large scale, due to inventing of the Internet of Things (IoT), and other related techniques [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eHowever, the traditional clustering algorithm reclusters all of the data from scratch whenever any influx of new data arrives. The incremental clustering approach is an essential way to solve the clustering of growing data. This incremental clustering (algorithmic analysis) system, deliver electricity more reliably and efficiently can greatly reduce the frequency and duration of power outages[\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eThe following parts of the article are structured as Section 2 gives a detailed related work which includes work done by various researchers and research gaps. Section 3 delves into the specifics of the proposed methodologies. The results obtained are then shown in Section 4, followed by a comprehensive analysis. Section 5 offers a concise concluding remark of the study with some last reflections.\u003c/p\u003e"},{"header":"2. LITERATURE REVIEW","content":"\u003cp\u003eTo achieve Smart Electricity Meter Data Analysis (SEMDA) effectively, it is necessary to understand smart meter datasets with attributes and types, various incremental clustering types, cluster evaluation methods etc.\u003c/p\u003e \u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003e2.1. Datasets for Research\u003c/h2\u003e \u003cp\u003eThe main challenge to do research on SEMDA is the availability of datasets but many authentic datasets are made available online. Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e summarized the publicly available SEM datasets at National and International levels.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eSummary of Several Open Load Datasets\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"6\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDataset\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eBrief Description\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eNo. of\u003c/p\u003e \u003cp\u003ehouses\u003c/p\u003e \u003cp\u003e(buildings)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eData\u003c/p\u003e \u003cp\u003eResolution\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eDuration\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eRef.\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCBTs\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eData from smart meter readings and pre- and post-trial surveys;(Ireland\u0026rsquo;s)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e6445\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eEvery 30\u003c/p\u003e \u003cp\u003emin\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e2009/9 -\u003c/p\u003e \u003cp\u003e2011/1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e[\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e]\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLCL\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eData encompasses smart meter readings, electricity price data, and data from surveys related to appliances and attitudes (London)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e5567\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eEvery 30\u003c/p\u003e \u003cp\u003emin\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e2013/1-\u003c/p\u003e \u003cp\u003e2013/12\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e[\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e]\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eUmass\u003c/p\u003e \u003cp\u003eSmart\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eResidential Apartment\u003c/p\u003e \u003cp\u003eelectricity\u003c/p\u003e \u003cp\u003econsumption data;\u003c/p\u003e \u003cp\u003e(USA)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e114\u003c/p\u003e \u003cp\u003eApartments\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eEvery 1 min\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e2014/10-\u003c/p\u003e \u003cp\u003e2016/12\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e[\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e]\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eENERTALK\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eElectricity consumption\u003c/p\u003e \u003cp\u003edata: aggregate\u003c/p\u003e \u003cp\u003eand appliance-level;\u003c/p\u003e \u003cp\u003e(Korea)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e22\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e15Hz\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e2016/11-\u003c/p\u003e \u003cp\u003e2017/01\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e[\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e]\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eIBLEND\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eCommercial and residential\u003c/p\u003e \u003cp\u003ebuildings of\u003c/p\u003e \u003cp\u003ean academic institute\u003c/p\u003e \u003cp\u003ecampus(Delhi,India)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e7 buildings\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eEvery 10\u003c/p\u003e \u003cp\u003emin\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e2013/8-\u003c/p\u003e \u003cp\u003e2017/12\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e[\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e]\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ePrayas\u003c/p\u003e \u003cp\u003eEnergy\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eSmart meter consumer\u003c/p\u003e \u003cp\u003edata (Pune)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e70\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eEvery 1\u003c/p\u003e \u003cp\u003emin\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e2018/1-\u003c/p\u003e \u003cp\u003e2018/2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e[\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e]\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eIITBSmart\u003c/p\u003e \u003cp\u003eEnergy\u003c/p\u003e \u003cp\u003eInformatics\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eResidential building\u003c/p\u003e \u003cp\u003eelectricity consumption\u003c/p\u003e \u003cp\u003edata of IIT\u003c/p\u003e \u003cp\u003eBombay campus\u003c/p\u003e \u003cp\u003e(Mumbai)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e60\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eEvery 60\u003c/p\u003e \u003cp\u003emin\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e2016/12\u003c/p\u003e \u003cp\u003eto 2018/1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e[\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e]\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eThe LCL, CBT, and Umass datasets often used in the existing literature. In addition, IITB, ENERTALK, and I-BLEND are newly released datasets.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec4\" class=\"Section2\"\u003e \u003ch2\u003e2.2. Literature on Incremental Clustering Algorithms\u003c/h2\u003e \u003cp\u003eThis literature survey focuses on incremental clustering algorithms that have been proposed to address the challenges of processing residential electricity consumption streaming and evolving data [\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e]. Authors in [\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e] constructs a hierarchical clustering structure that can capture both global and local consumption patterns. It enables efficient clustering of large-scale smart meter datasets and provides insights into consumption behavior at different levels of granularity. The study by [\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e] utilizes nearness factor incremental clustering techniques to identify clusters of similar energy consumption patterns. By adaptively updating the clusters as new data arrives, incremental clustering with closeness factor can handle concept drift and evolving consumption patterns in smart meter data. NFICA [\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e] is an incremental clustering algorithm developed for smart meter data analysis. It employs a nearness factor based approach to dynamically group similar consumption profiles. NFICA effectively handles streaming data by continuously updating cluster centers and adapting to changing energy usage patterns.In the work [\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e], author proposed a Log Likelihood-based Gradational Clustering Algorithm to identify consumer consumption patterns. However, the existing reference faces the challenge of order sensitivity. To address this issue, it is hypothesized that an incremental clustering [\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e, \u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e] approach is crucial in mitigating the challenges associated with clustering large and growing datasets from smart electricity meters. The study in [\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e] proposed an android application called \"PowerStats\" that provides users with statistics on their mobile phone charging patterns. The application collects data such as the phone model, battery percentage when the device is plugged in or unplugged, timestamps of plugging in and unplugging, as well as voltage and current information. With PowerStats, users can gain insights into their charging behaviors and better understand the usage patterns of their mobile devices. The author in [\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e] incorporates the fuzzy C-means approach to handle uncertain and imprecise consumption patterns. It adaptively updates cluster centers and membership degrees to capture evolving energy usage trends and variations.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec5\" class=\"Section2\"\u003e \u003ch2\u003e2.3. Research Gaps\u003c/h2\u003e \u003cp\u003eThrough an extensive literature survey, The researcher identified research gaps in the field of SEMDA:\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003eLeveraging incremental clustering algorithms can significantly enhance load profiling, leading to reduced electricity consumption.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eSEMDA provides energy saving recommendation with detailed consumption and appliance transition timestamp\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eSEMDA involves descriptive, predictive, and prescriptive analytics to provide insights to both energy providers and consumers,\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003c/div\u003e"},{"header":"3. PROPOSED METHODOLOGY","content":"\u003cp\u003eThe proposed research aims to enhance the Expectation Maximization (EM) algorithm [\u003cspan class=\"CitationRef\"\u003e24\u003c/span\u003e], The EM algorithm heavily relies on the initial estimation of the number of Gaussian components in the mixture model [\u003cspan class=\"CitationRef\"\u003e25\u003c/span\u003e]. Choosing the correct attributes (initial guess) is potentially the most important aspect of a successful clustering. The present study proposed a Closeness based Gaussian Mixture Incremental Clustering (CGMIC) algorithm that extends the Closeness Factor-Based Algorithm (CFBA) [\u003cspan class=\"CitationRef\"\u003e26\u003c/span\u003e] with the EM algorithm. An informed initial guess reduces the number of iterations required for convergence, making the EM algorithm efficient for handling incremental datasets. Furthermore, an accurate informed initial guess enhances the quality and accuracy of the EM output [\u003cspan class=\"CitationRef\"\u003e24\u003c/span\u003e]. Figure \u003cspan class=\"InternalRef\"\u003e1.2\u003c/span\u003e illustrates the system framework. The essential elements of the framework consist of data collection, data preprocessing, Proposed CGMIC algorithm, and performance evaluation. Subsequent sections detail the extraction of hidden pattern of electricity consumption through cluster analysis.\u003c/p\u003e\n\u003cdiv id=\"Sec7\" class=\"Section2\"\u003e\n \u003ch2\u003e3.1. Data Collection\u003c/h2\u003e\n \u003cp\u003eAcquiring the Indian residential power consumption dataset form IIT Bombay. The final dataset contains Unix Time Stamp (Indian Standard Time (GMT\u0026thinsp;+\u0026thinsp;5.30); apartment id; voltage of phase 1, 2 and 3 ; active power of all three different phases; reactive power of all three different phases; current for all three different phases, power factor of all three different phases, and phase angle. Figure \u003cspan class=\"InternalRef\"\u003e1.3\u003c/span\u003e depicted the year 2017 comprehensive electricity consumption pattern of residential customers. The majority of electricity usage is concentrated in the evening, specifically between 6:30 pm and 8:00 pm and the highest peak occurring at 7:00 pm.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec8\" class=\"Section2\"\u003e\n \u003ch2\u003e3.2. Data Preprocessing\u003c/h2\u003e\n \u003cp\u003eProcessing the dataset is essential to represent the quality of data accurately. There are various methods to handle missing values, including ignoring them entirely, replacing them with a numeric value, using the most frequent value for the feature, or substituting them by the mean value of the attribute. It\u0026apos;s crucial to remember that the smart meter dataset that is utilized for this research has no missing values.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec9\" class=\"Section2\"\u003e\n \u003ch2\u003e3.3. Proposed Closeness based Gaussian Mixture Incremental ClusteringAlgorithm(CGMIC) Algorithm\u003c/h2\u003e\n \u003cp\u003eThe proposed Algorithm \u003cspan class=\"InternalRef\"\u003e1\u003c/span\u003e effectively handle the incremental data and analysis of hidden patterns. With this predictive analytics, energy generators can make precise decisions about commissioning more solar panels or reducing the number of coal generators in their portfolio. The novel and inventive features of the proposed algorithm are as follow:\u003c/p\u003e\n \u003cul\u003e\n \u003cli\u003e\n \u003cp\u003eCGMIC algorithm is parameter-free i.e. free from selecting the number of cluster initialization\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003eCluster formation first, Less complex to implement, and Converge guaranteed\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003eCluster ranking during iterations for outliers detection\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003eLearn from the influx of new data, without discarding the previously acquired knowledge\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003eIncremental learning achieved for automatically suggesting groups of clients for specific actions, such as commercial offers for energy reduction\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003eLog-likelihood based order-independent statistical IC algorithm\u003c/p\u003e\n \u003c/li\u003e\n \u003c/ul\u003e\n \u003cp\u003e\u003cstrong\u003eAlgorithm 1\u003c/strong\u003e\u003c/p\u003e\n \u003cp\u003eProposed Closeness based Gaussian Mixture Incremental Clustering (CGMIC) Algorithm\u003c/p\u003e\n \u003cdiv class=\"BlockQuote\"\u003e\n \u003cp\u003e\u003cstrong\u003eInput\u003c/strong\u003e: I\u003csub\u003ex\u003c/sub\u003e= {I\u003csub\u003ex1\u003c/sub\u003e, I\u003csub\u003ex2\u003c/sub\u003e, ... ,I\u003csub\u003exn\u003c/sub\u003e} a set of n d- dimensional time series smart meter raw datasets, M\u003csub\u003eiter\u003c/sub\u003e: a maximum number of iterations, converge criteria(\u0026epsilon;) for loglikelihood.\u003c/p\u003e\n \u003c/div\u003e\n \u003cp\u003e\u003cstrong\u003eOutput\u003c/strong\u003e\u003c/p\u003e\n \u003cp\u003eA series of the cluster stored in clusterdb\u003c/p\u003e\n \u003cp\u003e\u003cstrong\u003eOutcome\u003c/strong\u003e\u003c/p\u003e\n \u003cp\u003eIncremental learning of load shedding patterns day wise, time wise, season-wise\u003c/p\u003e\n \u003cdiv class=\"BlockQuote\"\u003e\n \u003cp\u003ePhase I: Formation of Basic Clusters\u003c/p\u003e\n \u003c/div\u003e\u003cspan\u003e\n \u003cp\u003e1) WHILE change in loglikelihood(llh) is greater than \u0026epsilon; and M\u003csub\u003eiter\u003c/sub\u003e has not been reached DO:\u003c/p\u003e\u003cspan\u003e\n \u003cp\u003ea) for i\u0026thinsp;=\u0026thinsp;1 to n\u003c/p\u003e\n \u003cp\u003e\u003cbr\u003e\u003c/p\u003e\u003cspan\u003e\n \u003cp\u003ei. Consider every two time series I\u003csub\u003ex1\u003c/sub\u003e and I\u003csub\u003ex2\u003c/sub\u003e, I\u003csub\u003exn\u003c/sub\u003e(l) is the point l in series n. Sum(l) is the total of the corresponding parameters of the series considered.\u003c/p\u003e\n \u003c/span\u003e \u003cspan\u003e\n \u003cp\u003eii. The Relationship Probability(RP) of I\u003csub\u003ex1\u003c/sub\u003e is calculated as ratio of first series to the sum of the corresponding parameters\u003c/p\u003e\n \u003c/span\u003e \u003cspan\u003e\n \u003cp\u003eiii. Closeness(CN) between series are CN=[(Er(l))\u003csup\u003e2\u003c/sup\u003e * sqrt(Sum(l) )] [sqrt(Sum(l)]\u003csup\u003e-1\u003c/sup\u003e wherein Er(l) = [RP*Sum(l) \u0026ndash; I\u003csub\u003exn\u003c/sub\u003e(l)][sqrt(Sum(l) * RP * (1-RP))]\u003csup\u003e-1\u003c/sup\u003e\u003c/p\u003e\n \u003c/span\u003e \u003cspan\u003e\n \u003cp\u003eiv. number of cluster (k), Mean (\u0026micro;), Variance (\u0026sum;) are stored in clusterdb\u003c/p\u003e\n \u003c/span\u003e\n \u003cp\u003e\u003c/p\u003e\n \u003c/span\u003e\n \u003cp\u003e\u003c/p\u003e\n \u003c/span\u003e\n \u003cdiv class=\"BlockQuote\"\u003e\n \u003cp\u003eendfor\u003c/p\u003e\n \u003c/div\u003e\u003cspan\u003e\n \u003cp\u003eb. Initialize: Set \u0026micro;, \u0026sum;, k, prior probability (\u0026Pi;) by using output of previous step(a).llh= -ꝏ\u003c/p\u003e\n \u003c/span\u003e \u003cspan\u003e\n \u003cp\u003ec. for i\u0026thinsp;=\u0026thinsp;1 to n\u003c/p\u003e\n \u003c/span\u003e\n \u003cdiv class=\"BlockQuote\"\u003e\n \u003cp\u003efor j\u0026thinsp;=\u0026thinsp;1 to k\u003c/p\u003e\n \u003cp\u003ePosterior Relationship Probability PRP(Ix\u003csub\u003ei\u003c/sub\u003e| \u0026Pi;,\u0026micro;, \u0026sum;) = \u0026Pi;\u003csub\u003ej\u003c/sub\u003e* PP(Ix\u003csub\u003ei\u003c/sub\u003e|\u0026micro;\u003csub\u003ej\u003c/sub\u003e, \u0026sum;\u003csub\u003ej\u003c/sub\u003e)\u003c/p\u003e\n \u003cp\u003eWherein,PP(Ix\u003csub\u003ei\u003c/sub\u003e|\u0026micro;\u003csub\u003ej\u003c/sub\u003e, \u0026sum;\u003csub\u003ej\u003c/sub\u003e)=[exp(-1/2 (Ix\u003csub\u003ei\u003c/sub\u003e-\u0026micro;\u003csub\u003ej\u003c/sub\u003e)\u003csup\u003et\u003c/sup\u003e \u0026sum;\u003csub\u003ej\u003c/sub\u003e\u003csup\u003e-1\u003c/sup\u003e(Ix\u003csub\u003ei\u003c/sub\u003e-\u0026micro;\u003csub\u003ej\u003c/sub\u003e)]/[(2\u0026Pi;)\u003csup\u003ed/2\u003c/sup\u003e |\u0026sum;\u003csub\u003ej\u003c/sub\u003e|\u003csup\u003e1/2\u003c/sup\u003e]\u003c/p\u003e\n \u003cp\u003eendfor\u003c/p\u003e\n \u003cp\u003eendfor\u003c/p\u003e\n \u003c/div\u003e\u003cspan\u003e\n \u003cp\u003ed. for i\u0026thinsp;=\u0026thinsp;1 to k\u003c/p\u003e\n \u003c/span\u003e\n \u003cdiv id=\"Equa\" class=\"Equation\"\u003e\n \u003cdiv class=\"mathdisplay\" id=\"FileID_Equa\" name=\"EquationSource\"\u003e$$\\:{\\mu\\:}_{i\\:=\\:}\\frac{\\sum\\:_{j=1}^{n}{I}_{Xj}PRP\\left({{C}_{i}/I}_{Xj}\\right)}{\\sum\\:_{j=1}^{n}PRP\\left({{C}_{i}/I}_{Xj}\\right)}$$\u003c/div\u003e\u003c/div\u003e\u003cdiv id=\"Equb\" class=\"Equation\"\u003e\u003cdiv class=\"mathdisplay\" id=\"FileID_Equb\" name=\"EquationSource\"\u003e$$\\:{\\sum\\:}_{i}=\\:\\frac{\\sum\\:_{j=1}^{n}{{(I}_{Xj}-\\:{\\mu\\:}_{i})}^{2}PRP\\left({{C}_{i}/I}_{Xj}\\right)}{\\sum\\:_{j=1}^{n}PRP\\left({{C}_{i}/I}_{Xj}\\right)}$$\u003c/div\u003e\u003c/div\u003e\u003cdiv class=\"BlockQuote\"\u003e\u003cp\u003ellh\u0026thinsp;=\u0026thinsp;llh\u0026thinsp;+\u0026thinsp;log(\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:PRP\\left({C}_{i}\\right)PRP\\left({I}_{Xi}/{C}_{i}\\right)\\)\u003c/span\u003e\u003c/span\u003e)\u003c/p\u003e\u003cp\u003eendfor\u003c/p\u003e\u003c/div\u003e\u003cp\u003ePhase II: On influx of new data either updation of existing cluster(s) or formation of new cluster(s)\u003c/p\u003e\u003c/div\u003e"},{"header":"4. RESULTS AND DISCUSSION","content":"\u003cdiv class=\"BlockQuote\"\u003e\n \u003cp\u003eThe results of the CGMIC algorithms on IIT Bombay smart meter datasets are shown in this section.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec11\" class=\"Section2\"\u003e\n \u003ch2\u003e4.1. Dataset Description\u003c/h2\u003e\n \u003cp\u003eThe IIT Bombay Indian Residential Energy Dataset[\u003cspan class=\"CitationRef\"\u003e15\u003c/span\u003e] is valuable resource for energy consumption analysis. The Key features includes:\u003c/p\u003e\n \u003cul\u003e\n \u003cli\u003e\n \u003cp\u003eRegion and Country: IIT Bombay India\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003eData Period consider for research: January 2017 to December 2017\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003eApartment Type: 3BHK (3 Bedrooms, 1 Hall, 1 Kitchen)\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003eSampling Rate: Originally 5 seconds, down sampled to 1 hour\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003eAttributes: Timestamp,Voltage and energy consumption for each phase\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003eFormat: CSV files\u003c/p\u003e\n \u003c/li\u003e\n \u003c/ul\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec12\" class=\"Section2\"\u003e\n \u003ch2\u003e4.2. Load Profiling via CGMIC\u003c/h2\u003e\n \u003cp\u003eThe CGMIC algorithm analyzes data from smart electricity meters in Indian homes and result is shown in Fig.\u0026nbsp;1.4. In Fig.\u0026nbsp;1.4(a), the clusters obtained from the CGMIC algorithm applied to the IITBombay dataset are illustrated. This represents the initial implementation of the CGMIC algorithm, resulting in the formation of four primary clusters. Figure\u0026nbsp;1.4(b) demonstrates how new influx data seamlessly fits into the existing clusters. However, in Fig.\u0026nbsp;1.4(c) and 1.4(d), we observe that with the arrival of new data, certain data points do not align with the existing clusters due to different closeness values. Consequently, the CGMIC algorithm automatically forms two new clusters, labeled as cluster number five and six, respectively. As a result, the CGMIC algorithm divides the SEM data into a total of six clusters, effectively grouping consumers based on their electricity usage patterns and behaviors.\u003c/p\u003e\n \u003cp\u003eFigure \u003cspan class=\"InternalRef\"\u003e1.5\u003c/span\u003e illustrates the consumption patterns of residential customers organized by clusters. The data reveals that cluster 4 consists of consumers with the highest electricity consumption. Cluster 1 and Cluster 5 contains the customers whose electricity consumption is average that is the reason cluster 1 and 5 updated frequently. As shown in Table \u003cspan class=\"InternalRef\"\u003e2\u003c/span\u003e we can see that, a total 1 lac smart meter data were grouped into 6 clusters. 57.9% were used for cluster 1 and 4.2% were used for cluster 4.\u003c/p\u003e\n \u003cp\u003e\u003c/p\u003e\u0026nbsp;\u003ctable id=\"Tab2\" border=\"1\"\u003e\n \u003ccaption language=\"En\"\u003e\n \u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e\n \u003cdiv class=\"CaptionContent\"\u003e\n \u003cp\u003eDescriptive analysis of Clusters generated from CGMIC\u003c/p\u003e\n \u003c/div\u003e\n \u003c/caption\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eClusters\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003e1\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003e2\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003e3\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003e4\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003e5\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003e6\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003ePercentage of Data\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e57.9%\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e6.3%\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e10.4%\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e4.2%\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e16.4%\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e4.8%\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n \u003cp\u003e\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec13\" class=\"Section2\"\u003e\n \u003ch2\u003e4.3. Data Insights with Cluster Analysis for Sustainable Development\u003c/h2\u003e\n \u003cp\u003eThe CGMIC clustering can reveal various distinct consumption patterns among the daily load curves of the customer as shown in Fig. 1.6. The cluster analysis is given below:\u003c/p\u003e\n \u003cul\u003e\n \u003cli\u003e\n \u003cp\u003eCluster1:\u003c/p\u003e\n \u003cul\u003e\n \u003cli\u003e\n \u003cp\u003eComprises 70 percent of the population. These residents fall into the category of ordinary consumers who use average amounts of electricity.\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003eThe electricity demand tends to peak around noon.\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003eCluster1 indicates that most residents stay at home during the day\u003c/p\u003e\n \u003c/li\u003e\n \u003c/ul\u003e\n \u003cp\u003e\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003eCluster 2:\u003c/p\u003e\n \u003cul\u003e\n \u003cli\u003e\n \u003cp\u003eComprising 12 percent of the population, Cluster 2 consists of high-demand consumers.\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003eTheir electricity consumption is elevated during the morning, remains stable in the afternoon, and peaks around 4:00 PM.\u003c/p\u003e\n \u003c/li\u003e\n \u003c/ul\u003e\n \u003cp\u003e\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003eCluster 3:\u003c/p\u003e\n \u003cul\u003e\n \u003cli\u003e\n \u003cp\u003eRepresenting 7 percent of the population, Cluster 3 includes average electricity consumers.\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003eConsumption patterns indicate an afternoon peak, suggesting frequent use of electric appliances like air conditioners.\u003c/p\u003e\n \u003c/li\u003e\n \u003c/ul\u003e\n \u003cp\u003e\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003eCluster 4:\u003c/p\u003e\n \u003cul\u003e\n \u003cli\u003e\n \u003cp\u003eAccounting for 6 percent of the population, Cluster 4 contains the highest-demand consumers.\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003eThese users consume electricity without price considerations, resulting in unstable consumption.\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003ePeaks occur during the afternoon.\u003c/p\u003e\n \u003c/li\u003e\n \u003c/ul\u003e\n \u003cp\u003e\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003eCluster 5:\u003c/p\u003e\n \u003cul\u003e\n \u003cli\u003e\n \u003cp\u003eConstituting 5 percent of the population, Cluster 5 comprises low-demand consumers.\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003eTheir daily electricity consumption remains stable, with a peak around 1:00 PM.\u003c/p\u003e\n \u003c/li\u003e\n \u003c/ul\u003e\n \u003cp\u003e\u003c/p\u003e\n \u003c/li\u003e\n \u003c/ul\u003e\n \u003cp\u003eThe insight gained from the proposed system. The insight of the cluster analysis contribute directly to sustainable development goals by informing targeted energy efficiency programs to different customer segments.\u003c/p\u003e\n\u003c/div\u003e"},{"header":"5. SYSTEM PERFORMANCE EVALUATION","content":"\u003cp\u003eThe cluster generated by the proposed CGMIC algorithm compared using clustering validity indexes with DBSCAN algorithm for the validation. Generally, the final decision is made based on the result of the validity indexes. The DBSCAN algorithm[\u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e31\u003c/span\u003e] finding a number of clusters by estimated node density distribution. In power system literature, Davies-Bouldin Index(DBI) Dunn validity index (DVI), and Silhouette Criterion (SC) [\u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e] is among the most popular index therefore it\u0026rsquo;s used in this study too. Table\u0026nbsp;\u003cspan refid=\"Tab3\" class=\"InternalRef\"\u003e3\u003c/span\u003e compares the precision, recall, F1- measure and DBI of the two methods.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab3\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 3\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eEvaluations of the Proposed System\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"3\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eParameters\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eCGMIC\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eDBSCAN\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDBI\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.78\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1.59\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSC\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.91\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.90\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ePrecision\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.97\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.87\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRecall\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.94\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.84\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eF1-Measures\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.95\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.85\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eThe value of F1-score of CGMIC is higher than DBSCAN. Smaller values of DBI implies that the clustering algorithm separates the data set properly. Hence, the proposed method generates better clustering results.\u003c/p\u003e"},{"header":"6. CONCLUSIVE SUMMARY","content":"\u003cp\u003ethe proposed Gaussian Mixture Incremental Clustering(CGMIC) Algorithm accommodates the influx of new data seamlessly for accurate analysis. The CGMIC algorithm does not require prior knowledge about the number of clusters. This research found that the proposed system is ordered independent and parameter-free. The proposed system learned from the new labelled or unlabelled instances of data, without discarding the previously acquired knowledge. This study used the real World IIT Bombay Indian Residential Energy dataset, which contains SEM data of households. The various entities that are getting educated about sustainability through the findings of this research are customers, utility providers, policymaker, and the environment is explained as below:\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003eThe implemented system informed the household customer in advance of the specific times when their electricity tariff would be higher or lower than the normal price.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eHousehold electricity consumers to monitor and improvise their consumption patterns.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eUtility providers can find appropriate customer groups for the effectiveness of demand response programs, improve the load forecasting accuracy, and estimate the electricity consumption pattern of new customers.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003ePolicymakers can make policies for effective energy reduction with the help of insight from customers' electricity consumption habits.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eThis study also helps in reducing pollution by individual vehicles used for driving to the individual customer\u0026rsquo;s meters for manual reading and error if any\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003cp\u003eBy motivating consumers, utilities, and stakeholders, this proposed system encourages more sustainable behavior on a voluntary basis. The benefits of SEMDA extend beyond technical improvement by creating sustainable human beings via this research as this showcases and warns in advance when and how frequently there will be electricity shut down, cut offs.\u003c/p\u003e \u003cp\u003eFloating solar plants energy capture data analysis, patterns recognition, and forecasting can be analysed using the proposed system in upcoming phases of research.\u003c/p\u003e"},{"header":"Declarations","content":"\u003ch2\u003eAuthor Contribution\u003c/h2\u003e\u003cp\u003eAll author equally contributed in Ideation, Implementation and Drafting\u003c/p\u003e\u003ch2\u003eData Availability\u003c/h2\u003e\u003cp\u003ehttps://figshare.com/s/a4aedbe2ae2aadc5618b?file=13842431\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eS. M. Amin, \u0026quot;Smart grid: overview, issues and opportunities. Advances and challenges in sensing, modeling, simulation, optimization and control,\u0026quot; \u003cem\u003eEur J Control, \u003c/em\u003evol. 17, 2011.\u003c/li\u003e\n\u003cli\u003eP. Siano, \u0026quot;Demand response and smart grids\u0026mdash;a survey,\u0026quot; \u003cem\u003eRenew Sustain Energy Rev, \u003c/em\u003evol. 30, 2014.\u003c/li\u003e\n\u003cli\u003eMseb, \u0026quot;SCADA Data: Maharashtra Generation, Exchange and Demand overview and Load shedding data,\u0026quot; ed, 2018.\u003c/li\u003e\n\u003cli\u003eB. Yildiz, J. I. Bilbao, J. Dore, and A. B. Sproul, \u0026quot;Recent advances in the analysis of residential electricity consumption and applications of smart meter data,\u0026quot; \u003cem\u003eApplied Energy, \u003c/em\u003evol. 208, pp. 402-427, 2017.\u003c/li\u003e\n\u003cli\u003eP. Cichosz, \u003cem\u003eData mining algorithms: explained using R\u003c/em\u003e: John Wiley \u0026amp; Sons, 2014.\u003c/li\u003e\n\u003cli\u003eA. Y. Chaudhari and P. Mulay, \u0026quot;Unleashing analytics to reduce electricity consumption using incremental clustering algorithm,\u0026quot; \u003cem\u003eInternational Journal of Energy Sector Management, \u003c/em\u003evol. ahead-of-print, 2021.\u003c/li\u003e\n\u003cli\u003eP. Mulay, \u0026quot;Threshold Computation to Discover Cluster Structure, a New Approach,\u0026quot; \u003cem\u003eInternational Journal of Electrical and Computer Engineering (IJECE), \u003c/em\u003evol. 6, pp. 275-282, 2016.\u003c/li\u003e\n\u003cli\u003eP. A. Kulkarni and P. Mulay, \u0026quot;Evolve systems using incremental clustering approach,\u0026quot; \u003cem\u003eEvolving Systems, \u003c/em\u003evol. 4, pp. 71\u0026ndash;85-71\u0026ndash;85, 2013.\u003c/li\u003e\n\u003cli\u003eIssda, \u0026quot;Irish Social Science Data Archive: Commission for Energy Regulation (CER) smart metering project,\u0026quot; ed, 2018.\u003c/li\u003e\n\u003cli\u003eR. C. J. R. Schofield, S. H. Tindemans, M. Bilton, M. Woolf, and G. Strbac, \u0026quot;Low carbon london project: Data from the dynamic time-of-use electricity pricing trial, 2013,\u0026quot; 2015.\u003c/li\u003e\n\u003cli\u003eS. Barker, A. Mishra, D. Irwin, E. Cecchet, P. Shenoy, and J. Albrecht, \u0026quot;Smart*: An Open Data Set and Tools for Enabling Research in Sustainable Homes,\u0026quot; in \u003cem\u003eWorkshop on Data Mining Applications in Sustainability (SustKDD 2012)\u003c/em\u003e.\u003c/li\u003e\n\u003cli\u003eC. Shin, E. Lee, J. Han, J. Yim, W. Rhee, and H. Lee, \u0026quot;The ENERTALK dataset, 15 Hz electricity consumption data from 22 houses in Korea,\u0026quot; \u003cem\u003eScientific Data, \u003c/em\u003evol. 6, pp. 1-13, 2019.\u003c/li\u003e\n\u003cli\u003eH. Rashid, P. Singh, and A. Singh, \u0026quot;I-BLEND, a campus-scale commercial and residential buildings electrical energy dataset,\u0026quot; \u003cem\u003eScientific Data, \u003c/em\u003evol. 6, pp. 1-12, 2019.\u003c/li\u003e\n\u003cli\u003eP. E. Group, \u0026quot;Smart meter consumer data (Pune),\u0026quot; ed, 2018.\u003c/li\u003e\n\u003cli\u003eM. P. Mary, K. Hareesh, R. Krithi, and H. Rashid, \u0026quot;Want to Reduce Energy Consumption, Whom should we call?,\u0026quot; in \u003cem\u003eProceedings of the Ninth International Conference on Future Energy Systems\u003c/em\u003e, 2018, pp. 12\u0026ndash;20.\u003c/li\u003e\n\u003cli\u003eS. Kuralkar, P. Mulay, and A. Chaudhari, \u0026quot;Smart Energy Meter: Applications, Bibliometric Reviews and Future Research Directions,\u0026quot; \u003cem\u003eScience \u0026amp; Technology Libraries, \u003c/em\u003evol. 39, pp. 165-188, 2020/04/02 2020.\u003c/li\u003e\n\u003cli\u003eA. M. Alonso, F. J. Nogales, and C. Ruiz, \u0026quot;Hierarchical clustering for smart meter electricity loads based on quantile autocovariances,\u0026quot; \u003cem\u003eIEEE Transactions on Smart Grid, \u003c/em\u003evol. 11, pp. 4522-4530, 2020.\u003c/li\u003e\n\u003cli\u003eA. Y. Chaudhari and P. Mulay, \u0026quot;Cloud4NFICA-Nearness Factor-Based Incremental Clustering Algorithm Using Microsoft Azure for the Analysis of Intelligent Meter Data,\u0026quot; \u003cem\u003eInternational Journal of Information Retrieval Research (IJIRR), \u003c/em\u003evol. 10, pp. 21-39, 2020.\u003c/li\u003e\n\u003cli\u003eA. Chaudhari and P. Mulay, \u0026quot;Algorithmic analysis of intelligent electricity meter data for reduction of energy consumption and carbon emission,\u0026quot; \u003cem\u003eThe Electricity Journal, \u003c/em\u003evol. 32, p. 106674, 2019/12/01/ 2019.\u003c/li\u003e\n\u003cli\u003eA. Chaudhari and P. Mulay, \u0026quot;A bibliometric survey on incremental clustering algorithm for electricity smart meter data analysis,\u0026quot; \u003cem\u003eIran Journal of Computer Science, \u003c/em\u003eJuly 25 2019.\u003c/li\u003e\n\u003cli\u003eA. Chaudhari, R. R. Joshi, P. Mulay, K. Kotecha, and P. Kulkarni, \u0026quot;Bibliometric Survey on Incremental Clustering Algorithms,\u0026quot; \u003cem\u003eLibrary Philosophy and Practice (e-journal), \u003c/em\u003evol. 2762, pp. 1-25, 2019.\u003c/li\u003e\n\u003cli\u003eSaloni Kuralkar, Preeti Mulay, and A. Chaudhari, \u0026quot;Mobile Phone Charging: Power Statistics \u0026amp; Energy Consumption Pattern Analysis Using Developed \u0026ldquo;Powerstats\u0026rdquo; Android Application,\u0026quot; \u003cem\u003eInternational Journal of Modern Agriculture, \u003c/em\u003evol. 9, pp. 1682 - 1710, 09/30 2020.\u003c/li\u003e\n\u003cli\u003eJ. L. Viegas, S. M. Vieira, and J. M. Sousa, \u0026quot;Fuzzy clustering and prediction of electricity demand based on household characteristics,\u0026quot; in \u003cem\u003e2015 Conference of the International Fuzzy Systems Association and the European Society for Fuzzy Logic and Technology (IFSA-EUSFLAT-15)\u003c/em\u003e, 2015, pp. 1040-1046.\u003c/li\u003e\n\u003cli\u003eF. Melzi, A. Same, M. Zayani, and L. Oukhellou, \u0026quot;A dedicated mixture model for clustering smart meter data: identification and analysis of electricity consumption behaviors,\u0026quot; \u003cem\u003eEnergies, \u003c/em\u003evol. 10, pp. 1446-1446, 2017.\u003c/li\u003e\n\u003cli\u003eJ. Li and A. Nehorai, \u0026quot;Gaussian mixture learning via adaptive hierarchical clustering,\u0026quot; \u003cem\u003eSignal Processing, \u003c/em\u003evol. 150, pp. 116\u0026ndash;121-116\u0026ndash;121, 2018.\u003c/li\u003e\n\u003cli\u003ePreeti Mulay and P. Kulkarni, \u0026quot;Evolving Systems using incremental clustering approach,\u0026quot;\u003cem\u003e Evolving Systems, \u003c/em\u003evol. 4, pp. 70-85, 2013.\u003c/li\u003e\n\u003cli\u003eG. Bradski and A. Kaehler, \u0026quot;OpenCV,\u0026quot; \u003cem\u003eDr. Dobb\u0026rsquo;s journal of software tools, \u003c/em\u003evol. 3, 2000.\u003c/li\u003e\n\u003cli\u003eA. Revathi and N. A. Modi, \u0026quot;Comparative analysis of text extraction from color images using tesseract and opencv,\u0026quot; in \u003cem\u003e2021 8th International Conference on Computing for Sustainable Global Development (INDIACom)\u003c/em\u003e, 2021, pp. 931-936.\u003c/li\u003e\n\u003cli\u003eA. Rajabi, M. Eskandari, M. Jabbari Ghadi, S. Ghavidel, L. Li, J. Zhang\u003cem\u003e, et al.\u003c/em\u003e, \u0026quot;A pattern recognition methodology for analyzing residential customers load data and targeting demand response applications,\u0026quot; \u003cem\u003eEnergy and Buildings, \u003c/em\u003evol. 203, p. 109455, 2019/11/15/ 2019.\u003c/li\u003e\n\u003cli\u003eM. Sakthi, \u0026quot;Effective Methods to Improve the Performance of K Means Clustering Algorithm,\u0026quot; Department of Computer Science, Mother Teresa Womens University, 2017.\u003c/li\u003e\n\u003cli\u003eAvory Bryant and K. Cios, \u0026quot; RNN-DBSCAN: A Density-based Clustering Algorithm using Reverse Nearest Neighbor Density Estimates,\u0026quot; \u003cem\u003eIEEE Transaction On Knowledge and Data Engineering, \u003c/em\u003evol. 5, pp. 1-14, 2017.\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"incremental learning, smart meter data, pattern recognition, electricity management, sustainable development","lastPublishedDoi":"10.21203/rs.3.rs-4837042/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-4837042/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eThe sustainability of the electricity system is closely related to the analysis of smart electricity meter data, which plays an important role in enhancing energy management and overall grid operation. The widespread use of household smart meters generates a substantial volume of data, offering an opportunity to enhance overall energy management by analyzing household electricity usage data. However, when faced with an influx of new data, traditional clustering methods require re-clustering all the data from scratch, which can be computationally intensive. To address the challenge of handling the ever-increasing data, an incremental clustering algorithm proves to be the most suitable choice. Incremental learning, accomplished through incremental clustering, provides a straightforward and effective approach. In this research, the proposed Closeness-based Gaussian Mixture Incremental Clustering (CGMIC) Algorithm updates load patterns without relying on overall daily load curve clustering. The CGMIC algorithm first extracts load patterns from new data and then either intergrades the existing load patterns or forms new ones. Real-world electricity smart meter data, such as the IITB Indian Residential Energy Dataset, is utilized to validate the proposed system. The effectiveness of the proposed system is assessed using metrics like the silhouette score and Davis Bouldin index, employing the incremental K-means algorithm. The insight gained from the proposed system contribute directly to sustainable development goals. By effectively identifies changes in residential electricity consumption behavior, providing valuable insights for utility companies to optimize electricity load management.\u003c/p\u003e","manuscriptTitle":"The Role of Smart Electricity Meter Data Analysis in Driving Sustainable Development","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2024-08-28 06:35:03","doi":"10.21203/rs.3.rs-4837042/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"b918cf0f-311f-4444-9222-2afda32f144f","owner":[],"postedDate":"August 28th, 2024","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2024-09-08T15:21:27+00:00","versionOfRecord":[],"versionCreatedAt":"2024-08-28 06:35:03","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-4837042","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-4837042","identity":"rs-4837042","version":["v1"]},"buildId":"qtupq5eGEP_6zYnWcrvyt","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2024) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00