Generating Attribute Similarity Graphs: A User Behavior-Based Approach from Real- Time Microblogging Data on Platform X

doi:10.21203/rs.3.rs-4132627/v1

Generating Attribute Similarity Graphs: A User Behavior-Based Approach from Real- Time Microblogging Data on Platform X

2024 · doi:10.21203/rs.3.rs-4132627/v1

preprint OA: closed

Full text JSON View at publisher

Full text 263,912 characters · extracted from preprint-html · click to expand

Generating Attribute Similarity Graphs: A User Behavior-Based Approach from Real- Time Microblogging Data on Platform X | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Generating Attribute Similarity Graphs: A User Behavior-Based Approach from Real- Time Microblogging Data on Platform X Md Ahsan Ul Hasan, Azuraliza Abu Bakar, Mohd Ridzwan Yaakub This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-4132627/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Social network analysis is a powerful tool for understanding various phenomena, but it requires data with explicit connections among users. However, such data is hard to obtain in real-time, especially from platforms like X, commonly known as Twitter, where users share topic-related content rather than personal connections. Therefore, this paper tackles a new problem of building a social network graph in real-time where explicit connections are unavailable. Our methodology is centred around the concept of user similarity as the fundamental basis for establishing connections, suggesting that users with similar characteristics are more likely to form connections. To implement this concept, we extracted easily accessible attributes from the Twitter platform and proposed a novel graph model based on similarity. We also introduce an Attribute-Weighted Euclidean Distance (AWED) to calculate user similarities. We compare the proposed graph with synthetic graphs based on network properties, online social network characteristics, and predictive analysis. The results suggest that the AWED graph provides a more precise representation of the dynamic connections that exist in real-world online social networks, surpassing the inherent constraints of synthetic graphs. We demonstrate that the proposed method of graph construction is simple, flexible, and effective for network analysis tasks. Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Introduction Capturing and analysing real-time data is vital for Online Social Network (OSN) analysis as it enables immediate insights into social interactions, facilitating the observation of user behaviour as it happens (Bazzaz Abkenar et al., 2021 ). Researchers collect real-time information from OSNs through the utilization of diverse techniques and tools. These include crawling OSN platforms using Application Programming Interfaces (APIs), web scraping, network dataset repositories (pre-collected), data sharing agreements, and digital trace data collection, among others (Ohme et al., 2023 ; Stark, 2018 ). The most popular technique to collect real-time data analysis is through the use of API, as it provides researchers with the capability to retrieve and gather data in real-time, facilitating immediate analysis of the data upon its entry into the system (Ohme et al., 2023 ; Stark, 2018 ; Venturini & Rogers, 2019 ; Weber et al., 2021 ). Microblogging site X, famously known as Twitter, a popular OSN platform, functions as a hub for news updates, sharing information, and conducting marketing campaigns (Kanavos et al., 2023 ). Politicians, journalists, businesses, and celebrities have been using Twitter as a means to influence public opinion and impact political discourse. The platform has furthermore been utilized to gauge public opinion and sentiment regarding a specific subject. Hence, the analysis of real-time Twitter data holds the capacity to acquire a useful understanding of users' behavioural patterns, interests, and preferences. It has also gained popularity as a medium for academic study. Despite the widespread limitations imposed on data access by most OSN platforms following the consequences of Cambridge Analytica's data breach, Twitter remained an exception by continuing to offer its data through many APIs (Markos et al., 2023 ; Venturini & Rogers, 2019 ). However, the sheer volume of data generated by users makes it nearly impossible to collect explicit OSN data in the real world (Myers & Leskovec, 2010 ; Toraman et al., 2022 ). APIs provided by the Twitter platform have inherent limitations that impose restrictions on the frequency of queries allowed within a given time frame. For example, The Twitter API allows 15 calls per 15 minutes, delivering 1000 IDs, to get a Twitter account's followers and followings. Therefore, retrieving the complete list of followers would necessitate $\left(\raisebox{1ex}{$\text{N}\text{u}\text{m}\text{b}\text{e}\text{r} \text{o}\text{f} \text{F}\text{o}\text{l}\text{l}\text{o}\text{w}\text{e}\text{r}\text{s}$}\!\left/ \!\raisebox{-1ex}{$1000$}\right.\right)$ requests (De Nicola et al., 2021 ). Furthermore, Twitter API has recently undergone changes that include the introduction of new price structures and access levels (Rothwell, 2023 ). These modifications have led to further limitations on the number of tweets, requests, and data accessibility that developers and researchers can access. Moreover, Twitter distinguishes itself from other OSNs as its user base exhibits a distinct focus on real-time communication by sharing thoughts and ideas related to specific topics rather than prioritizing personal connections (Jain et al., 2021 ; Logan et al., 2023 ; Masrom et al., 2021 ). For example, we selected 150 primary users who had the highest number of likes on their tweets related to a recent topic. Consequently, we employed the Twitter API to obtain 100 friends for each of these users. The aim was to examine the potential friendship between these 150 users. Nevertheless, as shown in Fig. 1 , there were not many discernible connections between them. This discrepancy suggests that fully comprehending the significant network structure within Twitter may not be entirely possible by relying only on the traditional structure of friends and followers. Link prediction and imputation are among the most common techniques for addressing missing data in OSNs. Nevertheless, the efficacy of these methods may be constrained due to their heavy dependence on the interactions or connections between nodes for the estimation of missing data (Alam et al., 2023 ; Aziz et al., 2023 ; Mariani et al., 2020 ). Another widely adopted approach entails the creation of synthetic social networks if the real-world network is unavailable. Synthetic data modelling involves the generation of synthetic data that replicates the characteristics of real-world data (Agrawal et al., 2024 ; Faez et al., 2022 ; Jiang et al., 2022 ; Nettleton, 2016 ; O’Neil & Petty, 2019 ). This allows researchers to examine and assess information without compromising confidentiality or being constrained by the unavailability of data. However, synthetic social networks are generated algorithmically instead of being obtained by empirical methods, hence lacking authentic contextual complexities present in real-world networks (Agrawal et al., 2024 ; Lim & Bentley, 2022 ; O’Neil & Petty, 2019 ). The difficulties of deriving significant network structures from real-time data on Twitter have prompted the investigation of alternative approaches in this study. Consequently, this study introduces a novel user-attribute-based similarity graph model by employing publicly available Twitter data to generate connections according to their degree of similarity. The formation of relationships in OSNs is significantly impacted by the level of similarity among members, resulting in a higher probability of connections between users who share similar backgrounds or other shared attributes (Block & Grund, 2014 ; David-Barrett, 2020 ; Schwyck et al., 2023 ; Zareie & Sakellariou, 2020 ). Commonly employed methods for evaluating user similarity include Jaccard similarity, Cosine Similarity, Pearson Correlation Coefficient, and Euclidean Distance (Bodaghi & Oliveira, 2022 ; Kerrache et al., 2020 ; Shoeibi et al., 2022 ). However, these conventional approaches do not intrinsically account for the varying importance of user attributes in the formation of relationships. It is widely acknowledged that not all attributes of users hold the same level of importance or relevance in the context of establishing relationships (Alghobiri, 2023 ; de Andrade & Rêgo, 2018 ; Md Ahsan Ul Hasan et al., 2024 ; Li et al., 2019 ). Thus, this study introduces an A ttribute- W eighted E uclidean D istance (AWED) method to assess the similarity coefficients for generating an OSN graph. Principal Component Analysis (PCA) is employed to determine the importance of the attributes, and weights are assigned according to their respective importance. The paper's argument involves a comprehensive comparison between the proposed user-attribute-based similarity graph and three well-known synthetic graphs: Erdos Renyi (Piccardi, 2023 ; Shahraeini, 2023 ; Tantardini et al., 2019 ), Barabási–Albert (Kubina et al., 2017 ; Wei et al., 2022 ), and Stochastic Block Model graphs (Block & Grund, 2014 ; Hu et al., 2022 ; Lee & Wilkinson, 2019 ). The evaluation of these graphs is carried out using metrics pertaining to network structural properties, OSN characteristics, and predictive performance. Thus, the summary of the main contributions of this study are: The proposal of an OSN graph generation method in the absence of explicit relationships based on user-attribute-based similarity. The attributes selected for this study are easily derivable from Twitter platform in real-time using Twitter API. The proposal includes the implementation of an Attribute-Weighted Euclidean Distance metrics to quantify the similarity between users. The weights assigned to attributes are determined by assessing the importance of each attribute. A comprehensive comparison between proposed user-attribute-based similarity graph and three well-known synthetic graphs provided an insight into the efficacy of the proposed model against established synthetic graph models. The proposed method adheres to ethical standards and maintains data confidentiality by strictly following Twitter guidelines. This commitment ensures user privacy and upholds ethical integrity throughout the study. Preliminaries 2.1 Problem definition : Traditionally, an OSN such as Twitter can be modelled as a graph $G\left(V, E\right)$ , where $V=\left\{{v}_{1}, {v}_{2}, \dots , {v}_{n}\right\}$ represents the set of users (nodes) and $n=\left|V\right|$ represents the number of users. The set $E\subset V\times V$ is a set of edges represents the connection or relationship between users. In the absence of explicit relationship within a network ( $E=\varnothing )$ , the objective is to construct a method that creates relationship between users. Given that each user ${v}_{i}$ is associated with a set of attributes $A= \left\{{a}_{1}, {a}_{2}, \dots , {a}_{m}\right\}$ with varying weights $W=\left\{{w}_{1}, {w}_{2}, \dots , {w}_{m}\right\}$ , edges are constructed between nodes based on behavioral similarities, quantified using Attribute-Weighted Euclidean Distance (AWED). The formation of relationships in OSNs is highly influenced by the degree of similarity among users. Therefore, an edge is created between node ${v}_{i}$ and ${v}_{j}$ if the AWED exceeds the threshold $T$ : $$Edge \left({v}_{i},{v}_{j}\right)=\left\{\begin{array}{c}1, if AWED({v}_{i}, {v}_{j})\ge T\\ 0, Otherwise\end{array}\right.$$ 1 Even though Twitter is a directed graph, the constructed graph is undirected because of the well-established principle of symmetry in mathematics and social network analysis (Evkoski et al., 2023 ; Shoeibi et al., 2022 ). That is, $$\left(AWED\left({v}_{i}, {v}_{j}\right)\ge T\right)\Rightarrow \left(AWED\right({v}_{j}, {v}_{i})\ge T)$$ 2 2.2 Attributes : The Twitter API provides a rich and accessible source of data that can be used in a variety of ways to support academic research. The twitter API provides access to User-based information as well as user-generated contents that is Tweet-based information. User profile information includes the number of following (friends), the number of followers, the number of tweets as well as profile descriptions and locations. On the other hand, Tweet-based information includes the content of the tweet, time of the creation of tweet, the number of likes, retweets, comments, and quotes. Number of Friends : The number of accounts that a user is currently following. These accounts are displayed on the user's timeline, enabling them to view the tweets and updates from these accounts. The number of friends a user possesses is synonymous with the out-degree of the corresponding node in the graph which yield significant insights into the structure and dynamics of social networks (Guan et al., 2022 ). Number of Followers : The number of accounts that are currently following a specific individual. Followers receive notifications from the user, and their interaction with the user's tweets can enhance the user's reach and influence. The number of a user's followers is equivalent to the in-degree of the corresponding node in a graph, which can be used as a measure to comprehend the spread of information within online social networks (Panchendrarajan & Saxena, 2023 ). Number of Tweets : The total number of tweets published by a user. A tweet is a concise communication or post on the social networking platform Twitter, with a character limit of 280. It can contain various elements such as text, hyperlinks, visuals, or multimedia components. The aggregate count of tweets serves as an indicator of the user's extent of participation and involvement on the site (Fu & Shen, 2014 ). Engagement : Engagement incorporates the various forms of user involvement with a post, including likes, retweets, comments, and link clicks. Post resonance is a metric that gauges the level of audience involvement and may be determined by dividing the number of engagements by the number of impressions (Asadi & Agah, 2018 ; Iqbal et al., 2021 ). as shown in Eq. 3 , the engagement of user is calculated by summing the number of likes, retweets, replies, and quotes each tweet gets. A high engagement rate signifies that the material is captivating and pertinent to the audience. $$Engagement=\sum _{i=1}^{n}{(Likes, Retweets, Replies, Quotes)}_{i}$$ 3 Active : Being active on Twitter refers to the regularity and constancy with which a user interacts with the platform. This encompasses activities such as posting material, responding to comments, and engaging with others (Asadi & Agah, 2018 ; Iqbal et al., 2021 ). The Eq. 4 , shows the user's active-ness level over a specific time period and includes actions like tweeting, replying, retweeting, and quoting other users' tweets. $$Active= \frac{\sum Tweets posted, reacting to Retweets, Replies , Quotes}{Time}$$ 4 Tweets Impact : In Twitter, tweets impact represents the number of retweets that a user's post or tweet has received (Huynh et al., 2022 ). Eq. 5 denotes the impact of tweets, where n indicates the total number of tweets. $$Tweets Impact= \sum _{i=1}^{n}{Tweet}_{i} \times {\text{log}\left(Retweets\right)}_{i}$$ 5 Growth : In Twitter, growth refers to the increase in followers’ number over time (Mahmoudi et al., 2018 ). This may be assessed by monitoring the number of newly acquired followers. Eq. 6 indicates a user’s growth. $$Growth=\frac{{Followers}_{final}-{Followers}_{starting}}{Time}$$ 6 2.3 Principal Component Analysis (PCA) : Principal Component Analysis (PCA) is a statistical technique used to reduce the number of variables in a dataset. It transforms a group of variables that are related to each other into a smaller group of variables that are not related, known as principal components (Saarela & Jauhiainen, 2021 ; Zhou et al., 2022 ). Despite the reduction in variables, PCA preserves most of the information present in the original data. It can also be used to rank attributes according to their significance in capturing variability in the data. This is accomplished by analysing the eigenvalues of the covariance matrix. The eigenvalues indicate the extent to which each principal component accounts for the variance. As the eigenvalue increases, the corresponding principal component captures a greater amount of variation. Thus, characteristics linked to principal components with higher eigenvalues are deemed more significant in capturing the variability in the data. The procedure for ranking attributes involves following steps: Normalize the data to ensure uniform scaling of all variables. Calculate the covariance matrix in order to determine the relationships between variables. Calculate the eigenvectors and eigenvalues of the covariance matrix in order to determine the major components. Rank the principal components according to their eigenvalues, which indicate the variance explained by each component. The first principal component captures the highest amount of variability in the data, while the successive components collect the greatest amount of variability that is perpendicular to the previous components. Eq. 7 shows the calculation of the percentage of variance accounted for by each component. $$Percentage of variance= \frac{{\lambda }_{i}}{{\sum }_{j=1}^{p}{\lambda }_{j}} \times 100$$ 7 Where, the variable $p$ denotes the number of variables, while ${\lambda }_{i}$ represents the eigenvalue associated with the $i$ th principal component and the variable $j$ denotes the number of principal components. 2.4 Synthetic Graph : Synthetic graphs are simulated networks created using mathematical models to replicate the structural characteristics of actual social networks. Synthetic graph generators address the lack of datasets for evaluating graph learning algorithms, allowing for more thorough analysis of their performance in various scenarios. They are beneficial for comparing graph learning methods and modelling network dynamics (Nikolentzos et al., 2023 ; Piccardi, 2023 ; Verstraaten et al., 2017 ). The Erdős-Rényi (ER), Barabási-Albert (BA), and Stochastic Block Model (SBM) are common synthetic graph models used by researchers in Online Social Network (OSN) analysis (O’Neil & Petty, 2019 ; Piccardi, 2023 ). The Erdős-Rényi (ER) model is a classical graph modelling approach where every pair of nodes is linked by an edge with a consistent probability (Hu et al., 2022 ; Roux et al., 2023 ). The likelihood of a connection between two nodes is not influenced by the presence of other connections in the graph. In ER graph $G\left(n, p\right)$ there are $n$ vertices with probability $p$ independent from every other edges. The probability of graph $G$ with edges $E$ in this model is shown in Eq. 8 . $$P\left(G\right)= {p}^{E}{(1-p)}^{\frac{n\left(n-1\right)}{2}-E}$$ 8 Here, $\frac{n\left(n-1\right)}{2}$ represent the total number of possible edges. The Erdős-Rényi model works well for analyses that call for a straightforward and universal random graph model in which each edge has a defined probability of existing or not, regardless of the other edges (Hu et al., 2022 ; Shahraeini, 2023 ). The Barabási-Albert (BA) model is another well-liked method for creating synthetic graphs. The BA model generates random scale-free networks based on growth and preferential attachment principles (Kubina et al., 2017 ; Wei et al., 2022 ). Growth involves the continuous addition of new nodes to the network, whereas preferential attachment indicates that new nodes are more inclined to join with existing nodes that have a high degree (number of connections). Eq. 9 shows the degree distribution of BA model. $$P\left(k\right)= \frac{2m(m+1)}{k(k+1)(k+2)}$$ 9 Here, $P\left(k\right)$ denotes the probability that a node possesses degree k, and m denotes the number of connections formed by each newly added node to the network. This BA adheres to a power law distribution, indicating the presence of a small number of nodes with high degree (hubs) and a large number of nodes with low degree. BA graphs can be utilised to simulate and create realistic networks that display scale-free characteristics, including the World Wide Web, social networks, and biological networks (Kubina et al., 2017 ; Piccardi, 2023 ; Tantardini et al., 2019 ). The Stochastic Block Model (SBM) is another common model used in OSN analysis. The SBM is a probabilistic model for random graphs that generates graphs with communities, groups of nodes linked to each other with specific probabilities (Altenburger & Ugander, 2018 ; Hu et al., 2022 ; Lee & Wilkinson, 2019 ; Piccardi, 2023 ). The SBM is defined by the number of nodes in each community $n$ and a block probability matrix $P\in {\mathbb{R}}^{n\times n}$ in which each element represents the likelihood of an edge within a certain block. Here, each element ${P}_{ij}$ represents the probability of a connection between nodes from communities $i$ and $j$ . The SBM aims to incorporate more realistic features of real-world networks, including varying degree distributions, nested communities, and edge weights (Lee & Wilkinson, 2019 ). 2.5 Network Properties : The properties of a network encompass the attributes and metrics that elucidate the arrangement, conduct, and purpose of said network (Jain et al., 2021 ; Masrom et al., 2021 ; Piccardi, 2023 ; Talaga & Nowak, 2022 ; Tantardini et al., 2019 ). These properties facilitate the comparison of graphs by bringing to light the resemblances and disparities between. Table shows network properties and their descriptions. Degree Distribution : This illustrates the frequency at which nodes exhibit various levels of connectivity. The distribution of degrees within a network can be categorized as either uniform, normal, or skewed, contingent upon the degree to which connections are evenly or unevenly dispersed among the nodes. A skewed degree distribution may suggest the existence of hubs or outliers within the network (McMillan et al., 2022 ; Piccardi, 2023 ; Tantardini et al., 2019 ). The degree distribution $P\left(k\right)$ for an undirected graph can be calculated as shown in Eq. 10 . $$P\left(k\right)= \frac{Number of nodes with degree k}{N}$$ 10 Here, $N$ is the total number of nodes and $k$ is the degree of nodes. Edge Density : The ratio of the number of actual edges in a network to the total number of potential edges in the network is measured as edge density. It shows how closely linked the nodes are to one another in a graph (Wills & Meyer, 2020 ). Eq. 11 calculates the edge density of a network. $$Edge Density: \frac{2\times Number of Edges}{Number of Nodes\times (Number of Nodes-1)}$$ 11 Average Node Connectivity : It refers to the average number of node-independent pathways that connect each pair of nodes in a graph. It measures a network's robustness by expressing the average number of independent pathways that connect any two nodes (Beineke et al., 2002 ). It can be expressed as follows: $$Average Node Connectivity= \frac{1}{n(n-1)} {\sum }_{s\ne t}k(s,t)$$ 12 Here, n is the number of nodes in a graph, and the connections between nodes $s$ and $t$ is denoted as $k(s,t).$ Transitivity : This is the average local clustering coefficient of all nodes in a network. These coefficients indicate how likely nodes are to form triangle linkages or clusters. High transitivity indicates more communities or groupings in the graph, whereas low transitivity indicates more bridges or gaps (McMillan et al., 2022 ; Vasques Filho & O'Neale, 2020 ). Eq. 13 demonstrates the calculation of network transitivity. $$\text{T}\text{r}\text{a}\text{n}\text{s}\text{i}\text{t}\text{i}\text{v}\text{i}\text{t}\text{y}= \frac{3\times EquationNumber of triangles in the network}{EquationNumber of connected triples of nodes}$$ 13 Assortativity : This is the measure of the correlation between the degrees of nodes that are linked by an edge, reflecting the tendency of nodes to connect with others with similar or dissimilar degrees. High assortativity indicates a greater degree of homophily or similarity in the graph, whereas low assortativity indicates a higher level of heterogeneity or diversity (Al Musawi et al., 2022 ; McMillan et al., 2022 ). The calculation of the assortativity coefficient, commonly represented as $r$ , is performed utilizing the following: $$r= \frac{{\sum }_{i}{e}_{ii}-{\sum }_{i}{a}_{i}{b}_{i}}{1-{\sum }_{i}{a}_{i}^{2}}$$ 14 Here, the fraction of edges connecting nodes of degree $i$ to other nodes of degree $i$ is represented as ${e}_{ii}$ . The term ${a}_{i}$ represents the proportion of edges that are connected to nodes with a degree of $i$ . And The term ${b}_{i}$ denotes the proportion of edges that would connect to nodes of degree $i$ if the edges were assigned randomly across the graph. 2.6 OSN Characteristics : In OSNs, the characteristics of the network's structure are explicated through the usage of "power law distribution," "scale-free network," and "small-world phenomena" (Mislove et al., 2007 ; Weber et al., 2021 ). Power Law Distribution : It is believed that OSNs adapt to a power law distribution. This relates to the network's degree distribution, in which the degree of a node is determined by the quantity of connections it maintains (Broido & Clauset, 2019 ). $$P\left(k\right)\sim {k}^{-\gamma }$$ 15 where $P\left(k\right)$ represents the proportion of nodes in the network with $k$ connections, for high values of $k$ , $P\left(k\right)$ adheres to a power law. $\gamma$ is the is the exponent of the power law that characterises the degree distribution of the network. The network will have Scale-Free properties if $\gamma$ in the range of $2<\gamma <3$ . The phrase "scale-free" indicates that the network lacks a specific scale or size. To evaluate a graph shows power law characteristics we need to consider alpha value which is the exponent in the power law distribution equations. Xmin, the lower bound of x, KS p-value is the Kolmogorov-Smirnov test result. The likelihood Ratio (Power Law vs Exponential) compare the goodness of fit or the models. And Xmin process refers the estimation of the optimal lower cutoff using the goodness-of-fit based approach (Bhattacharya et al., 2020 ). Small World Phenomena : This is a core issue in social networks, highlighting the prevalence of short paths in a graph where nodes represent individuals connected by links indicating mutual acquaintance. it refers to the probability that two randomly selected individuals from the population share a common friend, commonly known as "six degrees of separation" (Bhattacharya et al., 2020 ). The Small World Phenomenon is usually measured using two mathematical methods: Average Path Length This represents the average number of steps on the shortest routes between every pair of nodes in a network. It quantifies the effectiveness of information or mass movement inside a network. In small-world networks, the average path length usually scales in proportion to the logarithm of the network's node count (Neal, 2017 ). It represents as $$L\sim\text{l}\text{o}\text{g}\left(N\right)$$ 16 Clustering coefficient This quantity represents the mean value of the local clustering coefficients computed for every node in the network. The local clustering coefficient of a given node is defined as the ratio of the number of existing triangles involving that node to the total number of possible triangles. A larger clustering coefficient indicates a greater presence of clusters or communities within the network, while a smaller clustering coefficient indicates a higher occurrence of bridges or gaps (Kanavos et al., 2023 ; Piccardi, 2023 ; Tantardini et al., 2019 ). $$C=\frac{3 \times Number of triangle }{Number of connected triples ofnodes }$$ 17 2.7 Predictive Analysis : The application of algorithms and data in the field of predictive analysis for OSNs involves the generation of predictions on potential network interactions, behaviours, or trends. We utilised two commonly used approaches, link prediction and community discovery, to conduct a predictive analysis of OSNs. Link Prediction Link prediction in OSN network analysis is essential for forecasting future user connections and understanding network growth patterns. By examining existing connections and network structure, this process identifies potential links between nodes, providing valuable insights into the network's evolution and the likelihood of new connections (Yilmaz et al., 2023 ; Yuliansyah et al., 2023 ). A common way for completing link prediction tasks is to combine Node2vec with a random forest classifier. This method entails utilizing Node2Vec as a means to produce embeddings for the nodes present in the graph. The embedded representations of the graph encapsulate its structural characteristics and function as feature vectors for each individual node. The Random Forest Classifier is subsequently trained using these embeddings to determine the probability of an edge between two nodes in the network. This functionality improves our understanding of the dynamics and structure of the network (Yilmaz et al., 2023 ). In assessing the efficacy of the performance of the graph’s link predictability, various metrics and approaches are available. Commonly employed metrics and approaches include Precision and recall measures quantify the balance between the quality and the amount of the projected relationships. Precision is the ratio of true positives divided by the total number of positive predictions (TP + FP), whereas recall is the quotient of true positives divided by the total number of actual positives (TP + FN) (Jia et al., 2022 ). F1-score measures the overall quality of the predicted connections by combining precision and recall. It is computed by averaging the values of precision and recall; it has a range of 0 to 1, with 1 representing the highest quality and 0 representing the lowest (Gui, 2024 ; Jia et al., 2022 ). The AUC-ROC metric measures the model's proficiency in appropriately prioritizing a randomly chosen positive instance (representing an extant link) over a randomly selected negative instance representing a non-existent link (Gui, 2024 ; Yilmaz et al., 2023 ). The AUC-ROC values span from 0 to 1, with a higher value signifying enhanced accuracy of the algorithm when predicting links. Community Detection : The identification of communities is another fundamental task in the field of network analysis, with the objective of dividing a network into distinct sub-structures or communities (Xu et al., 2022 ; Zhang et al., 2020 ). The Louvain algorithm, a well-established technique for community detection in networks, was employed in this study to detect communities within graph structures. We assessed the identified communities by employing three well-established criteria in the domain of network analysis and community detection: modularity, silhouette, and conductance. These metrics offered a thorough evaluation of the quality and structure of the identified communities (M. A. U. Hasan et al., 2024 ; Hromic & Hayes, 2019 ; Kim et al., 2022 ; Wang et al., 2018 ; Zhao et al., 2018 ). Modularity evaluates the degree to which a network has been separated into communities. A higher modularity value signifies a more robust community structure (M. A. U. Hasan et al., 2024 ; Kim et al., 2022 ). The Silhouette Score is a metric used to assess the level of clarity and distinctiveness exhibited by clusters or communities within a specific clustering or community detection process. The Silhouette Score is a numerical value that varies between − 1 and 1, with a higher score indicating more distinct groupings (Wang et al., 2018 ; Zhao et al., 2018 ). The measurement of Conductance involves the evaluation of the proportion of edges within communities in relation to the edges that connect different communities. Lower conductance values indicate a more optimal community structure (M. A. U. Hasan et al., 2024 ; Kim et al., 2022 ). Proposed methodology In this research, we introduced an approach for building a novel social network graph using information gathered from the Twitter network. The relationships among users in the social network were created based on their attribute’s similarities. We present this processing method in Fig. 2 , which is roughly divided into five portions: Data Collection & Primary User Selection, Preprocessing, Attribute Selection & Weight Assign, Graph Generation, and Evaluation. 3.1 Data Collection & Primary User Selection : The initial step is to select a primary user from twitter network. To do so, tweets were collected by using a search query with specific keywords during a specified time window. Following the initial collection, a filtration process was carried out to categorise tweets based on the number of likes they received. The primary users were chosen based on the authors of the most popular Tweets. Subsequently, two datasets were systematically created: one comprising profile data of the primary users, and the other containing their most recent tweets and tweets metrics. 3.2 Preprocessing : Collecting data straight from Twitter typically requires extracting information from a varied and unfiltered flow of content, resulting in the incorporation of several irrelevant or undesired data. Various preprocessing methods are used to eliminate abnormalities and duplications, improving the quality and integrity of the data from user datasets and tweet datasets. Subsequently, pre-processed datasets were merged to form a unified dataset. Numerical data was standardised using a Max-Min normalisation algorithm to scale it between 0 and 1. 3.3 Attribute Selection and Weight Assign : Selecting attributes and assigning weight has multiple steps. First, we transform pre-processed data into attributes that discussed in section 3.2. Afterward, we performed a comprehensive examination of existing literature to pinpoint essential attributes like spreadability, engagement, activity, growth, and impact, as well as their evaluation criteria, with a specific emphasis on their operational feasibility via the Twitter API. We used Principal Component Analysis (PCA) to rank these attributes based on their importance. Initially, we ensured uniform scaling across attributes with different units. Following this, the covariance matrix was calculated for the dataset. $$Covariance Matrix, \sum = \frac{1}{n-1}{\left(X-\stackrel{-}{X}\right)}^{T}\left(X-\stackrel{-}{X}\right)$$ 18 where $n$ is the number of observations, $X$ represents the data matrix, and $\stackrel{-}{X}$ denotes the mean vector of the attributes. Subsequently, the eigenvalues and eigenvectors associated with them were computed. $$Eigenvalue equation, \varSigma v=\lambda v$$ 19 where $v$ is the corresponding eigenvector and λ is the eigenvalue. The information contained within each principal component was revealed by these eigenvalues. We used eigenvalues to decide how many principal components $\left(k\right)$ to keep. The chosen components were utilised to convert the initial attribute space, producing a collection of orthogonal characteristics referred to as principal components. The transformed data matrix, $Z$ , is derived by multiplying the original data matrix, $X$ , by the matrix containing the first $k$ eigenvectors, $V$ : $$Z={XV}_{k}$$ 20 We ranked the original attributes by analysing their loadings on the principal components. The loading of attribute $j$ on principal component $i$ is given by: $${Loading}_{ij}= {V}_{ij}$$ 21 After calculating the attribute based on its loading, the attribute with the highest loading gets rank 1, the second highest attribute gets rank 2 and so on. After completing this ranking process, the order of ranks is reversed, resulting in a descending sequence. A normalization procedure is then applied to obtain weights within the range of 0.1 to 1.0. This ensures that the weights are proportionally representative of the original attribute values, preserving data integrity and facilitating more efficient computational processes. To do so we applied following formula: $${W}_{i}= 0.1+0.9\times \frac{{R}_{i}-min\left(R\right)}{{max}\left(R\right)-min\left(R\right)}$$ 22 Where, ${W}_{i}$ is the weight of the $i$ -th attribute, ${R}_{i}$ is the rank of the $i$ -th attribute, and $min\left(R\right)$ and $max\left(R\right)$ are the minimum and maximum of the reversed ranks. 3.4 Graph Construction Based on Weighted Similarity : The next step of similarity graph construction is to identify the similarities between users based on their attributes. To calculate the similarity coefficient between users we introduce a novel method called Attribute-Weighted Euclidean Distance (AWED). The significance of weighted features in OSN analysis is well acknowledged (Wang & Ma, 2016 ). Feature weights have a substantial impact on similarity calculations. They regulate the impact or significance of each attribute in determining the overall similarity between users (Cheng & Yan, 2023 ; Shantal et al., 2023). For instance, the attribute 'Number of followers' has a weight of 0.5, while “Number of tweets” has a weight of 0.2. A small difference in the number of followers (e.g., 100 followers) compared to a substantial difference in the number of tweets (e.g., 500 tweets) would result in the followers count having a greater impact on the similarity calculation, despite the larger difference in tweet count. Our proposed method involves integrating PCA to assign weights to user attributes based on attributes importance. Suppose two users ${U}_{1}$ and ${U}_{2}$ has attributes $A= \left\{{a}_{1}, {a}_{2}, \dots , {a}_{m}\right\}$ , weights $W=\left\{{w}_{1}, {w}_{2}, \dots , {w}_{m}\right\}$ assigned from PCA. The normalization factors are $m{f}_{1}, m{f}_{2}, \dots , m{f}_{m}$ , The AWED can be calculated as: $$AWED\left({U}_{1},{U}_{2}\right)= \sqrt{{\sum }_{i=1}^{n}{\left(\frac{{a}_{1i}}{m{f}_{1i}}- \frac{{a}_{2i}}{m{f}_{2i}}\right)}^{2}\bullet {w}_{i}}$$ 23 Here, ${a}_{1i}$ and ${a}_{2i}$ are the attribute values of users ${U}_{1}$ and ${U}_{2}$ , respectively. ${w}_{i}$ represents attributes importance, and $m{f}_{1i}$ and $m{f}_{2i}$ are the normalisation factors for attribute $i$ for users ${U}_{1}$ and ${U}_{2}$ respectively. Algorithm 1 represents the pair-wise similarity calculations as pseudo-code. Algorithm 1: Calculating Pair-wise Users AWED Similarity Coefficient Inputs : Dataset with Users, Users Attributes $attributeColumn: Columns containing attribute values$ $normalizationFactorColumn: Columns containing normalization factor values$ $rankColumn:feature importance ranks based on PCA$ // Reverse the order Output : Pairwise Users Similarity coefficient $attributes = dataset.drop(columns=[{\prime }User\_Id{\prime }\left]\right))$ $normalizationFactors = dataset\left[normalizationFactorColumn\right]$ $attributeImportanceRanks = dataset\left[rankColumn\right]$ $min\_rank = min\left(attributeImportanceRanks\right)$ $max\_rank = max\left(attributeImportanceRanks\right)$ // Weight assign $def calculate\_weight(rank, \text{m}\text{i}\text{n}\_rank, \text{m}\text{a}\text{x}\_rank ):$ $weight = \left(\right(rank - min\_rank) / (max\_rank - min\_rank\left)\right) * 0.9 + 0.1$ $return weight$ $weights = [calculate\_weight(rank, min\_rank, max\_rank\left) for rank in attributeImportanceRanks\right]$ //Similarity calculation $def AWED(User1, User2, Weight, normalizationFactors):$ $weighted\_sum\_squared = 0$ $for attribute in attributes:$ $Weight= Weights\left[attribute\right]$ $normalized\_value1 = User1\left[attribute\right] / normalization\_factors\left[attribute\right]$ $normalized\_value2 = User2\left[attribute\right] / normalization\_factors\left[attribute\right]$ $difference = absolute\_difference(normalized\_value1, normalized\_value2)$ $weighted\_difference = difference * Weight$ $weighted\_sum\_squared += {(weighted\_difference)}^{2}$ $similarity\_coefficient = square\_root(weighted\_sum\_squared)$ $return similarity\_coefficient$ $similarity\_df=\left[ \right]$ $for i in range\left(len\right(dataset\left)\right):$ $for j in range(i + 1, len(dataset\left)\right):$ $User1 = dataset.iloc[i, :].to\_dict\left(\right)$ $User2 = dataset.iloc[j, :].to\_dict\left(\right)$ $similarity = AWED(User1,User2,Weight,normalizationFactors)$ $similarity\_df = similarity\_df.append \left(\right\{ {\prime }user1{\prime }: User1,{\prime }user2{\prime }: User2, {\prime }AWED{\prime }: similarity\left\}\right)$ Similarity Graph Construction After acquiring the users’ similarities based on their weighted attributes, we perform similarity distribution and based on the percentage of relationships distribution we determined the threshold value. This threshold is the smallest measure of similarity necessary for two users to be connected in the graph. To generate a graph that depicts user relationships according to a similarity threshold, an empty graph $G$ is generated and initialised as $G=(V,E)$ , where $V$ represents the set of user nodes and $E$ represents the set of edges initially empty. At the outset, there are no connections established among users in the graph. Following this, we proceed through the similarity matrix based on $AWED$ , which comprises pairwise similarity coefficients among users. If the similarity coefficient ${AWED}_{ij}$ between users $i$ and $j$ is greater than the predefined threshold $T$ , an edge is added to the graph $(i,j)$ with the edge weight equal to the similarity coefficient. $$E=\left\{\left(i, j,{AWED}_{ij}\right)|{AWED}_{ij}>T\right\}$$ 24 3.5 Evaluation : The evaluation phase of our proposed methodology involves comparing the AWED graph with synthetic graphs. The objective is to show that the suggested graph displays more authentic characteristics compared to the synthetic alternatives in the field of social network analysis. This comparison is based on different criteria, including structural properties and characteristics of OSNs including power law distribution and the small-world phenomena. Prediction analyses are conducted to evaluate the effectiveness of the graph in prediction tasks. This evaluation approach tries to determine the authenticity and usefulness of the AWED graph in the field of social network analysis. Experimental Validation In this study, we conduct a two-fold experiment: 1) we construct a social network based on user similarity, and 2) we evaluate the performance of the network by comparing it with three synthetic graphs that have different characteristics. We employ network properties, OSN characteristics, and predictive analysis as the metrics for comparison. This section begins with a description of the accumulated dataset before moving on to the actual results. 4.1 Datasets : For this study, we followed the methodology described in section 3.1 and generated three datasets based on tweets that correspond to three different topics: War and Conflict, Environment and Global Warming, and Racism and Hate Speech. Each dataset contains the information of the primary users, their tweets, and the tweets’ information. n an intentional manner, we selected distinct primary users, timestamps, and durations for each dataset under consideration. The datasets are: Table 1 Summary of twitter datasets Datasets Duration Time stamps Total Tweets Total User Information Final user # (after preprocessing) War & Conflict 52 days 17 14070 tweets 2521 rows of users’ information 64 Environment & Global Warming 8 days 4 57355 tweets 1998 rows of users’ information 245 Racism & Hate speech 12 days 6 51442 tweets 2986 rows of users’ information 464 4.2 Attribute Selections : One of the main objectives of this study is to gather real-time data readily accessible via the Twitter API for the purpose of constructing an Online Social Network (OSN). The attributes selected for this study, as discussed in Section 3.2, are those most frequently addressed in the literature and were chosen with consideration for their ease of obtainability. It is important to note that not all features carry equal significance, and the importance of features can vary. Consequently, Principal Component Analysis (PCA) was employed to identify the features of greatest importance. To identify the number of Principal Components (PCs) to consider for final ranking, we selected the number of PCs that together explain a substantial portion of the variance in the data, and the first four principal components together explain at least 95% of the variance in the datasets. Figure shows the loading vectors of attributes for the first 4 PCs. Each PC in PCA is a linear combination of the original attributes, with the loadings denoting the weights assigned to each original feature in that combination. The aggregate absolute loadings of a feature across all PCs yield an estimation of that attribute's overall contribution to the variance accounted for by the PCs (Zhou et al., 2022 ). This provides us with a comprehensive assessment of the significance of each attribute in the dataset. Table 2 shows the overall contribution of each attribute to the variance explained by the PCs. The attribute "Following Count" has the highest total absolute loadings, suggesting that it has the most effect on the variance among all principal components. Next are "Engagement" and "Followers Count", which both exhibit major absolute loadings. Conversely, "Impact," "Tweets Count," "Growth," and "Active" have lower total loadings, indicating they are less significant in explaining for the variance. Table 2 Sum of the Absolute Loadings Value and Rank Attributes The sum of Absolute Loading Rank Following Count 1.729724 1 Engagement 1.712006 2 Followers Count 1.552863 3 Impact 0.972462 4 Tweets Count 0.901511 5 Growth 0.832458 6 Active 0.831984 7 4.3 Similarity coefficient between users : The similarity coefficient between two users, ${U}_{1}$ and ${U}_{2}$ , is calculated using their attribute values and ranks. Suppose there are three attributes and ${U}_{1}$ has $({a}_{11}=10, {a}_{12}=5,{a}_{13}=8)$ and normalization factors $\left(n{f}_{11}=2,n{f}_{12}=1,n{f}_{13}=4\right)$ . On the other hand, ${U}_{2}$ has $({a}_{21}=8, {a}_{22}=4,{a}_{23}=7)$ and normalization factors $\left(n{f}_{21}=3,n{f}_{22}=1.5,n{f}_{23}=5 \right)$ , the importance ranks of these attributes obtained from PCA are $({r}_{1}=0.5, {r}_{2}=0.3, {r}_{3}=0.2)$ . With the attribute values and weights in place, the final similarity between ${U}_{1}$ and ${U}_{2}$ : $AWED\left({U}_{1},{U}_{2}\right)= \sqrt{{\left(\frac{10}{2}-\frac{8}{3}\right)}^{2}\bullet 0.5+{\left(\frac{5}{1}-\frac{4}{1.5}\right)}^{2}\bullet 0.3+{\left(\frac{8}{4}-\frac{7}{5}\right)}^{2}\bullet 0.2 }$ (25) $AWED\left({U}_{1},{U}_{2}\right)=\sqrt{{\left(\frac{5}{2.67}\right)}^{2}\bullet 0.5+{\left(\frac{5}{2.67}\right)}^{2}\bullet 0.3+{\left(\frac{2}{1.4}\right)}^{2}\bullet 0.2}$ (26) $AWED\left({U}_{1},{U}_{2}\right)=\sqrt{0.47+0.32+0.08}\approx \sqrt{0.87 }\approx 0.93$ (27) 4.4 Threshold determination : Once the similarity coefficients are calculated using attribute values and rankings, the following step involves choosing a threshold value to decide if a link should be established between two users. To determine the best threshold value, we analyse the distribution of similarities using the descriptive statistics of the similarity coefficients, displayed in Table 3 . The table shows the standard deviation, minimum, maximum, mean, and percentiles of the similarity distribution. The mean value indicates the average similarity coefficient, whereas the standard deviation quantifies the variability or dispersion of the similarity coefficients. In this study, we choose the mean value as the threshold to indicate the similarity coefficients across users. Table 3 Similarity Coefficients Distribution Similarity Coefficient Min 0.422809601 p1 0.500131607 p5 0.714326859 p10 0.811151505 p25 0.925210953 p50 0.970602036 p75 0.987081528 p90 0.992170334 p95 0.993883133 p99 0.996419907 p100 0.998277664 max 0.998277664 mean 0.931131892 stdDev 0.096937947 4.5 Graph Generation : Once we calculated the AWED coefficients among users and established the threshold value, we proceeded to construct the network. We utilised the AWED method to generate a similarity matrix for the primary users, containing the pairwise similarity coefficients among them. The final graph was created by adding edges with AWED coefficients exceeding the selected threshold. We then applied weights to the network edges using these similarity factors. Figure 4. shows the graph creation process with threshold $T=0.931$ . Results and Discussions We constructed similarity graphs for three distinct datasets. Each of these graphs was subsequently compared with three categories of synthetic graphs, namely, Erdös-Rényi (ER), Barabási-Albert (BA), and Stochastic Block Model (SBM). To ensure a fair evaluation, we maintained an equivalent number of nodes in the synthetic graphs as present in each of the datasets. The Table 4 presents data on the number of nodes and edges present in each dataset, along with their respective synthetic graphs. Table 4 Number of Nodes and Edges of each Graphs Datasets Properties AWED Similarity ER BA SBM War & Conflict (W & C) Nodes 64 64 64 64 Edges 197 427 183 269 Environment & Global Worming (E & GW) Nodes 245 245 245 244 Edges 2944 7544 6450 8074 Hate Speech & Racism (HS &R) Nodes 449 449 449 449 Edges 12368 25420 12570 21237 5.1 Network properties : The structural aspects of a network are fundamental characteristics that offer valuable insights into its topology, connection, and overall structure. As presented in Table 5 , The evaluation of the graphs in this study is conducted using transitivity, assortativity, average clustering, edge density, average node connectedness, average degree, and total triangles. Table 5 Network properties of graphs Datasets Graphs Transitivity Assortativity Average Clustering Edge Density Average Node Connectivity Average Degree Total Triangles W & C AWED 0.762 0.514 0.521 0.175 5.073 11.031 4866 BA 0.177 -0.099 0.194 0.119 5.229 7.5 396 ER 0.209 0.036 0.207 0.216 11.828 13.625 1218 SBM 0.235 -0.063 0.251 0.182 9.557 11.302 930 E & G AWED 0.853 0.661 0.713 0.141 13.287 34.343 254331 BA 0.0704 -0.091 0.092 0.032 5.117 7.869 858 ER 0.202 -0.019 0.202 0.201 45.786 49.102 59289 SBM 0.213 -0.002 0.214 0.173 39.052 42.147 45804 HS & R AWED 0.762 0.371 0.399 0.112 18.391 50.316 1006215 BA 0.045 -0.074 0.066 0.018 5.103 7.929 1224 ER 0.199 -0.010 0.200 0.200 84.916 89.621 359559 SBM 0.216 0.001 0.217 0.174 73.310 77.977 295041 The proposed AWED Similarity graph demonstrates significant advantages in capturing the complex structural characteristics of OSNs compared to graphs constructed synthetically. This is evident in the examination of network properties across three different datasets. In each dataset, the AWED Similarity graph consistently exhibits superior transitivity, assortativity, and average clustering coefficients compared to its synthetic counterparts (BA, ER, and SBM). This indicates that the AWED Similarity graph demonstrates exceptional performance in fostering local connectivity, strengthening the establishment of tightly-knit clusters, and allowing assortative interactions among nodes that share similar characteristics. Moreover, the moderate edge density of the AWED Similarity graph indicates a well-balanced representation of OSN connections. The AWED Similarity graph's significantly higher total triangles highlight its ability to capture complex triadic relationships inside the network structure. This capacity is essential to comprehending how cohesive communities and linked subgroups emerge inside OSNs. However, low average node connectivity and average degree of AWED Similarity graph indicates a network where nodes tend to form distinct communities with strong internal connections but relatively fewer connections with nodes outside their immediate clusters. This implies that the AWED graph provides a more precise representation of the dynamic nature of social ties in real-world OSNs, transcending the constraints of synthetic graphs. 5.2 OSN Characteristics : As the power-law distribution, scale-free characteristics and small-world features are observed in real-world OSNs, we also compare our proposed AWED Similarity graphs with BA, ER and SBM graphs. The Table 6 displays the power-law characteristics of the graphs, which are characterised by metrics including Alpha, Xmin, the Kolmogorov-Smirnov p-value, and the likelihood ratio. The combination of these metrics provides a thorough comprehension of the power-law distribution and scale-free characteristics. Table 6 Power Law Distribution Datasets Graphs Alpha Xmin KS p-value Likelihood ratio (power law vs. exponential) W & C AWED Similarity 24.165 23.0 0.160 (2.202, 0.028) BA 3.049 4.0 0.085 (1.975, 0.048) ER 6.381 12.0 0.140 (-0.374, 0.708) SBM 14.528 14.0 0.169 (0.568, 0.569) E & GW AWED Similarity 21.660 75.0 0.122 (2.767, 0.005) BA 3.107 4.0 0.059 ( 6.552, 5.669 ) ER 9.854 47.0 0.131 (-4.895, 9.801) SBM 20.049 47.0 0.126 (-0.610, 0.541) HS & R AWED Similarity 26.361 157.0 0.095 (0.307, 0.758) BA 3.197 9.0 0.066 ( 2.540, 0.011 ) ER 23.388 101.0 0.073 (0.547, 0.584) SBM 17.823 81.0 0.078 (-2.012, 0.044) To ascertain whether a network is scale-free, it exhibits a heavy-tailed distribution (represented by a high alpha), a wide range of node degrees (represented by a high Xmin), and the likelihood ratio provides support for the fit of a power-law distribution (Broido & Clauset, 2019 ; Roux et al., 2023 ). The AWED Similarity graph consistently demonstrates greater alpha values in comparison to synthetic graphs across all three datasets. The observed data in AWED Similarity exhibits a heavier-tailed distribution, which is indicative of a network structure comprising influential nodes. It also has higher Xmin values, and positive likelihood ratios, which collectively suggest that the AWED Similarity graphs exhibit scale-free characteristics. Furthermore, the AWED Similarity graph’s positive likelihood ratios across all datasets suggest adherence to a power-law distribution. On the other hand, The Barabási-Albert (BA) graph also exhibits positive likelihood ratios across all datasets, consistent with its established reputation as a robust model for generating scale-free networks. However, the AWED Similarity graph’s elevated alpha values and expansive Xmin range imply a superior representation of the heavy-tailed connectivity patterns characteristic of real-world social networks. This suggests that the AWED Similarity graph may offer a more accurate model for studying the complex dynamics of online social networks. As for small world characteristics, a network characterized by a high clustering coefficient and a short average path length. The Table 7 provide information on the clustering coefficient and average path length of the graphs in all three datasets. The ER and SBM graphs consistently display small-world properties across all datasets. On the other hand, the BA graph consistently exhibits the lowest clustering coefficient, indicating a lower tendency for nodes to form local clusters. Concurrently, small-world characteristics are consistently displayed on the AWED Similarity graph for all three datasets. This is supported by the fact that its average path lengths are moderate, and its clustering coefficients are high, which indicate a harmonious balance between global connectivity and local clustering. This indicates that local connectivity and community formation may be prioritized over global information flow efficacy in the AWED Similarity graph. This characteristic is representative of social networks in the real world, where users often engage in conversations within their immediate communities while also maintaining connections with users from other communities. Consequently, the AWED Similarity graph potentially provides a more precise depiction of the intricate dynamics that are intrinsic to social networks in the real world. Table 7 Small World Phenomenon Datasets Graphs Clustering coefficient Average path length W & C AWED Similarity 0.521 2.647 BA 0.194 2.218 ER 0.207 1.828 SBM 0.252 1.972 E & GW AWED Similarity 0.717 4.593 BA 0.089 2.721 ER 0.202 1.797 SBM 0.214 1.828 HS & R AWED Similarity 0.399 3.003 BA 0.057 2.923 ER 0.200 1.800 SBM 0.217 1.825 5.3 Predictive Analysis : The goal of the predictive analysis is to recognize the performance of graphs when come from the same model. As suggested by Kumari et al. ( 2022 ) and Pulipati et al. ( 2021 ) graphs that perform better in predictive tasks may have clear and consistent structure. Therefore, for predictive analysis, we performed both link prediction and community detection. In this study, Node2Vec embeddings with the Random Forest Classifier was used to perform a link prediction task on the AWED similarity graph and synthetic graphs. The model’s parameter was set as 10 in order to ensure uniformity across all graphs. Figure 5 shows the accuracy, precision, recall, F1-Score and AUC-ROC of all the graphs in all three datasets. The empirical findings demonstrate a remarkable consistency across all three datasets, with the AWED similarity graphs distinctly surpassing the BA, ER, and SBM graphs on accuracy, precision, recall, F1-score, and AUC-ROC. This observation substantiates the assertion that AWED similarity graphs exhibit superior accuracy, comprehensiveness, balance, and efficacy in the prediction of links compared to their synthetic counterparts. Furthermore, this evidence implies that AWED similarity graphs possess an enhanced capability to accurately represent the intricate and dynamic interconnections among nodes in real-world networks, thereby facilitating a broad spectrum of network analysis tasks. In our study, we utilized community detection as an approach to compare graphs, as it has the ability to provide valuable insights about the structure and organization of a graph. The Louvain algorithm was employed to detect communities, and their performance was assessed using modularity, silhouette, and conductance. Figure 6 shows the visual representation of the detected communities within each graph for the respective datasets. Table shows the evaluation metrics of the detected communities. The findings from the analysis demonstrate a consistent trend across the three datasets. The results in Table 8 show a consisting pattern in all three datasets. The metric of modularity is employed in order to evaluate the robustness of a network's division into communities, with higher values indicating a more notable community organization. The graphs representing SBM and BA exhibit a greater degree of modularity, whilst the AWED Similarity graphs display a reasonable level of modularity throughout the datasets. Conversely, the ER graphs display the lowest scores in terms of modularity. The silhouette score, an indicator of similarity within clusters rather than between clusters, is maximized at higher values. The AWED Similarity graph demonstrates a high level of positive silhouette score among all graphs, indicating a strong and coherent identification of the community, therefore demonstrating exceptional performance in this metric. Conductance, which evaluates the quality of a community by comparing the number of edges within the community to the number of edges between different communities, tends to exhibit lower values. The minimal conductance of the AWED Similarity graph suggests that this network exhibits a higher level of internal interconnectedness among its communities, while having fewer connections between communities, in contrast to the other graph models. Table 8 Community Detection Evaluation Datasets Graphs Modularity Silhouette Score Conductance Number of Communities W& C AWED Similarity 0.248 0.044 0.137 4 BA 0.284 -0.503 0.258 6 ER 0.186 -0.213 0.302 6 SBM 0.315 -0.114 0.216 4 E & GW AWED Similarity 0.243 0.179 0.102 4 BA 0.311 -0.436 0.276 12 ER 0.107 -0.169 0.380 7 SBM 0.313 -0.051 0.218 4 HS & R AWED Similarity 0.216 0.151 0.170 5 BA 0.315 -0.326 0.285 10 ER 0.079 -0.107 0.397 7 SBM 0.320 -0.057 0.215 4 The AWED graph has a moderate level of modularity, indicating that its community structure possesses a certain degree of flexibility and interconnection, while avoiding an excessive degree of rigidity or fragmentation. This observation suggests the presence of a complex network structure. In addition, AWED demonstrates a harmonious community structure, emphasized by its positive silhouette score and minimal conductance. The silhouette score indicates a clear and cohesive internal structure within the identified communities, while the low conductivity indicates a lack of extensive outward linkages. Overall, the metrics indicate that AWED demonstrates a significant and well-structured community, successfully maintaining a balance between internal consistency and differentiation from other areas of the graph. Conclusion The challenges associated with extracting significant network structures from real-time Twitter data motivated us to investigate alternate approaches in this study. As a result, we presented a novel model for generating a user-attribute-based similarity graph. This model employs publicly available Twitter data to connect OSN users based on their attribute’s similarities. The attributes used for this study are readily derivable from the Twitter platform in real-time through the use of the Twitter API. To measure the similarity coefficient between users, a novel method termed Attribute-Weighted Euclidean Distance (AWED) is introduced. In order to evaluate the efficacy of our method, we compared the proposed graphs with synthetically generated graphs, considering network properties, OSN characteristics, and predictive analyses. The AWED Similarity graph demonstrates superior performance in terms of local connectivity, cluster formation, and assortative interactions when compared to synthetic graphs. It displays scale-free characteristics with significant nodes and robust community structures. In the context of link prediction, AWED surpasses synthetic graphs in terms of accuracy, precision, recall, F1-score, and AUC-ROC, thereby exhibiting a higher level of predictive accuracy. Moreover, it effectively maintains a balance between community detection and inter-community connections, while also possessing a more pronounced degree of definition in comparison to synthetic graphs. These findings indicate that the AWED graph provides a more precise depiction of the dynamic characteristics of social connections in real-world OSNs, overcoming the constraints of synthetic graphs. This enhancement facilitates a wide range of network analysis tasks. Future study will focus on the implementation of OSN applications on the generated graph, with the aim of facilitating the real-time detection and analysis of real-world events. Declarations Author Contribution Md Ahsan Ul Hasan: Conceptualization, Methodology, Investigation, Data curation, Writing – Original DraftAzuraliza Abu Bakar: Conceptualization, Funding acquisition, Methodology, Supervision, Project administration, Writing – Reviewing and Editing.Mohd Ridzwan Yaakub: Conceptualization, Methodology, Supervision, Resources, Validation, Writing – Reviewing and Editing Acknowledgement This research is supported by the Fundamental Research Grant Scheme (FRGS/1/2020/ICT02/UKM/01/2) of the Ministry of Higher Education Malaysia. References Agrawal, G., Kaur, A., & Myneni, S. (2024). A Review of Generative Models in Generating Synthetic Attack Data for Cybersecurity. Electronics , 13 (2), 322. https://www.mdpi.com/2079-9292/13/2/322 Al Musawi, A. F., Roy, S., & Ghosh, P. (2022). Identifying accurate link predictors based on assortativity of complex networks. Sci Rep , 12 (1), 18107. https://doi.org/10.1038/s41598-022-22843-4 Alam, S., Ayub, M. S., Arora, S., & Khan, M. A. (2023). An investigation of the imputation techniques for missing values in ordinal data enhancing clustering and classification analysis validity. Decision Analytics Journal , 9 , 100341. https://doi.org/https://doi.org/10.1016/j.dajour.2023.100341 Alghobiri, M. (2023). Exploring the attributes of influential users in social networks using association rule mining. Social Network Analysis and Mining , 13 (1), 118. https://doi.org/10.1007/s13278-023-01118-4 Altenburger, K. M., & Ugander, J. (2018). Monophily in social networks introduces similarity among friends-of-friends. Nat Hum Behav , 2 (4), 284-290. https://doi.org/10.1038/s41562-018-0321-8 Asadi, M., & Agah, A. (2018). Characterizing user influence within twitter. In Lecture Notes on Data Engineering and Communications Technologies (Vol. 13, pp. 122-132). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-319-69835-9_11 Aziz, F., Slater, L. T., Bravo-Merodio, L., Acharjee, A., & Gkoutos, G. V. (2023). Link prediction in complex network using information flow. Sci Rep , 13 (1), 14660. https://doi.org/10.1038/s41598-023-41476-9 Bazzaz Abkenar, S., Haghi Kashani, M., Mahdipour, E., & Jameii, S. M. (2021). Big data analytics meets social media: A systematic review of techniques, open issues, and future directions. Telematics and Informatics , 57 , 101517. https://doi.org/https://doi.org/10.1016/j.tele.2020.101517 Beineke, L. W., Oellermann, O. R., & Pippert, R. E. (2002). The average connectivity of a graph. Discrete Mathematics , 252 (1), 31-45. https://doi.org/https://doi.org/10.1016/S0012-365X(01)00180-7 Bhattacharya, S., Sinha, S., Roy, S., & Gupta, A. (2020). Towards finding the best-fit distribution for OSN data. The Journal of Supercomputing , 76 (12), 9882-9900. https://doi.org/10.1007/s11227-020-03232-y Block, P. E. R., & Grund, T. (2014). Multidimensional homophily in friendship networks. Network Science , 2 (2), 189-212. https://doi.org/10.1017/nws.2014.17 Bodaghi, A., & Oliveira, J. (2022). The theater of fake news spreading, who plays which role? A study on real graphs of spreading on Twitter. Expert Systems with Applications , 189 . https://doi.org/10.1016/j.eswa.2021.116110 Broido, A. D., & Clauset, A. (2019). Scale-free networks are rare. Nat Commun , 10 (1), 1017. https://doi.org/10.1038/s41467-019-08746-5 Cheng, Z., & Yan, A. (2023). A case weighted similarity deep measurement method based on a self-attention Siamese neural network. Industrial Artificial Intelligence , 1 (1), 2. https://doi.org/10.1007/s44244-022-00002-y David-Barrett, T. (2020). Herding Friends in Similarity-Based Architecture of Social Networks. Scientific Reports , 10 (1), 4859. https://doi.org/10.1038/s41598-020-61330-6 De Nicola, R., Petrocchi, M., & Pratelli, M. (2021). On the efficacy of old features for the detection of new bots. Information Processing & Management , 58 (6), 102685. https://doi.org/https://doi.org/10.1016/j.ipm.2021.102685 de Andrade, R. L., & Rêgo, L. C. (2018). The use of nodes attributes in social network analysis with an application to an international trade network. Physica A: Statistical Mechanics and its Applications , 491 , 249-270. https://doi.org/https://doi.org/10.1016/j.physa.2017.08.126 Evkoski, B., Kralj Novak, P., & Ljubešić, N. (2023). Content-based comparison of communities in social networks: Ex-Yugoslavian reactions to the Russian invasion of Ukraine. Applied Network Science , 8 (1), 40. https://doi.org/10.1007/s41109-023-00561-8 Faez, F., Hashemi Dijujin, N., Soleymani Baghshah, M., & Rabiee, H. R. (2022). SCGG: A deep structure-conditioned graph generative model. PLoS One , 17 (11), e0277887. https://doi.org/10.1371/journal.pone.0277887 Fu, X., & Shen, Y. (2014). Study of collective user behaviour in Twitter: a fuzzy approach. Neural Computing and Applications , 25 (7), 1603-1614. https://doi.org/10.1007/s00521-014-1642-9 Guan, L., Liu, X. F., Sun, W., Liang, H., & Zhu, J. J. H. (2022). Census of Twitter users: Scraping and describing the national network of South Korea. PLoS One , 17 (11), e0277549. https://doi.org/10.1371/journal.pone.0277549 Gui, C. (2024). Link prediction based on spectral analysis. PLoS One , 19 (1), e0287385. https://doi.org/10.1371/journal.pone.0287385 Hasan, M. A. U., Bakar, A. A., & Yaakub, M. R. (2024, 3-5 Jan. 2024). Detecting Community Through User Similarity Analysis on Twitter. 2024 18th International Conference on Ubiquitous Information Management and Communication (IMCOM), Hasan, M. A. U., Bakar, A. A., & Yaakub, M. R. (2024). Measuring User Influence in Real-Time on Twitter Using Behavioural Features. Physica A: Statistical Mechanics and its Applications , 129662. https://doi.org/https://doi.org/10.1016/j.physa.2024.129662 Hromic, H., & Hayes, C. (2019). Characterising and evaluating dynamic online communities from live microblogging user interactions. Social Network Analysis and Mining , 9 (1), 30. https://doi.org/10.1007/s13278-019-0576-8 Hu, Y., Wang, W., & Yu, Y. (2022). Graph matching beyond perfectly-overlapping Erdős–Rényi random graphs. Statistics and Computing , 32 (1), 19. https://doi.org/10.1007/s11222-022-10079-1 Huynh, T., Nguyen, H. D., Zelinka, I., Pham, X. H., Pham, V. T., Selamat, A., & Krejcar, O. (2022). A method to detect influencers in social networks based on the combination of amplification factors and content creation. PLoS One , 17 (10), e0274596. https://doi.org/10.1371/journal.pone.0274596 Iqbal, S., Khan, H. U., Ishfaq, U., Alghobiri, M., & Iqbal, S. (2021). Finding influential users in social networks based on novel features & link-based analysis. J. Intell. Fuzzy Syst. , 40 (1), 1623–1637. https://doi.org/10.3233/jifs-201036 Jain, A. K., Sahoo, S. R., & Kaubiyal, J. (2021). Online social networks security and privacy: comprehensive review and analysis. Complex & Intelligent Systems , 7 (5), 2157-2177. https://doi.org/10.1007/s40747-021-00409-7 Jia, W., Ma, R., Yan, L., Niu, W., & Ma, Z. (2022). TT-graph: A new model for building social network graphs from texts with time series. Expert Systems with Applications , 192 , 116405. https://doi.org/https://doi.org/10.1016/j.eswa.2021.116405 Jiang, N., Crooks, A. T., Kavak, H., Burger, A., & Kennedy, W. G. (2022). A method to create a synthetic population with social networks for geographically-explicit agent-based models. Computational Urban Science , 2 (1), 7. https://doi.org/10.1007/s43762-022-00034-1 Kanavos, A., Karamitsos, I., & Mohasseb, A. (2023). Exploring Clustering Techniques for Analyzing User Engagement Patterns in Twitter Data. Computers , 12 (6). https://doi.org/10.3390/computers12060124 Kerrache, S., Alharbi, R., & Benhidour, H. (2020). A Scalable Similarity-Popularity Link Prediction Method. Scientific Reports , 10 (1), 6394. https://doi.org/10.1038/s41598-020-62636-1 Kim, J., Jeong, S., & Lim, S. (2022). Link Pruning for Community Detection in Social Networks. Applied Sciences , 12 (13). https://doi.org/10.3390/app12136811 Kubina, R. M., Kostewicz, D. E., Brennan, K. M., & King, S. A. (2017). A Critical Review of Line Graphs in Behavior Analytic Journals. Educational Psychology Review , 29 (3), 583-598. https://doi.org/10.1007/s10648-015-9339-x Kumari, A., Behera, R. K., Sahoo, B., & Sahoo, S. P. (2022). Prediction of link evolution using community detection in social network. Computing , 104 (5), 1077-1098. https://doi.org/10.1007/s00607-021-01035-4 Lee, C., & Wilkinson, D. J. (2019). A review of stochastic block models and extensions for graph clustering. Applied Network Science , 4 (1), 122. https://doi.org/10.1007/s41109-019-0232-2 Li, Y., Yang, L., Xu, B., Wang, J., & Lin, H. (2019). Improving User Attribute Classification with Text and Social Network Attention. Cognitive Computation , 11 (4), 459-468. https://doi.org/10.1007/s12559-019-9624-y Lim, S. L., & Bentley, P. J. (2022). Opinion amplification causes extreme polarization in social networks. Scientific Reports , 12 (1), 18131. https://doi.org/10.1038/s41598-022-22856-z Logan, A. P., LaCasse, P. M., & Lunday, B. J. (2023). Social network analysis of Twitter interactions: a directed multilayer network approach. Soc Netw Anal Min , 13 (1), 65. https://doi.org/10.1007/s13278-023-01063-2 Mahmoudi, A., Yaakub, M. R., & Abu Bakar, A. (2018). New time-based model to identify the influential users in online social networks. Data Technologies and Applications , 52 (2), 278-290. https://doi.org/10.1108/DTA-08-2017-0056 Mariani, P., Marletta, A., Mussini, M., Zenga, M., & Grammatica, E. (2020). A missing value approach to social network data: “Dislike” or “Nothing”? Computational Management Science , 17 (4), 569-583. https://doi.org/10.1007/s10287-020-00381-6 Markos, E., Peña, P., Labrecque, L. I., & Swani, K. (2023). Are data breaches the new norm? Exploring data breach trends, consumer sentiment, and responses to security invasions. Journal of Consumer Affairs , 57 (3), 1089-1119. https://doi.org/https://doi.org/10.1111/joca.12554 Masrom, M. B., Busalim, A. H., Abuhassna, H., & Mahmood, N. H. N. (2021). Understanding students’ behavior in online social networks: a systematic literature review. International Journal of Educational Technology in Higher Education , 18 (1), 6. https://doi.org/10.1186/s41239-021-00240-7 McMillan, C., Felmlee, D., & Ashford, J. R. (2022). Reciprocity, transitivity, and skew: Comparing local structure in 40 positive and negative social networks. PLoS One , 17 (5), e0267886. https://doi.org/10.1371/journal.pone.0267886 Mislove, A., Marcon, M., Gummadi, K. P., Druschel, P., & Bhattacharjee, B. (2007). Measurement and analysis of online social networks Proceedings of the 7th ACM SIGCOMM conference on Internet measurement, San Diego, California, USA. https://doi.org/10.1145/1298306.1298311 Myers, S. A., & Leskovec, J. (2010). On the convexity of latent social network inference Proceedings of the 23rd International Conference on Neural Information Processing Systems - Volume 2, Vancouver, British Columbia, Canada. Neal, Z. P. (2017). How small is it? Comparing indices of small worldliness. Network Science , 5 (1), 30-44. https://doi.org/10.1017/nws.2017.5 Nettleton, D. F. (2016). A synthetic data generator for online social network graphs. Social Network Analysis and Mining , 6 (1), 44. https://doi.org/10.1007/s13278-016-0352-y Nikolentzos, G., Vazirgiannis, M., Xypolopoulos, C., Lingman, M., & Brandt, E. G. (2023). Synthetic electronic health records generated with variational graph autoencoders. npj Digital Medicine , 6 (1), 83. https://doi.org/10.1038/s41746-023-00822-x O’Neil, D. A., & Petty, M. D. (2019). Heuristic methods for synthesizing realistic social networks based on personality compatibility. Applied Network Science , 4 (1). https://doi.org/10.1007/s41109-019-0117-4 Ohme, J., Araujo, T., Boeschoten, L., Freelon, D., Ram, N., Reeves, B. B., & Robinson, T. N. (2023). Digital Trace Data Collection for Social Media Effects Research: APIs, Data Donation, and (Screen) Tracking. Communication Methods and Measures , 1-18. https://doi.org/10.1080/19312458.2023.2181319 Panchendrarajan, R., & Saxena, A. (2023). Topic-based influential user detection: a survey. Applied Intelligence , 53 (5), 5998-6024. https://doi.org/10.1007/s10489-022-03831-7 Piccardi, C. (2023). Metrics for network comparison using egonet feature distributions. Sci Rep , 13 (1), 14657. https://doi.org/10.1038/s41598-023-40938-4 Pulipati, S., Somula, R., & Parvathala, B. R. (2021). Nature inspired link prediction and community detection algorithms for social networks: a survey. International Journal of System Assurance Engineering and Management . https://doi.org/10.1007/s13198-021-01125-8 Rothwell, L. (2023, Jul 13, 2023). Understanding the Recent Changes to Twitter API: A complete guide . Blaze. Retrieved January 2, 2024 from https://www.withblaze.app/blog/understanding-the-recent-changes-to-twitter-api-a-complete-guide Roux, J., Bez, N., Rochet, P., Joo, R., & Mahevas, S. (2023). Graphlet correlation distance to compare small graphs. PLoS One , 18 (2), e0281646. https://doi.org/10.1371/journal.pone.0281646 Saarela, M., & Jauhiainen, S. (2021). Comparison of feature importance measures as explanations for classification models. SN Applied Sciences , 3 (2), 272. https://doi.org/10.1007/s42452-021-04148-9 Schwyck, M. E., Du, M., Li, Y., Chang, L. J., & Parkinson, C. (2023). Similarity Among Friends Serves as a Social Prior: The Assumption That “Birds of a Feather Flock Together” Shapes Social Decisions and Relationship Beliefs. Personality and Social Psychology Bulletin , 0 (0), 01461672221140269. https://doi.org/10.1177/01461672221140269 Shahraeini, M. (2023). Modified Erdős–Rényi Random Graph Model for Generating Synthetic Power Grids. IEEE Systems Journal , 1-12. https://doi.org/10.1109/JSYST.2023.3339664 Shantal, M., Othman, Z., & Bakar, A. A. (2023). A Novel Approach for Data Feature Weighting Using Correlation Coefficients and Min–Max Normalization. Symmetry , 15 (12), 2185. https://www.mdpi.com/2073-8994/15/12/2185 Shoeibi, N., Shoeibi, N., Chamoso, P., Alizadehsani, Z., & Corchado, J. M. (2022). A Hybrid Model for the Measurement of the Similarity between Twitter Profiles. Sustainability , 14 (9), 4909. https://www.mdpi.com/2071-1050/14/9/4909 Stark, T. H. (2018). Collecting Social Network Data. In D. L. Vannette & J. A. Krosnick (Eds.), The Palgrave Handbook of Survey Research (pp. 241-254). Springer International Publishing. https://doi.org/10.1007/978-3-319-54395-6_31 Talaga, S., & Nowak, A. (2022). Structural measures of similarity and complementarity in complex networks. Sci Rep , 12 (1), 16580. https://doi.org/10.1038/s41598-022-20710-w Tantardini, M., Ieva, F., Tajoli, L., & Piccardi, C. (2019). Comparing methods for comparing networks. Sci Rep , 9 (1), 17557. https://doi.org/10.1038/s41598-019-53708-y Toraman, C., Şahinuç, F., Yilmaz, E. H., & Akkaya, I. B. (2022). Understanding social engagements: A comparative analysis of user and text features in Twitter. Social Network Analysis and Mining , 12 (1), 47. https://doi.org/10.1007/s13278-022-00872-1 Vasques Filho, D., & O'Neale, D. R. J. (2020). Transitivity and degree assortativity explained: The bipartite structure of social networks. Physical Review E , 101 (5), 052305. https://doi.org/10.1103/PhysRevE.101.052305 Venturini, T., & Rogers, R. (2019). “API-Based Research” or How can Digital Sociology and Journalism Studies Learn from the Facebook and Cambridge Analytica Data Breach. Digital Journalism , 7 (4), 532-540. https://doi.org/10.1080/21670811.2019.1591927 Verstraaten, M., Varbanescu, A. L., & de Laat, C. (2017, 2017//). Synthetic Graph Generation for Systematic Exploration of Graph Structural Properties. Euro-Par 2016: Parallel Processing Workshops, Cham. Wang, M., & Ma, J. (2016). A novel recommendation approach based on users’ weighted trust relations and the rating similarities. Soft Computing , 20 (10), 3981-3990. https://doi.org/10.1007/s00500-015-1734-1 Wang, T., Brede, M., Ianni, A., & Mentzakis, E. (2018). Social interactions in online eating disorder communities: A network perspective. PLoS One , 13 (7), e0200800. https://doi.org/10.1371/journal.pone.0200800 Weber, D., Nasim, M., Mitchell, L., & Falzon, L. (2021). Exploring the effect of streamed social media data variations on social network analysis. Social Network Analysis and Mining , 11 (1), 62. https://doi.org/10.1007/s13278-021-00770-y Wei, X., Zhao, J., Liu, S., & Wang, Y. (2022). Identifying influential spreaders in complex networks for disease spread and control. Scientific Reports , 12 (1), 5550. https://doi.org/10.1038/s41598-022-09341-3 Wills, P., & Meyer, F. G. (2020). Metrics for graph comparison: A practitioner’s guide. PLoS One , 15 (2), e0228728. https://doi.org/10.1371/journal.pone.0228728 Xu, Y., Ren, T., & Sun, S. (2022). Community Detection Based on Node Influence and Similarity of Nodes. Mathematics , 10 (6). https://doi.org/10.3390/math10060970 Yilmaz, E. A., Balcisoy, S., & Bozkaya, B. (2023). A link prediction-based recommendation system using transactional data. Sci Rep , 13 (1), 6905. https://doi.org/10.1038/s41598-023-34055-5 Yuliansyah, H., Othman, Z. A., & Bakar, A. A. (2023). A new link prediction method to alleviate the cold-start problem based on extending common neighbor and degree centrality. Physica A: Statistical Mechanics and its Applications , 616 , 128546. https://doi.org/https://doi.org/10.1016/j.physa.2023.128546 Zareie, A., & Sakellariou, R. (2020). Similarity-based link prediction in social networks using latent relationships between the users. Sci Rep , 10 (1), 20137. https://doi.org/10.1038/s41598-020-76799-4 Zhang, S., Zhang, Y., Zhou, M., & Peng, L. (2020). Community detection based on similarities of communication behavior in IP networks. Journal of Ambient Intelligence and Humanized Computing , 13 (3), 1451-1461. https://doi.org/10.1007/s12652-020-02681-w Zhao, S., Sun, J., Shimizu, K., & Kadota, K. (2018). Silhouette Scores for Arbitrary Defined Groups in Gene Expression Data and Insights into Differential Expression Results. Biological Procedures Online , 20 (1), 5. https://doi.org/10.1186/s12575-018-0067-8 Zhou, H. J., Li, L., Li, Y., Li, W., & Li, J. J. (2022). PCA outperforms popular hidden variable inference methods for molecular QTL mapping. Genome Biology , 23 (1), 210. https://doi.org/10.1186/s13059-022-02761-4 Additional Declarations No competing interests reported. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-4132627","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":282323652,"identity":"1777e89e-59bb-43d9-8bc0-df582605a98f","order_by":0,"name":"Md Ahsan Ul Hasan","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA+ElEQVRIiWNgGAWjYFCCBAZmMC0BJm2AmLHxACla0kBaGkjSchhM4tWi257A+Lngj10eg3Tz4w8/d5y3W9t+GGhLjU00Li1mZx4wS89sSy5mkDlmJtl75nbytjOJQC3H0nIbcGm5kcAgzdvAnNggkWDGwNt2O9nsAFALY8NhfFqYf/P8qQdqSf/88W/buWSz8w8JamGT5mE7DNSSYyDN23bAzuwGIVvOPGyz5m07ntgmkVMmLduWnGB2A2hLAj6/HE8+fJvnT3Viv0T65o9v2+zszc6nP3zwocYGpxZQxIEpNig3EcxNwKkcC7AnRfEoGAWjYBSMDAAAzx9jCa8PDgoAAAAASUVORK5CYII=","orcid":"","institution":"University Kebangsaan Malaysia","correspondingAuthor":true,"prefix":"","firstName":"Md","middleName":"Ahsan Ul","lastName":"Hasan","suffix":""},{"id":282323653,"identity":"a6e74eaf-79bf-4b15-bddd-d2da6716f852","order_by":1,"name":"Azuraliza Abu Bakar","email":"","orcid":"","institution":"University Kebangsaan Malaysia","correspondingAuthor":false,"prefix":"","firstName":"Azuraliza","middleName":"Abu","lastName":"Bakar","suffix":""},{"id":282323654,"identity":"39622184-8285-46f5-9c22-c1d36033a4fe","order_by":2,"name":"Mohd Ridzwan Yaakub","email":"","orcid":"","institution":"University Kebangsaan Malaysia","correspondingAuthor":false,"prefix":"","firstName":"Mohd","middleName":"Ridzwan","lastName":"Yaakub","suffix":""}],"badges":[],"createdAt":"2024-03-19 19:42:19","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-4132627/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-4132627/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":53236282,"identity":"576d61a6-0e8c-4827-abfc-3465b2426441","added_by":"auto","created_at":"2024-03-22 08:52:55","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":127643,"visible":true,"origin":"","legend":"\u003cp\u003eInterconnections between selected users in their friendship network\u003c/p\u003e","description":"","filename":"1.png","url":"https://assets-eu.researchsquare.com/files/rs-4132627/v1/92acc4732988a6bf5fa86401.png"},{"id":53236284,"identity":"b4f67792-5a00-4b57-821a-01d010a8679f","added_by":"auto","created_at":"2024-03-22 08:52:55","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":169464,"visible":true,"origin":"","legend":"\u003cp\u003eThe proposed approach for the AWED graph\u003c/p\u003e","description":"","filename":"2.png","url":"https://assets-eu.researchsquare.com/files/rs-4132627/v1/26ca21260ceed29db65e4716.png"},{"id":53236283,"identity":"5bf0adeb-81b9-4c72-b805-df638856ed49","added_by":"auto","created_at":"2024-03-22 08:52:55","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":24224,"visible":true,"origin":"","legend":"\u003cp\u003eLoading Vectors of Attributes for the Principal Components\u003c/p\u003e","description":"","filename":"3.png","url":"https://assets-eu.researchsquare.com/files/rs-4132627/v1/057620c7d24b76efe2578521.png"},{"id":53236286,"identity":"5be3aaf1-5553-49e6-a3ab-7c902fd80074","added_by":"auto","created_at":"2024-03-22 08:52:55","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":310251,"visible":true,"origin":"","legend":"\u003cp\u003eGraph Creation Process\u003c/p\u003e","description":"","filename":"4.png","url":"https://assets-eu.researchsquare.com/files/rs-4132627/v1/8c7b43daf96b013a4427106b.png"},{"id":53236798,"identity":"62b6a1e0-b4e3-43f9-afb6-069624b4d380","added_by":"auto","created_at":"2024-03-22 09:00:55","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":177768,"visible":true,"origin":"","legend":"\u003cp\u003eLink Prediction Evaluation Results\u003c/p\u003e","description":"","filename":"5.png","url":"https://assets-eu.researchsquare.com/files/rs-4132627/v1/67ec5ad696bd3dbd4e7c3934.png"},{"id":53236287,"identity":"455a7483-d911-4ea3-ae5b-d216f728b001","added_by":"auto","created_at":"2024-03-22 08:52:55","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":2564793,"visible":true,"origin":"","legend":"\u003cp\u003eVisual representation of detected communities within each network\u003c/p\u003e","description":"","filename":"6.png","url":"https://assets-eu.researchsquare.com/files/rs-4132627/v1/b7e401d0a115b9cfd4934176.png"},{"id":56510582,"identity":"6c453105-c608-485f-a977-8de01219f1cf","added_by":"auto","created_at":"2024-05-15 06:30:05","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":6774418,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-4132627/v1/4eca7d3c-dcd3-4210-ae1d-eb0e94c3c72f.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Generating Attribute Similarity Graphs: A User Behavior-Based Approach from Real- Time Microblogging Data on Platform X","fulltext":[{"header":"Introduction","content":"\u003cp\u003eCapturing and analysing real-time data is vital for Online Social Network (OSN) analysis as it enables immediate insights into social interactions, facilitating the observation of user behaviour as it happens (Bazzaz Abkenar et al., \u003cspan class=\"CitationRef\"\u003e2021\u003c/span\u003e). Researchers collect real-time information from OSNs through the utilization of diverse techniques and tools. These include crawling OSN platforms using Application Programming Interfaces (APIs), web scraping, network dataset repositories (pre-collected), data sharing agreements, and digital trace data collection, among others (Ohme et al., \u003cspan class=\"CitationRef\"\u003e2023\u003c/span\u003e; Stark, \u003cspan class=\"CitationRef\"\u003e2018\u003c/span\u003e). The most popular technique to collect real-time data analysis is through the use of API, as it provides researchers with the capability to retrieve and gather data in real-time, facilitating immediate analysis of the data upon its entry into the system (Ohme et al., \u003cspan class=\"CitationRef\"\u003e2023\u003c/span\u003e; Stark, \u003cspan class=\"CitationRef\"\u003e2018\u003c/span\u003e; Venturini \u0026amp; Rogers, \u003cspan class=\"CitationRef\"\u003e2019\u003c/span\u003e; Weber et al., \u003cspan class=\"CitationRef\"\u003e2021\u003c/span\u003e).\u003c/p\u003e\n\u003cp\u003eMicroblogging site X, famously known as Twitter, a popular OSN platform, functions as a hub for news updates, sharing information, and conducting marketing campaigns (Kanavos et al., \u003cspan class=\"CitationRef\"\u003e2023\u003c/span\u003e). Politicians, journalists, businesses, and celebrities have been using Twitter as a means to influence public opinion and impact political discourse. The platform has furthermore been utilized to gauge public opinion and sentiment regarding a specific subject. Hence, the analysis of real-time Twitter data holds the capacity to acquire a useful understanding of users\u0026apos; behavioural patterns, interests, and preferences. It has also gained popularity as a medium for academic study. Despite the widespread limitations imposed on data access by most OSN platforms following the consequences of Cambridge Analytica\u0026apos;s data breach, Twitter remained an exception by continuing to offer its data through many APIs (Markos et al., \u003cspan class=\"CitationRef\"\u003e2023\u003c/span\u003e; Venturini \u0026amp; Rogers, \u003cspan class=\"CitationRef\"\u003e2019\u003c/span\u003e).\u003c/p\u003e\n\u003cp\u003eHowever, the sheer volume of data generated by users makes it nearly impossible to collect explicit OSN data in the real world (Myers \u0026amp; Leskovec, \u003cspan class=\"CitationRef\"\u003e2010\u003c/span\u003e; Toraman et al., \u003cspan class=\"CitationRef\"\u003e2022\u003c/span\u003e). APIs provided by the Twitter platform have inherent limitations that impose restrictions on the frequency of queries allowed within a given time frame. For example, The Twitter API allows 15 calls per 15 minutes, delivering 1000 IDs, to get a Twitter account\u0026apos;s followers and followings. Therefore, retrieving the complete list of followers would necessitate \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\left(\\raisebox{1ex}{$\\text{N}\\text{u}\\text{m}\\text{b}\\text{e}\\text{r} \\text{o}\\text{f} \\text{F}\\text{o}\\text{l}\\text{l}\\text{o}\\text{w}\\text{e}\\text{r}\\text{s}$}\\!\\left/ \\!\\raisebox{-1ex}{$1000$}\\right.\\right)\$\u003c/span\u003e\u003c/span\u003e requests (De Nicola et al., \u003cspan class=\"CitationRef\"\u003e2021\u003c/span\u003e). Furthermore, Twitter API has recently undergone changes that include the introduction of new price structures and access levels (Rothwell, \u003cspan class=\"CitationRef\"\u003e2023\u003c/span\u003e). These modifications have led to further limitations on the number of tweets, requests, and data accessibility that developers and researchers can access.\u003c/p\u003e\n\u003cp\u003eMoreover, Twitter distinguishes itself from other OSNs as its user base exhibits a distinct focus on real-time communication by sharing thoughts and ideas related to specific topics rather than prioritizing personal connections (Jain et al., \u003cspan class=\"CitationRef\"\u003e2021\u003c/span\u003e; Logan et al., \u003cspan class=\"CitationRef\"\u003e2023\u003c/span\u003e; Masrom et al., \u003cspan class=\"CitationRef\"\u003e2021\u003c/span\u003e). For example, we selected 150 primary users who had the highest number of likes on their tweets related to a recent topic. Consequently, we employed the Twitter API to obtain 100 friends for each of these users. The aim was to examine the potential friendship between these 150 users. Nevertheless, as shown in \u003cstrong\u003eFig.\u0026nbsp;1\u003c/strong\u003e, there were not many discernible connections between them. This discrepancy suggests that fully comprehending the significant network structure within Twitter may not be entirely possible by relying only on the traditional structure of friends and followers.\u003c/p\u003e\n\u003cdiv\u003eLink prediction and imputation are among the most common techniques for addressing missing data in OSNs. Nevertheless, the efficacy of these methods may be constrained due to their heavy dependence on the interactions or connections between nodes for the estimation of missing data (Alam et al., \u003cspan class=\"CitationRef\"\u003e2023\u003c/span\u003e; Aziz et al., \u003cspan class=\"CitationRef\"\u003e2023\u003c/span\u003e; Mariani et al., \u003cspan class=\"CitationRef\"\u003e2020\u003c/span\u003e). Another widely adopted approach entails the creation of synthetic social networks if the real-world network is unavailable. Synthetic data modelling involves the generation of synthetic data that replicates the characteristics of real-world data (Agrawal et al., \u003cspan class=\"CitationRef\"\u003e2024\u003c/span\u003e; Faez et al., \u003cspan class=\"CitationRef\"\u003e2022\u003c/span\u003e; Jiang et al., \u003cspan class=\"CitationRef\"\u003e2022\u003c/span\u003e; Nettleton, \u003cspan class=\"CitationRef\"\u003e2016\u003c/span\u003e; O\u0026rsquo;Neil \u0026amp; Petty, \u003cspan class=\"CitationRef\"\u003e2019\u003c/span\u003e). This allows researchers to examine and assess information without compromising confidentiality or being constrained by the unavailability of data. However, synthetic social networks are generated algorithmically instead of being obtained by empirical methods, hence lacking authentic contextual complexities present in real-world networks (Agrawal et al., \u003cspan class=\"CitationRef\"\u003e2024\u003c/span\u003e; Lim \u0026amp; Bentley, \u003cspan class=\"CitationRef\"\u003e2022\u003c/span\u003e; O\u0026rsquo;Neil \u0026amp; Petty, \u003cspan class=\"CitationRef\"\u003e2019\u003c/span\u003e).\u003c/div\u003e\n\u003cp\u003eThe difficulties of deriving significant network structures from real-time data on Twitter have prompted the investigation of alternative approaches in this study. Consequently, this study introduces a novel user-attribute-based similarity graph model by employing publicly available Twitter data to generate connections according to their degree of similarity. The formation of relationships in OSNs is significantly impacted by the level of similarity among members, resulting in a higher probability of connections between users who share similar backgrounds or other shared attributes (Block \u0026amp; Grund, \u003cspan class=\"CitationRef\"\u003e2014\u003c/span\u003e; David-Barrett, \u003cspan class=\"CitationRef\"\u003e2020\u003c/span\u003e; Schwyck et al., \u003cspan class=\"CitationRef\"\u003e2023\u003c/span\u003e; Zareie \u0026amp; Sakellariou, \u003cspan class=\"CitationRef\"\u003e2020\u003c/span\u003e). Commonly employed methods for evaluating user similarity include Jaccard similarity, Cosine Similarity, Pearson Correlation Coefficient, and Euclidean Distance (Bodaghi \u0026amp; Oliveira, \u003cspan class=\"CitationRef\"\u003e2022\u003c/span\u003e; Kerrache et al., \u003cspan class=\"CitationRef\"\u003e2020\u003c/span\u003e; Shoeibi et al., \u003cspan class=\"CitationRef\"\u003e2022\u003c/span\u003e). However, these conventional approaches do not intrinsically account for the varying importance of user attributes in the formation of relationships. It is widely acknowledged that not all attributes of users hold the same level of importance or relevance in the context of establishing relationships (Alghobiri, \u003cspan class=\"CitationRef\"\u003e2023\u003c/span\u003e; de Andrade \u0026amp; R\u0026ecirc;go, \u003cspan class=\"CitationRef\"\u003e2018\u003c/span\u003e; Md Ahsan Ul Hasan et al., \u003cspan class=\"CitationRef\"\u003e2024\u003c/span\u003e; Li et al., \u003cspan class=\"CitationRef\"\u003e2019\u003c/span\u003e). Thus, this study introduces an \u003cstrong\u003eA\u003c/strong\u003ettribute-\u003cstrong\u003eW\u003c/strong\u003eeighted \u003cstrong\u003eE\u003c/strong\u003euclidean \u003cstrong\u003eD\u003c/strong\u003eistance (AWED) method to assess the similarity coefficients for generating an OSN graph. Principal Component Analysis (PCA) is employed to determine the importance of the attributes, and weights are assigned according to their respective importance.\u003c/p\u003e\n\u003cp\u003eThe paper\u0026apos;s argument involves a comprehensive comparison between the proposed user-attribute-based similarity graph and three well-known synthetic graphs: Erdos Renyi (Piccardi, \u003cspan class=\"CitationRef\"\u003e2023\u003c/span\u003e; Shahraeini, \u003cspan class=\"CitationRef\"\u003e2023\u003c/span\u003e; Tantardini et al., \u003cspan class=\"CitationRef\"\u003e2019\u003c/span\u003e), Barab\u0026aacute;si\u0026ndash;Albert (Kubina et al., \u003cspan class=\"CitationRef\"\u003e2017\u003c/span\u003e; Wei et al., \u003cspan class=\"CitationRef\"\u003e2022\u003c/span\u003e), and Stochastic Block Model graphs (Block \u0026amp; Grund, \u003cspan class=\"CitationRef\"\u003e2014\u003c/span\u003e; Hu et al., \u003cspan class=\"CitationRef\"\u003e2022\u003c/span\u003e; Lee \u0026amp; Wilkinson, \u003cspan class=\"CitationRef\"\u003e2019\u003c/span\u003e). The evaluation of these graphs is carried out using metrics pertaining to network structural properties, OSN characteristics, and predictive performance.\u003c/p\u003e\n\u003cp\u003eThus, the summary of the main contributions of this study are:\u003c/p\u003e\n\u003cul\u003e\n \u003cli\u003e\n \u003cp\u003eThe proposal of an OSN graph generation method in the absence of explicit relationships based on user-attribute-based similarity. The attributes selected for this study are easily derivable from Twitter platform in real-time using Twitter API.\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003eThe proposal includes the implementation of an Attribute-Weighted Euclidean Distance metrics to quantify the similarity between users. The weights assigned to attributes are determined by assessing the importance of each attribute.\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003eA comprehensive comparison between proposed user-attribute-based similarity graph and three well-known synthetic graphs provided an insight into the efficacy of the proposed model against established synthetic graph models.\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003eThe proposed method adheres to ethical standards and maintains data confidentiality by strictly following Twitter guidelines. This commitment ensures user privacy and upholds ethical integrity throughout the study.\u003c/p\u003e\n \u003c/li\u003e\n\u003c/ul\u003e"},{"header":" Preliminaries","content":"\u003cp\u003e \u003cb\u003e2.1 Problem definition\u003c/b\u003e: Traditionally, an OSN such as Twitter can be modelled as a graph \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$G\\left(V, E\\right)\$\u003c/span\u003e\u003c/span\u003e, where \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$V=\\left\\{{v}_{1}, {v}_{2}, \\dots , {v}_{n}\\right\\}\$\u003c/span\u003e\u003c/span\u003erepresents the set of users (nodes) and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$n=\\left|V\\right|\$\u003c/span\u003e\u003c/span\u003e represents the number of users. The set \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$E\\subset V\\times V\$\u003c/span\u003e\u003c/span\u003e is a set of edges represents the connection or relationship between users. In the absence of explicit relationship within a network (\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$E=\\varnothing )\$\u003c/span\u003e\u003c/span\u003e, the objective is to construct a method that creates relationship between users. Given that each user \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\${v}_{i}\$\u003c/span\u003e\u003c/span\u003e is associated with a set of attributes \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$A= \\left\\{{a}_{1}, {a}_{2}, \\dots , {a}_{m}\\right\\}\$\u003c/span\u003e\u003c/span\u003e with varying weights \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$W=\\left\\{{w}_{1}, {w}_{2}, \\dots , {w}_{m}\\right\\}\$\u003c/span\u003e\u003c/span\u003e, edges are constructed between nodes based on behavioral similarities, quantified using Attribute-Weighted Euclidean Distance (AWED). The formation of relationships in OSNs is highly influenced by the degree of similarity among users. Therefore, an edge is created between node \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\${v}_{i}\$\u003c/span\u003e\u003c/span\u003eand \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\${v}_{j}\$\u003c/span\u003e\u003c/span\u003eif the AWED exceeds the threshold \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$T\$\u003c/span\u003e\u003c/span\u003e:\u003cdiv id=\"Equ1\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ1\" name=\"EquationSource\"\u003e\n$$Edge \\left({v}_{i},{v}_{j}\\right)=\\left\\{\\begin{array}{c}1, if AWED({v}_{i}, {v}_{j})\\ge T\\\\ 0, Otherwise\\end{array}\\right.$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e1\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003eEven though Twitter is a directed graph, the constructed graph is undirected because of the well-established principle of symmetry in mathematics and social network analysis (Evkoski et al., \u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e2023\u003c/span\u003e; Shoeibi et al., \u003cspan citationid=\"CR62\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). That is,\u003cdiv id=\"Equ2\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ2\" name=\"EquationSource\"\u003e\n$$\\left(AWED\\left({v}_{i}, {v}_{j}\\right)\\ge T\\right)\\Rightarrow \\left(AWED\\right({v}_{j}, {v}_{i})\\ge T)$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e2\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003e \u003cb\u003e2.2 Attributes\u003c/b\u003e: The Twitter API provides a rich and accessible source of data that can be used in a variety of ways to support academic research. The twitter API provides access to User-based information as well as user-generated contents that is Tweet-based information. User profile information includes the number of following (friends), the number of followers, the number of tweets as well as profile descriptions and locations. On the other hand, Tweet-based information includes the content of the tweet, time of the creation of tweet, the number of likes, retweets, comments, and quotes.\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eNumber of Friends\u003c/b\u003e: The number of accounts that a user is currently following. These accounts are displayed on the user's timeline, enabling them to view the tweets and updates from these accounts. The number of friends a user possesses is synonymous with the out-degree of the corresponding node in the graph which yield significant insights into the structure and dynamics of social networks (Guan et al., \u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e2022\u003c/span\u003e).\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eNumber of Followers\u003c/b\u003e: The number of accounts that are currently following a specific individual. Followers receive notifications from the user, and their interaction with the user's tweets can enhance the user's reach and influence. The number of a user's followers is equivalent to the in-degree of the corresponding node in a graph, which can be used as a measure to comprehend the spread of information within online social networks (Panchendrarajan \u0026amp; Saxena, \u003cspan citationid=\"CR53\" class=\"CitationRef\"\u003e2023\u003c/span\u003e).\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eNumber of Tweets\u003c/b\u003e: The total number of tweets published by a user. A tweet is a concise communication or post on the social networking platform Twitter, with a character limit of 280. It can contain various elements such as text, hyperlinks, visuals, or multimedia components. The aggregate count of tweets serves as an indicator of the user's extent of participation and involvement on the site (Fu \u0026amp; Shen, \u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e2014\u003c/span\u003e).\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eEngagement\u003c/b\u003e: Engagement incorporates the various forms of user involvement with a post, including likes, retweets, comments, and link clicks. Post resonance is a metric that gauges the level of audience involvement and may be determined by dividing the number of engagements by the number of impressions (Asadi \u0026amp; Agah, \u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e2018\u003c/span\u003e; Iqbal et al., \u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e2021\u003c/span\u003e). as shown in Eq.\u0026nbsp;\u003cspan refid=\"Equ3\" class=\"InternalRef\"\u003e3\u003c/span\u003e, the engagement of user is calculated by summing the number of likes, retweets, replies, and quotes each tweet gets. A high engagement rate signifies that the material is captivating and pertinent to the audience.\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003cdiv id=\"Equ3\" class=\"Equation\"\u003e \u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ3\" name=\"EquationSource\"\u003e\n$$Engagement=\\sum _{i=1}^{n}{(Likes, Retweets, Replies, Quotes)}_{i}$$\u003c/div\u003e \u003cdiv class=\"EquationNumber\"\u003e3\u003c/div\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eActive\u003c/b\u003e: Being active on Twitter refers to the regularity and constancy with which a user interacts with the platform. This encompasses activities such as posting material, responding to comments, and engaging with others (Asadi \u0026amp; Agah, \u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e2018\u003c/span\u003e; Iqbal et al., \u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e2021\u003c/span\u003e). The Eq.\u0026nbsp;\u003cspan refid=\"Equ4\" class=\"InternalRef\"\u003e4\u003c/span\u003e, shows the user's active-ness level over a specific time period and includes actions like tweeting, replying, retweeting, and quoting other users' tweets.\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003cdiv id=\"Equ4\" class=\"Equation\"\u003e \u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ4\" name=\"EquationSource\"\u003e\n$$Active= \\frac{\\sum Tweets posted, reacting to Retweets, Replies , Quotes}{Time}$$\u003c/div\u003e \u003cdiv class=\"EquationNumber\"\u003e4\u003c/div\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eTweets Impact\u003c/b\u003e: In Twitter, tweets impact represents the number of retweets that a user's post or tweet has received (Huynh et al., \u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). Eq.\u0026nbsp;\u003cspan refid=\"Equ5\" class=\"InternalRef\"\u003e5\u003c/span\u003e denotes the impact of tweets, where n indicates the total number of tweets.\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003cdiv id=\"Equ5\" class=\"Equation\"\u003e \u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ5\" name=\"EquationSource\"\u003e\n$$Tweets Impact= \\sum _{i=1}^{n}{Tweet}_{i} \\times {\\text{log}\\left(Retweets\\right)}_{i}$$\u003c/div\u003e \u003cdiv class=\"EquationNumber\"\u003e5\u003c/div\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eGrowth\u003c/b\u003e: In Twitter, growth refers to the increase in followers\u0026rsquo; number over time (Mahmoudi et al., \u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e2018\u003c/span\u003e). This may be assessed by monitoring the number of newly acquired followers. Eq.\u0026nbsp;\u003cspan refid=\"Equ6\" class=\"InternalRef\"\u003e6\u003c/span\u003e indicates a user\u0026rsquo;s growth.\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003cdiv id=\"Equ6\" class=\"Equation\"\u003e \u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ6\" name=\"EquationSource\"\u003e\n$$Growth=\\frac{{Followers}_{final}-{Followers}_{starting}}{Time}$$\u003c/div\u003e \u003cdiv class=\"EquationNumber\"\u003e6\u003c/div\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003e2.3 Principal Component Analysis (PCA)\u003c/b\u003e: Principal Component Analysis (PCA) is a statistical technique used to reduce the number of variables in a dataset. It transforms a group of variables that are related to each other into a smaller group of variables that are not related, known as principal components (Saarela \u0026amp; Jauhiainen, \u003cspan citationid=\"CR58\" class=\"CitationRef\"\u003e2021\u003c/span\u003e; Zhou et al., \u003cspan citationid=\"CR81\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). Despite the reduction in variables, PCA preserves most of the information present in the original data. It can also be used to rank attributes according to their significance in capturing variability in the data. This is accomplished by analysing the eigenvalues of the covariance matrix. The eigenvalues indicate the extent to which each principal component accounts for the variance. As the eigenvalue increases, the corresponding principal component captures a greater amount of variation. Thus, characteristics linked to principal components with higher eigenvalues are deemed more significant in capturing the variability in the data. The procedure for ranking attributes involves following steps:\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003eNormalize the data to ensure uniform scaling of all variables.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eCalculate the covariance matrix in order to determine the relationships between variables.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eCalculate the eigenvectors and eigenvalues of the covariance matrix in order to determine the major components.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eRank the principal components according to their eigenvalues, which indicate the variance explained by each component.\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003cp\u003eThe first principal component captures the highest amount of variability in the data, while the successive components collect the greatest amount of variability that is perpendicular to the previous components. Eq.\u0026nbsp;\u003cspan refid=\"Equ7\" class=\"InternalRef\"\u003e7\u003c/span\u003e shows the calculation of the percentage of variance accounted for by each component.\u003cdiv id=\"Equ7\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ7\" name=\"EquationSource\"\u003e\n$$Percentage of variance= \\frac{{\\lambda }_{i}}{{\\sum }_{j=1}^{p}{\\lambda }_{j}} \\times 100$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e7\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003eWhere, the variable \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$p\$\u003c/span\u003e\u003c/span\u003e denotes the number of variables, while \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\${\\lambda }_{i}\$\u003c/span\u003e\u003c/span\u003e represents the eigenvalue associated with the \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$i\$\u003c/span\u003e\u003c/span\u003eth principal component and the variable \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$j\$\u003c/span\u003e\u003c/span\u003e denotes the number of principal components.\u003c/p\u003e \u003cp\u003e \u003cb\u003e2.4 Synthetic Graph\u003c/b\u003e: Synthetic graphs are simulated networks created using mathematical models to replicate the structural characteristics of actual social networks. Synthetic graph generators address the lack of datasets for evaluating graph learning algorithms, allowing for more thorough analysis of their performance in various scenarios. They are beneficial for comparing graph learning methods and modelling network dynamics (Nikolentzos et al., \u003cspan citationid=\"CR50\" class=\"CitationRef\"\u003e2023\u003c/span\u003e; Piccardi, \u003cspan citationid=\"CR54\" class=\"CitationRef\"\u003e2023\u003c/span\u003e; Verstraaten et al., \u003cspan citationid=\"CR69\" class=\"CitationRef\"\u003e2017\u003c/span\u003e). The Erdős-R\u0026eacute;nyi (ER), Barab\u0026aacute;si-Albert (BA), and Stochastic Block Model (SBM) are common synthetic graph models used by researchers in Online Social Network (OSN) analysis (O\u0026rsquo;Neil \u0026amp; Petty, \u003cspan citationid=\"CR51\" class=\"CitationRef\"\u003e2019\u003c/span\u003e; Piccardi, \u003cspan citationid=\"CR54\" class=\"CitationRef\"\u003e2023\u003c/span\u003e).\u003c/p\u003e \u003cp\u003e \u003cb\u003eThe Erdős-R\u0026eacute;nyi (ER)\u003c/b\u003e model is a classical graph modelling approach where every pair of nodes is linked by an edge with a consistent probability (Hu et al., \u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e2022\u003c/span\u003e; Roux et al., \u003cspan citationid=\"CR57\" class=\"CitationRef\"\u003e2023\u003c/span\u003e). The likelihood of a connection between two nodes is not influenced by the presence of other connections in the graph. In ER graph \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$G\\left(n, p\\right)\$\u003c/span\u003e\u003c/span\u003e there are \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$n\$\u003c/span\u003e\u003c/span\u003e vertices with probability \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$p\$\u003c/span\u003e\u003c/span\u003e independent from every other edges. The probability of graph \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$G\$\u003c/span\u003e\u003c/span\u003ewith edges \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$E\$\u003c/span\u003e\u003c/span\u003ein this model is shown in Eq.\u0026nbsp;\u003cspan refid=\"Equ8\" class=\"InternalRef\"\u003e8\u003c/span\u003e.\u003cdiv id=\"Equ8\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ8\" name=\"EquationSource\"\u003e\n$$P\\left(G\\right)= {p}^{E}{(1-p)}^{\\frac{n\\left(n-1\\right)}{2}-E}$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e8\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003eHere, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\frac{n\\left(n-1\\right)}{2}\$\u003c/span\u003e\u003c/span\u003e represent the total number of possible edges. The Erdős-R\u0026eacute;nyi model works well for analyses that call for a straightforward and universal random graph model in which each edge has a defined probability of existing or not, regardless of the other edges (Hu et al., \u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e2022\u003c/span\u003e; Shahraeini, \u003cspan citationid=\"CR60\" class=\"CitationRef\"\u003e2023\u003c/span\u003e).\u003c/p\u003e \u003cp\u003e \u003cb\u003eThe Barab\u0026aacute;si-Albert (BA)\u003c/b\u003e model is another well-liked method for creating synthetic graphs. The BA model generates random scale-free networks based on growth and preferential attachment principles (Kubina et al., \u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e2017\u003c/span\u003e; Wei et al., \u003cspan citationid=\"CR73\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). Growth involves the continuous addition of new nodes to the network, whereas preferential attachment indicates that new nodes are more inclined to join with existing nodes that have a high degree (number of connections). Eq.\u0026nbsp;\u003cspan refid=\"Equ9\" class=\"InternalRef\"\u003e9\u003c/span\u003e shows the degree distribution of BA model.\u003cdiv id=\"Equ9\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ9\" name=\"EquationSource\"\u003e\n$$P\\left(k\\right)= \\frac{2m(m+1)}{k(k+1)(k+2)}$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e9\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003eHere, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$P\\left(k\\right)\$\u003c/span\u003e\u003c/span\u003e denotes the probability that a node possesses degree k, and m denotes the number of connections formed by each newly added node to the network. This BA adheres to a power law distribution, indicating the presence of a small number of nodes with high degree (hubs) and a large number of nodes with low degree. BA graphs can be utilised to simulate and create realistic networks that display scale-free characteristics, including the World Wide Web, social networks, and biological networks (Kubina et al., \u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e2017\u003c/span\u003e; Piccardi, \u003cspan citationid=\"CR54\" class=\"CitationRef\"\u003e2023\u003c/span\u003e; Tantardini et al., \u003cspan citationid=\"CR65\" class=\"CitationRef\"\u003e2019\u003c/span\u003e).\u003c/p\u003e \u003cp\u003e \u003cb\u003eThe Stochastic Block Model (SBM)\u003c/b\u003e is another common model used in OSN analysis. The SBM is a probabilistic model for random graphs that generates graphs with communities, groups of nodes linked to each other with specific probabilities (Altenburger \u0026amp; Ugander, \u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e2018\u003c/span\u003e; Hu et al., \u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e2022\u003c/span\u003e; Lee \u0026amp; Wilkinson, \u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e2019\u003c/span\u003e; Piccardi, \u003cspan citationid=\"CR54\" class=\"CitationRef\"\u003e2023\u003c/span\u003e). The SBM is defined by the number of nodes in each community \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$n\$\u003c/span\u003e\u003c/span\u003e and a block probability matrix \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$P\\in {\\mathbb{R}}^{n\\times n}\$\u003c/span\u003e\u003c/span\u003e in which each element represents the likelihood of an edge within a certain block. Here, each element \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\${P}_{ij}\$\u003c/span\u003e\u003c/span\u003e represents the probability of a connection between nodes from communities \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$i\$\u003c/span\u003e\u003c/span\u003e and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$j\$\u003c/span\u003e\u003c/span\u003e. The SBM aims to incorporate more realistic features of real-world networks, including varying degree distributions, nested communities, and edge weights (Lee \u0026amp; Wilkinson, \u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e2019\u003c/span\u003e).\u003c/p\u003e \u003cp\u003e \u003cb\u003e2.5 Network Properties\u003c/b\u003e: The properties of a network encompass the attributes and metrics that elucidate the arrangement, conduct, and purpose of said network (Jain et al., \u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e2021\u003c/span\u003e; Masrom et al., \u003cspan citationid=\"CR44\" class=\"CitationRef\"\u003e2021\u003c/span\u003e; Piccardi, \u003cspan citationid=\"CR54\" class=\"CitationRef\"\u003e2023\u003c/span\u003e; Talaga \u0026amp; Nowak, \u003cspan citationid=\"CR64\" class=\"CitationRef\"\u003e2022\u003c/span\u003e; Tantardini et al., \u003cspan citationid=\"CR65\" class=\"CitationRef\"\u003e2019\u003c/span\u003e). These properties facilitate the comparison of graphs by bringing to light the resemblances and disparities between. Table shows network properties and their descriptions.\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eDegree Distribution\u003c/b\u003e: This illustrates the frequency at which nodes exhibit various levels of connectivity. The distribution of degrees within a network can be categorized as either uniform, normal, or skewed, contingent upon the degree to which connections are evenly or unevenly dispersed among the nodes. A skewed degree distribution may suggest the existence of hubs or outliers within the network (McMillan et al., \u003cspan citationid=\"CR45\" class=\"CitationRef\"\u003e2022\u003c/span\u003e; Piccardi, \u003cspan citationid=\"CR54\" class=\"CitationRef\"\u003e2023\u003c/span\u003e; Tantardini et al., \u003cspan citationid=\"CR65\" class=\"CitationRef\"\u003e2019\u003c/span\u003e). The degree distribution \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$P\\left(k\\right)\$\u003c/span\u003e\u003c/span\u003e for an undirected graph can be calculated as shown in Eq.\u0026nbsp;\u003cspan refid=\"Equ10\" class=\"InternalRef\"\u003e10\u003c/span\u003e.\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003cdiv id=\"Equ10\" class=\"Equation\"\u003e \u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ10\" name=\"EquationSource\"\u003e\n$$P\\left(k\\right)= \\frac{Number of nodes with degree k}{N}$$\u003c/div\u003e \u003cdiv class=\"EquationNumber\"\u003e10\u003c/div\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eHere, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$N\$\u003c/span\u003e\u003c/span\u003e is the total number of nodes and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$k\$\u003c/span\u003e\u003c/span\u003e is the degree of nodes.\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eEdge Density\u003c/b\u003e: The ratio of the number of actual edges in a network to the total number of potential edges in the network is measured as edge density. It shows how closely linked the nodes are to one another in a graph (Wills \u0026amp; Meyer, \u003cspan citationid=\"CR74\" class=\"CitationRef\"\u003e2020\u003c/span\u003e). Eq.\u0026nbsp;\u003cspan refid=\"Equ11\" class=\"InternalRef\"\u003e11\u003c/span\u003e calculates the edge density of a network.\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003cdiv id=\"Equ11\" class=\"Equation\"\u003e \u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ11\" name=\"EquationSource\"\u003e\n$$Edge Density: \\frac{2\\times Number of Edges}{Number of Nodes\\times (Number of Nodes-1)}$$\u003c/div\u003e \u003cdiv class=\"EquationNumber\"\u003e11\u003c/div\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eAverage Node Connectivity\u003c/b\u003e: It refers to the average number of node-independent pathways that connect each pair of nodes in a graph. It measures a network's robustness by expressing the average number of independent pathways that connect any two nodes (Beineke et al., \u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e2002\u003c/span\u003e). It can be expressed as follows:\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003cdiv id=\"Equ12\" class=\"Equation\"\u003e \u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ12\" name=\"EquationSource\"\u003e\n$$Average Node Connectivity= \\frac{1}{n(n-1)} {\\sum }_{s\\ne t}k(s,t)$$\u003c/div\u003e \u003cdiv class=\"EquationNumber\"\u003e12\u003c/div\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eHere, n is the number of nodes in a graph, and the connections between nodes \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$s\$\u003c/span\u003e\u003c/span\u003e and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$t\$\u003c/span\u003e\u003c/span\u003e is denoted as \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$k(s,t).\$\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eTransitivity\u003c/b\u003e: This is the average local clustering coefficient of all nodes in a network. These coefficients indicate how likely nodes are to form triangle linkages or clusters. High transitivity indicates more communities or groupings in the graph, whereas low transitivity indicates more bridges or gaps (McMillan et al., \u003cspan citationid=\"CR45\" class=\"CitationRef\"\u003e2022\u003c/span\u003e; Vasques Filho \u0026amp; O'Neale, \u003cspan citationid=\"CR67\" class=\"CitationRef\"\u003e2020\u003c/span\u003e). Eq.\u0026nbsp;\u003cspan refid=\"Equ13\" class=\"InternalRef\"\u003e13\u003c/span\u003e demonstrates the calculation of network transitivity.\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003cdiv id=\"Equ13\" class=\"Equation\"\u003e \u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ13\" name=\"EquationSource\"\u003e\n$$\\text{T}\\text{r}\\text{a}\\text{n}\\text{s}\\text{i}\\text{t}\\text{i}\\text{v}\\text{i}\\text{t}\\text{y}= \\frac{3\\times EquationNumber of triangles in the network}{EquationNumber of connected triples of nodes}$$\u003c/div\u003e \u003cdiv class=\"EquationNumber\"\u003e13\u003c/div\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eAssortativity\u003c/b\u003e: This is the measure of the correlation between the degrees of nodes that are linked by an edge, reflecting the tendency of nodes to connect with others with similar or dissimilar degrees. High assortativity indicates a greater degree of homophily or similarity in the graph, whereas low assortativity indicates a higher level of heterogeneity or diversity (Al Musawi et al., \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2022\u003c/span\u003e; McMillan et al., \u003cspan citationid=\"CR45\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). The calculation of the assortativity coefficient, commonly represented as \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$r\$\u003c/span\u003e\u003c/span\u003e, is performed utilizing the following:\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003cdiv id=\"Equ14\" class=\"Equation\"\u003e \u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ14\" name=\"EquationSource\"\u003e\n$$r= \\frac{{\\sum }_{i}{e}_{ii}-{\\sum }_{i}{a}_{i}{b}_{i}}{1-{\\sum }_{i}{a}_{i}^{2}}$$\u003c/div\u003e \u003cdiv class=\"EquationNumber\"\u003e14\u003c/div\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eHere, the fraction of edges connecting nodes of degree \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$i\$\u003c/span\u003e\u003c/span\u003e to other nodes of degree \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$i\$\u003c/span\u003e\u003c/span\u003e is represented as \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\${e}_{ii}\$\u003c/span\u003e\u003c/span\u003e. The term \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\${a}_{i}\$\u003c/span\u003e\u003c/span\u003e represents the proportion of edges that are connected to nodes with a degree of \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$i\$\u003c/span\u003e\u003c/span\u003e. And The term \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\${b}_{i}\$\u003c/span\u003e\u003c/span\u003e denotes the proportion of edges that would connect to nodes of degree \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$i\$\u003c/span\u003e\u003c/span\u003e if the edges were assigned randomly across the graph.\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003e2.6 OSN Characteristics\u003c/b\u003e: In OSNs, the characteristics of the network's structure are explicated through the usage of \"power law distribution,\" \"scale-free network,\" and \"small-world phenomena\" (Mislove et al., \u003cspan citationid=\"CR46\" class=\"CitationRef\"\u003e2007\u003c/span\u003e; Weber et al., \u003cspan citationid=\"CR72\" class=\"CitationRef\"\u003e2021\u003c/span\u003e).\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003ePower Law Distribution\u003c/b\u003e: It is believed that OSNs adapt to a power law distribution. This relates to the network's degree distribution, in which the degree of a node is determined by the quantity of connections it maintains (Broido \u0026amp; Clauset, \u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e2019\u003c/span\u003e).\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003cdiv id=\"Equ15\" class=\"Equation\"\u003e \u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ15\" name=\"EquationSource\"\u003e\n$$P\\left(k\\right)\\sim {k}^{-\\gamma }$$\u003c/div\u003e \u003cdiv class=\"EquationNumber\"\u003e15\u003c/div\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003ewhere \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$P\\left(k\\right)\$\u003c/span\u003e\u003c/span\u003erepresents the proportion of nodes in the network with \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$k\$\u003c/span\u003e\u003c/span\u003e connections, for high values of \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$k\$\u003c/span\u003e\u003c/span\u003e, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$P\\left(k\\right)\$\u003c/span\u003e\u003c/span\u003e adheres to a power law. \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\gamma\$\u003c/span\u003e\u003c/span\u003e is the is the exponent of the power law that characterises the degree distribution of the network.\u003c/p\u003e \u003cp\u003eThe network will have \u003cb\u003eScale-Free properties\u003c/b\u003e if \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\gamma\$\u003c/span\u003e\u003c/span\u003e in the range of \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$2\u0026lt;\\gamma \u0026lt;3\$\u003c/span\u003e\u003c/span\u003e. The phrase \"scale-free\" indicates that the network lacks a specific scale or size.\u003c/p\u003e \u003cp\u003eTo evaluate a graph shows power law characteristics we need to consider alpha value which is the exponent in the power law distribution equations. Xmin, the lower bound of x, KS p-value is the Kolmogorov-Smirnov test result. The likelihood Ratio (Power Law vs Exponential) compare the goodness of fit or the models. And Xmin process refers the estimation of the optimal lower cutoff using the goodness-of-fit based approach (Bhattacharya et al., \u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e2020\u003c/span\u003e).\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eSmall World Phenomena\u003c/b\u003e: This is a core issue in social networks, highlighting the prevalence of short paths in a graph where nodes represent individuals connected by links indicating mutual acquaintance. it refers to the probability that two randomly selected individuals from the population share a common friend, commonly known as \"six degrees of separation\" (Bhattacharya et al., \u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e2020\u003c/span\u003e). The Small World Phenomenon is usually measured using two mathematical methods:\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003cp\u003e \u003cstrong\u003eAverage Path Length\u003c/strong\u003e \u003cp\u003eThis represents the average number of steps on the shortest routes between every pair of nodes in a network. It quantifies the effectiveness of information or mass movement inside a network. In small-world networks, the average path length usually scales in proportion to the logarithm of the network's node count (Neal, \u003cspan citationid=\"CR48\" class=\"CitationRef\"\u003e2017\u003c/span\u003e). It represents as\u003c/p\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv id=\"Equ16\" class=\"Equation\"\u003e \u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ16\" name=\"EquationSource\"\u003e\n$$L\\sim\\text{l}\\text{o}\\text{g}\\left(N\\right)$$\u003c/div\u003e \u003cdiv class=\"EquationNumber\"\u003e16\u003c/div\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cstrong\u003eClustering coefficient\u003c/strong\u003e \u003cp\u003eThis quantity represents the mean value of the local clustering coefficients computed for every node in the network. The local clustering coefficient of a given node is defined as the ratio of the number of existing triangles involving that node to the total number of possible triangles. A larger clustering coefficient indicates a greater presence of clusters or communities within the network, while a smaller clustering coefficient indicates a higher occurrence of bridges or gaps (Kanavos et al., \u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e2023\u003c/span\u003e; Piccardi, \u003cspan citationid=\"CR54\" class=\"CitationRef\"\u003e2023\u003c/span\u003e; Tantardini et al., \u003cspan citationid=\"CR65\" class=\"CitationRef\"\u003e2019\u003c/span\u003e).\u003c/p\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv id=\"Equ17\" class=\"Equation\"\u003e \u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ17\" name=\"EquationSource\"\u003e\n$$C=\\frac{3 \\times Number of triangle }{Number of connected triples ofnodes }$$\u003c/div\u003e \u003cdiv class=\"EquationNumber\"\u003e17\u003c/div\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003e2.7 Predictive Analysis\u003c/b\u003e: The application of algorithms and data in the field of predictive analysis for OSNs involves the generation of predictions on potential network interactions, behaviours, or trends. We utilised two commonly used approaches, link prediction and community discovery, to conduct a predictive analysis of OSNs.\u003c/p\u003e \u003cp\u003e \u003cstrong\u003eLink Prediction\u003c/strong\u003e \u003cp\u003eLink prediction in OSN network analysis is essential for forecasting future user connections and understanding network growth patterns. By examining existing connections and network structure, this process identifies potential links between nodes, providing valuable insights into the network's evolution and the likelihood of new connections (Yilmaz et al., \u003cspan citationid=\"CR76\" class=\"CitationRef\"\u003e2023\u003c/span\u003e; Yuliansyah et al., \u003cspan citationid=\"CR77\" class=\"CitationRef\"\u003e2023\u003c/span\u003e). A common way for completing link prediction tasks is to combine Node2vec with a random forest classifier. This method entails utilizing Node2Vec as a means to produce embeddings for the nodes present in the graph. The embedded representations of the graph encapsulate its structural characteristics and function as feature vectors for each individual node. The Random Forest Classifier is subsequently trained using these embeddings to determine the probability of an edge between two nodes in the network. This functionality improves our understanding of the dynamics and structure of the network (Yilmaz et al., \u003cspan citationid=\"CR76\" class=\"CitationRef\"\u003e2023\u003c/span\u003e). In assessing the efficacy of the performance of the graph\u0026rsquo;s link predictability, various metrics and approaches are available. Commonly employed metrics and approaches include\u003c/p\u003e \u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003ePrecision and recall\u003c/b\u003e measures quantify the balance between the quality and the amount of the projected relationships. Precision is the ratio of true positives divided by the total number of positive predictions (TP\u0026thinsp;+\u0026thinsp;FP), whereas recall is the quotient of true positives divided by the total number of actual positives (TP\u0026thinsp;+\u0026thinsp;FN) (Jia et al., \u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e2022\u003c/span\u003e).\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eF1-score\u003c/b\u003e measures the overall quality of the predicted connections by combining precision and recall. It is computed by averaging the values of precision and recall; it has a range of 0 to 1, with 1 representing the highest quality and 0 representing the lowest (Gui, \u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e2024\u003c/span\u003e; Jia et al., \u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e2022\u003c/span\u003e).\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eThe \u003cb\u003eAUC-ROC\u003c/b\u003e metric measures the model's proficiency in appropriately prioritizing a randomly chosen positive instance (representing an extant link) over a randomly selected negative instance representing a non-existent link (Gui, \u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e2024\u003c/span\u003e; Yilmaz et al., \u003cspan citationid=\"CR76\" class=\"CitationRef\"\u003e2023\u003c/span\u003e). The AUC-ROC values span from 0 to 1, with a higher value signifying enhanced accuracy of the algorithm when predicting links.\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003eCommunity Detection\u003c/b\u003e: The identification of communities is another fundamental task in the field of network analysis, with the objective of dividing a network into distinct sub-structures or communities (Xu et al., \u003cspan citationid=\"CR75\" class=\"CitationRef\"\u003e2022\u003c/span\u003e; Zhang et al., \u003cspan citationid=\"CR79\" class=\"CitationRef\"\u003e2020\u003c/span\u003e). The Louvain algorithm, a well-established technique for community detection in networks, was employed in this study to detect communities within graph structures. We assessed the identified communities by employing three well-established criteria in the domain of network analysis and community detection: modularity, silhouette, and conductance. These metrics offered a thorough evaluation of the quality and structure of the identified communities (M. A. U. Hasan et al., \u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e2024\u003c/span\u003e; Hromic \u0026amp; Hayes, \u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e2019\u003c/span\u003e; Kim et al., \u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e2022\u003c/span\u003e; Wang et al., \u003cspan citationid=\"CR71\" class=\"CitationRef\"\u003e2018\u003c/span\u003e; Zhao et al., \u003cspan citationid=\"CR80\" class=\"CitationRef\"\u003e2018\u003c/span\u003e).\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eModularity\u003c/b\u003e evaluates the degree to which a network has been separated into communities. A higher modularity value signifies a more robust community structure (M. A. U. Hasan et al., \u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e2024\u003c/span\u003e; Kim et al., \u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e2022\u003c/span\u003e).\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eThe Silhouette Score\u003c/b\u003e is a metric used to assess the level of clarity and distinctiveness exhibited by clusters or communities within a specific clustering or community detection process. The Silhouette Score is a numerical value that varies between \u0026minus;\u0026thinsp;1 and 1, with a higher score indicating more distinct groupings (Wang et al., \u003cspan citationid=\"CR71\" class=\"CitationRef\"\u003e2018\u003c/span\u003e; Zhao et al., \u003cspan citationid=\"CR80\" class=\"CitationRef\"\u003e2018\u003c/span\u003e).\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eThe measurement of \u003cb\u003eConductance\u003c/b\u003e involves the evaluation of the proportion of edges within communities in relation to the edges that connect different communities. Lower conductance values indicate a more optimal community structure (M. A. U. Hasan et al., \u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e2024\u003c/span\u003e; Kim et al., \u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e2022\u003c/span\u003e).\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e"},{"header":"Proposed methodology","content":"\u003cp\u003eIn this research, we introduced an approach for building a novel social network graph using information gathered from the Twitter network. The relationships among users in the social network were created based on their attribute\u0026rsquo;s similarities. We present this processing method in Fig. \u003cspan\u003e2\u003c/span\u003e, which is roughly divided into five portions: Data Collection \u0026amp; Primary User Selection, Preprocessing, Attribute Selection \u0026amp; Weight Assign, Graph Generation, and Evaluation.\u003c/p\u003e\n\u003cp\u003e\u003cspan\u003e\u003cstrong\u003e3.1 Data Collection \u0026amp; Primary User Selection\u003c/strong\u003e: The initial step is to select a primary user from twitter network. To do so, tweets were collected by using a search query with specific keywords during a specified time window. Following the initial collection, a filtration process was carried out to categorise tweets based on the number of likes they received. The primary users were chosen based on the authors of the most popular Tweets. Subsequently, two datasets were systematically created: one comprising profile data of the primary users, and the other containing their most recent tweets and tweets metrics.\u003cbr\u003e\u003c/span\u003e\u003cspan\u003e\u003cstrong\u003e3.2 Preprocessing\u003c/strong\u003e: Collecting data straight from Twitter typically requires extracting information from a varied and unfiltered flow of content, resulting in the incorporation of several irrelevant or undesired data. Various preprocessing methods are used to eliminate abnormalities and duplications, improving the quality and integrity of the data from user datasets and tweet datasets. Subsequently, pre-processed datasets were merged to form a unified dataset. Numerical data was standardised using a Max-Min normalisation algorithm to scale it between 0 and 1.\u003cbr\u003e\u003c/span\u003e\u003cspan\u003e\u003cstrong\u003e3.3 Attribute Selection and Weight Assign\u003c/strong\u003e: Selecting attributes and assigning weight has multiple steps. First, we transform pre-processed data into attributes that discussed in section 3.2. Afterward, we performed a comprehensive examination of existing literature to pinpoint essential attributes like spreadability, engagement, activity, growth, and impact, as well as their evaluation criteria, with a specific emphasis on their operational feasibility via the Twitter API. We used Principal Component Analysis (PCA) to rank these attributes based on their importance. Initially, we ensured uniform scaling across attributes with different units. Following this, the covariance matrix was calculated for the dataset.\u003cbr\u003e\u003c/span\u003e\u003c/p\u003e\n\u003cdiv id=\"Equ18\"\u003e\n \u003cdiv id=\"FileID_Equ18\" name=\"EquationSource\"\u003e$$Covariance Matrix, \\sum = \\frac{1}{n-1}{\\left(X-\\stackrel{-}{X}\\right)}^{T}\\left(X-\\stackrel{-}{X}\\right)$$\u003c/div\u003e\n \u003cdiv\u003e18\u003c/div\u003e\n\u003c/div\u003e\n\u003cp\u003ewhere \u003cspan\u003e\u003cspan\u003e\$n\$\u003c/span\u003e\u003c/span\u003e is the number of observations, \u003cspan\u003e\u003cspan\u003e\$X\$\u003c/span\u003e\u003c/span\u003e represents the data matrix, and \u003cspan\u003e\u003cspan\u003e\$\\stackrel{-}{X}\$\u003c/span\u003e\u003c/span\u003e denotes the mean vector of the attributes. Subsequently, the eigenvalues and eigenvectors associated with them were computed.\u003c/p\u003e\n\u003cdiv id=\"Equ19\"\u003e\n \u003cdiv id=\"FileID_Equ19\" name=\"EquationSource\"\u003e$$Eigenvalue equation, \\varSigma v=\\lambda v$$\u003c/div\u003e\n \u003cdiv\u003e19\u003c/div\u003e\n\u003c/div\u003e\n\u003cp\u003ewhere \u003cspan\u003e\u003cspan\u003e\$v\$\u003c/span\u003e\u003c/span\u003e is the corresponding eigenvector and \u0026lambda; is the eigenvalue. The information contained within each principal component was revealed by these eigenvalues. We used eigenvalues to decide how many principal components \u003cspan\u003e\u003cspan\u003e\$\\left(k\\right)\$\u003c/span\u003e\u003c/span\u003e to keep. The chosen components were utilised to convert the initial attribute space, producing a collection of orthogonal characteristics referred to as principal components. The transformed data matrix, \u003cspan\u003e\u003cspan\u003e\$Z\$\u003c/span\u003e\u003c/span\u003e, is derived by multiplying the original data matrix, \u003cspan\u003e\u003cspan\u003e\$X\$\u003c/span\u003e\u003c/span\u003e, by the matrix containing the first \u003cspan\u003e\u003cspan\u003e\$k\$\u003c/span\u003e\u003c/span\u003e eigenvectors, \u003cspan\u003e\u003cspan\u003e\$V\$\u003c/span\u003e\u003c/span\u003e:\u003c/p\u003e\n\u003cdiv id=\"Equ20\"\u003e\n \u003cdiv id=\"FileID_Equ20\" name=\"EquationSource\"\u003e$$Z={XV}_{k}$$\u003c/div\u003e\n \u003cdiv\u003e20\u003c/div\u003e\n\u003c/div\u003e\n\u003cp\u003eWe ranked the original attributes by analysing their loadings on the principal components. The loading of attribute \u003cspan\u003e\u003cspan\u003e\$j\$\u003c/span\u003e\u003c/span\u003e on principal component \u003cspan\u003e\u003cspan\u003e\$i\$\u003c/span\u003e\u003c/span\u003e is given by:\u003c/p\u003e\n\u003cdiv id=\"Equ21\"\u003e\n \u003cdiv id=\"FileID_Equ21\" name=\"EquationSource\"\u003e$${Loading}_{ij}= {V}_{ij}$$\u003c/div\u003e\n \u003cdiv\u003e21\u003c/div\u003e\n\u003c/div\u003e\n\u003cp\u003eAfter calculating the attribute based on its loading, the attribute with the highest loading gets rank 1, the second highest attribute gets rank 2 and so on. After completing this ranking process, the order of ranks is reversed, resulting in a descending sequence. A normalization procedure is then applied to obtain weights within the range of 0.1 to 1.0. This ensures that the weights are proportionally representative of the original attribute values, preserving data integrity and facilitating more efficient computational processes. To do so we applied following formula:\u003c/p\u003e\n\u003cdiv id=\"Equ22\"\u003e\n \u003cdiv id=\"FileID_Equ22\" name=\"EquationSource\"\u003e$${W}_{i}= 0.1+0.9\\times \\frac{{R}_{i}-min\\left(R\\right)}{{max}\\left(R\\right)-min\\left(R\\right)}$$\u003c/div\u003e\u003cdiv\u003e22\u003c/div\u003e\u003c/div\u003e\u003cp\u003eWhere, \u003cspan\u003e\u003cspan\u003e\${W}_{i}\$\u003c/span\u003e\u003c/span\u003e is the weight of the \u003cspan\u003e\u003cspan\u003e\$i\$\u003c/span\u003e\u003c/span\u003e-th attribute, \u003cspan\u003e\u003cspan\u003e\${R}_{i}\$\u003c/span\u003e\u003c/span\u003e is the rank of the \u003cspan\u003e\u003cspan\u003e\$i\$\u003c/span\u003e\u003c/span\u003e-th attribute, and \u003cspan\u003e\u003cspan\u003e\$min\\left(R\\right)\$\u003c/span\u003e\u003c/span\u003e and \u003cspan\u003e\u003cspan\u003e\$max\\left(R\\right)\$\u003c/span\u003e\u003c/span\u003e are the minimum and maximum of the reversed ranks.\u003c/p\u003e\u003cp\u003e\u003cstrong\u003e3.4 Graph Construction Based on Weighted Similarity\u003c/strong\u003e: The next step of similarity graph construction is to identify the similarities between users based on their attributes. To calculate the similarity coefficient between users we introduce a novel method called Attribute-Weighted Euclidean Distance (AWED). The significance of weighted features in OSN analysis is well acknowledged (Wang \u0026amp; Ma, \u003cspan\u003e2016\u003c/span\u003e). Feature weights have a substantial impact on similarity calculations. They regulate the impact or significance of each attribute in determining the overall similarity between users (Cheng \u0026amp; Yan, \u003cspan\u003e2023\u003c/span\u003e; Shantal et al., 2023). For instance, the attribute \u0026apos;Number of followers\u0026apos; has a weight of 0.5, while \u0026ldquo;Number of tweets\u0026rdquo; has a weight of 0.2. A small difference in the number of followers (e.g., 100 followers) compared to a substantial difference in the number of tweets (e.g., 500 tweets) would result in the followers count having a greater impact on the similarity calculation, despite the larger difference in tweet count.\u003c/p\u003e\u003cp\u003eOur proposed method involves integrating PCA to assign weights to user attributes based on attributes importance. Suppose two users \u003cspan\u003e\u003cspan\u003e\${U}_{1}\$\u003c/span\u003e\u003c/span\u003eand \u003cspan\u003e\u003cspan\u003e\${U}_{2}\$\u003c/span\u003e\u003c/span\u003e has attributes \u003cspan\u003e\u003cspan\u003e\$A= \\left\\{{a}_{1}, {a}_{2}, \\dots , {a}_{m}\\right\\}\$\u003c/span\u003e\u003c/span\u003e, weights \u003cspan\u003e\u003cspan\u003e\$W=\\left\\{{w}_{1}, {w}_{2}, \\dots , {w}_{m}\\right\\}\$\u003c/span\u003e\u003c/span\u003e assigned from PCA. The normalization factors are \u003cspan\u003e\u003cspan\u003e\$m{f}_{1}, m{f}_{2}, \\dots , m{f}_{m}\$\u003c/span\u003e\u003c/span\u003e, The AWED can be calculated as:\u003c/p\u003e\u003cdiv id=\"Equ23\"\u003e\u003cdiv id=\"FileID_Equ23\" name=\"EquationSource\"\u003e$$AWED\\left({U}_{1},{U}_{2}\\right)= \\sqrt{{\\sum }_{i=1}^{n}{\\left(\\frac{{a}_{1i}}{m{f}_{1i}}- \\frac{{a}_{2i}}{m{f}_{2i}}\\right)}^{2}\\bullet {w}_{i}}$$\u003c/div\u003e\n \u003cdiv\u003e23\u003c/div\u003e\n\u003c/div\u003e\n\u003cp\u003eHere, \u003cspan\u003e\u003cspan\u003e\${a}_{1i}\$\u003c/span\u003e\u003c/span\u003e and \u003cspan\u003e\u003cspan\u003e\${a}_{2i}\$\u003c/span\u003e\u003c/span\u003e are the attribute values of users \u003cspan\u003e\u003cspan\u003e\${U}_{1}\$\u003c/span\u003e\u003c/span\u003eand \u003cspan\u003e\u003cspan\u003e\${U}_{2}\$\u003c/span\u003e\u003c/span\u003e, respectively. \u003cspan\u003e\u003cspan\u003e\${w}_{i}\$\u003c/span\u003e\u003c/span\u003e represents attributes importance, and \u003cspan\u003e\u003cspan\u003e\$m{f}_{1i}\$\u003c/span\u003e\u003c/span\u003e and \u003cspan\u003e\u003cspan\u003e\$m{f}_{2i}\$\u003c/span\u003e\u003c/span\u003e are the normalisation factors for attribute \u003cspan\u003e\u003cspan\u003e\$i\$\u003c/span\u003e\u003c/span\u003e for users \u003cspan\u003e\u003cspan\u003e\${U}_{1}\$\u003c/span\u003e\u003c/span\u003eand \u003cspan\u003e\u003cspan\u003e\${U}_{2}\$\u003c/span\u003e\u003c/span\u003e respectively. \u003cstrong\u003eAlgorithm 1\u003c/strong\u003e represents the pair-wise similarity calculations as pseudo-code.\u003c/p\u003e\n\u003cdiv\u003e\n \u003ctable id=\"Tabb\" border=\"1\"\u003e\n \u003ccolgroup cols=\"3\"\u003e\u003c/colgroup\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth align=\"left\" colspan=\"2\"\u003e\n \u003cp\u003eAlgorithm 1: Calculating Pair-wise Users AWED Similarity Coefficient\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\" colspan=\"1\"\u003e\u0026nbsp;\u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colspan=\"2\"\u003e\n \u003cp\u003e\u003cstrong\u003eInputs\u003c/strong\u003e: Dataset with Users, Users Attributes\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colspan=\"1\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colspan=\"2\"\u003e\n \u003cp\u003e\u003cspan\u003e\u003cspan\u003e\$attributeColumn: Columns containing attribute values\$\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colspan=\"1\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colspan=\"2\"\u003e\n \u003cp\u003e\u003cspan\u003e\u003cspan\u003e\$normalizationFactorColumn: Columns containing normalization factor values\$\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colspan=\"1\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colspan=\"2\"\u003e\n \u003cp\u003e\u003cspan\u003e\u003cspan\u003e\$rankColumn:feature importance ranks based on PCA\$\u003c/span\u003e\u003c/span\u003e // Reverse the order\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colspan=\"1\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colspan=\"2\"\u003e\n \u003cp\u003e\u003cstrong\u003eOutput\u003c/strong\u003e: Pairwise Users Similarity coefficient\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colspan=\"1\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colspan=\"2\"\u003e\n \u003cp\u003e\u003cspan\u003e\u003cspan\u003e\$attributes = dataset.drop(columns=[{\\prime }User\\_Id{\\prime }\\left]\\right))\$\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colspan=\"1\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colspan=\"2\"\u003e\n \u003cp\u003e\u003cspan\u003e\u003cspan\u003e\$normalizationFactors = dataset\\left[normalizationFactorColumn\\right]\$\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colspan=\"1\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colspan=\"2\"\u003e\n \u003cp\u003e\u003cspan\u003e\u003cspan\u003e\$attributeImportanceRanks = dataset\\left[rankColumn\\right]\$\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colspan=\"1\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colspan=\"2\"\u003e\n \u003cp\u003e\u003cspan\u003e\u003cspan\u003e\$min\\_rank = min\\left(attributeImportanceRanks\\right)\$\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e\n \u003cp\u003e\u003cspan\u003e\u003cspan\u003e\$max\\_rank = max\\left(attributeImportanceRanks\\right)\$\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colspan=\"1\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colspan=\"2\"\u003e\n \u003cp\u003e// Weight assign\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colspan=\"1\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colspan=\"2\"\u003e\n \u003cp\u003e\u003cspan\u003e\u003cspan\u003e\$def calculate\\_weight(rank, \\text{m}\\text{i}\\text{n}\\_rank, \\text{m}\\text{a}\\text{x}\\_rank ):\$\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e\n \u003cp\u003e\u003cspan\u003e\u003cspan\u003e\$weight = \\left(\\right(rank - min\\_rank) / (max\\_rank - min\\_rank\\left)\\right) * 0.9 + 0.1\$\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colspan=\"1\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colspan=\"2\"\u003e\n \u003cp\u003e\u003cspan\u003e\u003cspan\u003e\$return weight\$\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colspan=\"1\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colspan=\"2\"\u003e\n \u003cp\u003e\u003cspan\u003e\u003cspan\u003e\$weights = [calculate\\_weight(rank, min\\_rank, max\\_rank\\left) for rank in attributeImportanceRanks\\right]\$\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colspan=\"1\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colspan=\"2\"\u003e\n \u003cp\u003e//Similarity calculation\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colspan=\"1\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colspan=\"2\"\u003e\n \u003cp\u003e\u003cspan\u003e\u003cspan\u003e\$def AWED(User1, User2, Weight, normalizationFactors):\$\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colspan=\"1\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colspan=\"2\"\u003e\n \u003cp\u003e\u003cspan\u003e\u003cspan\u003e\$weighted\\_sum\\_squared = 0\$\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colspan=\"1\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colspan=\"2\"\u003e\n \u003cp\u003e\u003cspan\u003e\u003cspan\u003e\$for attribute in attributes:\$\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e\n \u003cp\u003e\u003cspan\u003e\u003cspan\u003e\$Weight= Weights\\left[attribute\\right]\$\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e\n \u003cp\u003e\u003cspan\u003e\u003cspan\u003e\$normalized\\_value1 = User1\\left[attribute\\right] / normalization\\_factors\\left[attribute\\right]\$\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e\n \u003cp\u003e\u003cspan\u003e\u003cspan\u003e\$normalized\\_value2 = User2\\left[attribute\\right] / normalization\\_factors\\left[attribute\\right]\$\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e\n \u003cp\u003e\u003cspan\u003e\u003cspan\u003e\$difference = absolute\\_difference(normalized\\_value1, normalized\\_value2)\$\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e\n \u003cp\u003e\u003cspan\u003e\u003cspan\u003e\$weighted\\_difference = difference * Weight\$\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e\n \u003cp\u003e\u003cspan\u003e\u003cspan\u003e\$weighted\\_sum\\_squared += {(weighted\\_difference)}^{2}\$\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colspan=\"1\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colspan=\"2\"\u003e\n \u003cp\u003e\u003cspan\u003e\u003cspan\u003e\$similarity\\_coefficient = square\\_root(weighted\\_sum\\_squared)\$\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e\n \u003cp\u003e\u003cspan\u003e\u003cspan\u003e\$return similarity\\_coefficient\$\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colspan=\"1\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colspan=\"2\"\u003e\n \u003cp\u003e\u003cspan\u003e\u003cspan\u003e\$similarity\\_df=\\left[ \\right]\$\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colspan=\"1\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colspan=\"2\"\u003e\n \u003cp\u003e\u003cspan\u003e\u003cspan\u003e\$for i in range\\left(len\\right(dataset\\left)\\right):\$\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e\n \u003cp\u003e\u003cspan\u003e\u003cspan\u003e\$for j in range(i + 1, len(dataset\\left)\\right):\$\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e\n \u003cp\u003e\u003cspan\u003e\u003cspan\u003e\$User1 = dataset.iloc[i, :].to\\_dict\\left(\\right)\$\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e\n \u003cp\u003e\u003cspan\u003e\u003cspan\u003e\$User2 = dataset.iloc[j, :].to\\_dict\\left(\\right)\$\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e\n \u003cp\u003e\u003cspan\u003e\u003cspan\u003e\$similarity = AWED(User1,User2,Weight,normalizationFactors)\$\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e\n \u003cp\u003e\u003cspan\u003e\u003cspan\u003e\$similarity\\_df = similarity\\_df.append \\left(\\right\\{ {\\prime }user1{\\prime }: User1,{\\prime }user2{\\prime }: User2, {\\prime }AWED{\\prime }: similarity\\left\\}\\right)\$\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colspan=\"1\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n\u003c/div\u003e\n\u003cp\u003e\u003cstrong\u003eSimilarity Graph Construction\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eAfter acquiring the users\u0026rsquo; similarities based on their weighted attributes, we perform similarity distribution and based on the percentage of relationships distribution we determined the threshold value. This threshold is the smallest measure of similarity necessary for two users to be connected in the graph.\u003c/p\u003e\n\u003cp\u003eTo generate a graph that depicts user relationships according to a similarity threshold, an empty graph \u003cspan\u003e\u003cspan\u003e\$G\$\u003c/span\u003e\u003c/span\u003e is generated and initialised as \u003cspan\u003e\u003cspan\u003e\$G=(V,E)\$\u003c/span\u003e\u003c/span\u003e, where \u003cspan\u003e\u003cspan\u003e\$V\$\u003c/span\u003e\u003c/span\u003e represents the set of user nodes and \u003cspan\u003e\u003cspan\u003e\$E\$\u003c/span\u003e\u003c/span\u003e represents the set of edges initially empty. At the outset, there are no connections established among users in the graph. Following this, we proceed through the similarity matrix based on \u003cspan\u003e\u003cspan\u003e\$AWED\$\u003c/span\u003e\u003c/span\u003e, which comprises pairwise similarity coefficients among users. If the similarity coefficient \u003cspan\u003e\u003cspan\u003e\${AWED}_{ij}\$\u003c/span\u003e\u003c/span\u003e between users \u003cspan\u003e\u003cspan\u003e\$i\$\u003c/span\u003e\u003c/span\u003e and \u003cspan\u003e\u003cspan\u003e\$j\$\u003c/span\u003e\u003c/span\u003e is greater than the predefined threshold \u003cspan\u003e\u003cspan\u003e\$T\$\u003c/span\u003e\u003c/span\u003e, an edge is added to the graph \u003cspan\u003e\u003cspan\u003e\$(i,j)\$\u003c/span\u003e\u003c/span\u003e with the edge weight equal to the similarity coefficient.\u003c/p\u003e\n\u003cdiv id=\"Equ24\"\u003e\n \u003cdiv id=\"FileID_Equ24\" name=\"EquationSource\"\u003e$$E=\\left\\{\\left(i, j,{AWED}_{ij}\\right)|{AWED}_{ij}\u0026gt;T\\right\\}$$\u003c/div\u003e\n \u003cdiv\u003e24\u003c/div\u003e\n\u003c/div\u003e\n\u003cp\u003e\u003cstrong\u003e3.5 Evaluation\u003c/strong\u003e: The evaluation phase of our proposed methodology involves comparing the AWED graph with synthetic graphs. The objective is to show that the suggested graph displays more authentic characteristics compared to the synthetic alternatives in the field of social network analysis. This comparison is based on different criteria, including structural properties and characteristics of OSNs including power law distribution and the small-world phenomena. Prediction analyses are conducted to evaluate the effectiveness of the graph in prediction tasks. This evaluation approach tries to determine the authenticity and usefulness of the AWED graph in the field of social network analysis.\u003c/p\u003e"},{"header":"Experimental Validation","content":"\u003cp\u003eIn this study, we conduct a two-fold experiment: 1) we construct a social network based on user similarity, and 2) we evaluate the performance of the network by comparing it with three synthetic graphs that have different characteristics. We employ network properties, OSN characteristics, and predictive analysis as the metrics for comparison. This section begins with a description of the accumulated dataset before moving on to the actual results.\u003c/p\u003e \u003cp\u003e \u003cb\u003e4.1 Datasets\u003c/b\u003e: For this study, we followed the methodology described in section 3.1 and generated three datasets based on tweets that correspond to three different topics: War and Conflict, Environment and Global Warming, and Racism and Hate Speech. Each dataset contains the information of the primary users, their tweets, and the tweets\u0026rsquo; information. n an intentional manner, we selected distinct primary users, timestamps, and durations for each dataset under consideration. The datasets are:\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eSummary of twitter datasets\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"6\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDatasets\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eDuration\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eTime stamps\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eTotal Tweets\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eTotal User Information\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eFinal user # (after preprocessing)\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eWar \u0026amp; Conflict\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e52 days\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e17\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e14070 tweets\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e2521 rows of users\u0026rsquo; information\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e64\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eEnvironment \u0026amp; Global Warming\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e8 days\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e4\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e57355 tweets\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e1998 rows of users\u0026rsquo; information\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e245\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRacism \u0026amp; Hate speech\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e12 days\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e6\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e51442 tweets\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e2986 rows of users\u0026rsquo; information\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e464\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003e4.2 Attribute Selections\u003c/b\u003e: One of the main objectives of this study is to gather real-time data readily accessible via the Twitter API for the purpose of constructing an Online Social Network (OSN). The attributes selected for this study, as discussed in Section 3.2, are those most frequently addressed in the literature and were chosen with consideration for their ease of obtainability. It is important to note that not all features carry equal significance, and the importance of features can vary. Consequently, Principal Component Analysis (PCA) was employed to identify the features of greatest importance.\u003c/p\u003e \u003cp\u003eTo identify the number of Principal Components (PCs) to consider for final ranking, we selected the number of PCs that together explain a substantial portion of the variance in the data, and the first four principal components together explain at least 95% of the variance in the datasets. Figure shows the loading vectors of attributes for the first 4 PCs.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eEach PC in PCA is a linear combination of the original attributes, with the loadings denoting the weights assigned to each original feature in that combination. The aggregate absolute loadings of a feature across all PCs yield an estimation of that attribute's overall contribution to the variance accounted for by the PCs (Zhou et al., \u003cspan citationid=\"CR81\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). This provides us with a comprehensive assessment of the significance of each attribute in the dataset. Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e shows the overall contribution of each attribute to the variance explained by the PCs. The attribute \"Following Count\" has the highest total absolute loadings, suggesting that it has the most effect on the variance among all principal components. Next are \"Engagement\" and \"Followers Count\", which both exhibit major absolute loadings. Conversely, \"Impact,\" \"Tweets Count,\" \"Growth,\" and \"Active\" have lower total loadings, indicating they are less significant in explaining for the variance.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eSum of the Absolute Loadings Value and Rank\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"3\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAttributes\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eThe sum of Absolute Loading\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eRank\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eFollowing Count\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e1.729724\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eEngagement\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e1.712006\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e2\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eFollowers Count\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e1.552863\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e3\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eImpact\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.972462\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e4\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eTweets Count\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.901511\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e5\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eGrowth\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.832458\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e6\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eActive\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.831984\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e7\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003col\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003e4.3 Similarity coefficient between users\u003c/b\u003e: The similarity coefficient between two users, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\${U}_{1}\$\u003c/span\u003e\u003c/span\u003e and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\${U}_{2}\$\u003c/span\u003e\u003c/span\u003e, is calculated using their attribute values and ranks. Suppose there are three attributes and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\${U}_{1}\$\u003c/span\u003e\u003c/span\u003e has \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$({a}_{11}=10, {a}_{12}=5,{a}_{13}=8)\$\u003c/span\u003e\u003c/span\u003e and normalization factors \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\left(n{f}_{11}=2,n{f}_{12}=1,n{f}_{13}=4\\right)\$\u003c/span\u003e\u003c/span\u003e. On the other hand, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\${U}_{2}\$\u003c/span\u003e\u003c/span\u003e has \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$({a}_{21}=8, {a}_{22}=4,{a}_{23}=7)\$\u003c/span\u003e\u003c/span\u003e and normalization factors \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\left(n{f}_{21}=3,n{f}_{22}=1.5,n{f}_{23}=5 \\right)\$\u003c/span\u003e\u003c/span\u003e, the importance ranks of these attributes obtained from PCA are \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$({r}_{1}=0.5, {r}_{2}=0.3, {r}_{3}=0.2)\$\u003c/span\u003e\u003c/span\u003e. With the attribute values and weights in place, the final similarity between \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\${U}_{1}\$\u003c/span\u003e\u003c/span\u003e and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\${U}_{2}\$\u003c/span\u003e\u003c/span\u003e:\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003c/ol\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"No\" id=\"Tabc\" border=\"1\"\u003e \u003ccolgroup cols=\"2\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$AWED\\left({U}_{1},{U}_{2}\\right)= \\sqrt{{\\left(\\frac{10}{2}-\\frac{8}{3}\\right)}^{2}\\bullet 0.5+{\\left(\\frac{5}{1}-\\frac{4}{1.5}\\right)}^{2}\\bullet 0.3+{\\left(\\frac{8}{4}-\\frac{7}{5}\\right)}^{2}\\bullet 0.2 }\$\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003e(25)\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$AWED\\left({U}_{1},{U}_{2}\\right)=\\sqrt{{\\left(\\frac{5}{2.67}\\right)}^{2}\\bullet 0.5+{\\left(\\frac{5}{2.67}\\right)}^{2}\\bullet 0.3+{\\left(\\frac{2}{1.4}\\right)}^{2}\\bullet 0.2}\$\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e(26)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$AWED\\left({U}_{1},{U}_{2}\\right)=\\sqrt{0.47+0.32+0.08}\\approx \\sqrt{0.87 }\\approx 0.93\$\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e(27)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003col\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003e4.4 Threshold determination\u003c/b\u003e: Once the similarity coefficients are calculated using attribute values and rankings, the following step involves choosing a threshold value to decide if a link should be established between two users. To determine the best threshold value, we analyse the distribution of similarities using the descriptive statistics of the similarity coefficients, displayed in Table\u0026nbsp;\u003cspan refid=\"Tab3\" class=\"InternalRef\"\u003e3\u003c/span\u003e. The table shows the standard deviation, minimum, maximum, mean, and percentiles of the similarity distribution. The mean value indicates the average similarity coefficient, whereas the standard deviation quantifies the variability or dispersion of the similarity coefficients. In this study, we choose the mean value as the threshold to indicate the similarity coefficients across users.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003c/ol\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab3\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 3\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eSimilarity Coefficients Distribution\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"2\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eSimilarity Coefficient\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMin\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.422809601\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ep1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.500131607\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ep5\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.714326859\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ep10\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.811151505\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ep25\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.925210953\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ep50\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.970602036\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ep75\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.987081528\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ep90\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.992170334\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ep95\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.993883133\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ep99\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.996419907\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ep100\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.998277664\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003emax\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.998277664\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003emean\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.931131892\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003estdDev\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.096937947\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003e4.5 Graph Generation\u003c/b\u003e: Once we calculated the AWED coefficients among users and established the threshold value, we proceeded to construct the network. We utilised the AWED method to generate a similarity matrix for the primary users, containing the pairwise similarity coefficients among them. The final graph was created by adding edges with AWED coefficients exceeding the selected threshold. We then applied weights to the network edges using these similarity factors. Figure\u0026nbsp;4. shows the graph creation process with threshold \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$T=0.931\$\u003c/span\u003e\u003c/span\u003e .\u003c/p\u003e "},{"header":"Results and Discussions","content":"\u003cp\u003eWe constructed similarity graphs for three distinct datasets. Each of these graphs was subsequently compared with three categories of synthetic graphs, namely, Erd\u0026ouml;s-R\u0026eacute;nyi (ER), Barab\u0026aacute;si-Albert (BA), and Stochastic Block Model (SBM). To ensure a fair evaluation, we maintained an equivalent number of nodes in the synthetic graphs as present in each of the datasets. The Table\u0026nbsp;\u003cspan\u003e4\u003c/span\u003e presents data on the number of nodes and edges present in each dataset, along with their respective synthetic graphs.\u003c/p\u003e\n\u003cdiv\u003e\n \u003ctable id=\"Tab4\" border=\"1\"\u003e\n \u003ccaption language=\"En\"\u003e\n \u003cdiv\u003eTable 4\u003c/div\u003e\n \u003cdiv\u003e\n \u003cp\u003eNumber of Nodes and Edges of each Graphs\u003c/p\u003e\n \u003c/div\u003e\n \u003c/caption\u003e\n \u003ccolgroup cols=\"6\"\u003e\u003c/colgroup\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eDatasets\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eProperties\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eAWED Similarity\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eER\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eBA\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eSBM\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" rowspan=\"2\"\u003e\n \u003cp\u003eWar \u0026amp; Conflict (W \u0026amp; C)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eNodes\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e64\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e64\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e64\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e64\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eEdges\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e197\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e427\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e183\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e269\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" rowspan=\"2\"\u003e\n \u003cp\u003eEnvironment \u0026amp; Global Worming (E \u0026amp; GW)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eNodes\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e245\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e245\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e245\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e244\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eEdges\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e2944\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e7544\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e6450\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e8074\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" rowspan=\"2\"\u003e\n \u003cp\u003eHate Speech \u0026amp; Racism (HS \u0026amp;R)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eNodes\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e449\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e449\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e449\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e449\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eEdges\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e12368\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e25420\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e12570\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e21237\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n\u003c/div\u003e\n\u003cp\u003e\u003cstrong\u003e5.1 Network properties\u003c/strong\u003e: The structural aspects of a network are fundamental characteristics that offer valuable insights into its topology, connection, and overall structure. As presented in Table\u0026nbsp;\u003cspan\u003e5\u003c/span\u003e, The evaluation of the graphs in this study is conducted using transitivity, assortativity, average clustering, edge density, average node connectedness, average degree, and total triangles.\u003c/p\u003e\n\u003cdiv\u003e\n \u003ctable id=\"Tab5\" border=\"1\"\u003e\n \u003ccaption language=\"En\"\u003e\n \u003cdiv\u003eTable 5\u003c/div\u003e\n \u003cdiv\u003e\n \u003cp\u003eNetwork properties of graphs\u003c/p\u003e\n \u003c/div\u003e\n \u003c/caption\u003e\n \u003ccolgroup cols=\"9\"\u003e\u003c/colgroup\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eDatasets\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eGraphs\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eTransitivity\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eAssortativity\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eAverage Clustering\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eEdge Density\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eAverage Node Connectivity\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eAverage Degree\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eTotal Triangles\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" rowspan=\"4\"\u003e\n \u003cp\u003eW \u0026amp; C\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAWED\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.762\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.514\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.521\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.175\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e5.073\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e11.031\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e4866\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eBA\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.177\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e-0.099\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.194\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.119\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e5.229\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e7.5\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e396\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eER\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.209\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.036\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.207\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.216\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e11.828\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e13.625\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e1218\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eSBM\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.235\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e-0.063\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.251\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.182\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e9.557\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e11.302\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e930\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" rowspan=\"4\"\u003e\n \u003cp\u003eE \u0026amp; G\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAWED\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.853\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.661\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.713\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.141\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e13.287\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e34.343\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e254331\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eBA\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.0704\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e-0.091\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.092\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.032\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e5.117\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e7.869\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e858\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eER\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.202\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e-0.019\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.202\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.201\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e45.786\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e49.102\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e59289\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eSBM\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.213\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e-0.002\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.214\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.173\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e39.052\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e42.147\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e45804\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" rowspan=\"4\"\u003e\n \u003cp\u003eHS \u0026amp; R\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAWED\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.762\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.371\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.399\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.112\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e18.391\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e50.316\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e1006215\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eBA\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.045\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e-0.074\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.066\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.018\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e5.103\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e7.929\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e1224\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eER\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.199\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e-0.010\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.200\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.200\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e84.916\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e89.621\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e359559\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eSBM\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.216\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.001\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.217\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.174\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e73.310\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e77.977\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e295041\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n\u003c/div\u003e\n\u003cp\u003eThe proposed AWED Similarity graph demonstrates significant advantages in capturing the complex structural characteristics of OSNs compared to graphs constructed synthetically. This is evident in the examination of network properties across three different datasets. In each dataset, the AWED Similarity graph consistently exhibits superior transitivity, assortativity, and average clustering coefficients compared to its synthetic counterparts (BA, ER, and SBM). This indicates that the AWED Similarity graph demonstrates exceptional performance in fostering local connectivity, strengthening the establishment of tightly-knit clusters, and allowing assortative interactions among nodes that share similar characteristics. Moreover, the moderate edge density of the AWED Similarity graph indicates a well-balanced representation of OSN connections.\u003c/p\u003e\n\u003cp\u003eThe AWED Similarity graph\u0026apos;s significantly higher total triangles highlight its ability to capture complex triadic relationships inside the network structure. This capacity is essential to comprehending how cohesive communities and linked subgroups emerge inside OSNs. However, low average node connectivity and average degree of AWED Similarity graph indicates a network where nodes tend to form distinct communities with strong internal connections but relatively fewer connections with nodes outside their immediate clusters. This implies that the AWED graph provides a more precise representation of the dynamic nature of social ties in real-world OSNs, transcending the constraints of synthetic graphs.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e5.2 OSN Characteristics\u003c/strong\u003e: As the power-law distribution, scale-free characteristics and small-world features are observed in real-world OSNs, we also compare our proposed AWED Similarity graphs with BA, ER and SBM graphs. The Table\u0026nbsp;\u003cspan\u003e6\u003c/span\u003e displays the power-law characteristics of the graphs, which are characterised by metrics including Alpha, Xmin, the Kolmogorov-Smirnov p-value, and the likelihood ratio. The combination of these metrics provides a thorough comprehension of the power-law distribution and scale-free characteristics.\u003c/p\u003e\n\u003cdiv\u003e\n \u003ctable id=\"Tab6\" border=\"1\"\u003e\n \u003ccaption language=\"En\"\u003e\n \u003cdiv\u003eTable 6\u003c/div\u003e\n \u003cdiv\u003e\n \u003cp\u003ePower Law Distribution\u003c/p\u003e\n \u003c/div\u003e\n \u003c/caption\u003e\n \u003ccolgroup cols=\"6\"\u003e\u003c/colgroup\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eDatasets\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eGraphs\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eAlpha\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eXmin\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eKS p-value\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eLikelihood ratio (power law vs. exponential)\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" rowspan=\"4\"\u003e\n \u003cp\u003eW \u0026amp; C\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAWED Similarity\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e24.165\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e23.0\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.160\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e(2.202, 0.028)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eBA\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e3.049\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e4.0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.085\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e(1.975, 0.048)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eER\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e6.381\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e12.0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.140\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e(-0.374, 0.708)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eSBM\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e14.528\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e14.0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.169\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e(0.568, 0.569)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" rowspan=\"4\"\u003e\n \u003cp\u003eE \u0026amp; GW\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAWED Similarity\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e21.660\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e75.0\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.122\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e(2.767, 0.005)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eBA\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e3.107\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e4.0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.059\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e(\u003cstrong\u003e6.552, 5.669\u003c/strong\u003e)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eER\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e9.854\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e47.0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.131\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e(-4.895, 9.801)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eSBM\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e20.049\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e47.0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.126\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e(-0.610, 0.541)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" rowspan=\"4\"\u003e\n \u003cp\u003eHS \u0026amp; R\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAWED Similarity\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e26.361\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e157.0\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.095\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e(0.307, 0.758)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eBA\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e3.197\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e9.0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.066\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e(\u003cstrong\u003e2.540, 0.011\u003c/strong\u003e)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eER\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e23.388\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e101.0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.073\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e(0.547, 0.584)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eSBM\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e17.823\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e81.0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.078\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e(-2.012, 0.044)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n\u003c/div\u003e\n\u003cp\u003eTo ascertain whether a network is scale-free, it exhibits a heavy-tailed distribution (represented by a high alpha), a wide range of node degrees (represented by a high Xmin), and the likelihood ratio provides support for the fit of a power-law distribution (Broido \u0026amp; Clauset, \u003cspan\u003e2019\u003c/span\u003e; Roux et al., \u003cspan\u003e2023\u003c/span\u003e). The AWED Similarity graph consistently demonstrates greater alpha values in comparison to synthetic graphs across all three datasets. The observed data in AWED Similarity exhibits a heavier-tailed distribution, which is indicative of a network structure comprising influential nodes. It also has higher Xmin values, and positive likelihood ratios, which collectively suggest that the AWED Similarity graphs exhibit scale-free characteristics. Furthermore, the AWED Similarity graph\u0026rsquo;s positive likelihood ratios across all datasets suggest adherence to a power-law distribution.\u003c/p\u003e\n\u003cp\u003eOn the other hand, The Barab\u0026aacute;si-Albert (BA) graph also exhibits positive likelihood ratios across all datasets, consistent with its established reputation as a robust model for generating scale-free networks. However, the AWED Similarity graph\u0026rsquo;s elevated alpha values and expansive Xmin range imply a superior representation of the heavy-tailed connectivity patterns characteristic of real-world social networks. This suggests that the AWED Similarity graph may offer a more accurate model for studying the complex dynamics of online social networks.\u003c/p\u003e\n\u003cp\u003eAs for small world characteristics, a network characterized by a high clustering coefficient and a short average path length. The Table\u0026nbsp;\u003cspan\u003e7\u003c/span\u003e provide information on the clustering coefficient and average path length of the graphs in all three datasets. The ER and SBM graphs consistently display small-world properties across all datasets. On the other hand, the BA graph consistently exhibits the lowest clustering coefficient, indicating a lower tendency for nodes to form local clusters. Concurrently, small-world characteristics are consistently displayed on the AWED Similarity graph for all three datasets. This is supported by the fact that its average path lengths are moderate, and its clustering coefficients are high, which indicate a harmonious balance between global connectivity and local clustering. This indicates that local connectivity and community formation may be prioritized over global information flow efficacy in the AWED Similarity graph. This characteristic is representative of social networks in the real world, where users often engage in conversations within their immediate communities while also maintaining connections with users from other communities. Consequently, the AWED Similarity graph potentially provides a more precise depiction of the intricate dynamics that are intrinsic to social networks in the real world.\u003c/p\u003e\n\u003cdiv\u003e\n \u003ctable id=\"Tab7\" border=\"1\"\u003e\n \u003ccaption language=\"En\"\u003e\n \u003cdiv\u003eTable 7\u003c/div\u003e\n \u003cdiv\u003e\n \u003cp\u003eSmall World Phenomenon\u003c/p\u003e\n \u003c/div\u003e\n \u003c/caption\u003e\n \u003ccolgroup cols=\"4\"\u003e\u003c/colgroup\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eDatasets\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eGraphs\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eClustering coefficient\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eAverage path length\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" rowspan=\"4\"\u003e\n \u003cp\u003eW \u0026amp; C\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAWED Similarity\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.521\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e2.647\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eBA\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.194\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e2.218\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eER\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.207\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e1.828\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eSBM\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.252\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e1.972\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" rowspan=\"4\"\u003e\n \u003cp\u003eE \u0026amp; GW\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAWED Similarity\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.717\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e4.593\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eBA\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.089\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e2.721\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eER\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.202\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e1.797\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eSBM\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.214\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e1.828\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" rowspan=\"4\"\u003e\n \u003cp\u003eHS \u0026amp; R\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAWED Similarity\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.399\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e3.003\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eBA\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.057\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e2.923\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eER\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.200\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e1.800\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eSBM\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.217\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e1.825\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n\u003c/div\u003e\n\u003cp\u003e\u003cstrong\u003e5.3 Predictive Analysis\u003c/strong\u003e: The goal of the predictive analysis is to recognize the performance of graphs when come from the same model. As suggested by Kumari et al. (\u003cspan\u003e2022\u003c/span\u003e) and Pulipati et al. (\u003cspan\u003e2021\u003c/span\u003e) graphs that perform better in predictive tasks may have clear and consistent structure. Therefore, for predictive analysis, we performed both link prediction and community detection.\u003c/p\u003e\n\u003cp\u003eIn this study, Node2Vec embeddings with the Random Forest Classifier was used to perform a link prediction task on the AWED similarity graph and synthetic graphs. The model\u0026rsquo;s parameter was set as 10 in order to ensure uniformity across all graphs. Figure\u0026nbsp;5 shows the accuracy, precision, recall, F1-Score and AUC-ROC of all the graphs in all three datasets.\u003c/p\u003e\n\u003cdiv\u003e\n\u003c/div\u003e\n\u003cp\u003eThe empirical findings demonstrate a remarkable consistency across all three datasets, with the AWED similarity graphs distinctly surpassing the BA, ER, and SBM graphs on accuracy, precision, recall, F1-score, and AUC-ROC. This observation substantiates the assertion that AWED similarity graphs exhibit superior accuracy, comprehensiveness, balance, and efficacy in the prediction of links compared to their synthetic counterparts. Furthermore, this evidence implies that AWED similarity graphs possess an enhanced capability to accurately represent the intricate and dynamic interconnections among nodes in real-world networks, thereby facilitating a broad spectrum of network analysis tasks.\u003c/p\u003e\n\u003cp\u003eIn our study, we utilized community detection as an approach to compare graphs, as it has the ability to provide valuable insights about the structure and organization of a graph. The Louvain algorithm was employed to detect communities, and their performance was assessed using modularity, silhouette, and conductance. Figure\u0026nbsp;6 shows the visual representation of the detected communities within each graph for the respective datasets. Table shows the evaluation metrics of the detected communities.\u003c/p\u003e\n\u003cdiv\u003e\n\u003c/div\u003e\n\u003cp\u003eThe findings from the analysis demonstrate a consistent trend across the three datasets. The results in Table\u0026nbsp;\u003cspan\u003e8\u003c/span\u003e show a consisting pattern in all three datasets. The metric of modularity is employed in order to evaluate the robustness of a network\u0026apos;s division into communities, with higher values indicating a more notable community organization. The graphs representing SBM and BA exhibit a greater degree of modularity, whilst the AWED Similarity graphs display a reasonable level of modularity throughout the datasets. Conversely, the ER graphs display the lowest scores in terms of modularity.\u003c/p\u003e\n\u003cp\u003eThe silhouette score, an indicator of similarity within clusters rather than between clusters, is maximized at higher values. The AWED Similarity graph demonstrates a high level of positive silhouette score among all graphs, indicating a strong and coherent identification of the community, therefore demonstrating exceptional performance in this metric. Conductance, which evaluates the quality of a community by comparing the number of edges within the community to the number of edges between different communities, tends to exhibit lower values. The minimal conductance of the AWED Similarity graph suggests that this network exhibits a higher level of internal interconnectedness among its communities, while having fewer connections between communities, in contrast to the other graph models.\u003c/p\u003e\n\u003cdiv\u003e\n \u003ctable id=\"Tab8\" border=\"1\"\u003e\n \u003ccaption language=\"En\"\u003e\n \u003cdiv\u003eTable 8\u003c/div\u003e\n \u003cdiv\u003e\n \u003cp\u003eCommunity Detection Evaluation\u003c/p\u003e\n \u003c/div\u003e\n \u003c/caption\u003e\n \u003ccolgroup cols=\"6\"\u003e\u003c/colgroup\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eDatasets\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eGraphs\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eModularity\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eSilhouette Score\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eConductance\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eNumber of Communities\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" rowspan=\"4\"\u003e\n \u003cp\u003eW\u0026amp; C\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAWED Similarity\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.248\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.044\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.137\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e4\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eBA\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.284\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e-0.503\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.258\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e6\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eER\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.186\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e-0.213\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.302\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e6\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eSBM\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.315\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e-0.114\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.216\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e4\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" rowspan=\"4\"\u003e\n \u003cp\u003eE \u0026amp; GW\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAWED Similarity\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.243\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.179\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.102\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e4\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eBA\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.311\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e-0.436\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.276\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e12\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eER\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.107\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e-0.169\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.380\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e7\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eSBM\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.313\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e-0.051\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.218\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e4\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" rowspan=\"4\"\u003e\n \u003cp\u003eHS \u0026amp; R\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAWED Similarity\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.216\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.151\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.170\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e5\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eBA\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.315\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e-0.326\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.285\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e10\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eER\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.079\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e-0.107\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.397\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e7\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eSBM\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.320\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e-0.057\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.215\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e4\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n\u003c/div\u003e\n\u003cp\u003eThe AWED graph has a moderate level of modularity, indicating that its community structure possesses a certain degree of flexibility and interconnection, while avoiding an excessive degree of rigidity or fragmentation. This observation suggests the presence of a complex network structure. In addition, AWED demonstrates a harmonious community structure, emphasized by its positive silhouette score and minimal conductance. The silhouette score indicates a clear and cohesive internal structure within the identified communities, while the low conductivity indicates a lack of extensive outward linkages. Overall, the metrics indicate that AWED demonstrates a significant and well-structured community, successfully maintaining a balance between internal consistency and differentiation from other areas of the graph.\u003c/p\u003e"},{"header":"Conclusion","content":"\u003cp\u003eThe challenges associated with extracting significant network structures from real-time Twitter data motivated us to investigate alternate approaches in this study. As a result, we presented a novel model for generating a user-attribute-based similarity graph. This model employs publicly available Twitter data to connect OSN users based on their attribute\u0026rsquo;s similarities. The attributes used for this study are readily derivable from the Twitter platform in real-time through the use of the Twitter API. To measure the similarity coefficient between users, a novel method termed Attribute-Weighted Euclidean Distance (AWED) is introduced. In order to evaluate the efficacy of our method, we compared the proposed graphs with synthetically generated graphs, considering network properties, OSN characteristics, and predictive analyses. The AWED Similarity graph demonstrates superior performance in terms of local connectivity, cluster formation, and assortative interactions when compared to synthetic graphs. It displays scale-free characteristics with significant nodes and robust community structures. In the context of link prediction, AWED surpasses synthetic graphs in terms of accuracy, precision, recall, F1-score, and AUC-ROC, thereby exhibiting a higher level of predictive accuracy. Moreover, it effectively maintains a balance between community detection and inter-community connections, while also possessing a more pronounced degree of definition in comparison to synthetic graphs. These findings indicate that the AWED graph provides a more precise depiction of the dynamic characteristics of social connections in real-world OSNs, overcoming the constraints of synthetic graphs. This enhancement facilitates a wide range of network analysis tasks. Future study will focus on the implementation of OSN applications on the generated graph, with the aim of facilitating the real-time detection and analysis of real-world events.\u003c/p\u003e"},{"header":"Declarations","content":"\u003ch2\u003eAuthor Contribution\u003c/h2\u003e\u003cp\u003eMd Ahsan Ul Hasan: Conceptualization, Methodology, Investigation, Data curation, Writing \u0026ndash; Original DraftAzuraliza Abu Bakar: Conceptualization, Funding acquisition, Methodology, Supervision, Project administration, Writing \u0026ndash; Reviewing and Editing.Mohd Ridzwan Yaakub: Conceptualization, Methodology, Supervision, Resources, Validation, Writing \u0026ndash; Reviewing and Editing\u003c/p\u003e\u003cp\u003e\u003cstrong\u003eAcknowledgement\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis research is supported by the Fundamental Research Grant Scheme (FRGS/1/2020/ICT02/UKM/01/2) of the Ministry of Higher Education Malaysia.\u0026nbsp;\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eAgrawal, G., Kaur, A., \u0026amp; Myneni, S. (2024). A Review of Generative Models in Generating Synthetic Attack Data for Cybersecurity. \u003cem\u003eElectronics\u003c/em\u003e,\u003cem\u003e 13\u003c/em\u003e(2), 322. https://www.mdpi.com/2079-9292/13/2/322 \u003c/li\u003e\n\u003cli\u003eAl Musawi, A. F., Roy, S., \u0026amp; Ghosh, P. (2022). Identifying accurate link predictors based on assortativity of complex networks. \u003cem\u003eSci Rep\u003c/em\u003e,\u003cem\u003e 12\u003c/em\u003e(1), 18107. https://doi.org/10.1038/s41598-022-22843-4 \u003c/li\u003e\n\u003cli\u003eAlam, S., Ayub, M. S., Arora, S., \u0026amp; Khan, M. A. (2023). An investigation of the imputation techniques for missing values in ordinal data enhancing clustering and classification analysis validity. \u003cem\u003eDecision Analytics Journal\u003c/em\u003e,\u003cem\u003e 9\u003c/em\u003e, 100341. https://doi.org/https://doi.org/10.1016/j.dajour.2023.100341 \u003c/li\u003e\n\u003cli\u003eAlghobiri, M. (2023). Exploring the attributes of influential users in social networks using association rule mining. \u003cem\u003eSocial Network Analysis and Mining\u003c/em\u003e,\u003cem\u003e 13\u003c/em\u003e(1), 118. https://doi.org/10.1007/s13278-023-01118-4 \u003c/li\u003e\n\u003cli\u003eAltenburger, K. M., \u0026amp; Ugander, J. (2018). Monophily in social networks introduces similarity among friends-of-friends. \u003cem\u003eNat Hum Behav\u003c/em\u003e,\u003cem\u003e 2\u003c/em\u003e(4), 284-290. https://doi.org/10.1038/s41562-018-0321-8 \u003c/li\u003e\n\u003cli\u003eAsadi, M., \u0026amp; Agah, A. (2018). Characterizing user influence within twitter. In \u003cem\u003eLecture Notes on Data Engineering and Communications Technologies\u003c/em\u003e (Vol. 13, pp. 122-132). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-319-69835-9_11 \u003c/li\u003e\n\u003cli\u003eAziz, F., Slater, L. T., Bravo-Merodio, L., Acharjee, A., \u0026amp; Gkoutos, G. V. (2023). Link prediction in complex network using information flow. \u003cem\u003eSci Rep\u003c/em\u003e,\u003cem\u003e 13\u003c/em\u003e(1), 14660. https://doi.org/10.1038/s41598-023-41476-9 \u003c/li\u003e\n\u003cli\u003eBazzaz Abkenar, S., Haghi Kashani, M., Mahdipour, E., \u0026amp; Jameii, S. M. (2021). Big data analytics meets social media: A systematic review of techniques, open issues, and future directions. \u003cem\u003eTelematics and Informatics\u003c/em\u003e,\u003cem\u003e 57\u003c/em\u003e, 101517. https://doi.org/https://doi.org/10.1016/j.tele.2020.101517 \u003c/li\u003e\n\u003cli\u003eBeineke, L. W., Oellermann, O. R., \u0026amp; Pippert, R. E. (2002). The average connectivity of a graph. \u003cem\u003eDiscrete Mathematics\u003c/em\u003e,\u003cem\u003e 252\u003c/em\u003e(1), 31-45. https://doi.org/https://doi.org/10.1016/S0012-365X(01)00180-7 \u003c/li\u003e\n\u003cli\u003eBhattacharya, S., Sinha, S., Roy, S., \u0026amp; Gupta, A. (2020). Towards finding the best-fit distribution for OSN data. \u003cem\u003eThe Journal of Supercomputing\u003c/em\u003e,\u003cem\u003e 76\u003c/em\u003e(12), 9882-9900. https://doi.org/10.1007/s11227-020-03232-y \u003c/li\u003e\n\u003cli\u003eBlock, P. E. R., \u0026amp; Grund, T. (2014). Multidimensional homophily in friendship networks. \u003cem\u003eNetwork Science\u003c/em\u003e,\u003cem\u003e 2\u003c/em\u003e(2), 189-212. https://doi.org/10.1017/nws.2014.17 \u003c/li\u003e\n\u003cli\u003eBodaghi, A., \u0026amp; Oliveira, J. (2022). The theater of fake news spreading, who plays which role? A study on real graphs of spreading on Twitter. \u003cem\u003eExpert Systems with Applications\u003c/em\u003e,\u003cem\u003e 189\u003c/em\u003e. https://doi.org/10.1016/j.eswa.2021.116110 \u003c/li\u003e\n\u003cli\u003eBroido, A. D., \u0026amp; Clauset, A. (2019). Scale-free networks are rare. \u003cem\u003eNat Commun\u003c/em\u003e,\u003cem\u003e 10\u003c/em\u003e(1), 1017. https://doi.org/10.1038/s41467-019-08746-5 \u003c/li\u003e\n\u003cli\u003eCheng, Z., \u0026amp; Yan, A. (2023). A case weighted similarity deep measurement method based on a self-attention Siamese neural network. \u003cem\u003eIndustrial Artificial Intelligence\u003c/em\u003e,\u003cem\u003e 1\u003c/em\u003e(1), 2. https://doi.org/10.1007/s44244-022-00002-y \u003c/li\u003e\n\u003cli\u003eDavid-Barrett, T. (2020). Herding Friends in Similarity-Based Architecture of Social Networks. \u003cem\u003eScientific Reports\u003c/em\u003e,\u003cem\u003e 10\u003c/em\u003e(1), 4859. https://doi.org/10.1038/s41598-020-61330-6 \u003c/li\u003e\n\u003cli\u003eDe Nicola, R., Petrocchi, M., \u0026amp; Pratelli, M. (2021). On the efficacy of old features for the detection of new bots. \u003cem\u003eInformation Processing \u0026amp; Management\u003c/em\u003e,\u003cem\u003e 58\u003c/em\u003e(6), 102685. https://doi.org/https://doi.org/10.1016/j.ipm.2021.102685 \u003c/li\u003e\n\u003cli\u003ede Andrade, R. L., \u0026amp; R\u0026ecirc;go, L. C. (2018). The use of nodes attributes in social network analysis with an application to an international trade network. \u003cem\u003ePhysica A: Statistical Mechanics and its Applications\u003c/em\u003e,\u003cem\u003e 491\u003c/em\u003e, 249-270. https://doi.org/https://doi.org/10.1016/j.physa.2017.08.126 \u003c/li\u003e\n\u003cli\u003eEvkoski, B., Kralj Novak, P., \u0026amp; Ljube\u0026scaron;ić, N. (2023). Content-based comparison of communities in social networks: Ex-Yugoslavian reactions to the Russian invasion of Ukraine. \u003cem\u003eApplied Network Science\u003c/em\u003e,\u003cem\u003e 8\u003c/em\u003e(1), 40. https://doi.org/10.1007/s41109-023-00561-8 \u003c/li\u003e\n\u003cli\u003eFaez, F., Hashemi Dijujin, N., Soleymani Baghshah, M., \u0026amp; Rabiee, H. R. (2022). SCGG: A deep structure-conditioned graph generative model. \u003cem\u003ePLoS One\u003c/em\u003e,\u003cem\u003e 17\u003c/em\u003e(11), e0277887. https://doi.org/10.1371/journal.pone.0277887 \u003c/li\u003e\n\u003cli\u003eFu, X., \u0026amp; Shen, Y. (2014). Study of collective user behaviour in Twitter: a fuzzy approach. \u003cem\u003eNeural Computing and Applications\u003c/em\u003e,\u003cem\u003e 25\u003c/em\u003e(7), 1603-1614. https://doi.org/10.1007/s00521-014-1642-9 \u003c/li\u003e\n\u003cli\u003eGuan, L., Liu, X. F., Sun, W., Liang, H., \u0026amp; Zhu, J. J. H. (2022). Census of Twitter users: Scraping and describing the national network of South Korea. \u003cem\u003ePLoS One\u003c/em\u003e,\u003cem\u003e 17\u003c/em\u003e(11), e0277549. https://doi.org/10.1371/journal.pone.0277549 \u003c/li\u003e\n\u003cli\u003eGui, C. (2024). Link prediction based on spectral analysis. \u003cem\u003ePLoS One\u003c/em\u003e,\u003cem\u003e 19\u003c/em\u003e(1), e0287385. https://doi.org/10.1371/journal.pone.0287385 \u003c/li\u003e\n\u003cli\u003eHasan, M. A. U., Bakar, A. A., \u0026amp; Yaakub, M. R. (2024, 3-5 Jan. 2024). Detecting Community Through User Similarity Analysis on Twitter. 2024 18th International Conference on Ubiquitous Information Management and Communication (IMCOM), \u003c/li\u003e\n\u003cli\u003eHasan, M. A. U., Bakar, A. A., \u0026amp; Yaakub, M. R. (2024). Measuring User Influence in Real-Time on Twitter Using Behavioural Features. \u003cem\u003ePhysica A: Statistical Mechanics and its Applications\u003c/em\u003e, 129662. https://doi.org/https://doi.org/10.1016/j.physa.2024.129662 \u003c/li\u003e\n\u003cli\u003eHromic, H., \u0026amp; Hayes, C. (2019). Characterising and evaluating dynamic online communities from live microblogging user interactions. \u003cem\u003eSocial Network Analysis and Mining\u003c/em\u003e,\u003cem\u003e 9\u003c/em\u003e(1), 30. https://doi.org/10.1007/s13278-019-0576-8 \u003c/li\u003e\n\u003cli\u003eHu, Y., Wang, W., \u0026amp; Yu, Y. (2022). Graph matching beyond perfectly-overlapping Erdős\u0026ndash;R\u0026eacute;nyi random graphs. \u003cem\u003eStatistics and Computing\u003c/em\u003e,\u003cem\u003e 32\u003c/em\u003e(1), 19. https://doi.org/10.1007/s11222-022-10079-1 \u003c/li\u003e\n\u003cli\u003eHuynh, T., Nguyen, H. D., Zelinka, I., Pham, X. H., Pham, V. T., Selamat, A., \u0026amp; Krejcar, O. (2022). A method to detect influencers in social networks based on the combination of amplification factors and content creation. \u003cem\u003ePLoS One\u003c/em\u003e,\u003cem\u003e 17\u003c/em\u003e(10), e0274596. https://doi.org/10.1371/journal.pone.0274596 \u003c/li\u003e\n\u003cli\u003eIqbal, S., Khan, H. U., Ishfaq, U., Alghobiri, M., \u0026amp; Iqbal, S. (2021). Finding influential users in social networks based on novel features \u0026amp; link-based analysis. \u003cem\u003eJ. Intell. Fuzzy Syst.\u003c/em\u003e,\u003cem\u003e 40\u003c/em\u003e(1), 1623\u0026ndash;1637. https://doi.org/10.3233/jifs-201036 \u003c/li\u003e\n\u003cli\u003eJain, A. K., Sahoo, S. R., \u0026amp; Kaubiyal, J. (2021). Online social networks security and privacy: comprehensive review and analysis. \u003cem\u003eComplex \u0026amp; Intelligent Systems\u003c/em\u003e,\u003cem\u003e 7\u003c/em\u003e(5), 2157-2177. https://doi.org/10.1007/s40747-021-00409-7 \u003c/li\u003e\n\u003cli\u003eJia, W., Ma, R., Yan, L., Niu, W., \u0026amp; Ma, Z. (2022). TT-graph: A new model for building social network graphs from texts with time series. \u003cem\u003eExpert Systems with Applications\u003c/em\u003e,\u003cem\u003e 192\u003c/em\u003e, 116405. https://doi.org/https://doi.org/10.1016/j.eswa.2021.116405 \u003c/li\u003e\n\u003cli\u003eJiang, N., Crooks, A. T., Kavak, H., Burger, A., \u0026amp; Kennedy, W. G. (2022). A method to create a synthetic population with social networks for geographically-explicit agent-based models. \u003cem\u003eComputational Urban Science\u003c/em\u003e,\u003cem\u003e 2\u003c/em\u003e(1), 7. https://doi.org/10.1007/s43762-022-00034-1 \u003c/li\u003e\n\u003cli\u003eKanavos, A., Karamitsos, I., \u0026amp; Mohasseb, A. (2023). Exploring Clustering Techniques for Analyzing User Engagement Patterns in Twitter Data. \u003cem\u003eComputers\u003c/em\u003e,\u003cem\u003e 12\u003c/em\u003e(6). https://doi.org/10.3390/computers12060124 \u003c/li\u003e\n\u003cli\u003eKerrache, S., Alharbi, R., \u0026amp; Benhidour, H. (2020). A Scalable Similarity-Popularity Link Prediction Method. \u003cem\u003eScientific Reports\u003c/em\u003e,\u003cem\u003e 10\u003c/em\u003e(1), 6394. https://doi.org/10.1038/s41598-020-62636-1 \u003c/li\u003e\n\u003cli\u003eKim, J., Jeong, S., \u0026amp; Lim, S. (2022). Link Pruning for Community Detection in Social Networks. \u003cem\u003eApplied Sciences\u003c/em\u003e,\u003cem\u003e 12\u003c/em\u003e(13). https://doi.org/10.3390/app12136811 \u003c/li\u003e\n\u003cli\u003eKubina, R. M., Kostewicz, D. E., Brennan, K. M., \u0026amp; King, S. A. (2017). A Critical Review of Line Graphs in Behavior Analytic Journals. \u003cem\u003eEducational Psychology Review\u003c/em\u003e,\u003cem\u003e 29\u003c/em\u003e(3), 583-598. https://doi.org/10.1007/s10648-015-9339-x \u003c/li\u003e\n\u003cli\u003eKumari, A., Behera, R. K., Sahoo, B., \u0026amp; Sahoo, S. P. (2022). Prediction of link evolution using community detection in social network. \u003cem\u003eComputing\u003c/em\u003e,\u003cem\u003e 104\u003c/em\u003e(5), 1077-1098. https://doi.org/10.1007/s00607-021-01035-4 \u003c/li\u003e\n\u003cli\u003eLee, C., \u0026amp; Wilkinson, D. J. (2019). A review of stochastic block models and extensions for graph clustering. \u003cem\u003eApplied Network Science\u003c/em\u003e,\u003cem\u003e 4\u003c/em\u003e(1), 122. https://doi.org/10.1007/s41109-019-0232-2 \u003c/li\u003e\n\u003cli\u003eLi, Y., Yang, L., Xu, B., Wang, J., \u0026amp; Lin, H. (2019). Improving User Attribute Classification with Text and Social Network Attention. \u003cem\u003eCognitive Computation\u003c/em\u003e,\u003cem\u003e 11\u003c/em\u003e(4), 459-468. https://doi.org/10.1007/s12559-019-9624-y \u003c/li\u003e\n\u003cli\u003eLim, S. L., \u0026amp; Bentley, P. J. (2022). Opinion amplification causes extreme polarization in social networks. \u003cem\u003eScientific Reports\u003c/em\u003e,\u003cem\u003e 12\u003c/em\u003e(1), 18131. https://doi.org/10.1038/s41598-022-22856-z \u003c/li\u003e\n\u003cli\u003eLogan, A. P., LaCasse, P. M., \u0026amp; Lunday, B. J. (2023). Social network analysis of Twitter interactions: a directed multilayer network approach. \u003cem\u003eSoc Netw Anal Min\u003c/em\u003e,\u003cem\u003e 13\u003c/em\u003e(1), 65. https://doi.org/10.1007/s13278-023-01063-2 \u003c/li\u003e\n\u003cli\u003eMahmoudi, A., Yaakub, M. R., \u0026amp; Abu Bakar, A. (2018). New time-based model to identify the influential users in online social networks. \u003cem\u003eData Technologies and Applications\u003c/em\u003e,\u003cem\u003e 52\u003c/em\u003e(2), 278-290. https://doi.org/10.1108/DTA-08-2017-0056 \u003c/li\u003e\n\u003cli\u003eMariani, P., Marletta, A., Mussini, M., Zenga, M., \u0026amp; Grammatica, E. (2020). A missing value approach to social network data: \u0026ldquo;Dislike\u0026rdquo; or \u0026ldquo;Nothing\u0026rdquo;? \u003cem\u003eComputational Management Science\u003c/em\u003e,\u003cem\u003e 17\u003c/em\u003e(4), 569-583. https://doi.org/10.1007/s10287-020-00381-6 \u003c/li\u003e\n\u003cli\u003eMarkos, E., Pe\u0026ntilde;a, P., Labrecque, L. I., \u0026amp; Swani, K. (2023). Are data breaches the new norm? Exploring data breach trends, consumer sentiment, and responses to security invasions. \u003cem\u003eJournal of Consumer Affairs\u003c/em\u003e,\u003cem\u003e 57\u003c/em\u003e(3), 1089-1119. https://doi.org/https://doi.org/10.1111/joca.12554 \u003c/li\u003e\n\u003cli\u003eMasrom, M. B., Busalim, A. H., Abuhassna, H., \u0026amp; Mahmood, N. H. N. (2021). Understanding students\u0026rsquo; behavior in online social networks: a systematic literature review. \u003cem\u003eInternational Journal of Educational Technology in Higher Education\u003c/em\u003e,\u003cem\u003e 18\u003c/em\u003e(1), 6. https://doi.org/10.1186/s41239-021-00240-7 \u003c/li\u003e\n\u003cli\u003eMcMillan, C., Felmlee, D., \u0026amp; Ashford, J. R. (2022). Reciprocity, transitivity, and skew: Comparing local structure in 40 positive and negative social networks. \u003cem\u003ePLoS One\u003c/em\u003e,\u003cem\u003e 17\u003c/em\u003e(5), e0267886. https://doi.org/10.1371/journal.pone.0267886 \u003c/li\u003e\n\u003cli\u003eMislove, A., Marcon, M., Gummadi, K. P., Druschel, P., \u0026amp; Bhattacharjee, B. (2007). \u003cem\u003eMeasurement and analysis of online social networks\u003c/em\u003e Proceedings of the 7th ACM SIGCOMM conference on Internet measurement, San Diego, California, USA. https://doi.org/10.1145/1298306.1298311\u003c/li\u003e\n\u003cli\u003eMyers, S. A., \u0026amp; Leskovec, J. (2010). \u003cem\u003eOn the convexity of latent social network inference\u003c/em\u003e Proceedings of the 23rd International Conference on Neural Information Processing Systems - Volume 2, Vancouver, British Columbia, Canada. \u003c/li\u003e\n\u003cli\u003eNeal, Z. P. (2017). How small is it? Comparing indices of small worldliness. \u003cem\u003eNetwork Science\u003c/em\u003e,\u003cem\u003e 5\u003c/em\u003e(1), 30-44. https://doi.org/10.1017/nws.2017.5 \u003c/li\u003e\n\u003cli\u003eNettleton, D. F. (2016). A synthetic data generator for online social network graphs. \u003cem\u003eSocial Network Analysis and Mining\u003c/em\u003e,\u003cem\u003e 6\u003c/em\u003e(1), 44. https://doi.org/10.1007/s13278-016-0352-y \u003c/li\u003e\n\u003cli\u003eNikolentzos, G., Vazirgiannis, M., Xypolopoulos, C., Lingman, M., \u0026amp; Brandt, E. G. (2023). Synthetic electronic health records generated with variational graph autoencoders. \u003cem\u003enpj Digital Medicine\u003c/em\u003e,\u003cem\u003e 6\u003c/em\u003e(1), 83. https://doi.org/10.1038/s41746-023-00822-x \u003c/li\u003e\n\u003cli\u003eO\u0026rsquo;Neil, D. A., \u0026amp; Petty, M. D. (2019). Heuristic methods for synthesizing realistic social networks based on personality compatibility. \u003cem\u003eApplied Network Science\u003c/em\u003e,\u003cem\u003e 4\u003c/em\u003e(1). https://doi.org/10.1007/s41109-019-0117-4 \u003c/li\u003e\n\u003cli\u003eOhme, J., Araujo, T., Boeschoten, L., Freelon, D., Ram, N., Reeves, B. B., \u0026amp; Robinson, T. N. (2023). Digital Trace Data Collection for Social Media Effects Research: APIs, Data Donation, and (Screen) Tracking. \u003cem\u003eCommunication Methods and Measures\u003c/em\u003e, 1-18. https://doi.org/10.1080/19312458.2023.2181319 \u003c/li\u003e\n\u003cli\u003ePanchendrarajan, R., \u0026amp; Saxena, A. (2023). Topic-based influential user detection: a survey. \u003cem\u003eApplied Intelligence\u003c/em\u003e,\u003cem\u003e 53\u003c/em\u003e(5), 5998-6024. https://doi.org/10.1007/s10489-022-03831-7 \u003c/li\u003e\n\u003cli\u003ePiccardi, C. (2023). Metrics for network comparison using egonet feature distributions. \u003cem\u003eSci Rep\u003c/em\u003e,\u003cem\u003e 13\u003c/em\u003e(1), 14657. https://doi.org/10.1038/s41598-023-40938-4 \u003c/li\u003e\n\u003cli\u003ePulipati, S., Somula, R., \u0026amp; Parvathala, B. R. (2021). Nature inspired link prediction and community detection algorithms for social networks: a survey. \u003cem\u003eInternational Journal of System Assurance Engineering and Management\u003c/em\u003e. https://doi.org/10.1007/s13198-021-01125-8 \u003c/li\u003e\n\u003cli\u003eRothwell, L. (2023, Jul 13, 2023). \u003cem\u003eUnderstanding the Recent Changes to Twitter API: A complete guide\u003c/em\u003e. Blaze. Retrieved January 2, 2024 from https://www.withblaze.app/blog/understanding-the-recent-changes-to-twitter-api-a-complete-guide\u003c/li\u003e\n\u003cli\u003eRoux, J., Bez, N., Rochet, P., Joo, R., \u0026amp; Mahevas, S. (2023). Graphlet correlation distance to compare small graphs. \u003cem\u003ePLoS One\u003c/em\u003e,\u003cem\u003e 18\u003c/em\u003e(2), e0281646. https://doi.org/10.1371/journal.pone.0281646 \u003c/li\u003e\n\u003cli\u003eSaarela, M., \u0026amp; Jauhiainen, S. (2021). Comparison of feature importance measures as explanations for classification models. \u003cem\u003eSN Applied Sciences\u003c/em\u003e,\u003cem\u003e 3\u003c/em\u003e(2), 272. https://doi.org/10.1007/s42452-021-04148-9 \u003c/li\u003e\n\u003cli\u003eSchwyck, M. E., Du, M., Li, Y., Chang, L. J., \u0026amp; Parkinson, C. (2023). Similarity Among Friends Serves as a Social Prior: The Assumption That \u0026ldquo;Birds of a Feather Flock Together\u0026rdquo; Shapes Social Decisions and Relationship Beliefs. \u003cem\u003ePersonality and Social Psychology Bulletin\u003c/em\u003e,\u003cem\u003e 0\u003c/em\u003e(0), 01461672221140269. https://doi.org/10.1177/01461672221140269 \u003c/li\u003e\n\u003cli\u003eShahraeini, M. (2023). Modified Erdős\u0026ndash;R\u0026eacute;nyi Random Graph Model for Generating Synthetic Power Grids. \u003cem\u003eIEEE Systems Journal\u003c/em\u003e, 1-12. https://doi.org/10.1109/JSYST.2023.3339664 \u003c/li\u003e\n\u003cli\u003eShantal, M., Othman, Z., \u0026amp; Bakar, A. A. (2023). A Novel Approach for Data Feature Weighting Using Correlation Coefficients and Min\u0026amp;ndash;Max Normalization. \u003cem\u003eSymmetry\u003c/em\u003e,\u003cem\u003e 15\u003c/em\u003e(12), 2185. https://www.mdpi.com/2073-8994/15/12/2185 \u003c/li\u003e\n\u003cli\u003eShoeibi, N., Shoeibi, N., Chamoso, P., Alizadehsani, Z., \u0026amp; Corchado, J. M. (2022). A Hybrid Model for the Measurement of the Similarity between Twitter Profiles. \u003cem\u003eSustainability\u003c/em\u003e,\u003cem\u003e 14\u003c/em\u003e(9), 4909. https://www.mdpi.com/2071-1050/14/9/4909 \u003c/li\u003e\n\u003cli\u003eStark, T. H. (2018). Collecting Social Network Data. In D. L. Vannette \u0026amp; J. A. Krosnick (Eds.), \u003cem\u003eThe Palgrave Handbook of Survey Research\u003c/em\u003e (pp. 241-254). Springer International Publishing. https://doi.org/10.1007/978-3-319-54395-6_31 \u003c/li\u003e\n\u003cli\u003eTalaga, S., \u0026amp; Nowak, A. (2022). Structural measures of similarity and complementarity in complex networks. \u003cem\u003eSci Rep\u003c/em\u003e,\u003cem\u003e 12\u003c/em\u003e(1), 16580. https://doi.org/10.1038/s41598-022-20710-w \u003c/li\u003e\n\u003cli\u003eTantardini, M., Ieva, F., Tajoli, L., \u0026amp; Piccardi, C. (2019). Comparing methods for comparing networks. \u003cem\u003eSci Rep\u003c/em\u003e,\u003cem\u003e 9\u003c/em\u003e(1), 17557. https://doi.org/10.1038/s41598-019-53708-y \u003c/li\u003e\n\u003cli\u003eToraman, C., Şahinu\u0026ccedil;, F., Yilmaz, E. H., \u0026amp; Akkaya, I. B. (2022). Understanding social engagements: A comparative analysis of user and text features in Twitter. \u003cem\u003eSocial Network Analysis and Mining\u003c/em\u003e,\u003cem\u003e 12\u003c/em\u003e(1), 47. https://doi.org/10.1007/s13278-022-00872-1 \u003c/li\u003e\n\u003cli\u003eVasques Filho, D., \u0026amp; O\u0026apos;Neale, D. R. J. (2020). Transitivity and degree assortativity explained: The bipartite structure of social networks. \u003cem\u003ePhysical Review E\u003c/em\u003e,\u003cem\u003e 101\u003c/em\u003e(5), 052305. https://doi.org/10.1103/PhysRevE.101.052305 \u003c/li\u003e\n\u003cli\u003eVenturini, T., \u0026amp; Rogers, R. (2019). \u0026ldquo;API-Based Research\u0026rdquo; or How can Digital Sociology and Journalism Studies Learn from the Facebook and Cambridge Analytica Data Breach. \u003cem\u003eDigital Journalism\u003c/em\u003e,\u003cem\u003e 7\u003c/em\u003e(4), 532-540. https://doi.org/10.1080/21670811.2019.1591927 \u003c/li\u003e\n\u003cli\u003eVerstraaten, M., Varbanescu, A. L., \u0026amp; de Laat, C. (2017, 2017//). Synthetic Graph Generation for Systematic Exploration of Graph Structural Properties. Euro-Par 2016: Parallel Processing Workshops, Cham.\u003c/li\u003e\n\u003cli\u003eWang, M., \u0026amp; Ma, J. (2016). A novel recommendation approach based on users\u0026rsquo; weighted trust relations and the rating similarities. \u003cem\u003eSoft Computing\u003c/em\u003e,\u003cem\u003e 20\u003c/em\u003e(10), 3981-3990. https://doi.org/10.1007/s00500-015-1734-1 \u003c/li\u003e\n\u003cli\u003eWang, T., Brede, M., Ianni, A., \u0026amp; Mentzakis, E. (2018). Social interactions in online eating disorder communities: A network perspective. \u003cem\u003ePLoS One\u003c/em\u003e,\u003cem\u003e 13\u003c/em\u003e(7), e0200800. https://doi.org/10.1371/journal.pone.0200800 \u003c/li\u003e\n\u003cli\u003eWeber, D., Nasim, M., Mitchell, L., \u0026amp; Falzon, L. (2021). Exploring the effect of streamed social media data variations on social network analysis. \u003cem\u003eSocial Network Analysis and Mining\u003c/em\u003e,\u003cem\u003e 11\u003c/em\u003e(1), 62. https://doi.org/10.1007/s13278-021-00770-y \u003c/li\u003e\n\u003cli\u003eWei, X., Zhao, J., Liu, S., \u0026amp; Wang, Y. (2022). Identifying influential spreaders in complex networks for disease spread and control. \u003cem\u003eScientific Reports\u003c/em\u003e,\u003cem\u003e 12\u003c/em\u003e(1), 5550. https://doi.org/10.1038/s41598-022-09341-3 \u003c/li\u003e\n\u003cli\u003eWills, P., \u0026amp; Meyer, F. G. (2020). Metrics for graph comparison: A practitioner\u0026rsquo;s guide. \u003cem\u003ePLoS One\u003c/em\u003e,\u003cem\u003e 15\u003c/em\u003e(2), e0228728. https://doi.org/10.1371/journal.pone.0228728 \u003c/li\u003e\n\u003cli\u003eXu, Y., Ren, T., \u0026amp; Sun, S. (2022). Community Detection Based on Node Influence and Similarity of Nodes. \u003cem\u003eMathematics\u003c/em\u003e,\u003cem\u003e 10\u003c/em\u003e(6). https://doi.org/10.3390/math10060970 \u003c/li\u003e\n\u003cli\u003eYilmaz, E. A., Balcisoy, S., \u0026amp; Bozkaya, B. (2023). A link prediction-based recommendation system using transactional data. \u003cem\u003eSci Rep\u003c/em\u003e,\u003cem\u003e 13\u003c/em\u003e(1), 6905. https://doi.org/10.1038/s41598-023-34055-5 \u003c/li\u003e\n\u003cli\u003eYuliansyah, H., Othman, Z. A., \u0026amp; Bakar, A. A. (2023). A new link prediction method to alleviate the cold-start problem based on extending common neighbor and degree centrality. \u003cem\u003ePhysica A: Statistical Mechanics and its Applications\u003c/em\u003e,\u003cem\u003e 616\u003c/em\u003e, 128546. https://doi.org/https://doi.org/10.1016/j.physa.2023.128546 \u003c/li\u003e\n\u003cli\u003eZareie, A., \u0026amp; Sakellariou, R. (2020). Similarity-based link prediction in social networks using latent relationships between the users. \u003cem\u003eSci Rep\u003c/em\u003e,\u003cem\u003e 10\u003c/em\u003e(1), 20137. https://doi.org/10.1038/s41598-020-76799-4 \u003c/li\u003e\n\u003cli\u003eZhang, S., Zhang, Y., Zhou, M., \u0026amp; Peng, L. (2020). Community detection based on similarities of communication behavior in IP networks. \u003cem\u003eJournal of Ambient Intelligence and Humanized Computing\u003c/em\u003e,\u003cem\u003e 13\u003c/em\u003e(3), 1451-1461. https://doi.org/10.1007/s12652-020-02681-w \u003c/li\u003e\n\u003cli\u003eZhao, S., Sun, J., Shimizu, K., \u0026amp; Kadota, K. (2018). Silhouette Scores for Arbitrary Defined Groups in Gene Expression Data and Insights into Differential Expression Results. \u003cem\u003eBiological Procedures Online\u003c/em\u003e,\u003cem\u003e 20\u003c/em\u003e(1), 5. https://doi.org/10.1186/s12575-018-0067-8 \u003c/li\u003e\n\u003cli\u003eZhou, H. J., Li, L., Li, Y., Li, W., \u0026amp; Li, J. J. (2022). PCA outperforms popular hidden variable inference methods for molecular QTL mapping. \u003cem\u003eGenome Biology\u003c/em\u003e,\u003cem\u003e 23\u003c/em\u003e(1), 210. https://doi.org/10.1186/s13059-022-02761-4 \u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"","lastPublishedDoi":"10.21203/rs.3.rs-4132627/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-4132627/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eSocial network analysis is a powerful tool for understanding various phenomena, but it requires data with explicit connections among users. However, such data is hard to obtain in real-time, especially from platforms like X, commonly known as Twitter, where users share topic-related content rather than personal connections. Therefore, this paper tackles a new problem of building a social network graph in real-time where explicit connections are unavailable. Our methodology is centred around the concept of user similarity as the fundamental basis for establishing connections, suggesting that users with similar characteristics are more likely to form connections. To implement this concept, we extracted easily accessible attributes from the Twitter platform and proposed a novel graph model based on similarity. We also introduce an Attribute-Weighted Euclidean Distance (AWED) to calculate user similarities. We compare the proposed graph with synthetic graphs based on network properties, online social network characteristics, and predictive analysis. The results suggest that the AWED graph provides a more precise representation of the dynamic connections that exist in real-world online social networks, surpassing the inherent constraints of synthetic graphs. We demonstrate that the proposed method of graph construction is simple, flexible, and effective for network analysis tasks.\u003c/p\u003e","manuscriptTitle":"Generating Attribute Similarity Graphs: A User Behavior-Based Approach from Real- Time Microblogging Data on Platform X","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2024-03-22 08:52:50","doi":"10.21203/rs.3.rs-4132627/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"2020c41a-0ce8-4054-9911-0e183478a705","owner":[],"postedDate":"March 22nd, 2024","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2024-05-15T06:21:52+00:00","versionOfRecord":[],"versionCreatedAt":"2024-03-22 08:52:50","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-4132627","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-4132627","identity":"rs-4132627","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2024) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00