Exploration of the Big Five: Dimensionality Reduction and Clustering Techniques

doi:10.21203/rs.3.rs-4232726/v1

Exploration of the Big Five: Dimensionality Reduction and Clustering Techniques

2024 · doi:10.21203/rs.3.rs-4232726/v1

preprint OA: closed

Full text JSON View at publisher

Full text 83,582 characters · extracted from preprint-html · click to expand

Exploration of the Big Five: Dimensionality Reduction and Clustering Techniques | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Exploration of the Big Five: Dimensionality Reduction and Clustering Techniques Flavio Gioia This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-4232726/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract This study presents a comprehensive analysis of the BFI dataset using the 'psych' package in R. Through principal component analysis and descriptive statistical techniques such as PCA, histograms and Classification. We explored the dimensions of personality based on 25 evaluative items from the International Personality Item Pool. Columns A-E highlight the behavioral traits of agreeableness, conscientiousness, extraversion, neuroticism, and openness, measured on a 6-point response scale. The results provide significant insights into the correlations among different traits and offer a richer understanding of human personality dynamics. Psychology Applied Statistics Statistical Theory Big Five theory personality neuroticism openness conscientiousness extraversion agreeableness PCA Cluster Analysis. Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 Figure 9 Figure 10 Figure 11 Figure 12 Figure 13 Figure 14 Figure 15 Figure 16 Figure 17 Figure 18 1. Introduction In this work, I analysed the BFI dataset (Ravelle & Condon, 2019) present in the psych package of R (Revelle, psych: Procedures for Psychological, Psychometric, and Personality Research, 2023). 25 personality assessment items from the International Personality Item Pool (Goldberg, et al., 2006 ) were collected. Columns A represent people's behavioural agreeableness including items such as A1 that represents “indifference to the feelings of others” and A2 “inquire about the well-being of others”; columns C represent conscientiousness understood as a sense of duty, in fact C1 indicates “Am exacting in my work” and C2 “continues until everything is perfect”; columns E extroversion, in fact E1 indicates “Don't talk a lot” and E2 “Find it difficult to approach others”; columns N represent neuroticism and in fact N1 indicates a “tendency to get angry easily”; finally, the O columns represent openness and for example O4 identifies "spend time reflecting about things" (Kabigting, 2021 ). Item data was collected using a 6-point response scale: 1 Very inaccurate 2 Moderately inaccurate 3 Slightly inaccurate 4 Slightly accurate 5 Moderately accurate 6 Very accurate. 2. Materials and methods 2.1 Dataset We used the BFI dataset, which includes 25 rating items taken from the International Personality Item Pool. These items measure the personality traits: agreeableness, conscientiousness, extraversion, neuroticism, and openness. The data was collected using a 6-point response scale. 2.2 Statistical Analysis 2.2.1 Principal Component Analysis (PCA) PCA is a dimensionality reduction technique that allows us to identify the main components (or axes) of variation in the data (Jolliffe & Cadima, 2016 ). The formula to calculate the principal components is as follows: $$P{C}_{i}=X*{v}_{i}$$ Where: $P{C}_{i}:$ (i)-th principal component; $X$ : data matrix (observations in rows, variables in columns); ${v}_{i}$ : weight vector associated with the (i)-th principal component. 2.2.2 Histograms Visualization of the distribution of personality data. Create histograms for each trait (agreeableness, conscientiousness, etc.); Assess the variability and identify any patterns (Scott, 1979 ). 2.2.3 Hierarchical Clustering on Principal Components (HCPC) The mathematical formula used to determine the appropriate number of clusters we will use is based on the relative loss of inertia (Argüelles, Benavides, & Fernández, 2014 ). Here's how it works: Calculation of inertia: Inertia is a measure of the dispersion of points within a cluster. The higher the inertia, the greater the dispersion. Total inertia is the sum of the inertias within all clusters; Calculation of inertia for each partition: For each partition (i.e., a specific number of clusters), we calculate the total inertia within those clusters; Relative loss of inertia: The relative loss of inertia between two consecutive partitions is given by: $\frac{Inerti{a}_{n}-Inerti{a}_{n+1}}{Interti{a}_{n}}$ . Where (n) represents the number of clusters. Choice of the number of clusters: The suggested partition is the one with the highest relative loss of inertia. In other words, we select the number of clusters that maximizes the reduction of inertia compared to the previous partition. 2.2.4 Kendall Ranks Correlation (Kendall’s Tau) The Kendall correlation is a statistical index used to quantify the relationship between two ordinal variables. Specifically, it measures the association between two sets of data when the observations are ordered based on a common feature (Kendall, 1938 ). The formula to calculate the Kendall correlation (often abbreviated as τ) is as follows: $$\tau =\frac{C-D}{C+D}$$ Where: (C) represents the number of concordant pairs; (D) represents the number of discordant pairs. The pairs are considered concordant if the ranking order is the same for both variables. Conversely, they are considered discordant if the ranking order is different. The Kendall correlation coefficient takes values in the range $(-1\le \tau \le 1)$ . Here is what the values mean: $(\tau =1)$ : Perfect positive association. $\left(\tau =-1\right)$ : Perfect negative association. $(\tau =0)$ : No association. The Kendall correlation is particularly useful when working with ordinal data or when the relationships between variables are non-linear. In your case, you used the Kendall correlation to create a correlation matrix between personality traits. This index allowed you to assess the association between the traits in a robust and non-parametric manner. 3. Results and discussions 3.1 Descriptive statistics Let's create a graphical correlation matrix of the variables using Kendall method, with the circle method for visualization. The following graph is produced: Where there is red circle there is a strong correlation between the variables, where there are blue circles there is a negative correlation (that is, increasing one value, other decreases). We can detect that there is a relative strong correlation between variable N1 (Get angry easily) and the variable N2 (get irritated easily), the correlation is 0.68, not too strong. There is a negative correlation, although not strong, between E2 (Find it difficult to approach others) and E3 (Know how to captivate people), the value is -0.44. The correlation of 0.68 between N1 (Gets angry easily) and N2 (Gets irritated easily) suggests consistency in how individuals prone to anger react to frustrating situations. A negative correlation of -0.44 indicates that individuals who find it difficult to approach others tend not to be the ones who know how to capture people's attention, suggesting different styles of social interaction. Let's make some histograms. Let's make the histogram on the indifferent. This chart shows that the majority of people are placed on 1 which means "very inaccurate". So, the majority of people seem to care about other people's feelings according to this graph, as the second bar on the right also shows. Let's make the histogram of the level of satisfaction with one's job. According to this result, the majority of people are out of "5", so the majority of people are satisfied with their work, but not perfectly. Let's do the histogram on how easily people make friends. Histogram of how many people investigate a topic deeply, remember that this variable had the highest average of all. There seem to be more people who investigate a topic in depth in a very convinced way. Histogram of the variable with the highest standard deviation. However, the majority of people are concentrated on 4, i.e., "slightly convinced", however, we also note the high variability. 3.2 Main Components 3.2.1 2. Inertia Distribution The inertia of the first dimensions shows whether there are strong relationships between the variables and suggests the number of dimensions that should be studied. The first two dimensions of the analysis express 28.28% of the total inertia of the dataset; this means that 28.28% of the total variability of individual clouds (or variables) is explained by the plane. This is a small percentage, and the foreground represents only a fraction of the variability in the data. This value is greater than the reference value which is equal to 8.68% , the variability explained by this plan is therefore significant (the reference value is the 0.95 quantile of the distribution of inertia percentages obtained by simulating 1565 tables of data of equivalent size on the basis of a normal distribution). From these observations, it is interesting to consider the successive dimensions that also express a high percentage of the total inertia. An estimate of the right number of axes to be interpreted suggests restricting the analysis to the description of the first 7 axes. These axes have a greater amount of inertia than the 0.95 quantile of the random distributions (57.32% vs. 28.91%). This observation suggests that only these axes carry real information. As a result, the description will stand on these axes. 3.2.2. 1:2 Plan Description Dimension 1 contrasts individuals with a strongly positive coordinate on the axis (to the right of the graph) against individuals with a strongly negative coordinate on the axis (to the left of the graph). Group 1 (characterized by a positive coordinate on the axis) shares: high values for variables such as E4, A5, A4, A3, E5, C3, C1, A2, E3 , and age (variables are sorted by strongest). low values for variables such as N2, N3, N4, N1 , C5, N5, E2, C4, E1, and O2 (variables are ordered by the weakest). Group 2 (characterized by a negative coordinate on the axis) shares: high values for variables such as E2, E1 , C4, C5, N4, A1 , O5, N3, N2, and O2 ( variables are ordered by the strongest). low values for variables such as E4, A3, E3, A5, E5 , A2, A4, O3 , C2, and C1 (variables are sorted by the weakest). Dimension 2 contrasts individuals with a strongly positive coordinate on the axis (toward the top of the graph) against individuals with a strongly negative coordinate on the axis (toward the bottom of the graph). Group 1 (characterized by a positive coordinate on the axis) shares: high values for variables such as N2, N3, N1, N5, N4, E3, A2 , A3, gender , and E5 (variables are ordered by strongest). low values for the E1 and age variables (the variables are sorted starting from the weakest). Group 2 (characterized by a negative coordinate on the axis) shares: high values for variables such as E2, E1 , C4, C5, N4, A1 , O5, N3, N2, and O2 (variables are ordered by the strongest). low values for variables such as E4, A3, E3, A5, E5 , A2, A4, O3 , C2, and C1 (variables are sorted by the weakest). Group 3 (characterized by a negative coordinate on the axis) shares: high values for variables such as E4, A5, A4, A3, E5, C3, C1, A2, E3 , and age (variables are sorted by strongest). low values for variables such as N2, N3, N4, N1 , C5, N5, E2, C4, E1, and O2 (variables are ordered by the weakest). 3.2.3. 3:4 Plan Description Dimension 3 contrasts individuals with a strongly positive coordinate on the axis (to the right of the graph) against individuals with a strongly negative coordinate on the axis (to the left of the graph). Group 1 (characterized by a positive coordinate on the axis) shares: high values for variables such as O1, O3, O4, C1 , education, N4, E2, A1, C2, and E1 (variables are ordered by strongest). low values for variables O5, O2, gender, A4, E4 , A3, A2, A5, N5 and C4 (variables are ordered by the weakest). Group 2 (characterized by a negative coordinate on the axis) shares: high values for variables C4, C5, E4, A5, E3 , A3, O3, A 2 and O1 (variables are ordered by the strongest). low values for variables such as C2, C3, C1, E1, E2, A1 , N3, N2, N1, and N5 (variables are sorted by the weakest). Group 3 (characterized by a negative coordinate on the axis) shares: high values for variables such as O5, O2, sex , A4, C3, C2, N5, A2 , E1 and A3 (variables are ordered by strongest). low values for variables O3, O1, C5, O4 , C4, E3, education , N4, and age (variables are sorted by the weakest). Dimension 4 contrasts individuals with a strongly positive coordinate on the axis (toward the top of the graph) against individuals with a strongly negative coordinate on the axis (toward the bottom of the graph). Group 1 (characterized by a positive coordinate on the axis) shares: high values for variables such as O5, O2, sex , A4, C3, C2, N5, A2 , E1 and A3 (variables are ordered by strongest). low values for variables O3, O1, C5, O4 , C4, E3, education , N4, and age (variables are sorted by the weakest). Group 2 (characterized by a negative coordinate on the axis) shares: high values for variables C4, C5, E4, A5, E3 , A3, O3, A 2 and O1 (variables are ordered by the strongest). low values for variables such as C2, C3, C1, E1, E2, A1 , N3, N2, N1, and N5 (variables are sorted by the weakest). Group 3 (characterized by a negative coordinate on the axis) shares: high values for variables such as O1, O3, O4, C1 , education, N4, E2, A1, C2, and E1 (variables are ordered by strongest). low values for variables O5, O2, gender, A4, E4 , A3, A2, A5, N5 and C4 (variables are ordered by the weakest). 3.2.4. 5:6 Plan Description Dimension 5 contrasts individuals with a strongly positive coordinate on the axis (to the right of the graph) against individuals with a strongly negative coordinate on the axis (to the left of the graph). Group 1 (characterized by a positive coordinate on the axis) shares: high values for variables such as age , education, gender, A2, O4, N4 , E2, N5 , A3, and C5 (variables are ordered by strongest). low values for variables such as A1, E3, E4, O5, O2 , O1, E5 , C4, C 2, and N1 (variables are ordered by the weakest). Group 2 (characterized by a negative coordinate on the axis) shares: high values for variables A1, E5, N1, N2 , E4 , E3 and O5 (variables are ordered by the strongest). low values for variables such as O4, A2 , E1, E2, A3, N4 , age, education , A5 and A4 (variables are ordered by the weakest). Group 3 (characterized by a negative coordinate on the axis) shares: high values for variables such as E1, E3, O4, A5, A1, C4, O1 , O2, A3, and E2 (variables are ordered by strongest). low values for the variables age , education, sex, N2, N1 and E5 (the variables are ordered by the weakest). Dimension 6 contrasts individuals with a strongly positive coordinate on the axis (toward the top of the graph) against individuals with a strongly negative coordinate on the axis (toward the bottom of the graph). Group 1 (characterized by a positive coordinate on the axis) shares: high values for variables such as E1, E3, O4, A5, A1, C4, O1 , O2, A3, and E2 (variables are ordered by strongest). low values for the variables age , education, sex, N2, N1 and E5 (the variables are ordered by the weakest). Group 2 (characterized by a negative coordinate on the axis) shares: high values for variables A1, E5, N1, N2 , E4 , E3 and O5 (variables are ordered by the strongest). low values for variables such as O4, A2 , E1, E2, A3, N4 , age, education , A5 and A4 (variables are ordered by the weakest). Group 3 (characterized by a negative coordinate on the axis) shares: high values for variables such as age , education, gender, A2, O4, N4 , E2, N5 , A3, and C5 (variables are ordered by strongest). low values for variables such as A1, E3, E4, O5, O2 , O1, E5 , C4, C 2, and N1 (variables are ordered by the weakest). 3.2.5. Description of size 7 Dimension 7 contrasts individuals with a strongly positive coordinate on the axis (to the right of the graph) against individuals with a strongly negative coordinate on the axis (to the left of the graph). Group 1 (characterized by a positive coordinate on the axis) shares: high values for variables such as education , O5, age, O2, C4, C5, A1, E3 , E4 , and A5 (variables are ordered by strongest). low values for the sex variables , A2, N2 and N3 (the variables are ordered by the weakest). Group 2 (characterized by a negative coordinate on the axis) shares: high values for gender , N5 and O3 variables (variables are ordered by strongest). low values for variables such as age, O2, education , O5, E1, A1, C4 , N1, O1, and N4 (variables are ordered by the weakest). 4. Classification The classification made on individuals reveals 4 clusters. The cluster 1 is made of individuals sharing: high values for the variables A4 and N2 (variables are sorted from the strongest). low values for the variables X , C5 , education and A1 (variables are sorted from the weakest). The cluster 2 is made of individuals sharing: high values for the variable C5 . low values for the variables X , A4 , E3 and C1 (variables are sorted from the weakest). The cluster 3 is made of individuals sharing: high values for the variables X , E4 , E3 , E5 , A1 , O3 , A5 and education (variables are sorted from the strongest). The cluster 4 is made of individuals sharing: high values for the variables X , E2 and C2 (variables are sorted from the strongest). low values for the variables E5 and N2 (variables are sorted from the weakest). The hierarchical tree can be drawn on the factorial map with the individuals colored according to their clusters. 5. Discussions and Conclusions This study explored the dimensions of personality through a comprehensive analysis of the BFI dataset, using descriptive statistical techniques and principal component analysis (PCA). The results revealed significant correlations between various personality traits, such as anger proneness (N1) and irritability (N2), as well as negative correlations between difficulty in approaching others (E2) and the ability to capture people's attention (E3). Histograms provided further insights, showing that most individuals care about others' feelings and there is a general satisfaction with work, albeit not perfect. Additionally, it emerged that many people deeply investigate a topic, suggesting a high level of curiosity or openness. The PCA analysis highlighted that the first two dimensions only account for 28.28% of the total inertia, suggesting that human personality is a multidimensional and complex construct. The description of the planes allowed for distinguishing groups with distinct behavioral traits, providing a visual representation of personality dynamics. Dimensions 3 and 4 provided further insights, distinguishing groups of individuals with distinct behavioral traits based on high and low values in specific variables. The third dimension contrasted individuals with strong positive coordinates, associated with traits such as openness (O1, O3, O4) and conscientiousness (C1), with those with strong negative coordinates, exhibiting opposite tendencies. This suggests that openness and conscientiousness may be key factors in differentiating personality profiles. The fourth dimension further differentiated individuals, with Group 1 showing high values in traits related to openness and educational experience, while Group 2 highlighted a combination of extraversion (E4) and agreeableness (A5). These results underscore the complexity of interactions between personality traits and how they manifest in unique combinations in each individual. The fifth dimension highlighted differences based on age, education, and gender, suggesting that these demographic factors can influence or be associated with specific personality traits. Specifically, the association between age and variables like O4 (openness to new experiences) and N4 (emotional stability) may indicate how personality evolves or is perceived throughout life. The sixth dimension revealed contrasts between individuals with strong positive coordinates, displaying traits of extraversion and openness, and those with negative coordinates, tending to exhibit traits of neuroticism and introversion. This could reflect how individuals adapt to and interact with their social environment. Lastly, the seventh dimension shed light on the importance of education and age in personality traits, with one group showing high values in education and openness, while the other group exhibited higher neuroticism and lower extraversion. This suggests that education and life experience can have a significant impact on how people perceive themselves and behave. In conclusion, this study has expanded our understanding of personality dynamics, highlighting the importance of considering a variety of demographic factors and behavioral traits. The findings have implications for the development of more sophisticated psychometric tools and the customization of interventions in the fields of psychology and well-being. Future research could explore the interactions between these traits in different social and cultural contexts, as well as their impact on individual behavior and decision-making. We can draw some key conclusions regarding the classification of personality traits: Cluster 1: Comprised of individuals with high values for variables A4 and N2. Low values for variables X, C5, education, and A1. This cluster may represent a group of individuals with specific personality characteristics. Cluster 2: Comprised of individuals with high values for variable C5. Low values for variables X, A4, E3, and C1. This cluster may reflect another personality profile. Cluster 3: Comprised of individuals with high values for variables X, E4, E3, E5, A1, O3, A5, and education. This cluster appears to include individuals with a wide range of personality traits. Cluster 4: Comprised of individuals with high values for variables X, E2, and C2. Low values for variables E5 and N2. This cluster may represent a group with specific traits. However, it is important to note some general limitations of our study. Firstly, the results are based on a specific sample of 2800 people in 2010 (Revelle, Wilt, & Rosenthal, Individual Differences in Cognition: New Methods for Examining the Personality-Cognition Link, 2010). Secondly, the analysis was conducted using specific statistical methods and tools (PCA, clustering, and histograms), which may affect the generalizability of the results to different contexts. Lastly, the BFI dataset from the psych package may have certain peculiarities that should be considered when applying the results to other situations. Declarations The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Data availability Link to OSF is provided in the article. References Argüelles, M., Benavides, C., & Fernández, I. (2014). A new approach to the identification of regional clusters: hierarchical clustering on principal components. Applied Economics, 46 (21), 2511-2519. doi:https://doi.org/10.1080/00036846.2014.904491 Goldberg, L. R. (2006). International Personality Item Pool: A Scientific Collaboratory for the Development of Advanced Measures of Personality Traits and Other Individual Differences . Retrieved from IPIP: https://ipip.ori.org/index.htm Goldberg, L. R., Johnson, J., Eber, H., Hogan, R., Ashton, M., Cloninger, C., & Gough, H. (2006). The international personality item pool and the future of public-domain personality measures. Journal of Research in Personality, 40 (1), 84-96. doi:https://doi.org/10.1016/j.jrp.2005.08.007 Husson, F., Monge, A., & Vaissie, P. (2023). Factoshiny: Perform Factorial Analysis from 'FactoMineR' with a Shiny. Retrieved from https://CRAN.R-project.org/package=Factoshiny Jolliffe, I., & Cadima, J. (2016, 04). Principal component analysis: a review and recent developments. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 374 , 20150202. doi:https://doi.org/10.1098/rsta.2015.0202 Kabigting, F. (2021, August). The Discovery and Evolution of the Big Five of Personality Traits: A Historical Review. GNOSI: An Interdisciplinary Journal of Human Theory and Praxis, 4 (3), 83-100. doi:https://doi.org/10.13140/RG.2.2.13907.40480 Kendall, M. (1938, June). A New Measure of Rank Correlation. Biometrika, 30 (1/2), 81-93. Retrieved from https://doi.org/10.2307/2332226 [dataset] Ravelle, W., & Condon, D. (2019, February). 25 Personality items representing 5 factors. Retrieved from https://doi.org/10.17605/OSF.IO/K39BG Revelle, W. (2023). psych: Procedures for Psychological, Psychometric, and Personality Research. Evanston, Illinois, U.S.A. Retrieved from https://CRAN.R-project.org/package=psych Revelle, W., Wilt, J., & Rosenthal, A. (2010). Individual Differences in Cognition: New Methods for Examining the Personality-Cognition Link. In Handbook of Individual Differences in Cognition: Attention, Memory, and Executive Control (pp. 27-49). New York, NY: Gruszka, Aleksandra and Matthews, Gerald and Szymura, Blazej. Retrieved from https://link.springer.com/chapter/10.1007/978-1-4419-1210-7_2 Scott, D. (1979, 12). On optimal and data-based histograms. Biometrika , 605-610. doi:https://doi.org/10.1093/biomet/66.3.605 Wei, T., & Simko, V. (2021). R package 'corrplot': Visualization of a Correlation Matrix. Retrieved from https://github.com/taiyun/corrplot Wickam, H. (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. Retrieved from https://ggplot2.tidyverse.org Additional Declarations The authors declare no competing interests. Supplementary Files Appendix.docx Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-4232726","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":288525828,"identity":"5a4cb535-3be4-4f44-9e18-e1cc76feced8","order_by":0,"name":"Flavio Gioia","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA6UlEQVRIie3PsarCMBSA4VMCzXLUtS73GQJiR32VSl+gkwhXaqGgm676ForgHDlgly5uioviC2SSDpeLwYpc7hAdBfOPJ/lIDoDN9obxBECCgK8af8wCPTGEsiSNeloOvBsxGbwfdub0hxifQeBbqaKWsyRsLop+HIsscEgZCUbrqQiZT+gfcEOeyI/mj7UBA0LB3BsBV3pi92yXkgywkaK/L37jVwiXmuj/MPR3lSF7gTAEvUsmPHK7h8qY6rP8lMjcRPjorNTP92AySlf74hLXqllIqmcgwFD8HzmJCej48ckFm81m+/iuMU1S39u4o9AAAAAASUVORK5CYII=","orcid":"https://orcid.org/0009-0000-0326-3840","institution":"Libera Università Maria SS. Assunta","correspondingAuthor":true,"prefix":"","firstName":"Flavio","middleName":"","lastName":"Gioia","suffix":""}],"badges":[],"createdAt":"2024-04-07 20:29:26","currentVersionCode":1,"declarations":{"humanSubjects":false,"vertebrateSubjects":false,"conflictsOfInterestStatement":false,"humanSubjectEthicalGuidelines":false,"humanSubjectConsent":false,"humanSubjectClinicalTrial":false,"humanSubjectCaseReport":false,"vertebrateSubjectEthicalGuidelines":false},"doi":"10.21203/rs.3.rs-4232726/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-4232726/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":54361294,"identity":"905e4ab4-cc2d-48c2-afc7-0fb92a19822d","added_by":"auto","created_at":"2024-04-09 11:15:04","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":452513,"visible":true,"origin":"","legend":"\u003cp\u003ePlot correlation matrix on dataset variables and ordinal variables education and age\u003c/p\u003e","description":"","filename":"1.png","url":"https://assets-eu.researchsquare.com/files/rs-4232726/v1/b18b9bf64d4fafe7a5f802ff.png"},{"id":54361300,"identity":"c4c0102d-42c9-4aa5-80b7-29dd06359636","added_by":"auto","created_at":"2024-04-09 11:15:05","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":24072,"visible":true,"origin":"","legend":"\u003cp\u003eIndifference level histogram\u003c/p\u003e","description":"","filename":"2.png","url":"https://assets-eu.researchsquare.com/files/rs-4232726/v1/7ac275acb80d3445806a17cd.png"},{"id":54361295,"identity":"61f10961-3610-4e3a-b3a7-21cb53d6e591","added_by":"auto","created_at":"2024-04-09 11:15:04","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":26115,"visible":true,"origin":"","legend":"\u003cp\u003eHistogram of job satisfaction level\u003c/p\u003e","description":"","filename":"3.png","url":"https://assets-eu.researchsquare.com/files/rs-4232726/v1/0fe78eb04e95e8ab9a24c2f6.png"},{"id":54361895,"identity":"3409f157-157e-4630-bd69-c1a2df6078e2","added_by":"auto","created_at":"2024-04-09 11:23:05","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":23390,"visible":true,"origin":"","legend":"\u003cp\u003eHistogram of how easily people make friends\u003c/p\u003e","description":"","filename":"4.png","url":"https://assets-eu.researchsquare.com/files/rs-4232726/v1/ee4cebe0a854eaae6b98c970.png"},{"id":54361296,"identity":"1ee1d354-0db9-4e5e-9b92-e0994ba8ed43","added_by":"auto","created_at":"2024-04-09 11:15:04","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":22321,"visible":true,"origin":"","legend":"\u003cp\u003eHistogram of how many people investigate a topic deeply\u003c/p\u003e","description":"","filename":"5.png","url":"https://assets-eu.researchsquare.com/files/rs-4232726/v1/7a2d95af8f812393b3f12b03.png"},{"id":54361302,"identity":"21f7ce97-c6fa-45c2-8e38-d7a76abc889a","added_by":"auto","created_at":"2024-04-09 11:15:05","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":16966,"visible":true,"origin":"","legend":"\u003cp\u003eHistogram of how many people waste time\u003c/p\u003e","description":"","filename":"6.png","url":"https://assets-eu.researchsquare.com/files/rs-4232726/v1/df758c026a7e9200f3fc77ac.png"},{"id":54361293,"identity":"07702ece-b632-4c00-bf8a-92eaf293ea77","added_by":"auto","created_at":"2024-04-09 11:15:04","extension":"png","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":96596,"visible":true,"origin":"","legend":"\u003cp\u003eDecomposition of the total inertia\u003c/p\u003e","description":"","filename":"7.png","url":"https://assets-eu.researchsquare.com/files/rs-4232726/v1/fdf287af8a4d0cf9a9de950f.png"},{"id":54361304,"identity":"5a59df42-009f-4359-8c14-9d125b0b94a5","added_by":"auto","created_at":"2024-04-09 11:15:05","extension":"png","order_by":8,"title":"Figure 8","display":"","copyAsset":false,"role":"figure","size":53177,"visible":true,"origin":"","legend":"\u003cp\u003eIndividual Factor Map (PCA) The labeled individuals are the ones with the greatest contribution to the construction of the plan.\u003c/p\u003e","description":"","filename":"8.png","url":"https://assets-eu.researchsquare.com/files/rs-4232726/v1/a96865ba64ead3798c520d40.png"},{"id":54361305,"identity":"da446520-e2a1-461e-bd06-28339d3ebbc6","added_by":"auto","created_at":"2024-04-09 11:15:05","extension":"png","order_by":9,"title":"Figure 9","display":"","copyAsset":false,"role":"figure","size":59647,"visible":true,"origin":"","legend":"\u003cp\u003eVariable Factor Map (PCA) The labeled variables are the ones best shown on the plane.\u003c/p\u003e","description":"","filename":"9.png","url":"https://assets-eu.researchsquare.com/files/rs-4232726/v1/676d90e23fea848cfc0d0c91.png"},{"id":54361299,"identity":"94794ddb-8e6b-4033-858e-71380a94eeaa","added_by":"auto","created_at":"2024-04-09 11:15:04","extension":"png","order_by":10,"title":"Figure 10","display":"","copyAsset":false,"role":"figure","size":57798,"visible":true,"origin":"","legend":"\u003cp\u003eIndividual Factor Map (PCA) The labeled individuals are the ones with the greatest contribution to the construction of the plan.\u003c/p\u003e","description":"","filename":"10.png","url":"https://assets-eu.researchsquare.com/files/rs-4232726/v1/f92ffee9d76aac6f140b0d3c.png"},{"id":54361308,"identity":"e6576089-968d-4a88-8e31-fb520a59a971","added_by":"auto","created_at":"2024-04-09 11:15:05","extension":"png","order_by":11,"title":"Figure 11","display":"","copyAsset":false,"role":"figure","size":59600,"visible":true,"origin":"","legend":"\u003cp\u003eVariable Factor Map (PCA) The labeled variables are the ones best shown on the plane\u003c/p\u003e","description":"","filename":"11.png","url":"https://assets-eu.researchsquare.com/files/rs-4232726/v1/33a9afb3be771cce00b17d4b.png"},{"id":54361303,"identity":"62c02ce4-8378-4de0-98bc-cdc392435926","added_by":"auto","created_at":"2024-04-09 11:15:05","extension":"png","order_by":12,"title":"Figure 12","display":"","copyAsset":false,"role":"figure","size":57699,"visible":true,"origin":"","legend":"\u003cp\u003eFactorial Map of Individuals (PCA) The labeled individuals are the ones with the greatest contribution to the construction of the plan.\u003c/p\u003e","description":"","filename":"12.png","url":"https://assets-eu.researchsquare.com/files/rs-4232726/v1/8ccd03aab70052e5a21cf5a9.png"},{"id":54361311,"identity":"9d603928-91b4-4cb1-a90b-3a6799df18a3","added_by":"auto","created_at":"2024-04-09 11:15:06","extension":"png","order_by":13,"title":"Figure 13","display":"","copyAsset":false,"role":"figure","size":58007,"visible":true,"origin":"","legend":"\u003cp\u003eVariable Factor Map (PCA) The labeled variables are the ones that are best shown on the plane\u003c/p\u003e","description":"","filename":"13.png","url":"https://assets-eu.researchsquare.com/files/rs-4232726/v1/80fbd15775eab326c65da93a.png"},{"id":54361297,"identity":"7821ccc5-5fa2-4d01-afee-2b29a05521b2","added_by":"auto","created_at":"2024-04-09 11:15:04","extension":"png","order_by":14,"title":"Figure 14","display":"","copyAsset":false,"role":"figure","size":54942,"visible":true,"origin":"","legend":"\u003cp\u003eFactorial Map of Individuals (PCA) The labeled individuals are the ones with the greatest contribution to the construction of the plan.\u003c/p\u003e","description":"","filename":"14.png","url":"https://assets-eu.researchsquare.com/files/rs-4232726/v1/dd37b854651ad5d1126fa156.png"},{"id":54361307,"identity":"7d67c607-e252-4ce9-80de-60427df88ff0","added_by":"auto","created_at":"2024-04-09 11:15:05","extension":"png","order_by":15,"title":"Figure 15","display":"","copyAsset":false,"role":"figure","size":57918,"visible":true,"origin":"","legend":"\u003cp\u003eVariable Factor Map (PCA) The labeled variables are the ones that are best shown on the plane.\u003c/p\u003e","description":"","filename":"15.png","url":"https://assets-eu.researchsquare.com/files/rs-4232726/v1/c15defe8628697748490a24a.png"},{"id":54361310,"identity":"42345155-5629-4712-8d27-3500c1c1f57d","added_by":"auto","created_at":"2024-04-09 11:15:05","extension":"png","order_by":16,"title":"Figure 16","display":"","copyAsset":false,"role":"figure","size":58498,"visible":true,"origin":"","legend":"\u003cp\u003eHierarchical tree\u003c/p\u003e","description":"","filename":"16.png","url":"https://assets-eu.researchsquare.com/files/rs-4232726/v1/7ce034f06af324fc44923257.png"},{"id":54361306,"identity":"cfdabb7a-4ee2-40bb-bffa-01c32b8f6c96","added_by":"auto","created_at":"2024-04-09 11:15:05","extension":"png","order_by":17,"title":"Figure 17","display":"","copyAsset":false,"role":"figure","size":95828,"visible":true,"origin":"","legend":"\u003cp\u003eAscending Hierarchical Classification of the individuals.\u003c/p\u003e","description":"","filename":"17.png","url":"https://assets-eu.researchsquare.com/files/rs-4232726/v1/bacf26e3baaf411ac0ff946e.png"},{"id":54361309,"identity":"4f2955e9-0b3b-4cff-8f04-0055433dddab","added_by":"auto","created_at":"2024-04-09 11:15:05","extension":"png","order_by":18,"title":"Figure 18","display":"","copyAsset":false,"role":"figure","size":24230,"visible":true,"origin":"","legend":"\u003cp\u003eHierarchical tree on the factorial map.\u003c/p\u003e","description":"","filename":"18.png","url":"https://assets-eu.researchsquare.com/files/rs-4232726/v1/7cb4db926343d925d0f3a91b.png"},{"id":54362398,"identity":"792f039b-5885-455a-98cf-f980dd2cc5ca","added_by":"auto","created_at":"2024-04-09 11:31:06","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1502708,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-4232726/v1/6f0acca1-aa54-4a1d-90fa-459859b63fc0.pdf"},{"id":54361298,"identity":"13d5227a-700e-4abf-9129-295d66b7a8ae","added_by":"auto","created_at":"2024-04-09 11:15:04","extension":"docx","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":16929,"visible":true,"origin":"","legend":"","description":"","filename":"Appendix.docx","url":"https://assets-eu.researchsquare.com/files/rs-4232726/v1/9f0b03a94ac4839d6b31f144.docx"}],"financialInterests":"The authors declare no competing interests.","formattedTitle":"\u003cp\u003eExploration of the Big Five: Dimensionality Reduction and Clustering Techniques\u003c/p\u003e","fulltext":[{"header":"1. Introduction","content":"\u003cp\u003eIn this work, I analysed the BFI dataset (Ravelle \u0026amp; Condon, 2019) present in the psych package of R (Revelle, psych: Procedures for Psychological, Psychometric, and Personality Research, 2023). 25 personality assessment items from the International Personality Item Pool (Goldberg, et al., \u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e2006\u003c/span\u003e) were collected. Columns A represent people's behavioural agreeableness including items such as A1 that represents \u0026ldquo;indifference to the feelings of others\u0026rdquo; and A2 \u0026ldquo;inquire about the well-being of others\u0026rdquo;; columns C represent conscientiousness understood as a sense of duty, in fact C1 indicates \u0026ldquo;Am exacting in my work\u0026rdquo; and C2 \u0026ldquo;continues until everything is perfect\u0026rdquo;; columns E extroversion, in fact E1 indicates \u0026ldquo;Don't talk a lot\u0026rdquo; and E2 \u0026ldquo;Find it difficult to approach others\u0026rdquo;; columns N represent neuroticism and in fact N1 indicates a \u0026ldquo;tendency to get angry easily\u0026rdquo;; finally, the O columns represent openness and for example O4 identifies \"spend time reflecting about things\" (Kabigting, \u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e2021\u003c/span\u003e). Item data was collected using a 6-point response scale: 1 Very inaccurate 2 Moderately inaccurate 3 Slightly inaccurate 4 Slightly accurate 5 Moderately accurate 6 Very accurate.\u003c/p\u003e"},{"header":"2. Materials and methods","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e\n \u003ch2\u003e2.1 Dataset\u003c/h2\u003e\n \u003cp\u003eWe used the BFI dataset, which includes 25 rating items taken from the International Personality Item Pool. These items measure the personality traits: agreeableness, conscientiousness, extraversion, neuroticism, and openness. The data was collected using a 6-point response scale.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec4\" class=\"Section2\"\u003e\n \u003ch2\u003e2.2 Statistical Analysis\u003c/h2\u003e\n \u003cdiv id=\"Sec5\" class=\"Section3\"\u003e\n \u003ch2\u003e2.2.1 Principal Component Analysis (PCA)\u003c/h2\u003e\n \u003cp\u003ePCA is a dimensionality reduction technique that allows us to identify the main components (or axes) of variation in the data (Jolliffe \u0026amp; Cadima, \u003cspan class=\"CitationRef\"\u003e2016\u003c/span\u003e). The formula to calculate the principal components is as follows:\u003c/p\u003e\n \u003cdiv id=\"Equa\" class=\"Equation\"\u003e\n \u003cdiv class=\"mathdisplay\" id=\"FileID_Equa\" name=\"EquationSource\"\u003e$$P{C}_{i}=X*{v}_{i}$$\u003c/div\u003e\n \u003c/div\u003e\n \u003cp\u003eWhere:\u003c/p\u003e\n \u003cul\u003e\n \u003cli\u003e\n \u003cp\u003e\u003cspan class=\"InlineEquation\"\u003e\u0026nbsp;\u003cspan class=\"mathinline\"\u003e\$P{C}_{i}:\$\u003c/span\u003e\u0026nbsp;\u003c/span\u003e (i)-th principal component;\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003e\u003cspan class=\"InlineEquation\"\u003e\u0026nbsp;\u003cspan class=\"mathinline\"\u003e\$X\$\u003c/span\u003e\u0026nbsp;\u003c/span\u003e: data matrix (observations in rows, variables in columns);\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003e\u003cspan class=\"InlineEquation\"\u003e\u0026nbsp;\u003cspan class=\"mathinline\"\u003e\${v}_{i}\$\u003c/span\u003e\u0026nbsp;\u003c/span\u003e: weight vector associated with the (i)-th principal component.\u003c/p\u003e\n \u003c/li\u003e\n \u003c/ul\u003e\n \u003c/div\u003e\n \u003cdiv id=\"Sec6\" class=\"Section3\"\u003e\n \u003ch2\u003e2.2.2 Histograms\u003c/h2\u003e\n \u003cp\u003eVisualization of the distribution of personality data.\u003c/p\u003e\n \u003cul\u003e\n \u003cli\u003e\n \u003cp\u003eCreate histograms for each trait (agreeableness, conscientiousness, etc.);\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003eAssess the variability and identify any patterns (Scott, \u003cspan class=\"CitationRef\"\u003e1979\u003c/span\u003e).\u003c/p\u003e\n \u003c/li\u003e\n \u003c/ul\u003e\n \u003c/div\u003e\n \u003cdiv id=\"Sec7\" class=\"Section3\"\u003e\n \u003ch2\u003e2.2.3 Hierarchical Clustering on Principal Components (HCPC)\u003c/h2\u003e\n \u003cp\u003eThe mathematical formula used to determine the appropriate number of clusters we will use is based on the relative loss of inertia (Arg\u0026uuml;elles, Benavides, \u0026amp; Fern\u0026aacute;ndez, \u003cspan class=\"CitationRef\"\u003e2014\u003c/span\u003e). Here\u0026apos;s how it works:\u003c/p\u003e\n \u003cul\u003e\n \u003cli\u003e\n \u003cp\u003eCalculation of inertia: Inertia is a measure of the dispersion of points within a cluster. The higher the inertia, the greater the dispersion. Total inertia is the sum of the inertias within all clusters;\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003eCalculation of inertia for each partition: For each partition (i.e., a specific number of clusters), we calculate the total inertia within those clusters;\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003eRelative loss of inertia: The relative loss of inertia between two consecutive partitions is given by: \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\frac{Inerti{a}_{n}-Inerti{a}_{n+1}}{Interti{a}_{n}}\$\u003c/span\u003e\u003c/span\u003e. Where (n) represents the number of clusters.\u003c/p\u003e\n \u003c/li\u003e\n \u003c/ul\u003e\n \u003cp\u003eChoice of the number of clusters: The suggested partition is the one with the highest relative loss of inertia. In other words, we select the number of clusters that maximizes the reduction of inertia compared to the previous partition.\u003c/p\u003e\n \u003c/div\u003e\n \u003cdiv id=\"Sec8\" class=\"Section3\"\u003e\n \u003ch2\u003e2.2.4 Kendall Ranks Correlation (Kendall\u0026rsquo;s Tau)\u003c/h2\u003e\n \u003cp\u003eThe Kendall correlation is a statistical index used to quantify the relationship between two ordinal variables. Specifically, it measures the association between two sets of data when the observations are ordered based on a common feature (Kendall, \u003cspan class=\"CitationRef\"\u003e1938\u003c/span\u003e).\u003c/p\u003e\n \u003cp\u003eThe formula to calculate the Kendall correlation (often abbreviated as \u0026tau;) is as follows:\u003c/p\u003e\n \u003cdiv id=\"Equb\" class=\"Equation\"\u003e\n \u003cdiv class=\"mathdisplay\" id=\"FileID_Equb\" name=\"EquationSource\"\u003e$$\\tau =\\frac{C-D}{C+D}$$\u003c/div\u003e\n \u003c/div\u003e\n \u003cp\u003eWhere:\u003c/p\u003e\n \u003cul\u003e\n \u003cli\u003e\n \u003cp\u003e(C) represents the number of concordant pairs;\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003e(D) represents the number of discordant pairs.\u003c/p\u003e\n \u003c/li\u003e\n \u003c/ul\u003e\n \u003cp\u003eThe pairs are considered concordant if the ranking order is the same for both variables. Conversely, they are considered discordant if the ranking order is different.\u003c/p\u003e\n \u003cp\u003eThe Kendall correlation coefficient takes values in the range \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$(-1\\le \\tau \\le 1)\$\u003c/span\u003e\u003c/span\u003e. Here is what the values mean:\u003c/p\u003e\n \u003cul\u003e\n \u003cli\u003e\n \u003cp\u003e\u003cspan class=\"InlineEquation\"\u003e\u0026nbsp;\u003cspan class=\"mathinline\"\u003e\$(\\tau =1)\$\u003c/span\u003e\u0026nbsp;\u003c/span\u003e: Perfect positive association.\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003e\u003cspan class=\"InlineEquation\"\u003e\u0026nbsp;\u003cspan class=\"mathinline\"\u003e\$\\left(\\tau =-1\\right)\$\u003c/span\u003e\u0026nbsp;\u003c/span\u003e: Perfect negative association.\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003e\u003cspan class=\"InlineEquation\"\u003e\u0026nbsp;\u003cspan class=\"mathinline\"\u003e\$(\\tau =0)\$\u003c/span\u003e\u0026nbsp;\u003c/span\u003e: No association.\u003c/p\u003e\n \u003c/li\u003e\n \u003c/ul\u003e\n \u003cp\u003eThe Kendall correlation is particularly useful when working with ordinal data or when the relationships between variables are non-linear. In your case, you used the Kendall correlation to create a correlation matrix between personality traits. This index allowed you to assess the association between the traits in a robust and non-parametric manner.\u003c/p\u003e\n \u003c/div\u003e\n\u003c/div\u003e"},{"header":"3. Results and discussions","content":"\u003cdiv id=\"Sec10\" class=\"Section2\"\u003e\n \u003ch2\u003e3.1 Descriptive statistics\u003c/h2\u003e\n \u003cp\u003eLet\u0026apos;s create a graphical correlation matrix of the variables using Kendall method, with the circle method for visualization.\u003c/p\u003e\n \u003cp\u003eThe following graph is produced:\u003c/p\u003e\n \u003cp\u003eWhere there is red circle there is a strong correlation between the variables, where there are blue circles there is a negative correlation (that is, increasing one value, other decreases). We can detect that there is a relative strong correlation between variable N1 (Get angry easily) and the variable N2 (get irritated easily), the correlation is 0.68, not too strong. There is a negative correlation, although not strong, between E2 (Find it difficult to approach others) and E3 (Know how to captivate people), the value is -0.44. The correlation of 0.68 between N1 (Gets angry easily) and N2 (Gets irritated easily) suggests consistency in how individuals prone to anger react to frustrating situations. A negative correlation of -0.44 indicates that individuals who find it difficult to approach others tend not to be the ones who know how to capture people\u0026apos;s attention, suggesting different styles of social interaction.\u003c/p\u003e\n \u003cp\u003eLet\u0026apos;s make some histograms. Let\u0026apos;s make the histogram on the indifferent.\u003c/p\u003e\n \u003cp\u003eThis chart shows that the majority of people are placed on 1 which means \u0026quot;very inaccurate\u0026quot;. So, the majority of people seem to care about other people\u0026apos;s feelings according to this graph, as the second bar on the right also shows.\u003c/p\u003e\n \u003cp\u003eLet\u0026apos;s make the histogram of the level of satisfaction with one\u0026apos;s job.\u003c/p\u003e\n \u003cp\u003eAccording to this result, the majority of people are out of \u0026quot;5\u0026quot;, so the majority of people are satisfied with their work, but not perfectly.\u003c/p\u003e\n \u003cp\u003eLet\u0026apos;s do the histogram on how easily people make friends.\u003c/p\u003e\n \u003cp\u003eHistogram of how many people investigate a topic deeply, remember that this variable had the highest average of all.\u003c/p\u003e\n \u003cp\u003eThere seem to be more people who investigate a topic in depth in a very convinced way.\u003c/p\u003e\n \u003cp\u003eHistogram of the variable with the highest standard deviation.\u003c/p\u003e\n \u003cp\u003eHowever, the majority of people are concentrated on 4, i.e., \u0026quot;slightly convinced\u0026quot;, however, we also note the high variability.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec11\" class=\"Section2\"\u003e\n \u003ch2\u003e3.2 Main Components\u003c/h2\u003e\n \u003cdiv id=\"Sec12\" class=\"Section3\"\u003e\n \u003ch2\u003e3.2.1 2. Inertia Distribution\u003c/h2\u003e\n \u003cp\u003eThe inertia of the first dimensions shows whether there are strong relationships between the variables and suggests the number of dimensions that should be studied.\u003c/p\u003e\n \u003cp\u003eThe first two dimensions of the analysis express \u003cstrong\u003e28.28% of the\u003c/strong\u003e total inertia of the dataset; this means that 28.28% of the total variability of individual clouds (or variables) is explained by the plane. This is a small percentage, and the foreground represents only a fraction of the variability in the data. This value is greater than the reference value which is equal to \u003cstrong\u003e8.68%\u003c/strong\u003e, the variability explained by this plan is therefore significant (the reference value is the 0.95 quantile of the distribution of inertia percentages obtained by simulating 1565 tables of data of equivalent size on the basis of a normal distribution).\u003c/p\u003e\n \u003cp\u003eFrom these observations, it is interesting to consider the successive dimensions that also express a high percentage of the total inertia.\u003c/p\u003e\n \u003cp\u003eAn estimate of the right number of axes to be interpreted suggests restricting the analysis to the description of the first 7 axes. These axes have a greater amount of inertia than the 0.95 quantile of the random distributions (57.32% vs. 28.91%). This observation suggests that only these axes carry real information. As a result, the description will stand on these axes.\u003c/p\u003e\n \u003c/div\u003e\n \u003cdiv id=\"Sec13\" class=\"Section3\"\u003e\n \u003ch2\u003e3.2.2. 1:2 Plan Description\u003c/h2\u003e\n \u003cp\u003eDimension \u003cstrong\u003e1\u003c/strong\u003e contrasts individuals with a strongly positive coordinate on the axis (to the right of the graph) against individuals with a strongly negative coordinate on the axis (to the left of the graph).\u003c/p\u003e\n \u003cp\u003eGroup 1 (characterized by a positive coordinate on the axis) shares:\u003c/p\u003e\n \u003cul\u003e\n \u003cli\u003e\n \u003cp\u003ehigh values for variables such as \u003cem\u003eE4, A5, A4, A3, E5, C3, C1, A2, E3\u003c/em\u003e, and \u003cem\u003eage\u003c/em\u003e (variables are sorted by strongest).\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003elow values for variables such as \u003cem\u003eN2, N3, N4, N1\u003c/em\u003e, C5, N5, E2, C4, E1, \u003cem\u003eand\u003c/em\u003e O2 \u003cem\u003e(variables are ordered by the weakest).\u003c/em\u003e\u003c/p\u003e\n \u003c/li\u003e\n \u003c/ul\u003e\n \u003cp\u003eGroup 2 (characterized by a negative coordinate on the axis) shares:\u003c/p\u003e\n \u003cul\u003e\n \u003cli\u003e\n \u003cp\u003ehigh values for variables such as \u003cem\u003eE2, E1\u003c/em\u003e, C4, C5, N4, \u003cem\u003eA1\u003c/em\u003e, O5, \u003cem\u003eN3, N2, and O2\u003c/em\u003e (\u003cem\u003evariables are ordered by the strongest).\u003c/em\u003e\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003elow values for variables such as \u003cem\u003eE4, A3, E3, A5, E5\u003c/em\u003e, A2, A4, \u003cem\u003eO3\u003c/em\u003e, C2, and \u003cem\u003eC1\u003c/em\u003e (variables are sorted by the weakest).\u003c/p\u003e\n \u003c/li\u003e\n \u003c/ul\u003e\n \u003cp\u003eDimension \u003cstrong\u003e2\u003c/strong\u003e contrasts individuals with a strongly positive coordinate on the axis (toward the top of the graph) against individuals with a strongly negative coordinate on the axis (toward the bottom of the graph).\u003c/p\u003e\n \u003cp\u003eGroup 1 (characterized by a positive coordinate on the axis) shares:\u003c/p\u003e\n \u003cul\u003e\n \u003cli\u003e\n \u003cp\u003ehigh values for variables such as N2, N3, N1, N5, N4, E3, \u003cem\u003eA2\u003c/em\u003e, A3, \u003cem\u003egender\u003c/em\u003e, \u003cem\u003eand\u003c/em\u003e E5 (variables are ordered by strongest).\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003elow values for the \u003cem\u003eE1\u003c/em\u003e and \u003cem\u003eage\u003c/em\u003e variables (the variables are sorted starting from the weakest).\u003c/p\u003e\n \u003c/li\u003e\n \u003c/ul\u003e\n \u003cp\u003eGroup 2 (characterized by a negative coordinate on the axis) shares:\u003c/p\u003e\n \u003cul\u003e\n \u003cli\u003e\n \u003cp\u003ehigh values for variables such as \u003cem\u003eE2, E1\u003c/em\u003e, C4, C5, N4, \u003cem\u003eA1\u003c/em\u003e, O5, \u003cem\u003eN3, N2, and O2 (variables are ordered by the strongest).\u003c/em\u003e\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003elow values for variables such as \u003cem\u003eE4, A3, E3, A5, E5\u003c/em\u003e, A2, A4, \u003cem\u003eO3\u003c/em\u003e, C2, and \u003cem\u003eC1\u003c/em\u003e (variables are sorted by the weakest).\u003c/p\u003e\n \u003c/li\u003e\n \u003c/ul\u003e\n \u003cp\u003eGroup 3 (characterized by a negative coordinate on the axis) shares:\u003c/p\u003e\n \u003cul\u003e\n \u003cli\u003e\n \u003cp\u003ehigh values for variables such as \u003cem\u003eE4, A5, A4, A3, E5, C3, C1, A2, E3\u003c/em\u003e, and \u003cem\u003eage\u003c/em\u003e (variables are sorted by strongest).\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003elow values for variables such as \u003cem\u003eN2, N3, N4, N1\u003c/em\u003e, C5, N5, E2, C4, E1, \u003cem\u003eand\u003c/em\u003e O2 \u003cem\u003e(variables are ordered by the weakest).\u003c/em\u003e\u003c/p\u003e\n \u003c/li\u003e\n \u003c/ul\u003e\n \u003c/div\u003e\n \u003cdiv id=\"Sec14\" class=\"Section3\"\u003e\n \u003ch2\u003e3.2.3. 3:4 Plan Description\u003c/h2\u003e\n \u003cp\u003eDimension \u003cstrong\u003e3\u003c/strong\u003e contrasts individuals with a strongly positive coordinate on the axis (to the right of the graph) against individuals with a strongly negative coordinate on the axis (to the left of the graph).\u003c/p\u003e\n \u003cp\u003eGroup 1 (characterized by a positive coordinate on the axis) shares:\u003c/p\u003e\n \u003cul\u003e\n \u003cli\u003e\n \u003cp\u003ehigh values for variables such as O1, \u003cem\u003eO3, O4, C1\u003c/em\u003e, education, N4, E2, A1, C2, \u003cem\u003eand E1 (variables are ordered by strongest).\u003c/em\u003e\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003elow values for variables O5, O2, \u003cem\u003egender, A4, E4\u003c/em\u003e, A3, A2, A5, \u003cem\u003eN5\u003c/em\u003e and \u003cem\u003eC4\u003c/em\u003e (variables are ordered by the weakest).\u003c/p\u003e\n \u003c/li\u003e\n \u003c/ul\u003e\n \u003cp\u003eGroup 2 (characterized by a negative coordinate on the axis) shares:\u003c/p\u003e\n \u003cul\u003e\n \u003cli\u003e\n \u003cp\u003ehigh values for variables C4, \u003cem\u003eC5, E4, A5, E3\u003c/em\u003e, A3, O3, A\u003cem\u003e2\u003c/em\u003e and \u003cem\u003eO1\u003c/em\u003e (variables are ordered by the strongest).\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003elow values for variables such as C2, C3, C1, \u003cem\u003eE1, E2, A1\u003c/em\u003e, N3, N2, N1, and \u003cem\u003eN5 (variables are sorted by the weakest).\u003c/em\u003e\u003c/p\u003e\n \u003c/li\u003e\n \u003c/ul\u003e\n \u003cp\u003eGroup 3 (characterized by a negative coordinate on the axis) shares:\u003c/p\u003e\n \u003cul\u003e\n \u003cli\u003e\n \u003cp\u003ehigh values for variables such as \u003cem\u003eO5, O2, sex\u003c/em\u003e, A4, C3, C2, N5, \u003cem\u003eA2\u003c/em\u003e, E1 \u003cem\u003eand A3\u003c/em\u003e (variables are ordered by strongest).\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003elow values for variables O3, \u003cem\u003eO1, C5, O4\u003c/em\u003e, C4, E3, \u003cem\u003eeducation\u003c/em\u003e, N4, \u003cem\u003eand age\u003c/em\u003e (variables are sorted by the weakest).\u003c/p\u003e\n \u003c/li\u003e\n \u003c/ul\u003e\n \u003cp\u003eDimension \u003cstrong\u003e4\u003c/strong\u003e contrasts individuals with a strongly positive coordinate on the axis (toward the top of the graph) against individuals with a strongly negative coordinate on the axis (toward the bottom of the graph).\u003c/p\u003e\n \u003cp\u003eGroup 1 (characterized by a positive coordinate on the axis) shares:\u003c/p\u003e\n \u003cul\u003e\n \u003cli\u003e\n \u003cp\u003ehigh values for variables such as \u003cem\u003eO5, O2, sex\u003c/em\u003e, A4, C3, C2, N5, \u003cem\u003eA2\u003c/em\u003e, E1 \u003cem\u003eand A3\u003c/em\u003e (variables are ordered by strongest).\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003elow values for variables O3, \u003cem\u003eO1, C5, O4\u003c/em\u003e, C4, E3, \u003cem\u003eeducation\u003c/em\u003e, N4, \u003cem\u003eand age\u003c/em\u003e (variables are sorted by the weakest).\u003c/p\u003e\n \u003c/li\u003e\n \u003c/ul\u003e\n \u003cp\u003eGroup 2 (characterized by a negative coordinate on the axis) shares:\u003c/p\u003e\n \u003cul\u003e\n \u003cli\u003e\n \u003cp\u003ehigh values for variables C4, \u003cem\u003eC5, E4, A5, E3\u003c/em\u003e, A3, O3, A\u003cem\u003e2\u003c/em\u003e and \u003cem\u003eO1\u003c/em\u003e (variables are ordered by the strongest).\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003elow values for variables such as C2, C3, C1, \u003cem\u003eE1, E2, A1\u003c/em\u003e, N3, N2, N1, and \u003cem\u003eN5 (variables are sorted by the weakest).\u003c/em\u003e\u003c/p\u003e\n \u003c/li\u003e\n \u003c/ul\u003e\n \u003cp\u003eGroup 3 (characterized by a negative coordinate on the axis) shares:\u003c/p\u003e\n \u003cul\u003e\n \u003cli\u003e\n \u003cp\u003ehigh values for variables such as O1, \u003cem\u003eO3, O4, C1\u003c/em\u003e, education, N4, E2, A1, C2, \u003cem\u003eand E1 (variables are ordered by strongest).\u003c/em\u003e\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003elow values for variables O5, O2, \u003cem\u003egender, A4, E4\u003c/em\u003e, A3, A2, A5, \u003cem\u003eN5\u003c/em\u003e and \u003cem\u003eC4\u003c/em\u003e (variables are ordered by the weakest).\u003c/p\u003e\n \u003c/li\u003e\n \u003c/ul\u003e\n \u003c/div\u003e\n \u003cdiv id=\"Sec15\" class=\"Section3\"\u003e\n \u003ch2\u003e3.2.4. 5:6 Plan Description\u003c/h2\u003e\n \u003cp\u003eDimension \u003cstrong\u003e5\u003c/strong\u003e contrasts individuals with a strongly positive coordinate on the axis (to the right of the graph) against individuals with a strongly negative coordinate on the axis (to the left of the graph).\u003c/p\u003e\n \u003cp\u003eGroup 1 (characterized by a positive coordinate on the axis) shares:\u003c/p\u003e\n \u003cul\u003e\n \u003cli\u003e\n \u003cp\u003ehigh values for variables such as \u003cem\u003eage\u003c/em\u003e, education, gender, A2, \u003cem\u003eO4, N4\u003c/em\u003e, E2, \u003cem\u003eN5\u003c/em\u003e, A3, and \u003cem\u003eC5\u003c/em\u003e (variables are ordered by strongest).\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003elow values for variables such as \u003cem\u003eA1, E3, E4, O5, O2\u003c/em\u003e, O1, \u003cem\u003eE5\u003c/em\u003e, C4, \u003cem\u003eC\u003c/em\u003e2, \u003cem\u003eand N1\u003c/em\u003e (variables are ordered by the weakest).\u003c/p\u003e\n \u003c/li\u003e\n \u003c/ul\u003e\n \u003cp\u003eGroup 2 (characterized by a negative coordinate on the axis) shares:\u003c/p\u003e\n \u003cul\u003e\n \u003cli\u003e\n \u003cp\u003ehigh values for variables A1, E5, \u003cem\u003eN1, N2\u003c/em\u003e, \u003cem\u003eE4\u003c/em\u003e, E3 \u003cem\u003eand\u003c/em\u003e O5 \u003cem\u003e(variables are ordered by the strongest).\u003c/em\u003e\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003elow values for variables such as \u003cem\u003eO4, A2\u003c/em\u003e, E1, E2, \u003cem\u003eA3, N4\u003c/em\u003e, age, \u003cem\u003eeducation\u003c/em\u003e, A5 \u003cem\u003eand A4\u003c/em\u003e (variables are ordered by the weakest).\u003c/p\u003e\n \u003c/li\u003e\n \u003c/ul\u003e\n \u003cp\u003eGroup 3 (characterized by a negative coordinate on the axis) shares:\u003c/p\u003e\n \u003cul\u003e\n \u003cli\u003e\n \u003cp\u003ehigh values for variables such as E1, E3, O4, A5, A1, \u003cem\u003eC4, O1\u003c/em\u003e, O2, A3, \u003cem\u003eand E2\u003c/em\u003e (variables are ordered by strongest).\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003elow values for the variables \u003cem\u003eage\u003c/em\u003e, education, sex, \u003cem\u003eN2, N1 and\u003c/em\u003e E5 \u003cem\u003e(the variables are ordered by the weakest).\u003c/em\u003e\u003c/p\u003e\n \u003c/li\u003e\n \u003c/ul\u003e\n \u003cp\u003eDimension \u003cstrong\u003e6\u003c/strong\u003e contrasts individuals with a strongly positive coordinate on the axis (toward the top of the graph) against individuals with a strongly negative coordinate on the axis (toward the bottom of the graph).\u003c/p\u003e\n \u003cp\u003eGroup 1 (characterized by a positive coordinate on the axis) shares:\u003c/p\u003e\n \u003cul\u003e\n \u003cli\u003e\n \u003cp\u003ehigh values for variables such as E1, E3, O4, A5, A1, \u003cem\u003eC4, O1\u003c/em\u003e, O2, A3, \u003cem\u003eand E2\u003c/em\u003e (variables are ordered by strongest).\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003elow values for the variables \u003cem\u003eage\u003c/em\u003e, education, sex, \u003cem\u003eN2, N1 and\u003c/em\u003e E5 \u003cem\u003e(the variables are ordered by the weakest).\u003c/em\u003e\u003c/p\u003e\n \u003c/li\u003e\n \u003c/ul\u003e\n \u003cp\u003eGroup 2 (characterized by a negative coordinate on the axis) shares:\u003c/p\u003e\n \u003cul\u003e\n \u003cli\u003e\n \u003cp\u003ehigh values for variables A1, E5, \u003cem\u003eN1, N2\u003c/em\u003e, \u003cem\u003eE4\u003c/em\u003e, E3 \u003cem\u003eand\u003c/em\u003e O5 \u003cem\u003e(variables are ordered by the strongest).\u003c/em\u003e\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003elow values for variables such as \u003cem\u003eO4, A2\u003c/em\u003e, E1, E2, \u003cem\u003eA3, N4\u003c/em\u003e, age, \u003cem\u003eeducation\u003c/em\u003e, A5 \u003cem\u003eand A4\u003c/em\u003e (variables are ordered by the weakest).\u003c/p\u003e\n \u003c/li\u003e\n \u003c/ul\u003e\n \u003cp\u003eGroup 3 (characterized by a negative coordinate on the axis) shares:\u003c/p\u003e\n \u003cul\u003e\n \u003cli\u003e\n \u003cp\u003ehigh values for variables such as \u003cem\u003eage\u003c/em\u003e, education, gender, A2, \u003cem\u003eO4, N4\u003c/em\u003e, E2, \u003cem\u003eN5\u003c/em\u003e, A3, and \u003cem\u003eC5\u003c/em\u003e (variables are ordered by strongest).\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003elow values for variables such as \u003cem\u003eA1, E3, E4, O5, O2\u003c/em\u003e, O1, \u003cem\u003eE5\u003c/em\u003e, C4, \u003cem\u003eC\u003c/em\u003e2, \u003cem\u003eand N1\u003c/em\u003e (variables are ordered by the weakest).\u003c/p\u003e\n \u003c/li\u003e\n \u003c/ul\u003e\n \u003c/div\u003e\n \u003cdiv id=\"Sec16\" class=\"Section3\"\u003e\n \u003ch2\u003e3.2.5. Description of size 7\u003c/h2\u003e\n \u003cp\u003eDimension \u003cstrong\u003e7\u003c/strong\u003e contrasts individuals with a strongly positive coordinate on the axis (to the right of the graph) against individuals with a strongly negative coordinate on the axis (to the left of the graph).\u003c/p\u003e\n \u003cp\u003eGroup 1 (characterized by a positive coordinate on the axis) shares:\u003c/p\u003e\n \u003cul\u003e\n \u003cli\u003e\n \u003cp\u003ehigh values for variables such as \u003cem\u003eeducation\u003c/em\u003e, O5, age, O2, \u003cem\u003eC4, C5, A1, E3\u003c/em\u003e, \u003cem\u003eE4\u003c/em\u003e, and \u003cem\u003eA5\u003c/em\u003e (variables are ordered by strongest).\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003elow values for the \u003cem\u003esex variables\u003c/em\u003e, A2, N2 \u003cem\u003eand\u003c/em\u003e N3 \u003cem\u003e(the variables are ordered by the weakest).\u003c/em\u003e\u003c/p\u003e\n \u003c/li\u003e\n \u003c/ul\u003e\n \u003cp\u003eGroup 2 (characterized by a negative coordinate on the axis) shares:\u003c/p\u003e\n \u003cul\u003e\n \u003cli\u003e\n \u003cp\u003ehigh values for \u003cem\u003egender\u003c/em\u003e, \u003cem\u003eN5\u003c/em\u003e and \u003cem\u003eO3\u003c/em\u003e variables (variables are ordered by strongest).\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003elow values for variables such as \u003cem\u003eage, O2, education\u003c/em\u003e, O5, E1, \u003cem\u003eA1, C4\u003c/em\u003e, N1, O1, \u003cem\u003eand\u003c/em\u003e N4 \u003cem\u003e(variables are ordered by the weakest).\u003c/em\u003e\u003c/p\u003e\n \u003c/li\u003e\n \u003c/ul\u003e\n \u003c/div\u003e\n\u003c/div\u003e"},{"header":"4. Classification","content":"\u003cp\u003e \u003c/p\u003e \u003cp\u003eThe classification made on individuals reveals 4 clusters.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eThe \u003cb\u003ecluster 1\u003c/b\u003e is made of individuals sharing:\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003ehigh values for the variables \u003cem\u003eA4\u003c/em\u003e and \u003cem\u003eN2\u003c/em\u003e (variables are sorted from the strongest).\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003elow values for the variables \u003cem\u003eX\u003c/em\u003e, \u003cem\u003eC5\u003c/em\u003e, \u003cem\u003eeducation\u003c/em\u003e and \u003cem\u003eA1\u003c/em\u003e (variables are sorted from the weakest).\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003cp\u003eThe \u003cb\u003ecluster 2\u003c/b\u003e is made of individuals sharing:\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003ehigh values for the variable \u003cem\u003eC5\u003c/em\u003e.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003elow values for the variables \u003cem\u003eX\u003c/em\u003e, \u003cem\u003eA4\u003c/em\u003e, \u003cem\u003eE3\u003c/em\u003e and \u003cem\u003eC1\u003c/em\u003e (variables are sorted from the weakest).\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003cp\u003eThe \u003cb\u003ecluster 3\u003c/b\u003e is made of individuals sharing:\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003ehigh values for the variables \u003cem\u003eX\u003c/em\u003e, \u003cem\u003eE4\u003c/em\u003e, \u003cem\u003eE3\u003c/em\u003e, \u003cem\u003eE5\u003c/em\u003e, \u003cem\u003eA1\u003c/em\u003e, \u003cem\u003eO3\u003c/em\u003e, \u003cem\u003eA5\u003c/em\u003e and \u003cem\u003eeducation\u003c/em\u003e (variables are sorted from the strongest).\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003cp\u003eThe \u003cb\u003ecluster 4\u003c/b\u003e is made of individuals sharing:\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003ehigh values for the variables \u003cem\u003eX\u003c/em\u003e, \u003cem\u003eE2\u003c/em\u003e and \u003cem\u003eC2\u003c/em\u003e (variables are sorted from the strongest).\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003elow values for the variables \u003cem\u003eE5\u003c/em\u003e and \u003cem\u003eN2\u003c/em\u003e (variables are sorted from the weakest).\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eThe hierarchical tree can be drawn on the factorial map with the individuals colored according to their clusters.\u003c/p\u003e"},{"header":"5. Discussions and Conclusions","content":"\u003cp\u003eThis study explored the dimensions of personality through a comprehensive analysis of the BFI dataset, using descriptive statistical techniques and principal component analysis (PCA). The results revealed significant correlations between various personality traits, such as anger proneness (N1) and irritability (N2), as well as negative correlations between difficulty in approaching others (E2) and the ability to capture people's attention (E3).\u003c/p\u003e\n\u003cp\u003eHistograms provided further insights, showing that most individuals care about others' feelings and there is a general satisfaction with work, albeit not perfect. Additionally, it emerged that many people deeply investigate a topic, suggesting a high level of curiosity or openness.\u003c/p\u003e\n\u003cp\u003eThe PCA analysis highlighted that the first two dimensions only account for 28.28% of the total inertia, suggesting that human personality is a multidimensional and complex construct. The description of the planes allowed for distinguishing groups with distinct behavioral traits, providing a visual representation of personality dynamics.\u003c/p\u003e\n\u003cp\u003eDimensions 3 and 4 provided further insights, distinguishing groups of individuals with distinct behavioral traits based on high and low values in specific variables. The third dimension contrasted individuals with strong positive coordinates, associated with traits such as openness (O1, O3, O4) and conscientiousness (C1), with those with strong negative coordinates, exhibiting opposite tendencies. This suggests that openness and conscientiousness may be key factors in differentiating personality profiles.\u003c/p\u003e\n\u003cp\u003eThe fourth dimension further differentiated individuals, with Group 1 showing high values in traits related to openness and educational experience, while Group 2 highlighted a combination of extraversion (E4) and agreeableness (A5). These results underscore the complexity of interactions between personality traits and how they manifest in unique combinations in each individual.\u003c/p\u003e\n\u003cp\u003eThe fifth dimension highlighted differences based on age, education, and gender, suggesting that these demographic factors can influence or be associated with specific personality traits. Specifically, the association between age and variables like O4 (openness to new experiences) and N4 (emotional stability) may indicate how personality evolves or is perceived throughout life.\u003c/p\u003e\n\u003cp\u003eThe sixth dimension revealed contrasts between individuals with strong positive coordinates, displaying traits of extraversion and openness, and those with negative coordinates, tending to exhibit traits of neuroticism and introversion. This could reflect how individuals adapt to and interact with their social environment.\u003c/p\u003e\n\u003cp\u003eLastly, the seventh dimension shed light on the importance of education and age in personality traits, with one group showing high values in education and openness, while the other group exhibited higher neuroticism and lower extraversion. This suggests that education and life experience can have a significant impact on how people perceive themselves and behave.\u003c/p\u003e\n\u003cp\u003eIn conclusion, this study has expanded our understanding of personality dynamics, highlighting the importance of considering a variety of demographic factors and behavioral traits. The findings have implications for the development of more sophisticated psychometric tools and the customization of interventions in the fields of psychology and well-being. Future research could explore the interactions between these traits in different social and cultural contexts, as well as their impact on individual behavior and decision-making.\u003c/p\u003e\n\u003cp\u003eWe can draw some key conclusions regarding the classification of personality traits:\u003c/p\u003e\n\u003col\u003e\n \u003cli\u003eCluster 1:\u003cul\u003e\n \u003cli\u003eComprised of individuals with high values for variables A4 and N2.\u003c/li\u003e\n \u003cli\u003eLow values for variables X, C5, education, and A1.\u003c/li\u003e\n \u003cli\u003eThis cluster may represent a group of individuals with specific personality characteristics.\u003c/li\u003e\n \u003c/ul\u003e\n \u003c/li\u003e\n \u003cli\u003eCluster 2:\u003cul\u003e\n \u003cli\u003eComprised of individuals with high values for variable C5.\u003c/li\u003e\n \u003cli\u003eLow values for variables X, A4, E3, and C1.\u003c/li\u003e\n \u003cli\u003eThis cluster may reflect another personality profile.\u003c/li\u003e\n \u003c/ul\u003e\n \u003c/li\u003e\n \u003cli\u003eCluster 3:\u003cul\u003e\n \u003cli\u003eComprised of individuals with high values for variables X, E4, E3, E5, A1, O3, A5, and education.\u003c/li\u003e\n \u003cli\u003eThis cluster appears to include individuals with a wide range of personality traits.\u003c/li\u003e\n \u003c/ul\u003e\n \u003c/li\u003e\n \u003cli\u003eCluster 4:\u003cul\u003e\n \u003cli\u003eComprised of individuals with high values for variables X, E2, and C2.\u003c/li\u003e\n \u003cli\u003eLow values for variables E5 and N2.\u003c/li\u003e\n \u003cli\u003eThis cluster may represent a group with specific traits.\u003c/li\u003e\n \u003c/ul\u003e\n \u003c/li\u003e\n\u003c/ol\u003e\n\u003cp\u003eHowever, it is important to note some general limitations of our study. Firstly, the results are based on a specific sample of 2800 people in 2010 (Revelle, Wilt, \u0026amp; Rosenthal, Individual Differences in Cognition: New Methods for Examining the Personality-Cognition Link, 2010). Secondly, the analysis was conducted using specific statistical methods and tools (PCA, clustering, and histograms), which may affect the generalizability of the results to different contexts. Lastly, the BFI dataset from the psych package may have certain peculiarities that should be considered when applying the results to other situations.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003eThe authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.\u003c/p\u003e\n\u003ch1\u003eData availability\u003c/h1\u003e\n\u003cp\u003eLink to OSF is provided in the article.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n \u003cli\u003eArg\u0026uuml;elles, M., Benavides, C., \u0026amp; Fern\u0026aacute;ndez, I. (2014). A new approach to the identification of regional clusters: hierarchical clustering on principal components. \u003cem\u003eApplied Economics, 46\u003c/em\u003e(21), 2511-2519. doi:https://doi.org/10.1080/00036846.2014.904491\u003c/li\u003e\n \u003cli\u003eGoldberg, L. R. (2006). \u003cem\u003eInternational Personality Item Pool: A Scientific Collaboratory for the Development of Advanced Measures of Personality Traits and Other Individual Differences\u003c/em\u003e. Retrieved from IPIP: https://ipip.ori.org/index.htm\u003c/li\u003e\n \u003cli\u003eGoldberg, L. R., Johnson, J., Eber, H., Hogan, R., Ashton, M., Cloninger, C., \u0026amp; Gough, H. (2006). The international personality item pool and the future of public-domain personality measures. \u003cem\u003eJournal of Research in Personality, 40\u003c/em\u003e(1), 84-96. doi:https://doi.org/10.1016/j.jrp.2005.08.007\u003c/li\u003e\n \u003cli\u003eHusson, F., Monge, A., \u0026amp; Vaissie, P. (2023). Factoshiny: Perform Factorial Analysis from \u0026apos;FactoMineR\u0026apos; with a Shiny. Retrieved from https://CRAN.R-project.org/package=Factoshiny\u003c/li\u003e\n \u003cli\u003eJolliffe, I., \u0026amp; Cadima, J. (2016, 04). Principal component analysis: a review and recent developments. \u003cem\u003ePhilosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 374\u003c/em\u003e, 20150202. doi:https://doi.org/10.1098/rsta.2015.0202\u003c/li\u003e\n \u003cli\u003eKabigting, F. (2021, August). The Discovery and Evolution of the Big Five of Personality Traits: A Historical Review. \u003cem\u003eGNOSI: An Interdisciplinary Journal of Human Theory and Praxis, 4\u003c/em\u003e(3), 83-100. doi:https://doi.org/10.13140/RG.2.2.13907.40480\u003c/li\u003e\n \u003cli\u003eKendall, M. (1938, June). A New Measure of Rank Correlation. \u003cem\u003eBiometrika, 30\u003c/em\u003e(1/2), 81-93. Retrieved from https://doi.org/10.2307/2332226\u003c/li\u003e\n \u003cli\u003e[dataset] Ravelle, W., \u0026amp; Condon, D. (2019, February). 25 Personality items representing 5 factors. Retrieved from https://doi.org/10.17605/OSF.IO/K39BG\u003c/li\u003e\n \u003cli\u003eRevelle, W. (2023). psych: Procedures for Psychological, Psychometric, and Personality Research. Evanston, Illinois, U.S.A. Retrieved from https://CRAN.R-project.org/package=psych\u003c/li\u003e\n \u003cli\u003eRevelle, W., Wilt, J., \u0026amp; Rosenthal, A. (2010). Individual Differences in Cognition: New Methods for Examining the Personality-Cognition Link. In \u003cem\u003eHandbook of Individual Differences in Cognition: Attention, Memory, and Executive Control\u003c/em\u003e (pp. 27-49). New York, NY: Gruszka, Aleksandra and Matthews, Gerald and Szymura, Blazej. Retrieved from https://link.springer.com/chapter/10.1007/978-1-4419-1210-7_2\u003c/li\u003e\n \u003cli\u003eScott, D. (1979, 12). On optimal and data-based histograms.\u0026nbsp;\u003cem\u003eBiometrika\u003c/em\u003e, 605-610. doi:https://doi.org/10.1093/biomet/66.3.605\u003c/li\u003e\n \u003cli\u003eWei, T., \u0026amp; Simko, V. (2021).\u0026nbsp;R package \u0026apos;corrplot\u0026apos;: Visualization of a Correlation Matrix. Retrieved from https://github.com/taiyun/corrplot\u003c/li\u003e\n \u003cli\u003eWickam, H. (2016). \u003cem\u003eggplot2: Elegant Graphics for Data Analysis.\u003c/em\u003e Springer-Verlag New York. Retrieved from https://ggplot2.tidyverse.org\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"Libera Università Maria SS. Assunta","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Big Five theory, personality, neuroticism, openness, conscientiousness, extraversion, agreeableness, PCA, Cluster Analysis.","lastPublishedDoi":"10.21203/rs.3.rs-4232726/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-4232726/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eThis study presents a comprehensive analysis of the BFI dataset using the 'psych' package in R. Through principal component analysis and descriptive statistical techniques such as PCA, histograms and Classification. We explored the dimensions of personality based on 25 evaluative items from the International Personality Item Pool. Columns A-E highlight the behavioral traits of agreeableness, conscientiousness, extraversion, neuroticism, and openness, measured on a 6-point response scale. The results provide significant insights into the correlations among different traits and offer a richer understanding of human personality dynamics.\u003c/p\u003e","manuscriptTitle":"Exploration of the Big Five: Dimensionality Reduction and Clustering Techniques","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2024-04-09 11:14:59","doi":"10.21203/rs.3.rs-4232726/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"8223f892-db54-44de-b43d-fa869525068b","owner":[],"postedDate":"April 9th, 2024","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[{"id":30375059,"name":"Psychology"},{"id":30375060,"name":"Applied Statistics"},{"id":30375061,"name":"Statistical Theory"}],"tags":[],"updatedAt":"2024-04-09T11:14:59+00:00","versionOfRecord":[],"versionCreatedAt":"2024-04-09 11:14:59","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-4232726","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-4232726","identity":"rs-4232726","version":["v1"]},"buildId":"qtupq5eGEP_6zYnWcrvyt","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2024) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00