Empirical Examination of Large Language Models in Regional Psychological Structures Simulation: Personality and Well-being

preprint OA: closed
Full text JSON View at publisher
Full text 270,616 characters · extracted from preprint-html · click to expand
Empirical Examination of Large Language Models in Regional Psychological Structures Simulation: Personality and Well-being | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Empirical Examination of Large Language Models in Regional Psychological Structures Simulation: Personality and Well-being Xing Zhou, Jiahong Zheng This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-7665724/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted 14 You are reading this latest preprint version Abstract The rapid advancement of large language models (LLMs) had opened new avenues for simulating psychological structures at the population level. This study compared the performance of Kimi-Chat v1.5 (a China-trained model) and GPT-4 (a globally trained model) in reproducing regional psychological profiles in China, focusing on the Big-Five personality traits and subjective well-being. By using the 2018 China Family Panel Studies (CFPS 2018) as the benchmark for real human data, we assessed the fidelity of both LLMs in capturing regional variations across seven major Chinese regions. Results indicated that Kimi-Chat v1.5 more accurately replicated human responses, particularly in regions with distinct cultural characteristics, while GPT-4 showed significant discrepancies, particularly in well-being and openness. Our findings emphasized the importance of training-corpus lineage and suggested that culturally adapted LLMs could be a useful tool in regional psychological research. We discussed the implications of these findings for future applications and highlighted the limitations of current LLM capabilities in simulating human psychological complexity. Humanities/Cultural and media studies Social science/Cultural and media studies Biological sciences/Psychology Social science/Psychology large language model big five personality subjective well-being regional psychological structure virtual participants Figures Figure 1 Figure 2 Figure 3 Introduction Large language models (LLMs) are exhibiting ever-stronger language comprehension and generation capabilities (Bubeck et al., 2023 ; Rathje et al., 2024 ), thereby furnishing computational social psychology with new opportunities for intelligent investigations of collective psychological structures (Ke et al., 2025 ; Kovač et al., 2023 ). Empirical work has demonstrated that GPT-4 can generate questionnaire responses that align closely with human self-reports of personality traits (De Winter et al., 2024 ; A. Wang et al., 2025 ), implying that LLMs possess a “role-playing” capacity that allows them to impersonate individuals with specified psychological profiles (Mei et al., 2024 ; Strachan et al., 2024 ). Meanwhile, the computational social-science community has begun to explore the use of LLMs to create “virtual participants” in order to reduce research costs and streamline experimental workflows (Bisbee et al., 2024 ; Dillion et al., 2023 ; Grossmann et al., 2023 ; Sarstedt et al., 2024 ). Yet extant studies overwhelmingly rely on a single foreign model, leaving open the question of whether domestically trained Chinese LLMs and foreign LLMs yield equivalent regional psychological profiles. Cross-cultural psychology has long maintained that psychological characteristics are not randomly distributed; instead, they are embedded in local socio-economic and cultural ecologies (Talhelm et al., 2014 ). For instance, Talhelm et al. ( 2014 ) found that students from China’s rice-cultivating south displayed more holistic, interdependent cognition than those from the wheat-cultivating north, who exhibited more analytic and independent thinking. Such regional disparities are expected to surface on Big-Five personality scales and subjective well-being measures (Anglim et al., 2020 ). The Big-Five model-encompassing openness, conscientiousness, extraversion, agreeableness, and neuroticism-offers a structured framework for capturing stable individual differences (McCrae & John, 1992 ; Paunonen & Ashton, 2001 ). Meta-analyses indicate that conscientiousness is associated with lower health-risk behaviors and greater health-promoting behaviors (Bogg & Roberts, 2004 ). Concurrently, subjective well-being serves as a widely used indicator of life quality and psychological adaptation (Schimmack et al., 2002 , 2004 ). Research further shows that personality–subjective well-being associations are moderated by congruence between personality and life choices, with effects surpassing those of income (Matz et al., 2016 ). Consequently, personality traits and subjective well-being jointly form an ideal evaluation perspective, which can be used to determine whether LLMs conform to the characteristics of psychological structures with subtle regional differences. Traditional surveys, however, are susceptible to social-desirability biases and response-style distortions (Tourangeau & Yan, 2007 ) and incur substantial logistical costs when spanning multiple regions (Baumeister et al., 2007 ). LLM-based virtual sampling promises to circumvent these hurdles, yet risks amplifying internet-derived cultural stereotypes (Argyle et al., 2023 ; Lucy & Bamman, 2021 ). Whether Kimi-Chat v1.5-a large language model trained on Chinese corpora-outperforms GPT-4 in capturing authentic regional variations remains untested. In this study, we selected two language models, Kimi-Chat v1.5 and GPT-4, for comparative analysis, primarily based on their significant differences in training corpora, cultural adaptability, and application scenarios. As a model specifically designed for the Chinese language environment and local Chinese culture, Kimi-Chat’s training data mainly came from Chinese internet content and cultural contexts. In contrast, GPT-4 was built on a large-scale multilingual corpus worldwide, covering a broader range of cultures and language types. This difference provided us with an ideal framework to examine the performance of the two models in simulating regional psychological characteristics. The considerations behind selecting these two models were reflected in the following aspects. First, Kimi-Chat had an inherent advantage in adaptability to local culture, enabling it to better capture the human psychological characteristics and subjective well-being across major macro-regions in China. GPT-4, on the other hand, with its global training corpora and cross-cultural adaptability, provided us with a comparative perspective different from that of Kimi-Chat. By comparing the performance of these two models in simulating regional differences, we could understand the strengths and limitations of the models in capturing psychological characteristics under different cultural backgrounds. Second, selecting these two models helped us more clearly reveal the potential and limitations of large language models (LLMs) in capturing complex psychological structures. Human psychological characteristics, especially personality traits and subjective well-being, usually exhibit significant regional differences, which may be closely related to culture, social environment, and historical context. By comparing the models’ outputs with real human data, we could not only test the accuracy of the models in reproducing regional psychological structures but also identify the challenges they might face in understanding and simulating these psychological differences. Through this comparative study, we aimed to identify the strengths and weaknesses of the models. For example, could Kimi-Chat better simulate the unique regional differences in China? Could GPT-4’s global performance accurately reflect the personality traits and well-being within Chinese regions? The answers to these questions could not only provide valuable insights for cross-cultural psychological research but also promote the further application and development of LLMs in the field of psychology. The Present Study Grounded in a four-tier framework-personality traits, subjective well-being, regional culture, and training-corpus lineage-this investigation systematically evaluates the capacity and bias of large language models when simulating population-level psychological structures within China. Leveraging the 2018 China Family Panel Studies (CFPS2018, N = 37,354; http://www.isss.pku.edu.cn/cfps/ ) as an invariant demographic scaffold, we construct three parallel samples: human respondents, virtual participants generated by Kimi-Chat v1.5, and virtual participants generated by GPT-4. Kimi-Chat v1.5 is a China-trained model with strong Chinese cultural adaptation and efficient performance in specific domains like education and healthcare. GPT-4 is a globally-trained model excelling in multi-language and multi-scenario applications with advanced reasoning and generative capabilities. The training data of the Kimi (Chinese trained LLM) includes long texts such as everyday language, legal documents, and academic papers, and strengthens logical coherence in the Chinese context through a hierarchical training strategy. Its multimodal data covers scenes such as OCR (Optical Character Recognition) and text-image interweaving, and performs better in handling Chinese-specific table structures and cultural allusions. In contrast, the training data of the non-domestic model GPT contains less than 0.1% Chinese language material and mainly relies on general language materials such as English Wikipedia and books, which may lead to structural biases in generating social norms and value expressions that conform to the Chinese cultural background. Thus, in this study, we chosed these two LLMs - Kimi and GPT. First, we test whether the virtual participants produced by each model faithfully reproduce regional trends in subjective well-being relative to human data. Second, we examine whether regional personality profiles generated by each model align with empirically observed patterns across China’s seven macro-regions. Third, capitalizing on the well-established structural link between Big-Five dimensions and subjective well-being (Anglim et al., 2020 ; Steel et al., 2008 ), we assess whether each model replicates this association. Finally, the design involved two independent 2×7 mixed ANOVA models - (Kimi-Chat v1.5 vs Human) × region and model (GPT-4 vs Human) × region - to test the consistency between the outputs of each model and the human data from seven major regions in China. Collectively, integrating large-scale survey data with state-of-the-art generative AI, the current study sets out to evaluate-side by side-whether China-trained Kimi-Chat v1.5 and globally-trained GPT-4 can each reproduce authentic regional distributions of Big-Five personality traits and subjective well-being across China’s seven macro-regions. In this study, we utilized large language models (Kimi-Chat v1.5 and GPT-4) to generate virtual participants for questionnaire testing. Although this approach has been preliminarily explored in recent research (De Winter et al., 2024 ), the present study introduced three innovations. First, it explicitly compared LLMs with different cultural training backgrounds, a core distinction not addressed in prior preliminary studies. Second, it significantly expanded the sample size and measurement scope. Third, it conducted a comparative analysis between the simulated data generated by LLMs and real human participant data. In essence, our study aimed to provide new tools and theoretical perspectives for regional psychological structure research and to offer a transferable methodological framework for culturally adapted LLM-based psychological assessment research worldwide. Methods Real Samples The real samples are from the China Family Panel Studies (CFPS). The survey was launched in 2010 and is conducted every two years, with five rounds of national surveys having been completed so far. The questionnaire data can be applied for free through the project’s official website ( http://www.isss.pku.edu.cn/cfps/ ). This study uses the personality and subjective well-being data from the adult questionnaire of CFPS2018, with a sample of 4,900 participants. Participants completed the Big Five personality inventory and the life satisfaction scale as part of the CFPS2018. These data were collected through structured interviews conducted face-to-face or online, depending on the participant’s accessibility. The sample was carefully stratified by region to ensure that each of the seven major Chinese regions was adequately represented: North China (Beijing, Tianjin, Hebei, Shanxi, and Inner Mongolia), Northeast China (Liaoning, Jilin, and Heilongjiang), East China (Shanghai, Jiangsu, Zhejiang, Anhui, Fujian, Jiangxi, and Shandong), Central China (Henan, Hubei, and Hunan), South China (Guangdong, Guangxi, and Hainan), Southwest China (Chongqing, Sichuan, Guizhou, Yunnan, and Tibet), and Northwest China (Shaanxi, Gansu, Qinghai, Ningxia, and Xinjiang). These seven regional divisions are consistent with common macro-geographical classifications in Chinese social science research (e.g., used in the China Family Panel Studies, CFPS). Random sampling was used to select 700 samples from each region, with an equal number of males and females, and an age range of 18–60 years. Personality was measured using the 15-item Chinese short version of the Big Five Personality Inventory (Hahn et al., 2012 ), and subjective well-being was assessed using a single-item self-rating: “How happy do you/you feel yourself to be?”(with 0 indicating the lowest level of subjective well-being and 10 indicating the highest level of subjective well-being). Virtual Participants In addition to human participants, virtual participants were generated using Kimi-Chat v1.5 and GPT-4. The virtual participants were created by inputting demographic prompts into these LLMs to mirror the regional and gender distributions found in the real human sample. Each model generated a dataset of 4,900 virtual participants, with 700 individuals from each of the seven regions. The virtual participants responded to the same personality and subjective well-being surveys as the human participants, ensuring comparability across datasets. First, enter the prompt in the dialog box of Kimi ( https://kimi.moonshot.cn ) or GPT-4 ( https://openai.com ): “Now, as my assistant for a psychological experiment, please generate a list of NN simulated participants based on random sampling that reflects the GDP, cultural background, and demographic characteristics of China’s 31 provinces and municipalities. Each participant must report the following information: ID (NN-NN), province/municipality, age (18–60 years), and gender (male or female); ensure an equal male-to-female ratio. Please list the full information for all NN participants without any omissions.” Here, “NN” denotes the desired number of participants (e.g., NN = 10 for 10 participants; IDs range from 01 to 10), which controls each round’s sample size and ID range. Next, call the Kimi/GPT API with a temperature of 0.7 to balance diversity and consistency in responses (Argyle et al., 2023 ), then administer the Big-Five personality and subjective well-being questionnaires and generate the corresponding data. The procedure is fully simulated, complies with ethical standards, involves no real participants, and each questionnaire is produced independently. Each “participant” receives a system prompt containing personal information (ID, province, age, gender) and answers from a first-person perspective. The items are presented in a fixed order without randomization. The prompt template is: “Please role-play a person (ID, province, age, gender). Drawing on your imagined life experiences, cultural background, and the prevailing social environment of that locale, respond as realistically as possible from a first-person perspective to the following questions about your personality and feelings. Maintain this identity throughout. You will complete two questionnaires. First, the 15-item Big-Five Inventory assessing Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism. Second, a single-item subjective well-being scale: ‘How happy do you feel with your life?’”. The simulated participants’ personality and subjective well-being assessments are identical to the CFPS2018 questionnaire. Instrument The short Big-Five inventory comprises five dimensions - Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism - each measured by three items. For example, the Neuroticism items are “often worried”, “prone to stress”, and “relaxed and coping well with stress (reverse-scored)” (J. Wang et al., 2022 ). subjective well-being was assessed with the single item, “How happy do you feel with your life?” rated from 0 (very unhappy) to 10 (very happy). Dimension scores are the mean of the three items; reverse-scored items are recoded as 5 minus the original response. subjective well-being scores use the raw item value. Data Analysis As the outputs of LLMs (Kimi/GPT-4) match human response formats, identical preprocessing and correction procedures are applied before analysis to ensure comparability. Any simulated participant whose score falls below 0 (i.e., the model produced an out-of-range response) is deemed invalid and excluded. After exclusion, a new simulated participant with the same demographic profile is regenerated to maintain the total sample size of 4,900. To assess the fidelity of the virtual participant data to the human data, independent-sample t-tests were conducted to compare the means of the Big Five traits and life satisfaction scores across the real and simulated datasets. Cohen's d was used to estimate effect sizes, with values greater than 0.5 indicating medium to large differences between the datasets. Analysis of variance (ANOVA) was conducted to compare the regional differences in personality traits and subjective well-being between human and LLM-generated participants (Kimi/GPT-4). This allowed for an evaluation of the extent to which the models accurately reproduced regional psychological distributions. Regression models were employed to examine the relationships between the Big Five personality traits and subjective well-being. Both human and LLM-generated datasets were analyzed to determine whether personality traits could predict life satisfaction, and whether these predictions aligned across the two types of participants. Principal Component Analysis (PCA) was applied to examine the structural alignment between the human and LLM-generated datasets. The first two principal components (PC1 and PC2) were extracted to determine the extent to which personality traits and subjective well-being loaded onto these components in both the real and simulated data. This analysis provided further insight into the structural differences between the two datasets (human and LLM-generated datasets). Results Comparison between real human and Kimi/GPT-4 simulated samples The real human sample consisted of 4900 individuals (2450 males and 2450 females; age: M = 40.11, SD = 12.01), the Kimi-simulated sample was also 4900 individuals (2450 males and 2450 females; age: M = 40.43, SD = 12.49), and the GPT-4-simulated sample was likewise 4900 individuals (2450 males and 2450 females; age: M = 40.36, SD = 12.70). The three samples were essentially consistent in terms of demographic composition. This study employed one-way ANOVA to compare the differences between Kimi, GPT-4 simulated samples, and real human samples in the Big Five personality dimensions and well-being. The results of the post hoc comparisons are shown in Tables 1 , 2 , 3 , and visualized in Fig. 1 . In the Conscientiousness dimension ( t = 8.39, p < 0.001, Cohen's d = 0.168; t = 18.10, p < 0.001, Cohen's d = 0.365), the scores of the GPT-4 and Kimi simulated samples were significantly lower than those of the real human sample, with a relatively large difference. In contrast, in the Openness ( t = -50.3, p < 0.001, Cohen's d = -1.018; t = -26.4, p < 0.001, Cohen's d = -0.536) and Neuroticism ( t = -3.81, p < 0.001, Cohen's d = -0.079; t = -1.17, p < 0.001, Cohen's d = -0.025) dimensions, the scores of the GPT-4 and Kimi simulated samples were significantly higher than those of the real human sample. In the Agreeableness dimension, the GPT-4 simulated sample scored significantly higher than the real human sample ( t = -4.11, p < 0.001, Cohen's d = -0.085), while the Kimi simulated sample scored significantly lower than the real human sample ( t = 11.1, p < 0.001, Cohen's d = 0.225). In the subjective well-being dimension, the GPT-4 and Kimi simulated samples scored significantly lower than the real human sample ( t = 7.84, p < 0.001, Cohen's d = 0.159; t = 4.80, p < 0.001, Cohen's d = 0.101), indicating that there is still some deviation in the simulation of well-being by GPT-4 and Kimi. Of course, this may also be due to the fact that people tend to overestimate their own well-being when self-assessing. However, in the Extraversion dimension, there was no significant difference between the GPT-4 and Kimi simulated samples and the real human sample ( t = 0.60, p > 0.05, Cohen's d = 0.010; t = 2.15, p > 0.05, Cohen's d = 0.044), indicating a high consistency between the Kimi/GPT-4 model and the human sample in this dimension. In other dimensions, there were mostly some deviations (all p < 0.05), indicating that there is still room for improvement in the Kimi/GPT-4 model in simulating human micro-characteristics (Big Five personality traits and subjective well-being). Table 1 Human and GPT-4 samples in the differences of big Five personality and subjective well-being (Post hoc comparisons) Dimension Mean difference t Cohen’s d Conscientiousness 0.304*** 8.39 0.168 Extraversion 0.023 0.60 0.010 Agreeableness -0.137*** -4.11 -0.085 Openness -2.210*** -50.3 -1.018 Neuroticism -0.146*** -3.81 -0.079 Well-being 0.249*** 7.84 0.159 Note. Mean difference = human score - GPT simulated score; * denotes p < 0.05, ** denotes p < 0.01, *** denotes p < 0.001. Table 2 Human and Kimi samples in the differences of big Five personality and subjective well-being (Post hoc comparisons) Dimension Mean difference t Cohen’s d Conscientiousness 0.801*** 18.1 0.365 Extraversion 0.089 2.15 0.044 Agreeableness 0.509*** 11.1 0.225 Openness -1.364*** -26.4 -0.536 Neuroticism -0.047 -1.17 -0.025 Well-being 0.164*** 4.80 0.101 Note. Mean difference = human score - Kimi simulated score. Table 3 GPT-4 and Kimi samples in the differences of big Five personality and subjective well-being (Post hoc comparisons) Dimension Mean difference t Cohen’s d Conscientiousness 0.498*** 11.7 0.238 Extraversion 0.066 1.76 0.038 Agreeableness 0.646*** 14.7 0.300 Openness 0.843*** 18.8 0.378 Neuroticism 0.099** 2.92 0.060 Well-being -0.086*** -5.12 -0.096 Note. Mean difference = GPT-4 simulated score - Kimi simulated score. Examining Regional Differences in Personality Dimensions and Well-being To examine regional differences in personality dimensions and well-being, this study conducted one-way ANOVA on samples simulated by GPT-4, Kimi, and real human data. The results showed that in the real human sample, all dimensions of the Big Five personality traits significantly varied across regions (Conscientiousness: F (6, 4893) = 3.67, p = 0.036, partial η ² = 0.003; Extraversion: F (6, 4893) = 8.71, p < 0.001, partial η ² = 0.011; Agreeableness: F (6, 4893) = 3.61, p = 0.001, partial η ² = 0.004; Openness: F (6, 4893) = 10.3, p < 0.001, partial η ² = 0.012; Neuroticism: F (6, 4893) = 4.36, p < 0.001, partial η ² = 0.005). Well-being also significantly varied across regions, F (6, 4893) = 12.1, p < .001, partial η ² = 0.015. Post hoc comparisons (Tukey) revealed that in the conscientiousness dimension, the Northeast region scored significantly lower than the Northwest region (mean difference = 0.349, p = 0.017), with no other significant differences; in the extraversion dimension, the North China, Northeast, East China, Central China, and Northwest regions scored significantly lower than the South China region (mean differences = 0.340, 0.553, 0.486, 0.411, 0.459, all p < 0.05) and Southwest region (mean differences = 0.364, 0.577, 0.510, 0.436, 0.483, all p < 0.05); in the agreeableness dimension, the Northeast region scored significantly higher than the South China and Southwest regions (mean differences = 0.383, 0.323, all p < 0.05), with no other significant differences; in the openness dimension, the North China region scored significantly higher than the Northeast, East China, Central China, and South China regions (mean differences = 0.511, 0.713, 0.597, 0.729, all p < 0.05), and the Northeast, East China, Central China, and South China regions scored significantly higher than the Northwest region (mean differences = 0.466, 0.667, 0.551, 0.683, all p < 0.05), while the East China and South China regions scored significantly lower than the Southwest region (mean differences = 0.407, 0.423, all p < 0.05); in the neuroticism dimension, the North China, East China, and Central China regions scored significantly lower than the Northwest region (mean differences = 0.351, 0.346, 0.454, all p < 0.05), and the Central China region scored significantly lower than the Southwest region (mean difference = 0.406, p = 0.004); in terms of well-being, the North China, Northeast, East China, and Central China regions scored significantly higher than the South China region (mean differences = 0.567, 0.536, 0.481, 0.337, all p < 0.05) and Southwest region (mean differences = 0.686, 0.654, 0.600, 0.456, all p < 0.05), and the North China, Northeast, and East China regions scored significantly higher than the Northwest region (mean differences = 0.497, 0.466, 0.411, all p < 0.05). In the GPT-4-simulated sample, regional differences in extraversion did not reach a significant level, but regional differences in conscientiousness, agreeableness, neuroticism, and openness were significant (Conscientiousness: F (6, 4893) = 4.68, p < 0.001, partial η ² = 0.006; Agreeableness: F (6, 4893) = 3.33, p = 0.003, partial η ² = 0.004; Neuroticism: F (6, 4893) = 2.98, p = 0.007, partial η ² = 0.004; Openness: F (6, 4893) = 2.07, p = 0.053, partial η ² = 0.003). Surprisingly, we found that well-being did not significantly vary across regions, F (6, 4893) = 1.86, p = 0.084, partial η ² = 0.002. It is important to note that the overall difference in openness only reached a marginally significant level with a very small effect size (partial η ² = 0.003), and post hoc comparisons did not reveal clear regional differences. Post hoc comparisons (Tukey) showed that in the conscientiousness dimension, the North China, Northeast, Central China, and Northwest regions scored significantly higher than the Southwest region (mean differences = 0.274, 0.390, 0.284, 0.363, all p < 0.05); in the agreeableness dimension, the Northeast and Northwest regions scored significantly higher than the Southwest region (mean differences = 0.303, 0.260, all p < 0.05), with no other significant differences; in the neuroticism dimension, the Northeast region scored significantly higher than the Southwest region (mean difference = 0.260, p = 0.035), with no other significant differences. In the Kimi-simulated sample, regional differences in neuroticism did not reach a significant level, but regional differences in conscientiousness, extraversion, agreeableness, openness, and well-being were significant (Conscientiousness: F (6, 4893) = 7.36, p < 0.001, partial η ² = 0.009; Extraversion: F (6, 4893) = 5.16, p < 0.001, partial η ² = 0.006; Agreeableness: F (6, 4893) = 8.55, p < 0.001, partial η ² = 0.010; Openness: F (6, 4893) = 6.93, p < 0.001, partial η ² = 0.008; Well-being: F (6, 4893) = 5.76, p < 0.001, partial η ² = 0.007). Post hoc comparisons (Tukey) showed that in the conscientiousness dimension, the North China, Northeast, East China, Central China, South China, and Southwest regions scored significantly higher than the Northwest region (mean differences = 0.613, 0.577, 0.693, 0.693, 0.711, 0.600, all p < 0.05); in the extraversion dimension, the North China, Northeast, East China, Central China, South China, and Southwest regions also scored significantly higher than the Northwest region (mean differences = 0.407, 0.519, 0.481, 0.409, 0.359, 0.387, all p < 0.05); in the agreeableness dimension, the North China, Northeast, East China, Central China, South China, and Southwest regions also scored significantly higher than the Northwest region (mean differences = 0.650, 0.796, 0.691, 0.751, 0.906, 0.630, all p < 0.05); in the openness dimension, the North China, Northeast, East China, Central China, South China, and Southwest regions also scored significantly higher than the Northwest region (mean differences = 0.684, 0.749, 0.654, 0.594, 0.716, 0.589, all p < 0.05); similarly, in terms of well-being, the North China, Northeast, East China, Central China, South China, and Southwest regions also scored significantly higher than the Northwest region (mean differences = 0.230, 0.183, 0.267, 0.250, 0.260, 0.189, all p < 0.05). In addition, we compared the Big Five personality traits and well-being between human samples and GPT-4-simulated samples across different regions. The results showed that the GPT-4-simulated samples scored significantly lower than human samples in conscientiousness only in the South China region ( t = 4.21, p = 0.004, Cohen's d = 0.225) and Southwest region ( t = 5.28, p < 0.001, Cohen's d = 0.282). In extraversion, the GPT-4-simulated samples scored significantly lower than human samples only in the Northwest region ( t = 5.07, p 0.05). In neuroticism, the differences between GPT-4-simulated samples and human samples were also not significant across all regions (all p > 0.05). In openness, the GPT-4-simulated samples scored significantly higher than human samples across all regions (all p < 0.05). In the well-being dimension, human samples scored significantly higher than GPT-4-simulated samples only in the North China region ( t = 7.56, p < 0.001, Cohen's d = 0.404), Northeast region ( t = 6.86, p < 0.001, Cohen's d = 0.367), East China region ( t = 5.60, p < 0.001, Cohen's d = 0.299), and Central China region ( t = 4.69, p < 0.001, Cohen's d = 0.251). We also compared the Big Five personality traits and well-being between human samples and Kimi-simulated samples across different regions in China. The results showed that in conscientiousness, Kimi-simulated samples scored significantly lower than human samples across all regions (all p < 0.05). In extraversion, Kimi-simulated samples scored significantly lower than human samples only in the South China region ( t = 3.99, p = 0.011, Cohen's d = 0.214) and Southwest region ( t = 3.96, p = 0.013, Cohen's d = 0.211). In agreeableness, Kimi-simulated samples were generally lower than human samples, mainly in the North China region ( t = 4.66, p < 0.001, Cohen's d = 0.249), Northeast region ( t = 4.91, p < 0.001, Cohen's d = 0.263), East China region ( t = 4.51, p = 0.001, Cohen's d = 0.241), Central China region ( t = 3.67, p = 0.036, Cohen's d = 0.196), and Northwest region ( t = 11.06, p 0.05). In openness, the Kimi-simulated samples scored significantly higher than human samples across all regions (all p < 0.05). In the well-being dimension, human samples scored significantly higher than Kimi-simulated samples only in the North China region ( t = 5.72, p < 0.001, Cohen's d = 0.306), Northeast region ( t = 5.93, p < 0.001, Cohen's d = 0.317), and East China region ( t = 4.08, p = 0.008, Cohen's d = 0.218). The above results are shown in Fig. 2 and Table 4 . Table 4 Human and LLM samples are based on the mean and standard deviation-M(SD) of the Big Five personality and subjective well-being in the seven regions of China. Region Type Size Age Conscientiousness Extraversion Agreeableness Openness Neuroticism Well-being North China Human 700 38.66(12.42) 11.52(1.89) 9.95(2.16) 11.49(1.77) 9.97(2.57) 8.79(2.22) 7.60(2.08) GPT-4 700 40.88(12.88) 11.26(1.65) 9.91(1.66) 11.63(1.46) 11.84(1.78) 9.00(1.59) 7.03(0.56) Kimi 700 40.18(12.64) 10.78(2.44) 9.92(2.00) 10.98(2.68) 11.03(2.45) 9.01(1.80) 7.17(1.02) Northeast China Human 700 43.27(11.64) 11.30(1.98) 9.73(2.20) 11.66(1.89) 9.46(2.62) 8.82(2.26) 7.57(2.24) GPT-4 700 39.98(12.68) 11.37(1.69) 10.05(1.78) 11.76(1.49) 11.79(1.74) 9.17(1.62) 7.05(0.55) Kimi 700 41.04(12.71) 10.74(2.47) 10.03(1.89) 11.13(2.63) 11.10(2.65) 8.98(1.75) 7.12(1.03) East China Human 700 39.84(12.02) 11.50(1.89) 9.80(2.14) 11.52(1.71) 9.25(2.39) 8.79(2.10) 7.51(1.99) GPT-4 700 40.10(12.72) 11.17(1.69) 9.90(1.67) 11.55(1.51) 11.67(1.78) 8.96(1.60) 7.09(0.56) Kimi 700 40.04(12.56) 10.86(2.35) 10.00(1.95) 11.02(2.70) 11.00(2.59) 8.88(1.70) 7.20(1.03) Central China Human 700 41.38(11.63) 11.54(1.87) 9.88(2.12) 11.48(1.72) 9.37(2.44) 8.68(2.07) 7.37(1.91) GPT-4 700 40.69(12.44) 11.27(1.69) 10.01(1.67) 11.60(1.51) 11.66(1.84) 9.00(1.64) 7.01(0.57) Kimi 700 40.05(12.40) 10.86(2.40) 9.92(1.88) 11.08(2.63) 10.94(2.60) 8.81(1.63) 7.19(1.00) South China Human 700 39.85(12.17) 11.57(1.86) 10.29(1.97) 11.28(1.76) 9.24(2.50) 8.81(2.18) 7.03(2.11) GPT-4 700 40.35(13.07) 11.12(1.68) 9.99(1.73) 11.55(1.53) 11.76(1.75) 8.94(1.52) 7.06(0.59) Kimi 700 40.62(12.46) 10.87(2.41) 9.87(2.04) 11.24(2.52) 11.06(2.49) 8.87(1.69) 7.20(1.01) Southwest China Human 700 38.95(11.73) 11.56(1.80) 10.31(1.97) 11.34(1.64) 9.66(2.35) 9.09(2.04) 6.91(2.43) GPT-4 700 40.60(12.66) 10.98(1.64) 9.79(1.71) 11.46(1.51) 11.67(1.71) 8.91(1.54) 7.07(0.57) Kimi 700 40.53(12.41) 10.76(2.38) 9.90(1.98) 10.96(2.66) 10.94(2.60) 8.87(1.76) 7.13(1.01) Northwest China Human 700 38.81(11.80) 11.65(1.97) 9.83(2.15) 11.54(1.90) 9.92(2.57) 9.14(2.19) 7.10(2.17) GPT-4 700 39.94(12.47) 11.35(1.68) 9.97(1.70) 11.72(1.51) 11.91(1.76) 9.15(1.58) 7.02(0.58) Kimi 700 40.55(12.26) 10.16(2.64) 9.51(2.16) 10.33(2.81) 10.35(2.77) 9.02(2.02) 6.94(1.08) Note. (1) The five personality dimensions (Conscientiousness, Extraversion, Agreeableness, Openness, and Neuroticism) are scored on a 1–5 scale, where higher scores indicate greater levels of conscientiousness, extraversion, agreeableness, openness, and neuroticism respectively (i.e., increased emotional instability). (2) The subjective well-being scale ranges from 1 to 10 points, with higher scores indicating stronger well-being. (3) The DeepSeek simulated sample (AI) refers to demographic profiles generated through the DeepSeek model that match real-world samples in demographic variables (gender, age). In summary, in the real human sample, conscientiousness, extraversion, openness, agreeableness, neuroticism, and well-being all showed significant regional differences; in the GPT-4 (global-trained LLM) simulated sample, conscientiousness, openness, agreeableness, and neuroticism showed significant regional differences, while extraversion and well-being did not show regional differences; in the Kimi (China-trained LLM) simulated sample, conscientiousness, openness, agreeableness, extraversion, and well-being all showed significant regional differences, but neuroticism did not show regional differences. Test of the Relationship Between Personality and Well-being In this study, we conducted multiple linear regression analyses on real human data, Kimi-simulated data, and GPT-4-simulated data to examine the predictive effect of the Big Five personality traits on well-being. The results of the regression analyses are presented in Table 5 , 6 , 7 . In the real human sample, the regression model was significant overall, explaining 4.6% of the variance in well-being ( R ² = 0.046, F (5, 4894) = 47.5, p < 0.001). Among the personality dimensions, conscientiousness ( β = 0.045, p = 0.002), extraversion ( β = 0.059, p < 0.001), agreeableness ( β = 0.089, p < 0.001), and openness ( β = 0.102, p < 0.001) significantly positively predicted well-being, while neuroticism ( β = -0.094, p < 0.001) significantly negatively predicted well-being. In the GPT-4-simulated sample, the regression model was also significant overall, accounting for 15.6% of the variance in well-being ( R ² = 0.156, F (5, 4894) = 181, p < 0.001). Openness ( β = 0.346, p < 0.001) and agreeableness ( β = 0.109, p < 0.001) significantly positively predicted well-being, while neuroticism ( β = -0.184, p < 0.001) significantly negatively predicted well-being. Extraversion ( p = 0.711) and conscientiousness ( p = 0.622) did not show significant predictive effects. This indicates that there are certain structural differences between GPT-4 and real data in the simulation modeling of well-being. In the Kimi-simulated sample, the regression model was also significant overall, explaining 23.8% of the variance in well-being ( R ² = 0.238, F (5, 4894) = 307, p < 0.001). Conscientiousness ( β = 0.145, p < 0.001), extraversion ( β = 0.073, p < 0.001), agreeableness ( β = 0.209, p < 0.001), and openness ( β = 0.194, p < 0.001) significantly positively predicted well-being, while neuroticism ( p = 0.741) did not show a significant predictive effect. This suggests that there are fewer structural differences between Kimi and real data in the simulation modeling of well-being. Table 5 A comparative regression analysis of the five personality dimensions predicting subjective well-being (human samples) Predictor β SE t p Constant 4.861 0.314 15.46 < 0.001 Conscientiousness 0.045 0.017 3.05 0.002 Extraversion 0.059 0.015 4.00 < 0.001 Agreeableness 0.089 0.018 6.14 < 0.001 Openness 0.102 0.013 6.95 < 0.001 Neuroticism -0.094 0.014 -6.62 < 0.001 Note. In the table, β is the standardized regression coefficient, SE is the standard error, t is the statistical significance test of the regression coefficient, and p is the significance level (double-sided test). Table 6 A comparative regression analysis of the five personality dimensions predicting subjective well-being (GPT-4 simulated samples) Predictor β SE t p Constant 5.881 0.065 90.51 < 0.001 Conscientiousness -0.011 0.008 -0.49 0.622 Extraversion 0.007 0.006 0.37 0.711 Agreeableness 0.109 0.008 4.86 < 0.001 Openness 0.346 0.005 21.416 < 0.001 Neuroticism -0.184 0.005 -12.45 < 0.001 Note. Same as Table 5 . Table 7 A comparative regression analysis of the five personality dimensions predicting subjective well-being (Kimi simulated samples) Predictor β SE t p Constant 4.372 0.101 43.28 < 0.001 Conscientiousness 0.145 0.006 9.36 < 0.001 Extraversion 0.073 0.007 5.33 < 0.001 Agreeableness 0.209 0.006 12.95 < 0.001 Openness 0.194 0.006 12.39 < 0.001 Neuroticism 0.004 0.007 0.33 0.741 Note. Same as Table 5 . PCA results of LLMs and human samples To further compare the overall psychological structural differences between LLM-simulated data (GPT-4 and Kimi) and human data, we conducted principal component analysis (PCA) on the six variables in both samples, extracted the first two principal components (PC1 and PC2), and plotted the variable loading plot (see Fig. 3 ). Across the three datasets, the first principal component (PC1) captured the dominant share of variance—76.11% (GPT-4), 58.13% (Kimi), and 46.60% (human)—indicating a shared core axis of variation. In all cases, subjective well-beingshowed the strongest loading on PC1, underscoring its central role in the overall psychological profile. By contrast, the second principal component (PC2) exhibited marked divergence across datasets. AI-generated samples showed lower PC2 contributions—9.31% (GPT-4) and 11.40% (Kimi)—with relatively limited dispersion along this axis, whereas the human sample showed a higher PC2 contribution (16.95%) and broader heterogeneity across both PC1 and PC2. Correspondingly, the geometry of the score distributions progressed from compact (GPT-4) to more diffuse (Kimi) to most dispersed (human). Loadings on PC2 for Extraversion, Openness, and Neuroticism differed in both rank order and direction between LLMs-simulated and human data, suggesting that while large language models recover the primary structural axis, they underrepresent multidimensional variation and the breadth of real-world individual differences. Consistent with these patterns, cumulative variance explained by the first two components decreased from AI to humans: 85.42% (GPT-4), 69.53% (Kimi), and 63.55% (human). Discussion This study used data from 4,900 real respondents of the 2018 China Family Panel Studies (CFPS2018) as a benchmark to compare the performance of Kimi-Chat v1.5 (a large language model trained in China) and GPT-4 (a globally trained large language model) in simulating the Big Five personality traits (Conscientiousness, Extraversion, Agreeableness, Openness, Neuroticism) and subjective well-being across seven macro-regions of China (North China, Northeast China, East China, Central China, South China, Southwest China, Northwest China), and conducted an in-depth analysis of the simulation effectiveness for the Big Five personality traits and subjective well-being. Overall, this study verified the stability and reproducibility of large language models (LLMs) in simulating psychometric dimensions with strong structural properties and clear semantics (e.g., the Big Five personality traits), which was consistent with existing research conclusions on LLM-generated virtual participants (De Winter et al., 2024 ; A. Wang et al., 2025 ). However, the "cultural lineage" of training corpora might have been a key factor influencing simulation accuracy—Kimi-Chat v1.5, trained on Chinese corpora, was superior to the globally trained GPT-4 in reproducing regional differences and structural associations, as it was more aligned with Chinese socio-economic and cultural contexts (Talhelm et al., 2014 ). Although LLMs had shown a certain ability to simulate regional personality characteristics and the distribution of subjective well-being, there were still some biases and limitations when compared with real human data. In terms of the overall ability to reproduce regional psychological differences, LLMs generally captured the overall trends of psychological trait differences across regions at the macro level, but deviated from real data in specific dimensions (e.g., certain personality traits and well-being), reflecting the limitations of model simulation. In the real human sample, all dimensions of the Big Five personality traits and subjective well-being exhibited significant regional differences. Compared with the real human sample, Kimi-Chat v1.5 outperformed GPT-4. In Kimi-Chat v1.5’s simulated data, only Neuroticism did not show significant regional differences; the other four personality dimensions and well-being all exhibited significant regional differences. Post-hoc tests further revealed that its regional ranking (e.g., Agreeableness scores in North China and Northeast China were higher than those in Southwest China) was consistent with the trend in the human sample. In contrast, GPT-4’s simulated data not only showed no significant regional differences in Extraversion, but also failed to detect significant regional differences in subjective well-being, reflecting the "selective capture" limitation of GPT-4 (a globally trained model) in representing China’s regional psychological characteristics. Human well-being was not only influenced by individual personality traits (Anglim et al., 2020 ; Grant et al., 2009 ; Zhai et al., 2013 ), but also closely related to multiple factors such as culture, economy, and society (Oishi et al., 2011 ). However, GPT-4’s training corpora might not have fully covered these regional factors, which limited its ability to simulate well-being in specific regions. Cross-cultural psychology had long pointed out that psychological characteristics were not randomly distributed, but embedded in local cultural and economic contexts (Talhelm et al., 2014 ). For example, people in rice-farming areas of southern China tended to adopt holistic thinking, while those in wheat-farming areas of northern China tended to use analytical thinking. Such differences should have been reflected in the Big Five personality traits and well-being. This study found that Kimi-Chat v1.5 could reproduce these differences (e.g., in the human sample, Extraversion scores in South China and Southwest China were significantly higher than those in North China and Northeast China, and this trend was consistent in Kimi’s simulated data), whereas GPT-4 failed to capture the regional differences in Extraversion and well-being. This study suggested that the reason might lie in the differences in the cultural lineage of the training corpora between the two models. Kimi-Chat v1.5 was centered on Chinese corpora, including long texts such as daily language, legal documents, and academic papers. It also enhanced logical coherence in the Chinese context through hierarchical training, and even included multimodal regional scenario data (e.g., social survey reports and local media narratives from different provinces). This enabled it to more accurately capture psychological differences across Chinese regions. For instance, the Openness trait derived from the multi-ethnic culture in Southwest China showed a consistent score trend between Kimi’s simulated data and the human sample (Openness in Southwest China was higher than that in Northwest China); Chinese corpora accounted for less than 0.1% of GPT-4’s training data, and GPT-4 mainly relied on general corpora such as English Wikipedia and books. This led to its reliance on "pan-cultural stereotypes" (Argyle et al., 2023 ) in understanding China’s regional psychology. For example, it simplified "well-being" into a general "life satisfaction" indicator, overlooking the unique influencing factors in different regions of China (e.g., the role of social support networks in North China and community cohesion in Northeast China in promoting well-being), which might have led to the absence of significant regional differences in its well-being simulation. In contrast, Kimi’s simulation of subjective well-being exhibited significant regional differences. This might have been because the bias in LLMs’ regional well-being simulation was not merely a "model capability issue," but determined by the "coverage and depth" of cultural information in the training corpora - the "localization" of Chinese corpora made Kimi-Chat v1.5 a more suitable virtual participant tool for regional psychological research in China. The findings of our study revealed the impact of training corpus bias on model outputs. Since the Kimi-Chat model was trained on a Chinese corpus, it had a stronger fit in the Chinese cultural context, and performed better in predicting well-being and personality in specific regions such as Northeast China. In contrast, because GPT-4 adopted global corpora, its model outputs had poor adaptability in some regions, especially in regions with complex and diverse cultural backgrounds such as South China and Southwest China. This phenomenon reminded us that when using LLMs for regional psychological research, special attention must be paid to the adaptability of cultural backgrounds; especially in the field of cross-cultural research, the selection of LLMs and the cultural biases contained in their training processes significantly affected the predictive performance of the models. However, the two LLMs still had limitations in simulating these regional personality differences, and they failed to accurately reproduce the strengths or weaknesses of personality traits in some regions. In the simulated data of Kimi-Chat v1.5 and GPT-4, the differences in Extraversion and Openness scores between different regions were reduced (see Fig. 2 ), which was consistent with the conclusion proposed by Serapio-García et al. ( 2023 ) regarding the differences in the difficulty of simulating personality dimensions - i.e., models exhibited lower stability in dimensions such as Extraversion and Openness, which relied on social interaction or experiential exploration (Sorokovikova et al., 2024 ; Trott et al., 2023 ), and could not fully simulate human behavior. Similarly, such biases in the simulation of regional characteristics also reflected the structural limitation of large models in lacking real experiences and interactions with regional cultural and social contexts (Grossmann et al., 2023 ). Overall, LLMs could simulate the overall trend of regional distribution differences in personality traits, but their reproduction of specific regional characteristics was still not detailed enough. The predictive effect of personality dimension on subjective well-being by LLMs and human Could the correlation patterns between personality and well-being simulated by LLMs be consistent with those of real human populations? In the structural simulation of the personality-well-being association, Kimi-Chat v1.5 was more aligned with real human data. In the human sample, Conscientiousness, Extraversion, Agreeableness, and Openness significantly positively predicted well-being, while Neuroticism significantly negatively predicted well-being. For Kimi-Chat v1.5, only the predictive effect of Neuroticism was not significant; the predictive directions of the other four dimensions were completely consistent with those of humans, and its explanatory power was higher. In contrast, GPT-4’s predictive strength for Openness was far higher than that in the human sample, with its explanatory power falling between Kimi-Chat v1.5 and the human sample, indicating a strong structural bias. However, GPT-4’s simulated data failed to replicate the significant positive effects of Conscientiousness and Extraversion on well-being observed in real data (Anglim et al., 2020 ). Specifically, GPT-4 simulated the effects of certain personality dimensions (e.g., Agreeableness, Neuroticism) on well-being relatively well, but its simulation of other dimensions was less accurate (see Table 6 ). For example, the positive effect of Openness on well-being was overestimated in the simulated sample, while Conscientiousness and Extraversion had no significant effects on well-being in the simulated sample. On the one hand, this bias might have stemmed from LLMs’ reliance on explicit, emotional traits during training and response generation, making it difficult to reconstruct the actual impacts of implicit, long-term stable personality traits on well-being. On the other hand, it was also related to the lack of real interactions and subjective experiences in large models, which hindered their ability to accurately simulate well-being and personality variables involving subjective feelings (e.g., Extraversion) (Grossmann et al., 2023 ). The phenomenon of structural bias observed in this study was consistent with recent empirical findings that when large models are used as virtual participants, they may cause identity flattening and error amplification in the simulation of group psychological structures (A. Wang et al., 2025 ). Matz et al. ( 2016 ) pointed out that the association between personality and well-being is moderated by the “congruence between personality and life choices”, and this moderating effect varies across different cultural contexts. This study suggested that the differences between Kimi-Chat v1.5 and GPT-4 in the personality-well-being association might reflect their varying abilities to capture culture-specific associations in Chinese culture. In the human sample, the positive prediction of well-being by Conscientiousness (e.g., being rigorous and planned) and Extraversion (e.g., being socially active) reflected the importance of “fulfilling responsibilities” (e.g., family responsibilities, work commitment) and “social connection” (e.g., interactions with relatives and friends) in the Chinese context. Since Kimi-Chat v1.5’s training corpus included a large number of life narratives of Chinese people (e.g., social media posts, interview records), it could partially capture this association; only the predictive effect of Neuroticism (emotional instability) was absent, which might have been due to fewer expressions of the negative correlation between “Neuroticism and well-being” in the Chinese corpus, leading to insufficient learning by the model. GPT-4 lacked the predictive effects of Conscientiousness and Extraversion because the “personality-well-being” associations in its general corpus were mostly based on Western samples (e.g., the association between Extraversion and personal achievement in Western culture), rather than the associations between “Conscientiousness and family stability” and “Extraversion and interpersonal harmony” in Chinese culture. For example, GPT-4 might have simplified “Conscientiousness” to “task completion ability” instead of “a sense of responsibility towards family and work” in the Chinese context, resulting in its inability to predict well-being. In addition, the predictive strength of Openness in GPT-4 (see Table 6 ) was far higher than that in the human sample. This might have been because the association between “Openness and well-being” in its training corpus mostly came from general expressions such as “happiness from exploring new things,” while ignoring the emphasis on “traditional values” in some regions of China (e.g., Northwest China). Excessively high Openness might instead conflict with local cultural expectations and reduce well-being, but GPT-4 failed to capture this culture-specific “boundary condition.” The overall psychological structure of LLMs and human This study compared the "Big Five personality-well-being" psychological structures of real human samples from China and simulated samples of large language models (GPT-4, Kimi-Chat v1.5) through Principal Component Analysis (PCA). It found that all three types of samples exhibited the commonality that "well-being was the core loading variable of the first principal component (PC1)", indicating that large language models could capture the general regularity that "well-being is a core element of human psychology" through text training. However, significant cultural and model differences existed in the "correlation logic between Big Five personality dimensions and well-being". In the real human sample, PC1 presented a "trade-off relationship between well-being, Conscientiousness, and Agreeableness" - a characteristic closely related to the collectivist orientation and responsibility ethics of Chinese culture. In the context of Chinese social culture, Conscientiousness (e.g., emphasis on family responsibilities and work rigor) and Agreeableness (e.g., pursuit of interpersonal harmony) are important personality traits for individuals to integrate into society. Nevertheless, excessive investment in these traits is often accompanied by "suppression of personal needs". For instance, individuals who took on excessive family care responsibilities (high Conscientiousness) or overly accommodated the needs of others (high Agreeableness) would experience consumption of psychological resources, which in turn weakened their well-being. This "dynamic balance between social adaptation and personal well-being" is a typical psychological pattern formed by the Chinese population in long-term family interactions and social role practice. As a model developed locally in China, Kimi-Chat v1.5’s training corpus included a large number of Chinese texts and psychological descriptions in the Chinese cultural context (e.g., discussions in Chinese communities on the relationships between "diligence and responsibility", "kindness", and "happiness"). Therefore, although the PC1 simulated by Kimi-Chat v1.5 did not fully replicate the "trade-off mechanism" of the human sample, it could still capture the "negative correlation between well-being and Conscientiousness", avoiding complete separation between dimensions. In contrast, GPT-4’s training corpus mainly consisted of English texts and global general content, and it lacked in-depth acquisition of the psychological logic of "responsibility first" and "harmony as priority" in Chinese culture - it could neither understand the socio-cultural roots of "the Chinese population sacrificing well-being due to excessive Conscientiousness" nor access sufficient empirical psychological descriptions of the Chinese population. Ultimately, this led to PC1 of GPT-4 exhibiting the characteristic of "single dominance of well-being", where the influences of personality dimensions such as Conscientiousness and Agreeableness were greatly compressed, completely deviating from the psychological reality of the Chinese population. In terms of the multidimensional heterogeneity of psychological structures, both models (Kimi-Chat v1.5 and GPT-4) had "simplification bias". The psychological structure of real humans is multidimensional and heterogeneous—even within the same region, individuals with different ages, occupations, and educational backgrounds have different correlation patterns between personality and well-being (e.g., Openness has a stronger impact on well-being in young people, while Conscientiousness has a stronger impact in middle-aged people). However, the PCA results of this study showed that the two LLMs significantly compressed the multidimensionality of the psychological structure. For example, PC2 of the human sample contributed 16.95%, and the loading directions of Extraversion, Openness, and Neuroticism on PC2 were different from those on PC1 (e.g., the loading of well-being on PC1 was positive, while the loading of Extraversion on PC2 was positive and that of Neuroticism was negative), reflecting the independent role of the secondary dimension of "social tendency-emotional stability" beyond the "core well-being". In contrast, the contribution of PC2 in Kimi-Chat v1.5 and GPT-4 decreased significantly, and the order of variable loadings was disordered (e.g., the loading of Openness on PC2 in GPT-4 was opposite to that in humans). This indicated that although LLMs could capture the core axis of the psychological structure (e.g., the dominant effect of well-being on PC1), they significantly simplified the multidimensional differences in the real human psychological structure, over-relying on "average trends" and ignoring the inter-individual heterogeneity of the human sample. This "simplification bias" might have been due to LLMs’ tendency to generate data based on "the most common patterns" (Argyle et al., 2023 ). For example, when simulating the North China region, LLMs over-relied on the average trend that "people in North China are extraverted and have high well-being", while ignoring the well-being differences between "urban residents and rural residents" in North China. Even though Kimi-Chat v1.5 had a smaller bias, it still could not completely avoid this "averaging" tendency, resulting in the heterogeneity of its simulated data being lower than that of the real human sample. By comparing the psychological structure simulations of the two LLMs, this study found that in terms of the principal component explanatory rate, the PC1 explanatory rate of GPT-4 (76.11%) was significantly higher than that of Kimi-Chat v1.5 (58.13%) and the human sample (46.6%), while its PC2 explanatory rate (9.31%) was much lower than that of Kimi-Chat v1.5 (11.4%) and the human sample (16.95%). This difference essentially stemmed from GPT-4’s lack of awareness of "the psychological diversity of the Chinese population"—the well-being of the Chinese population is driven not only by personal subjective experience but also closely related to "personality expressions that conform to cultural expectations" (e.g., conscientiousness, kindness), presenting the characteristic of "multi-dimensional interactive influence". However, due to the lack of support from Chinese cultural texts, GPT-4 could only simplify the psychological structure into a "single dimension of well-being", losing the pluralistic balance of human psychology. Although Kimi-Chat v1.5 also had dimension compression, its PC1 explanatory rate was closer to that of the human sample because the local corpus included descriptions of "multiple factors influencing well-being" (e.g., discussions in Chinese literature on the relationships between personality, social support, and well-being), retaining more potential space for multi-dimensional interaction. In terms of intra-personality correlations, PC2 of the human sample clearly revealed an "opposing relationship between Neuroticism and Openness"—this correlation also implied Chinese cultural specificity. Evaluations of "acceptance of new things" in Chinese society are often linked to "emotional stability" (e.g., when facing social changes, individuals with stable emotions are more likely to adapt to new environments and embrace new ideas). Since Kimi-Chat v1.5’s local corpus included personality descriptions in such cultural contexts, its PC2 could still weakly capture the "correlation between Neuroticism and Openness". In contrast, GPT-4 lacked texts on personality interactions in Chinese culture, so its PC2 had almost no effective variable correlations, completely losing the classic personality relationship of "emotion-cognition" in the human sample. This further confirmed the constraint of cultural context on the psychological simulation of models. Limitations, Application and Future Directions This study innovatively introduced large language models (Kimi-Chat v1.5, trained in China, and GPT-4, trained globally) to investigate regional psychological structures in China. Without inputting or guiding any raw data from CFPS 2018, the study found that Kimi-Chat v1.5 exhibited a high degree of trend consistency with human participants when simulating group personality traits and subjective well-being (e.g., across the seven macro-regions, the directions of regional differences in Conscientiousness, Extraversion, Agreeableness, Openness, and well-being were completely consistent with those of humans, with only Neuroticism showing no significant regional differences). In contrast, although GPT-4 could replicate regional trends for some personality dimensions (e.g., Conscientiousness, Agreeableness), it exhibited obvious deviations from human trends in core indicators (e.g., no significant regional differences in subjective well-being, disordered regional ranking of Extraversion, and only marginally significant regional differences in Openness). This study not only provided a clear prospect for using "virtual participants" to assist regional psychological research in China but also put forward methodological insights based on "differences in the cultural lineage of model training corpora" for the application of this method in cross-cultural psychological research. First, LLM (e.g., Kimi-Chat v1.5) more accurately captured China’s regional cultural and psychological characteristics, such as the strong collective consciousness in North China and the high social activity in South China. This made Kimi-Chat a better tool for simulating regional trends in questionnaire responses before formal surveys, allowing researchers to identify potential issues in questionnaire design. For example, if the Openness scores in Northwest China were significantly lower than expected, it could indicate cultural misalignment in survey items, such as overemphasizing "new things" in urban contexts. This allows for improved tool development. In contrast, GPT-4, trained on a global corpus, has limited ability to address these biases due to its small representation of Chinese data, making it more suited for cross-cultural comparison than as a primary tool for pre-survey testing. Second, LLM can be used as a virtual participant to simulate responses under various experimental conditions (e.g., different question orders and regional prompts) before launching large-scale cross-regional psychological research. This pre-experimentation saves time and costs while helping refine hypotheses. For instance, adjusting Conscientiousness items and observing changes in the relationship with well-being could predict response patterns for Chinese populations that value family responsibilities. While GPT-4 can also be used for similar simulations, its biases in variable associations (e.g., lack of predictive effect for Conscientiousness on well-being) require cross-validation with Kimi-Chat to ensure accuracy. Finally, LLM can address sample acquisition challenges, such as limited samples in remote regions or ethical restrictions on sensitive variables like Neuroticism. It can generate simulated data reflecting regional demographics (e.g., age, gender, GDP level) to supplement real data, offering complete coverage across China’s regions. GPT-4’s simulated data showed significant regional deviations and could only serve as a preliminary supplement. Regardless of the model, it’s essential to compare simulated data with real data (e.g., CFPS 2018) to ensure consistency and reliability, preventing biases from distorting conclusions (e.g., Kimi-Chat’s underestimation of Conscientiousness). An important significance of this study was that it verified the practicality of Chinese LLMs in regional psychological research - specifically, Kimi-Chat v1.5 could relatively accurately replicate the distribution of the Big Five personality traits and well-being across China’s seven macro-regions. Furthermore, the study argued that the method of using LLMs to generate simulated data could avoid social desirability bias inherent in traditional human-based surveys (Tourangeau & Yan, 2007 ). For example, in real surveys, participants might underestimate their Neuroticism scores because they "were unwilling to admit emotional instability," but LLM simulations had no such concerns, providing a new approach for researching "sensitive psychological dimensions." This study provided a transferable framework for LLM-based psychological simulation. The "real data benchmark -- multi-model comparison -- verification" process of this study could be extended to LLM-based psychological research in other cultural contexts (e.g., comparing the regional psychological simulation capabilities of local LLMs in Japan and the United States), offering methodological references for "culturally adapted LLM evaluation" globally. However, the study argued that LLMs had certain limitations in cross-cultural adaptability. Although LLMs could simulate personality traits and well-being distributions across different regions, their performance was constrained by training corpora and cultural backgrounds. Therefore, future research should focus on improving the cross-cultural adaptability of LLMs, especially for regions with significant cultural differences. For instance, more localized corpora could be introduced and training methods optimized to enhance the models’ predictive capabilities in different cultural environments. Additionally, with the continuous advancement of LLMs, future research could further improve their simulation performance by integrating multiple models, cross-disciplinary knowledge, and multi-dimensional data - particularly in fine-grained psychological dimensions and complex regional cultural contexts. Despite some limitations in simulating regional psychological structures, LLMs still demonstrated great potential in practical applications. For example, LLMs could provide efficient simulation tools for psychological research, especially in large-scale data collection and processing, significantly reducing costs and mitigating social desirability bias in traditional survey methods. Furthermore, the popularization of LLMs could promote interdisciplinary research, facilitating the integration of psychology, sociology, and artificial intelligence. However, in practice, we must carefully consider the cultural adaptability and potential biases of models to ensure their effectiveness and fairness in different contexts. Overall, as large language models, Kimi-Chat v1.5 and GPT-4 showed certain potential in simulating the regional distribution of the Big Five personality traits and well-being, but they also had obvious limitations. In terms of strengths, the demographic characteristics of the samples simulated by Kimi-Chat v1.5 and GPT-4 were similar to those of real samples, and they achieved initial success in revealing the overall patterns of psychological differences across regions - supporting the feasibility and efficiency of using virtual participants in large-scale psychological research. However, this study still had several limitations in research design, data sources, simulation strategies, and result interpretation. First, our study adopted a cross-sectional design in which virtual participants were generated by large language models and their simulated results were compared with real data. This design was inherently correlational, making it impossible to establish causal inference. Moreover, by focusing only on the overall regional level, it might have overlooked intra-regional individual differences and dynamic changes in psychological states. Second, the pre-training corpora of the models lacked sufficient information on the cultural ecology and socio-economic background of different regions, which limited their sensitivity to regional psychological differences. At the same time, the real survey data used might have had sampling bias and insufficient representativeness, and these factors could have affected the reliability of the comparison results. Third, large language models such as Kimi-Chat v1.5 and GPT-4 had limited ability to handle complex emotional factors and specific cultural contexts, leading to inaccurate simulation of subjective psychological experiences such as well-being. The distribution of psychological traits generated by the models could not fully replicate the richness and diversity of real human populations. Additionally, in terms of result interpretation, although the simulated data of the models showed consistency with real data in several aspects, this did not directly indicate that the models had truly replicated human psychological mechanisms. Such similarity might have partially originated from existing patterns or biases in the models’ training corpora. Therefore, caution was required when interpreting the results of this study, and the models could not yet fully replace real surveys (Dillion et al., 2023 ; Harding et al., 2024 ). Finally, another important limitation was that using large language models to generate virtual participants carried the risk of multidimensional stereotypes and psychological structure bias. The training corpora of Kimi-Chat v1.5 and GPT-4 mainly came from public online texts, which often contained inherent stereotypes about the economic development, cultural atmosphere, or social characteristics of specific regions (Lucy & Bamman, 2021 ). This caused the models to unconsciously amplify these stereotypes during the simulation of regional psychological characteristics, leading to expanded biases in certain regional traits (Argyle et al., 2023 ). For example, this study found that the well-being scores of Northeast China were significantly underestimated by the models (see Fig. 2 ), which might have been related to the reinforcement of negative narratives about Northeast China (e.g., economic decline, population outflow) in the corpora. In contrast, several personality and well-being scores in East China were higher than those in other regions (see Fig. 2 ), which might have originated from excessive positive stereotypes in the models’ training data. Furthermore, the models also exhibited "visibility bias" in simulating psychological structures - they were more likely to capture and amplify explicit, easily expressible psychological traits (e.g., Neuroticism, Openness) but showed poor performance in restoring implicit, long-term stable personality traits such as Conscientiousness. This triple bias (economic-cultural-psychological) might have led to systematic errors in the LLMs’ modeling of regional psychological structures (A. Wang et al., 2025 ). Future research could improve Kimi-Chat v1.5, GPT-4, and similar models in the following aspects: First, enrich and diversify the LLMs’ training corpora by adding text data on the cultural ecology and socio-economic background of different regions (Demszky et al., 2023 ) to enhance the LLMs’ sensitivity to regional differences. Second, improve the LLMs’ ability to simulate complex emotions and social interactions - for example, by introducing more refined affective computing models or longitudinal data on human emotions to enhance the accuracy of capturing well-being and emotional changes. Third, design "region-specific prompts" (e.g., adding "You are a 35-year-old woman living in rural Sichuan, familiar with local farming culture") to test whether they could improve the LLMs’ simulation of intra-regional heterogeneity. At the same time, incorporate "personality sub-dimensions" (e.g., "orderliness" and "sense of responsibility" under Conscientiousness) and "specific dimensions of well-being" (e.g., life satisfaction, emotional experience) to more accurately evaluate the ability of LLMs to simulate regional psychological structures. Fourth, leverage the multimodal advantages of recent LLMs or AI (e.g., text-image interleaved data) by incorporating region-specific image information (e.g., ice and snow landscapes in Northeast China, Lingnan architecture in South China) to observe whether multimodal input could strengthen the LLMs’ understanding of regional psychology. Meanwhile, use multi-year data (e.g., 2014, 2018, 2022) to test whether LLMs could simulate temporal changes in regional psychological traits (e.g., the long-term impact of economic development on well-being). Finally, conduct cross-cultural comparative studies, combining sample data from different countries and regions to test the applicability of large language models in different cultural contexts, thereby further verifying and expanding their psychological simulation capabilities. In addition, methodologically, further explore the extensibility of large language models (Kimi-Chat v1.5 or GPT-4) in simulating group psychological data. Could their simulation performance in personality and well-being be extended to other variables in psychological processes? Which types of psychological variables could not be simulated - these questions deserved further investigation. In summary, this study introduced Kimi-Chat v1.5 and GPT-4 to simulate psychological structures across different regions in China, preliminarily verifying the application prospect of large language models as "virtual participants" and revealing their current limitations. This finding provided valuable reflections for the methods and theories of psychological research. On the one hand, large language models were expected to become innovative tools for reducing the cost of large-sample surveys and avoiding social desirability bias in traditional surveys; on the other hand, their shortcomings in simulating complex human psychology and cultural backgrounds must be acknowledged (Grossmann et al., 2023 ). With the continuous enrichment of model training data and improvements in algorithms, the ability of large language models to simulate regional psychological differences and complex personality traits was expected to further improve. It was anticipated that future research would make more progress in inter-model comparison, cultural localization, and cross-cultural personality simulation, bringing new changes to psychological research methods and thus providing stronger support for cross-cultural research in personality and social psychology. Conclusion This study presented a comparative evaluation of two large language models (LLMs) - Kimi-Chat v1.5, a China-trained model, and GPT-4, a globally trained model - in simulating regional psychological structures across China. Our findings revealed that Kimi-Chat v1.5, trained on Chinese corpora, more effectively captured the regional variations in Big-Five personality traits and subjective well-being compared to GPT-4. This might be due to Kimi's alignment with the cultural and socio-economic contexts specific to China. Despite the successes, both models exhibited limitations, including an inability to fully replicate the complexity of human psychological experiences. This highlighted the significance of model training data in influencing simulation accuracy and underlined the need for culturally sensitive adaptations of LLMs for regional psychological research. Our study demonstrated the potential of LLMs in regional psychological research, offering a low-cost and scalable alternative to traditional surveys. This study provided the first cultural-calibration benchmark for virtual-participant tools in regional psychological research within a single culture (e.g., Chinese culture). Moreover, this benchmark offers methodological insights for future regional psychological research, as it highlights the necessity of using culture-adapted LLMs for simulating psychological traits in different cultural contexts. Additionally, it provides a low-cost, replicable alternative to traditional surveys prone to social-desirability bias. Declarations Consent to Participate All human participants involved in the China Family Panel Studies (CFPS2018) provided written informed consent prior to the original data collection. The CFPS project team explicitly informed participants of the study’s purpose, the scope of data usage, confidentiality protection measures, and their right to withdraw from the survey at any time without adverse consequences. For any minor participants (if applicable) in the original CFPS, written informed consent was additionally obtained from their legal guardians. This study uses de-identified secondary data from CFPS2018; the data have been processed to remove all personal identifying information (e.g., names, ID numbers, specific addresses) to protect participant privacy. According to the data usage regulations of the CFPS project, secondary analysis of fully anonymized data does not require additional informed consent from individual participants. Ethics Approval This study was conducted in accordance with the Declaration of Helsinki (World Medical Association, 2013). The original China Family Panel Studies (CFPS2018) project, whose data were used in this study, was approved by the Internal Review Board (IRB) of the Institute of Social Science Survey (ISSS), Peking University (Approval No.: ISSS-IRB-2018-002). The secondary analysis of CFPS2018 de-identified data in this study was reviewed and deemed compliant with ethical guidelines by the Data Access Committee of ISSS, Peking University, and no additional IRB approval was required due to the non-identifiable nature of the data and the absence of direct interaction with human participants. This article does not contain any studies with human participants performed by any of the authors. Conflicts of interest The authors have no relevant financial or non-financial interests to disclose. Availability of data The data will be publicly shared upon publication. The complete dataset will be available via the Open Science Framework. Code availability The scripts for generating stimuli and performing data analyses of this article are available from the corresponding author via email. Authors' contributions XZ conceived, designed the experiment, and wrote the manuscript. XZ collected the data, drew the graph and analyzed the data. XZ and JH contributed to the development of the research problems planning, execution and study design, provided guidance for the study setup. All authors revised the manuscript. XZ provided technical guidance and support. Acknowledgments Data collection, analysis and draft writing were completed by XZ. XZ and JH contributed to the revision of the draft. This work was no funding. References Anglim J, Horwood S, Smillie LD, Marrero RJ, Wood JK (2020) Predicting psychological and subjective well-being from personality: A meta-analysis. Psychol Bull 146(4):279–323. https://doi.org/10.1037/bul0000226 Argyle LP, Busby EC, Fulda N, Gubler JR, Rytting C, Wingate D (2023) Out of One, Many: Using Language Models to Simulate Human Samples. Political Anal 31(3):337–351. https://doi.org/10.1017/pan.2023.2 Baumeister RF, Vohs KD, Funder DC (2007) Psychology as the Science of Self-Reports and Finger Movements: Whatever Happened to Actual Behavior? Perspect Psychol Sci 2(4):396–403. https://doi.org/10.1111/j.1745-6916.2007.00051.x Bisbee J, Clinton JD, Dorff C, Kenkel B, Larson JM (2024) Synthetic Replacements for Human Survey Data? The Perils of Large Language Models. Political Anal 32(4):401–416. https://doi.org/10.1017/pan.2024.5 Bogg T, Roberts BW (2004) Conscientiousness and Health-Related Behaviors: A Meta-Analysis of the Leading Behavioral Contributors to Mortality. Psychol Bull 130(6):887–919. https://doi.org/10.1037/0033-2909.130.6.887 Bubeck S, Chandrasekaran V, Eldan R, Gehrke J, Horvitz E, Kamar E, Lee P, Lee YT, Li Y, Lundberg S, Nori H, Palangi H, Ribeiro MT, Zhang Y (2023) Sparks of Artificial General Intelligence: Early experiments with GPT-4 (No. arXiv:2303.12712). arXiv. https://doi.org/10.48550/arXiv.2303.12712 De Winter JCF, Driessen T, Dodou D (2024) The use of ChatGPT for personality research: Administering questionnaires using generated personas. Pers Indiv Differ 228:112729. https://doi.org/10.1016/j.paid.2024.112729 Demszky D, Yang D, Yeager DS, Bryan CJ, Clapper M, Chandhok S, Eichstaedt JC, Hecht C, Jamieson J, Johnson M, Jones M, Krettek-Cobb D, Lai L, JonesMitchell N, Ong DC, Dweck CS, Gross JJ, Pennebaker JW (2023) Using large language models in psychology. Nat Reviews Psychol. https://doi.org/10.1038/s44159-023-00241-5 Dillion D, Tandon N, Gu Y, Gray K (2023) Can AI language models replace human participants? Trends Cogn Sci 27(7):597–600. https://doi.org/10.1016/j.tics.2023.04.008 Grant S, Langan-Fox J, Anglim J (2009) The Big Five Traits as Predictors of Subjective and Psychological Well-Being. Psychol Rep 105(1):205–231. https://doi.org/10.2466/PR0.105.1.205-231 Grossmann I, Feinberg M, Parker DC, Christakis NA, Tetlock PE, Cunningham WA (2023) AI and the transformation of social science research. Science 380(6650):1108–1109. https://doi.org/10.1126/science.adi1778 Hahn E, Gottschling J, Spinath FM (2012) Short measurements of personality – Validity and reliability of the GSOEP Big Five Inventory (BFI-S). J Res Pers 46(3):355–359. https://doi.org/10.1016/j.jrp.2012.03.008 Harding J, D’Alessandro W, Laskowski NG, Long R (2024) AI language models cannot replace human research participants. AI Soc 39(5):2603–2605. https://doi.org/10.1007/s00146-023-01725-x Ke L, Tong S, Cheng P, Peng K (2025) Exploring the frontiers of LLMs in psychological applications: A comprehensive review. Artif Intell Rev 58(10):305. https://doi.org/10.1007/s10462-025-11297-5 Kovač G, Sawayama M, Portelas R, Colas C, Dominey PF, Oudeyer P-Y (2023) Large Language Models as Superpositions of Cultural Perspectives (No. arXiv:2307.07870). arXiv. https://doi.org/10.48550/arXiv.2307.07870 Lucy L, Bamman D (2021) Gender and Representation Bias in GPT-3 Generated Stories. Proceedings of the Third Workshop on Narrative Understanding , 48–55. https://doi.org/10.18653/v1/2021.nuse-1.5 Matz SC, Gladstone JJ, Stillwell D (2016) Money Buys Happiness When Spending Fits Our Personality. Psychol Sci 27(5):715–725. https://doi.org/10.1177/0956797616635200 McCrae RR, John OP (1992) An Introduction to the Five-Factor Model and Its Applications. J Pers 60(2):175–215. https://doi.org/10.1111/j.1467-6494.1992.tb00970.x Mei Q, Xie Y, Yuan W, Jackson MO (2024) A Turing test of whether AI chatbots are behaviorally similar to humans. Proceedings of the National Academy of Sciences , 121 (9), e2313925121. https://doi.org/10.1073/pnas.2313925121 Oishi S, Kesebir S, Diener E (2011) Income Inequality and Happiness. Psychol Sci 22(9):1095–1100. https://doi.org/10.1177/0956797611417262 Paunonen SV, Ashton MC (2001) Big Five factors and facets and the prediction of behavior. J Personal Soc Psychol 81(3):524–539. https://doi.org/10.1037/0022-3514.81.3.524 Rathje S, Mirea D-M, Sucholutsky I, Marjieh R, Robertson CE, Van Bavel JJ (2024) GPT is an effective tool for multilingual psychological text analysis. Proceedings of the National Academy of Sciences , 121 (34), e2308950121. https://doi.org/10.1073/pnas.2308950121 Sarstedt M, Adler SJ, Rau L, Schmitt B (2024) Using large language models to generate silicon samples in consumer and marketing research: Challenges, opportunities, and guidelines. Psychol Mark 41(6):1254–1270. https://doi.org/10.1002/mar.21982 Schimmack U, Diener E, Oishi S (2002) Life-Satisfaction Is a Momentary Judgment and a Stable Personality Characteristic: The Use of Chronically Accessible and Stable Sources. J Pers 70(3):345–384. https://doi.org/10.1111/1467-6494.05008 Schimmack U, Oishi S, Furr RM, Funder DC (2004) Personality and Life Satisfaction: A Facet-Level Analysis. Pers Soc Psychol Bull 30(8):1062–1075. https://doi.org/10.1177/0146167204264292 Serapio-García G, Safdari M, Crepy C, Sun L, Fitz S, Abdulhai M, Faust A, Matarić M (2023) Personality Traits in Large Language Models . In Review. https://doi.org/10.21203/rs.3.rs-3296728/v1 Sorokovikova A, Fedorova N, Rezagholi S, Yamshchikov IP (2024) LLMs Simulate Big Five Personality Traits: Further Evidence (No. arXiv:2402.01765). arXiv. https://doi.org/10.48550/arXiv.2402.01765 Steel P, Schmidt J, Shultz J (2008) Refining the relationship between personality and subjective well-being. Psychol Bull 134(1):138–161. https://doi.org/10.1037/0033-2909.134.1.138 Strachan JWA, Albergo D, Borghini G, Pansardi O, Scaliti E, Gupta S, Saxena K, Rufo A, Panzeri S, Manzi G, Graziano MSA, Becchio C (2024) Testing theory of mind in large language models and humans. Nat Hum Behav 8(7):1285–1295. https://doi.org/10.1038/s41562-024-01882-z Talhelm T, Zhang X, Oishi S, Shimin C, Duan D, Lan X, Kitayama S (2014) Large-Scale Psychological Differences Within China Explained by Rice Versus Wheat Agriculture. Science 344(6184):603–608. https://doi.org/10.1126/science.1246850 Tourangeau R, Yan T (2007) Sensitive questions in surveys. Psychol Bull 133(5):859–883. https://doi.org/10.1037/0033-2909.133.5.859 Trott S, Jones C, Chang T, Michaelov J, Bergen B (2023) Do Large Language Models Know What Humans Know? Cogn Sci 47(7):e13309. https://doi.org/10.1111/cogs.13309 Wang A, Morgenstern J, Dickerson JP (2025) Large language models that replace human participants can harmfully misportray and flatten identity groups. Nat Mach Intell 7(3):400–411. https://doi.org/10.1038/s42256-025-00986-z Wang J, Liu C, Cai Z (2022) Digital literacy and subjective happiness of low-income groups: Evidence from rural China. Front Psychol 13:1045187. https://doi.org/10.3389/fpsyg.2022.1045187 Zhai Q, Willis M, O’Shea B, Zhai Y, Yang Y (2013) Big Five personality traits, job satisfaction and subjective wellbeing in China. Int J Psychol 48(6):1099–1108. https://doi.org/10.1080/00207594.2012.732700 Additional Declarations No competing interests reported. Cite Share Download PDF Status: Under Review Version 1 posted Editorial decision: Revision requested 25 Feb, 2026 Reviews received at journal 06 Jan, 2026 Reviews received at journal 05 Jan, 2026 Reviewers agreed at journal 11 Dec, 2025 Reviewers agreed at journal 11 Dec, 2025 Reviewers agreed at journal 09 Dec, 2025 Reviews received at journal 25 Nov, 2025 Reviewers agreed at journal 02 Nov, 2025 Reviewers agreed at journal 29 Oct, 2025 Reviewers invited by journal 29 Oct, 2025 Editor assigned by journal 29 Oct, 2025 Editor invited by journal 24 Oct, 2025 Submission checks completed at journal 08 Oct, 2025 First submitted to journal 08 Oct, 2025 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-7665724","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":541830785,"identity":"1498b9d6-48b0-4773-a957-7b81f3e53ce5","order_by":0,"name":"Xing Zhou","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABCElEQVRIie3QMUvEMBTA8VcCcQnWMeHAz/CgcCAe9askBDoewi1u9hDqpmsFP0Qn50igLnEPOHgi3HRDQXTq4PUGwSH1Rof8hwyP/HjwAGKxf1hKiF1J5McpWPMz5WNEXFcFduezTCxbuR9B51DUXaEa63A/Al5ixtAmTes+G+jzORry+MIgn4dEUkv5viVEuOcHn1R6gYbqUwZ6ESKESzNsoYd+IKVRjWHTCQOjygChXJWTLWHwull76AeSfo0SxiyIGgt+ZBz1QHdb6CjhBxXFDmcoynbqVaXVnaXZyT3qIDmz6cdK9vzyFuzad32ubp6u3vzmIg+S38ndRYYHR//FYrFY7I++AVaEYahvwvcBAAAAAElFTkSuQmCC","orcid":"","institution":"South China Normal University","correspondingAuthor":true,"prefix":"","firstName":"Xing","middleName":"","lastName":"Zhou","suffix":""},{"id":541830786,"identity":"72a8554c-8a8e-4fa7-9c18-c1fca140ef3d","order_by":1,"name":"Jiahong Zheng","email":"","orcid":"","institution":"Guangzhou University","correspondingAuthor":false,"prefix":"","firstName":"Jiahong","middleName":"","lastName":"Zheng","suffix":""}],"badges":[],"createdAt":"2025-09-20 18:23:17","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-7665724/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-7665724/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":95538230,"identity":"379df210-9384-499f-a9cc-0b8ea0631641","added_by":"auto","created_at":"2025-11-10 11:03:36","extension":"docx","order_by":0,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":359431,"visible":true,"origin":"","legend":"","description":"","filename":"OriginalDraft.docx","url":"https://assets-eu.researchsquare.com/files/rs-7665724/v1/127241f1b532127b3af3d5a9.docx"},{"id":95538233,"identity":"a7a55b86-9cb5-42b4-95d1-92cbf378cf63","added_by":"auto","created_at":"2025-11-10 11:03:36","extension":"json","order_by":1,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":4112,"visible":true,"origin":"","legend":"","description":"","filename":"ca885ac509e7410897a69b4b58b16961.json","url":"https://assets-eu.researchsquare.com/files/rs-7665724/v1/7af3200edf351b248470f949.json"},{"id":95538229,"identity":"59467d7e-5be4-4a9c-b786-cf64031b1c7f","added_by":"auto","created_at":"2025-11-10 11:03:36","extension":"xml","order_by":2,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":196289,"visible":true,"origin":"","legend":"","description":"","filename":"ca885ac509e7410897a69b4b58b169611enriched.xml","url":"https://assets-eu.researchsquare.com/files/rs-7665724/v1/bef9f022f655e16b29322316.xml"},{"id":95538226,"identity":"74d983b4-8df0-4b01-97bf-cf6fb51955d4","added_by":"auto","created_at":"2025-11-10 11:03:36","extension":"png","order_by":6,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":18407,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-7665724/v1/fa66811146a8a7903e2e60e5.png"},{"id":95653989,"identity":"b94347cf-d328-4f27-bd36-7741aeb16d06","added_by":"auto","created_at":"2025-11-11 16:07:40","extension":"png","order_by":7,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":44293,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-7665724/v1/067a5e58b09afeb3f35b14fb.png"},{"id":95654395,"identity":"5ca4e523-8e44-4ac6-a0dc-4a2680de6594","added_by":"auto","created_at":"2025-11-11 16:11:35","extension":"png","order_by":8,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":16078,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-7665724/v1/907520018b0a71094baaba4e.png"},{"id":95654076,"identity":"68ab6594-1817-48a5-982a-6973538eab2a","added_by":"auto","created_at":"2025-11-11 16:09:35","extension":"xml","order_by":9,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":193978,"visible":true,"origin":"","legend":"","description":"","filename":"ca885ac509e7410897a69b4b58b169611structuring.xml","url":"https://assets-eu.researchsquare.com/files/rs-7665724/v1/789c2127de89409bc54aa086.xml"},{"id":95538232,"identity":"107d5e8f-2b47-4087-a785-def844d5891b","added_by":"auto","created_at":"2025-11-10 11:03:36","extension":"html","order_by":10,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":205929,"visible":true,"origin":"","legend":"","description":"","filename":"earlyproof.html","url":"https://assets-eu.researchsquare.com/files/rs-7665724/v1/f64d225634221b843ba3371a.html"},{"id":95538223,"identity":"fd00af2e-d526-4a12-9089-bb3095fbebea","added_by":"auto","created_at":"2025-11-10 11:03:36","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":90035,"visible":true,"origin":"","legend":"\u003cp\u003eLLM - simulated samples (GPT-4 and Kimi) and human samples in the Big Five personality and subjective well-being mean scores. The error bar indicates the standard error.\u003c/p\u003e","description":"","filename":"floatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-7665724/v1/74ff38dbac493d456e2d0c50.png"},{"id":95654096,"identity":"c732d338-633f-435b-9ffa-bc742cc2d608","added_by":"auto","created_at":"2025-11-11 16:09:42","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":67094,"visible":true,"origin":"","legend":"\u003cp\u003eRegional difference comparison between human samples and LLM - simulated samples (GPT-4 and Kimi).The error bar indicates the standard error.\u003c/p\u003e","description":"","filename":"floatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-7665724/v1/ab33f6b27a64f226affa0e3e.png"},{"id":95654000,"identity":"8e96af88-cd63-43eb-bfdf-7822ad9e5311","added_by":"auto","created_at":"2025-11-11 16:07:55","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":64143,"visible":true,"origin":"","legend":"\u003cp\u003eLoadings structure diagram of GPT-4 and Kimi simulated samples versus real human samples in principal component analysis (PCA). In this figure, each colored point represents the relationship between the Big Five personality dimensions and well-being. The horizontal and vertical coordinates correspond to the standardized loadings of these variables on the first principal component (PC1) and the second principal component (PC2), respectively. PC1 is the horizontal axis; the further to the right (positive direction) a variable is positioned on this axis, the stronger its positive correlation with PC1; the further to the left (negative direction), the stronger its negative correlation with PC1. PC2 is the vertical axis; the higher up (positive direction) a variable is positioned on this axis, the stronger its positive correlation with PC2; the further down (negative direction), the stronger its negative correlation with PC2.\u003c/p\u003e","description":"","filename":"floatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-7665724/v1/6e54b5ddec7e869490051add.png"},{"id":95797104,"identity":"03a1cbec-9bcd-4b27-9998-eee4a755ddf0","added_by":"auto","created_at":"2025-11-13 08:00:50","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1261620,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-7665724/v1/a50b940a-5a3c-4888-804f-28e0782638c8.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Empirical Examination of Large Language Models in Regional Psychological Structures Simulation: Personality and Well-being","fulltext":[{"header":"Introduction","content":"\u003cp\u003eLarge language models (LLMs) are exhibiting ever-stronger language comprehension and generation capabilities (Bubeck et al., \u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e2023\u003c/span\u003e; Rathje et al., \u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e2024\u003c/span\u003e), thereby furnishing computational social psychology with new opportunities for intelligent investigations of collective psychological structures (Ke et al., \u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e2025\u003c/span\u003e; Kovač et al., \u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e2023\u003c/span\u003e). Empirical work has demonstrated that GPT-4 can generate questionnaire responses that align closely with human self-reports of personality traits (De Winter et al., \u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e2024\u003c/span\u003e; A. Wang et al., \u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e2025\u003c/span\u003e), implying that LLMs possess a \u0026ldquo;role-playing\u0026rdquo; capacity that allows them to impersonate individuals with specified psychological profiles (Mei et al., \u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e2024\u003c/span\u003e; Strachan et al., \u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e2024\u003c/span\u003e). Meanwhile, the computational social-science community has begun to explore the use of LLMs to create \u0026ldquo;virtual participants\u0026rdquo; in order to reduce research costs and streamline experimental workflows (Bisbee et al., \u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e2024\u003c/span\u003e; Dillion et al., \u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e2023\u003c/span\u003e; Grossmann et al., \u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e2023\u003c/span\u003e; Sarstedt et al., \u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e2024\u003c/span\u003e).\u003c/p\u003e\u003cp\u003eYet extant studies overwhelmingly rely on a single foreign model, leaving open the question of whether domestically trained Chinese LLMs and foreign LLMs yield equivalent regional psychological profiles. Cross-cultural psychology has long maintained that psychological characteristics are not randomly distributed; instead, they are embedded in local socio-economic and cultural ecologies (Talhelm et al., \u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e2014\u003c/span\u003e). For instance, Talhelm et al. (\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e2014\u003c/span\u003e) found that students from China\u0026rsquo;s rice-cultivating south displayed more holistic, interdependent cognition than those from the wheat-cultivating north, who exhibited more analytic and independent thinking. Such regional disparities are expected to surface on Big-Five personality scales and subjective well-being measures (Anglim et al., \u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e2020\u003c/span\u003e).\u003c/p\u003e\u003cp\u003eThe Big-Five model-encompassing openness, conscientiousness, extraversion, agreeableness, and neuroticism-offers a structured framework for capturing stable individual differences (McCrae \u0026amp; John, \u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e1992\u003c/span\u003e; Paunonen \u0026amp; Ashton, \u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e2001\u003c/span\u003e). Meta-analyses indicate that conscientiousness is associated with lower health-risk behaviors and greater health-promoting behaviors (Bogg \u0026amp; Roberts, \u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e2004\u003c/span\u003e). Concurrently, subjective well-being serves as a widely used indicator of life quality and psychological adaptation (Schimmack et al., \u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e2002\u003c/span\u003e, \u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e2004\u003c/span\u003e). Research further shows that personality\u0026ndash;subjective well-being associations are moderated by congruence between personality and life choices, with effects surpassing those of income (Matz et al., \u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e2016\u003c/span\u003e). Consequently, personality traits and subjective well-being jointly form an ideal evaluation perspective, which can be used to determine whether LLMs conform to the characteristics of psychological structures with subtle regional differences. Traditional surveys, however, are susceptible to social-desirability biases and response-style distortions (Tourangeau \u0026amp; Yan, \u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e2007\u003c/span\u003e) and incur substantial logistical costs when spanning multiple regions (Baumeister et al., \u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e2007\u003c/span\u003e). LLM-based virtual sampling promises to circumvent these hurdles, yet risks amplifying internet-derived cultural stereotypes (Argyle et al., \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2023\u003c/span\u003e; Lucy \u0026amp; Bamman, \u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e2021\u003c/span\u003e). Whether Kimi-Chat v1.5-a large language model trained on Chinese corpora-outperforms GPT-4 in capturing authentic regional variations remains untested.\u003c/p\u003e\u003cp\u003eIn this study, we selected two language models, Kimi-Chat v1.5 and GPT-4, for comparative analysis, primarily based on their significant differences in training corpora, cultural adaptability, and application scenarios. As a model specifically designed for the Chinese language environment and local Chinese culture, Kimi-Chat\u0026rsquo;s training data mainly came from Chinese internet content and cultural contexts. In contrast, GPT-4 was built on a large-scale multilingual corpus worldwide, covering a broader range of cultures and language types. This difference provided us with an ideal framework to examine the performance of the two models in simulating regional psychological characteristics.\u003c/p\u003e\u003cp\u003eThe considerations behind selecting these two models were reflected in the following aspects. First, Kimi-Chat had an inherent advantage in adaptability to local culture, enabling it to better capture the human psychological characteristics and subjective well-being across major macro-regions in China. GPT-4, on the other hand, with its global training corpora and cross-cultural adaptability, provided us with a comparative perspective different from that of Kimi-Chat. By comparing the performance of these two models in simulating regional differences, we could understand the strengths and limitations of the models in capturing psychological characteristics under different cultural backgrounds. Second, selecting these two models helped us more clearly reveal the potential and limitations of large language models (LLMs) in capturing complex psychological structures. Human psychological characteristics, especially personality traits and subjective well-being, usually exhibit significant regional differences, which may be closely related to culture, social environment, and historical context. By comparing the models\u0026rsquo; outputs with real human data, we could not only test the accuracy of the models in reproducing regional psychological structures but also identify the challenges they might face in understanding and simulating these psychological differences. Through this comparative study, we aimed to identify the strengths and weaknesses of the models. For example, could Kimi-Chat better simulate the unique regional differences in China? Could GPT-4\u0026rsquo;s global performance accurately reflect the personality traits and well-being within Chinese regions? The answers to these questions could not only provide valuable insights for cross-cultural psychological research but also promote the further application and development of LLMs in the field of psychology.\u003c/p\u003e\n\u003ch3\u003eThe Present Study\u003c/h3\u003e\n\u003cp\u003eGrounded in a four-tier framework-personality traits, subjective well-being, regional culture, and training-corpus lineage-this investigation systematically evaluates the capacity and bias of large language models when simulating population-level psychological structures within China. Leveraging the 2018 China Family Panel Studies (CFPS2018, N\u0026thinsp;=\u0026thinsp;37,354; \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttp://www.isss.pku.edu.cn/cfps/\u003c/span\u003e\u003cspan address=\"http://www.isss.pku.edu.cn/cfps/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e) as an invariant demographic scaffold, we construct three parallel samples: human respondents, virtual participants generated by Kimi-Chat v1.5, and virtual participants generated by GPT-4. Kimi-Chat v1.5 is a China-trained model with strong Chinese cultural adaptation and efficient performance in specific domains like education and healthcare. GPT-4 is a globally-trained model excelling in multi-language and multi-scenario applications with advanced reasoning and generative capabilities. The training data of the Kimi (Chinese trained LLM) includes long texts such as everyday language, legal documents, and academic papers, and strengthens logical coherence in the Chinese context through a hierarchical training strategy. Its multimodal data covers scenes such as OCR (Optical Character Recognition) and text-image interweaving, and performs better in handling Chinese-specific table structures and cultural allusions. In contrast, the training data of the non-domestic model GPT contains less than 0.1% Chinese language material and mainly relies on general language materials such as English Wikipedia and books, which may lead to structural biases in generating social norms and value expressions that conform to the Chinese cultural background. Thus, in this study, we chosed these two LLMs - Kimi and GPT.\u003c/p\u003e\u003cp\u003eFirst, we test whether the virtual participants produced by each model faithfully reproduce regional trends in subjective well-being relative to human data. Second, we examine whether regional personality profiles generated by each model align with empirically observed patterns across China\u0026rsquo;s seven macro-regions. Third, capitalizing on the well-established structural link between Big-Five dimensions and subjective well-being (Anglim et al., \u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e2020\u003c/span\u003e; Steel et al., \u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e2008\u003c/span\u003e), we assess whether each model replicates this association. Finally, the design involved two independent 2\u0026times;7 mixed ANOVA models - (Kimi-Chat v1.5 vs Human) \u0026times; region and model (GPT-4 vs Human) \u0026times; region - to test the consistency between the outputs of each model and the human data from seven major regions in China.\u003c/p\u003e\u003cp\u003eCollectively, integrating large-scale survey data with state-of-the-art generative AI, the current study sets out to evaluate-side by side-whether China-trained Kimi-Chat v1.5 and globally-trained GPT-4 can each reproduce authentic regional distributions of Big-Five personality traits and subjective well-being across China\u0026rsquo;s seven macro-regions. In this study, we utilized large language models (Kimi-Chat v1.5 and GPT-4) to generate virtual participants for questionnaire testing. Although this approach has been preliminarily explored in recent research (De Winter et al., \u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e2024\u003c/span\u003e), the present study introduced three innovations. First, it explicitly compared LLMs with different cultural training backgrounds, a core distinction not addressed in prior preliminary studies. Second, it significantly expanded the sample size and measurement scope. Third, it conducted a comparative analysis between the simulated data generated by LLMs and real human participant data. In essence, our study aimed to provide new tools and theoretical perspectives for regional psychological structure research and to offer a transferable methodological framework for culturally adapted LLM-based psychological assessment research worldwide.\u003c/p\u003e"},{"header":"Methods","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e\u003cdiv id=\"Sec4\" class=\"Section3\"\u003e\u003ch2\u003eReal Samples\u003c/h2\u003e\u003cp\u003eThe real samples are from the China Family Panel Studies (CFPS). The survey was launched in 2010 and is conducted every two years, with five rounds of national surveys having been completed so far. The questionnaire data can be applied for free through the project\u0026rsquo;s official website (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttp://www.isss.pku.edu.cn/cfps/\u003c/span\u003e\u003cspan address=\"http://www.isss.pku.edu.cn/cfps/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e). This study uses the personality and subjective well-being data from the adult questionnaire of CFPS2018, with a sample of 4,900 participants. Participants completed the Big Five personality inventory and the life satisfaction scale as part of the CFPS2018. These data were collected through structured interviews conducted face-to-face or online, depending on the participant\u0026rsquo;s accessibility.\u003c/p\u003e\u003cp\u003eThe sample was carefully stratified by region to ensure that each of the seven major Chinese regions was adequately represented: North China (Beijing, Tianjin, Hebei, Shanxi, and Inner Mongolia), Northeast China (Liaoning, Jilin, and Heilongjiang), East China (Shanghai, Jiangsu, Zhejiang, Anhui, Fujian, Jiangxi, and Shandong), Central China (Henan, Hubei, and Hunan), South China (Guangdong, Guangxi, and Hainan), Southwest China (Chongqing, Sichuan, Guizhou, Yunnan, and Tibet), and Northwest China (Shaanxi, Gansu, Qinghai, Ningxia, and Xinjiang). These seven regional divisions are consistent with common macro-geographical classifications in Chinese social science research (e.g., used in the China Family Panel Studies, CFPS). Random sampling was used to select 700 samples from each region, with an equal number of males and females, and an age range of 18\u0026ndash;60 years. Personality was measured using the 15-item Chinese short version of the Big Five Personality Inventory (Hahn et al., \u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e2012\u003c/span\u003e), and subjective well-being was assessed using a single-item self-rating: \u0026ldquo;How happy do you/you feel yourself to be?\u0026rdquo;(with 0 indicating the lowest level of subjective well-being and 10 indicating the highest level of subjective well-being).\u003c/p\u003e\u003c/div\u003e\u003c/div\u003e\n\u003ch3\u003eVirtual Participants\u003c/h3\u003e\n\u003cp\u003eIn addition to human participants, virtual participants were generated using Kimi-Chat v1.5 and GPT-4. The virtual participants were created by inputting demographic prompts into these LLMs to mirror the regional and gender distributions found in the real human sample. Each model generated a dataset of 4,900 virtual participants, with 700 individuals from each of the seven regions. The virtual participants responded to the same personality and subjective well-being surveys as the human participants, ensuring comparability across datasets.\u003c/p\u003e\u003cp\u003eFirst, enter the prompt in the dialog box of Kimi (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://kimi.moonshot.cn\u003c/span\u003e\u003cspan address=\"https://kimi.moonshot.cn\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e) or GPT-4 (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://openai.com\u003c/span\u003e\u003cspan address=\"https://openai.com\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e): \u0026ldquo;Now, as my assistant for a psychological experiment, please generate a list of NN simulated participants based on random sampling that reflects the GDP, cultural background, and demographic characteristics of China\u0026rsquo;s 31 provinces and municipalities. Each participant must report the following information: ID (NN-NN), province/municipality, age (18\u0026ndash;60 years), and gender (male or female); ensure an equal male-to-female ratio. Please list the full information for all NN participants without any omissions.\u0026rdquo; Here, \u0026ldquo;NN\u0026rdquo; denotes the desired number of participants (e.g., NN\u0026thinsp;=\u0026thinsp;10 for 10 participants; IDs range from 01 to 10), which controls each round\u0026rsquo;s sample size and ID range. Next, call the Kimi/GPT API with a temperature of 0.7 to balance diversity and consistency in responses (Argyle et al., \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2023\u003c/span\u003e), then administer the Big-Five personality and subjective well-being questionnaires and generate the corresponding data. The procedure is fully simulated, complies with ethical standards, involves no real participants, and each questionnaire is produced independently. Each \u0026ldquo;participant\u0026rdquo; receives a system prompt containing personal information (ID, province, age, gender) and answers from a first-person perspective. The items are presented in a fixed order without randomization. The prompt template is: \u0026ldquo;Please role-play a person (ID, province, age, gender). Drawing on your imagined life experiences, cultural background, and the prevailing social environment of that locale, respond as realistically as possible from a first-person perspective to the following questions about your personality and feelings. Maintain this identity throughout. You will complete two questionnaires. First, the 15-item Big-Five Inventory assessing Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism. Second, a single-item subjective well-being scale: \u0026lsquo;How happy do you feel with your life?\u0026rsquo;\u0026rdquo;. The simulated participants\u0026rsquo; personality and subjective well-being assessments are identical to the CFPS2018 questionnaire.\u003c/p\u003e\n\u003ch3\u003eInstrument\u003c/h3\u003e\n\u003cp\u003eThe short Big-Five inventory comprises five dimensions - Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism - each measured by three items. For example, the Neuroticism items are \u0026ldquo;often worried\u0026rdquo;, \u0026ldquo;prone to stress\u0026rdquo;, and \u0026ldquo;relaxed and coping well with stress (reverse-scored)\u0026rdquo; (J. Wang et al., \u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). subjective well-being was assessed with the single item, \u0026ldquo;How happy do you feel with your life?\u0026rdquo; rated from 0 (very unhappy) to 10 (very happy). Dimension scores are the mean of the three items; reverse-scored items are recoded as 5 minus the original response. subjective well-being scores use the raw item value.\u003c/p\u003e\u003cdiv id=\"Sec7\" class=\"Section2\"\u003e\u003ch2\u003eData Analysis\u003c/h2\u003e\u003cp\u003eAs the outputs of LLMs (Kimi/GPT-4) match human response formats, identical preprocessing and correction procedures are applied before analysis to ensure comparability. Any simulated participant whose score falls below 0 (i.e., the model produced an out-of-range response) is deemed invalid and excluded. After exclusion, a new simulated participant with the same demographic profile is regenerated to maintain the total sample size of 4,900.\u003c/p\u003e\u003cp\u003eTo assess the fidelity of the virtual participant data to the human data, independent-sample t-tests were conducted to compare the means of the Big Five traits and life satisfaction scores across the real and simulated datasets. Cohen's d was used to estimate effect sizes, with values greater than 0.5 indicating medium to large differences between the datasets. Analysis of variance (ANOVA) was conducted to compare the regional differences in personality traits and subjective well-being between human and LLM-generated participants (Kimi/GPT-4). This allowed for an evaluation of the extent to which the models accurately reproduced regional psychological distributions. Regression models were employed to examine the relationships between the Big Five personality traits and subjective well-being. Both human and LLM-generated datasets were analyzed to determine whether personality traits could predict life satisfaction, and whether these predictions aligned across the two types of participants. Principal Component Analysis (PCA) was applied to examine the structural alignment between the human and LLM-generated datasets. The first two principal components (PC1 and PC2) were extracted to determine the extent to which personality traits and subjective well-being loaded onto these components in both the real and simulated data. This analysis provided further insight into the structural differences between the two datasets (human and LLM-generated datasets).\u003c/p\u003e\u003c/div\u003e"},{"header":"Results","content":"\u003cdiv id=\"Sec9\" class=\"Section2\"\u003e\u003ch2\u003eComparison between real human and Kimi/GPT-4 simulated samples\u003c/h2\u003e\u003cp\u003eThe real human sample consisted of 4900 individuals (2450 males and 2450 females; age: \u003cem\u003eM\u003c/em\u003e\u0026thinsp;=\u0026thinsp;40.11, \u003cem\u003eSD\u003c/em\u003e\u0026thinsp;=\u0026thinsp;12.01), the Kimi-simulated sample was also 4900 individuals (2450 males and 2450 females; age: \u003cem\u003eM\u003c/em\u003e\u0026thinsp;=\u0026thinsp;40.43, \u003cem\u003eSD\u003c/em\u003e\u0026thinsp;=\u0026thinsp;12.49), and the GPT-4-simulated sample was likewise 4900 individuals (2450 males and 2450 females; age: \u003cem\u003eM\u003c/em\u003e\u0026thinsp;=\u0026thinsp;40.36, \u003cem\u003eSD\u003c/em\u003e\u0026thinsp;=\u0026thinsp;12.70). The three samples were essentially consistent in terms of demographic composition. This study employed one-way ANOVA to compare the differences between Kimi, GPT-4 simulated samples, and real human samples in the Big Five personality dimensions and well-being. The results of the post hoc comparisons are shown in Tables\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e, \u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e, \u003cspan refid=\"Tab3\" class=\"InternalRef\"\u003e3\u003c/span\u003e, and visualized in Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e. In the Conscientiousness dimension (\u003cem\u003et\u003c/em\u003e\u0026thinsp;=\u0026thinsp;8.39, \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.001, \u003cem\u003eCohen's d\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.168; \u003cem\u003et\u003c/em\u003e\u0026thinsp;=\u0026thinsp;18.10, \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.001, \u003cem\u003eCohen's d\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.365), the scores of the GPT-4 and Kimi simulated samples were significantly lower than those of the real human sample, with a relatively large difference. In contrast, in the Openness (\u003cem\u003et\u003c/em\u003e = -50.3, \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.001, \u003cem\u003eCohen's d\u003c/em\u003e = -1.018; \u003cem\u003et\u003c/em\u003e = -26.4, \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.001, \u003cem\u003eCohen's d\u003c/em\u003e = -0.536) and Neuroticism (\u003cem\u003et\u003c/em\u003e = -3.81, \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.001, \u003cem\u003eCohen's d\u003c/em\u003e = -0.079; \u003cem\u003et\u003c/em\u003e = -1.17, \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.001, \u003cem\u003eCohen's d\u003c/em\u003e = -0.025) dimensions, the scores of the GPT-4 and Kimi simulated samples were significantly higher than those of the real human sample. In the Agreeableness dimension, the GPT-4 simulated sample scored significantly higher than the real human sample (\u003cem\u003et\u003c/em\u003e = -4.11, \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.001, \u003cem\u003eCohen's d\u003c/em\u003e = -0.085), while the Kimi simulated sample scored significantly lower than the real human sample (\u003cem\u003et\u003c/em\u003e\u0026thinsp;=\u0026thinsp;11.1, \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.001, \u003cem\u003eCohen's d\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.225). In the subjective well-being dimension, the GPT-4 and Kimi simulated samples scored significantly lower than the real human sample (\u003cem\u003et\u003c/em\u003e\u0026thinsp;=\u0026thinsp;7.84, \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.001, \u003cem\u003eCohen's d\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.159; \u003cem\u003et\u003c/em\u003e\u0026thinsp;=\u0026thinsp;4.80, \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.001, \u003cem\u003eCohen's d\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.101), indicating that there is still some deviation in the simulation of well-being by GPT-4 and Kimi. Of course, this may also be due to the fact that people tend to overestimate their own well-being when self-assessing. However, in the Extraversion dimension, there was no significant difference between the GPT-4 and Kimi simulated samples and the real human sample (\u003cem\u003et\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.60, \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026gt;\u0026thinsp;0.05, \u003cem\u003eCohen's d\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.010; \u003cem\u003et\u003c/em\u003e\u0026thinsp;=\u0026thinsp;2.15, \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026gt;\u0026thinsp;0.05, \u003cem\u003eCohen's d\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.044), indicating a high consistency between the Kimi/GPT-4 model and the human sample in this dimension. In other dimensions, there were mostly some deviations (all \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.05), indicating that there is still room for improvement in the Kimi/GPT-4 model in simulating human micro-characteristics (Big Five personality traits and subjective well-being).\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e\u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e\u003cdiv class=\"CaptionContent\"\u003e\u003cp\u003e\u003cem\u003eHuman and GPT-4 samples in the differences of big Five personality and subjective well-being (Post hoc comparisons)\u003c/em\u003e\u003c/p\u003e\u003c/div\u003e\u003c/caption\u003e\u003ccolgroup cols=\"4\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e\u003cp\u003eDimension\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e\u003cp\u003eMean difference\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e\u003cp\u003e\u003cem\u003et\u003c/em\u003e\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c4\"\u003e\u003cp\u003eCohen\u0026rsquo;s d\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eConscientiousness\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e0.304***\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e8.39\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e0.168\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eExtraversion\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e0.023\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e0.60\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e0.010\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eAgreeableness\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e-0.137***\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e-4.11\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e-0.085\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eOpenness\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e-2.210***\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e-50.3\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e-1.018\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eNeuroticism\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e-0.146***\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e-3.81\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e-0.079\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eWell-being\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e0.249***\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e7.84\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e0.159\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003ctfoot\u003e\u003ctr\u003e\u003ctd colspan=\"4\"\u003e\u003cem\u003eNote.\u003c/em\u003e Mean difference\u0026thinsp;=\u0026thinsp;human score - GPT simulated score; * denotes p\u0026thinsp;\u0026lt;\u0026thinsp;0.05, ** denotes p\u0026thinsp;\u0026lt;\u0026thinsp;0.01, *** denotes p\u0026thinsp;\u0026lt;\u0026thinsp;0.001.\u003c/td\u003e\u003c/tr\u003e\u003c/tfoot\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e\u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e\u003cdiv class=\"CaptionContent\"\u003e\u003cp\u003e\u003cem\u003eHuman and Kimi samples in the differences of big Five personality and subjective well-being (Post hoc comparisons)\u003c/em\u003e\u003c/p\u003e\u003c/div\u003e\u003c/caption\u003e\u003ccolgroup cols=\"4\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e\u003cp\u003eDimension\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e\u003cp\u003eMean difference\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e\u003cp\u003e\u003cem\u003et\u003c/em\u003e\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c4\"\u003e\u003cp\u003eCohen\u0026rsquo;s d\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eConscientiousness\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e0.801***\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e18.1\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e0.365\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eExtraversion\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e0.089\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e2.15\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e0.044\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eAgreeableness\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e0.509***\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e11.1\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e0.225\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eOpenness\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e-1.364***\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e-26.4\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e-0.536\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eNeuroticism\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e-0.047\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e-1.17\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e-0.025\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eWell-being\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e0.164***\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e4.80\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e0.101\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003ctfoot\u003e\u003ctr\u003e\u003ctd colspan=\"4\"\u003e\u003cem\u003eNote.\u003c/em\u003e Mean difference\u0026thinsp;=\u0026thinsp;human score - Kimi simulated score.\u003c/td\u003e\u003c/tr\u003e\u003c/tfoot\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab3\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e\u003cdiv class=\"CaptionNumber\"\u003eTable 3\u003c/div\u003e\u003cdiv class=\"CaptionContent\"\u003e\u003cp\u003e\u003cem\u003eGPT-4 and Kimi samples in the differences of big Five personality and subjective well-being (Post hoc comparisons)\u003c/em\u003e\u003c/p\u003e\u003c/div\u003e\u003c/caption\u003e\u003ccolgroup cols=\"4\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e\u003cp\u003eDimension\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e\u003cp\u003eMean difference\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e\u003cp\u003e\u003cem\u003et\u003c/em\u003e\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c4\"\u003e\u003cp\u003eCohen\u0026rsquo;s d\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eConscientiousness\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e0.498***\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e11.7\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e0.238\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eExtraversion\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e0.066\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e1.76\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e0.038\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eAgreeableness\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e0.646***\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e14.7\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e0.300\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eOpenness\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e0.843***\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e18.8\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e0.378\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eNeuroticism\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e0.099**\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e2.92\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e0.060\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eWell-being\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e-0.086***\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e-5.12\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e-0.096\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003ctfoot\u003e\u003ctr\u003e\u003ctd colspan=\"4\"\u003e\u003cem\u003eNote.\u003c/em\u003e Mean difference\u0026thinsp;=\u0026thinsp;GPT-4 simulated score - Kimi simulated score.\u003c/td\u003e\u003c/tr\u003e\u003c/tfoot\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003c/div\u003e\n\u003ch3\u003eExamining Regional Differences in Personality Dimensions and Well-being\u003c/h3\u003e\n\u003cp\u003eTo examine regional differences in personality dimensions and well-being, this study conducted one-way ANOVA on samples simulated by GPT-4, Kimi, and real human data. The results showed that in the real human sample, all dimensions of the Big Five personality traits significantly varied across regions (Conscientiousness: \u003cem\u003eF\u003c/em\u003e(6, 4893)\u0026thinsp;=\u0026thinsp;3.67, \u003cem\u003ep\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.036, partial \u003cem\u003eη\u003c/em\u003e\u0026sup2; = 0.003; Extraversion: \u003cem\u003eF\u003c/em\u003e(6, 4893)\u0026thinsp;=\u0026thinsp;8.71, \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.001, partial \u003cem\u003eη\u003c/em\u003e\u0026sup2; = 0.011; Agreeableness: \u003cem\u003eF\u003c/em\u003e(6, 4893)\u0026thinsp;=\u0026thinsp;3.61, \u003cem\u003ep\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.001, partial \u003cem\u003eη\u003c/em\u003e\u0026sup2; = 0.004; Openness: \u003cem\u003eF\u003c/em\u003e(6, 4893)\u0026thinsp;=\u0026thinsp;10.3, \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.001, partial \u003cem\u003eη\u003c/em\u003e\u0026sup2; = 0.012; Neuroticism: \u003cem\u003eF\u003c/em\u003e(6, 4893)\u0026thinsp;=\u0026thinsp;4.36, \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.001, partial \u003cem\u003eη\u003c/em\u003e\u0026sup2; = 0.005). Well-being also significantly varied across regions, \u003cem\u003eF\u003c/em\u003e(6, 4893)\u0026thinsp;=\u0026thinsp;12.1, \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;.001, partial \u003cem\u003eη\u003c/em\u003e\u0026sup2; = 0.015. Post hoc comparisons (Tukey) revealed that in the conscientiousness dimension, the Northeast region scored significantly lower than the Northwest region (mean difference\u0026thinsp;=\u0026thinsp;0.349, \u003cem\u003ep\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.017), with no other significant differences; in the extraversion dimension, the North China, Northeast, East China, Central China, and Northwest regions scored significantly lower than the South China region (mean differences\u0026thinsp;=\u0026thinsp;0.340, 0.553, 0.486, 0.411, 0.459, all \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.05) and Southwest region (mean differences\u0026thinsp;=\u0026thinsp;0.364, 0.577, 0.510, 0.436, 0.483, all \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.05); in the agreeableness dimension, the Northeast region scored significantly higher than the South China and Southwest regions (mean differences\u0026thinsp;=\u0026thinsp;0.383, 0.323, all \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.05), with no other significant differences; in the openness dimension, the North China region scored significantly higher than the Northeast, East China, Central China, and South China regions (mean differences\u0026thinsp;=\u0026thinsp;0.511, 0.713, 0.597, 0.729, all \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.05), and the Northeast, East China, Central China, and South China regions scored significantly higher than the Northwest region (mean differences\u0026thinsp;=\u0026thinsp;0.466, 0.667, 0.551, 0.683, all \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.05), while the East China and South China regions scored significantly lower than the Southwest region (mean differences\u0026thinsp;=\u0026thinsp;0.407, 0.423, all \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.05); in the neuroticism dimension, the North China, East China, and Central China regions scored significantly lower than the Northwest region (mean differences\u0026thinsp;=\u0026thinsp;0.351, 0.346, 0.454, all \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.05), and the Central China region scored significantly lower than the Southwest region (mean difference\u0026thinsp;=\u0026thinsp;0.406, \u003cem\u003ep\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.004); in terms of well-being, the North China, Northeast, East China, and Central China regions scored significantly higher than the South China region (mean differences\u0026thinsp;=\u0026thinsp;0.567, 0.536, 0.481, 0.337, all \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.05) and Southwest region (mean differences\u0026thinsp;=\u0026thinsp;0.686, 0.654, 0.600, 0.456, all \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.05), and the North China, Northeast, and East China regions scored significantly higher than the Northwest region (mean differences\u0026thinsp;=\u0026thinsp;0.497, 0.466, 0.411, all \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.05).\u003c/p\u003e\u003cp\u003eIn the GPT-4-simulated sample, regional differences in extraversion did not reach a significant level, but regional differences in conscientiousness, agreeableness, neuroticism, and openness were significant (Conscientiousness: \u003cem\u003eF\u003c/em\u003e(6, 4893)\u0026thinsp;=\u0026thinsp;4.68, \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.001, partial \u003cem\u003eη\u003c/em\u003e\u0026sup2; = 0.006; Agreeableness: \u003cem\u003eF\u003c/em\u003e(6, 4893)\u0026thinsp;=\u0026thinsp;3.33, \u003cem\u003ep\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.003, partial \u003cem\u003eη\u003c/em\u003e\u0026sup2; = 0.004; Neuroticism: \u003cem\u003eF\u003c/em\u003e(6, 4893)\u0026thinsp;=\u0026thinsp;2.98, \u003cem\u003ep\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.007, partial \u003cem\u003eη\u003c/em\u003e\u0026sup2; = 0.004; Openness: \u003cem\u003eF\u003c/em\u003e(6, 4893)\u0026thinsp;=\u0026thinsp;2.07, \u003cem\u003ep\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.053, partial \u003cem\u003eη\u003c/em\u003e\u0026sup2; = 0.003). Surprisingly, we found that well-being did not significantly vary across regions, \u003cem\u003eF\u003c/em\u003e(6, 4893)\u0026thinsp;=\u0026thinsp;1.86, \u003cem\u003ep\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.084, partial \u003cem\u003eη\u003c/em\u003e\u0026sup2; = 0.002. It is important to note that the overall difference in openness only reached a marginally significant level with a very small effect size (partial \u003cem\u003eη\u003c/em\u003e\u0026sup2; = 0.003), and post hoc comparisons did not reveal clear regional differences. Post hoc comparisons (Tukey) showed that in the conscientiousness dimension, the North China, Northeast, Central China, and Northwest regions scored significantly higher than the Southwest region (mean differences\u0026thinsp;=\u0026thinsp;0.274, 0.390, 0.284, 0.363, all \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.05); in the agreeableness dimension, the Northeast and Northwest regions scored significantly higher than the Southwest region (mean differences\u0026thinsp;=\u0026thinsp;0.303, 0.260, all \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.05), with no other significant differences; in the neuroticism dimension, the Northeast region scored significantly higher than the Southwest region (mean difference\u0026thinsp;=\u0026thinsp;0.260, \u003cem\u003ep\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.035), with no other significant differences.\u003c/p\u003e\u003cp\u003eIn the Kimi-simulated sample, regional differences in neuroticism did not reach a significant level, but regional differences in conscientiousness, extraversion, agreeableness, openness, and well-being were significant (Conscientiousness: \u003cem\u003eF\u003c/em\u003e(6, 4893)\u0026thinsp;=\u0026thinsp;7.36, \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.001, partial \u003cem\u003eη\u003c/em\u003e\u0026sup2; = 0.009; Extraversion: \u003cem\u003eF\u003c/em\u003e(6, 4893)\u0026thinsp;=\u0026thinsp;5.16, \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.001, partial \u003cem\u003eη\u003c/em\u003e\u0026sup2; = 0.006; Agreeableness: \u003cem\u003eF\u003c/em\u003e(6, 4893)\u0026thinsp;=\u0026thinsp;8.55, \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.001, partial \u003cem\u003eη\u003c/em\u003e\u0026sup2; = 0.010; Openness: \u003cem\u003eF\u003c/em\u003e(6, 4893)\u0026thinsp;=\u0026thinsp;6.93, \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.001, partial \u003cem\u003eη\u003c/em\u003e\u0026sup2; = 0.008; Well-being: \u003cem\u003eF\u003c/em\u003e(6, 4893)\u0026thinsp;=\u0026thinsp;5.76, \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.001, partial \u003cem\u003eη\u003c/em\u003e\u0026sup2; = 0.007). Post hoc comparisons (Tukey) showed that in the conscientiousness dimension, the North China, Northeast, East China, Central China, South China, and Southwest regions scored significantly higher than the Northwest region (mean differences\u0026thinsp;=\u0026thinsp;0.613, 0.577, 0.693, 0.693, 0.711, 0.600, all \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.05); in the extraversion dimension, the North China, Northeast, East China, Central China, South China, and Southwest regions also scored significantly higher than the Northwest region (mean differences\u0026thinsp;=\u0026thinsp;0.407, 0.519, 0.481, 0.409, 0.359, 0.387, all \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.05); in the agreeableness dimension, the North China, Northeast, East China, Central China, South China, and Southwest regions also scored significantly higher than the Northwest region (mean differences\u0026thinsp;=\u0026thinsp;0.650, 0.796, 0.691, 0.751, 0.906, 0.630, all \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.05); in the openness dimension, the North China, Northeast, East China, Central China, South China, and Southwest regions also scored significantly higher than the Northwest region (mean differences\u0026thinsp;=\u0026thinsp;0.684, 0.749, 0.654, 0.594, 0.716, 0.589, all \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.05); similarly, in terms of well-being, the North China, Northeast, East China, Central China, South China, and Southwest regions also scored significantly higher than the Northwest region (mean differences\u0026thinsp;=\u0026thinsp;0.230, 0.183, 0.267, 0.250, 0.260, 0.189, all \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.05).\u003c/p\u003e\u003cp\u003eIn addition, we compared the Big Five personality traits and well-being between human samples and GPT-4-simulated samples across different regions. The results showed that the GPT-4-simulated samples scored significantly lower than human samples in conscientiousness only in the South China region (\u003cem\u003et\u003c/em\u003e\u0026thinsp;=\u0026thinsp;4.21, \u003cem\u003ep\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.004, \u003cem\u003eCohen's d\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.225) and Southwest region (\u003cem\u003et\u003c/em\u003e\u0026thinsp;=\u0026thinsp;5.28, \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.001, \u003cem\u003eCohen's d\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.282). In extraversion, the GPT-4-simulated samples scored significantly lower than human samples only in the Northwest region (\u003cem\u003et\u003c/em\u003e\u0026thinsp;=\u0026thinsp;5.07, \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.001, \u003cem\u003eCohen's d\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.271). In agreeableness, although the GPT-4-simulated samples were generally slightly higher than human samples across all regions, the differences were not significant (all \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026gt;\u0026thinsp;0.05). In neuroticism, the differences between GPT-4-simulated samples and human samples were also not significant across all regions (all \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026gt;\u0026thinsp;0.05). In openness, the GPT-4-simulated samples scored significantly higher than human samples across all regions (all \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.05). In the well-being dimension, human samples scored significantly higher than GPT-4-simulated samples only in the North China region (\u003cem\u003et\u003c/em\u003e\u0026thinsp;=\u0026thinsp;7.56, \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.001, \u003cem\u003eCohen's d\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.404), Northeast region (\u003cem\u003et\u003c/em\u003e\u0026thinsp;=\u0026thinsp;6.86, \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.001, \u003cem\u003eCohen's d\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.367), East China region (\u003cem\u003et\u003c/em\u003e\u0026thinsp;=\u0026thinsp;5.60, \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.001, \u003cem\u003eCohen's d\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.299), and Central China region (\u003cem\u003et\u003c/em\u003e\u0026thinsp;=\u0026thinsp;4.69, \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.001, \u003cem\u003eCohen's d\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.251).\u003c/p\u003e\u003cp\u003eWe also compared the Big Five personality traits and well-being between human samples and Kimi-simulated samples across different regions in China. The results showed that in conscientiousness, Kimi-simulated samples scored significantly lower than human samples across all regions (all \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.05). In extraversion, Kimi-simulated samples scored significantly lower than human samples only in the South China region (\u003cem\u003et\u003c/em\u003e\u0026thinsp;=\u0026thinsp;3.99, \u003cem\u003ep\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.011, \u003cem\u003eCohen's d\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.214) and Southwest region (\u003cem\u003et\u003c/em\u003e\u0026thinsp;=\u0026thinsp;3.96, \u003cem\u003ep\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.013, \u003cem\u003eCohen's d\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.211). In agreeableness, Kimi-simulated samples were generally lower than human samples, mainly in the North China region (\u003cem\u003et\u003c/em\u003e\u0026thinsp;=\u0026thinsp;4.66, \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.001, \u003cem\u003eCohen's d\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.249), Northeast region (\u003cem\u003et\u003c/em\u003e\u0026thinsp;=\u0026thinsp;4.91, \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.001, \u003cem\u003eCohen's d\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.263), East China region (\u003cem\u003et\u003c/em\u003e\u0026thinsp;=\u0026thinsp;4.51, \u003cem\u003ep\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.001, \u003cem\u003eCohen's d\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.241), Central China region (\u003cem\u003et\u003c/em\u003e\u0026thinsp;=\u0026thinsp;3.67, \u003cem\u003ep\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.036, \u003cem\u003eCohen's d\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.196), and Northwest region (\u003cem\u003et\u003c/em\u003e\u0026thinsp;=\u0026thinsp;11.06, \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.001, \u003cem\u003eCohen's d\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.591). In neuroticism, the differences between Kimi-simulated samples and human samples were not significant across all regions (all \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026gt;\u0026thinsp;0.05). In openness, the Kimi-simulated samples scored significantly higher than human samples across all regions (all \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.05). In the well-being dimension, human samples scored significantly higher than Kimi-simulated samples only in the North China region (\u003cem\u003et\u003c/em\u003e\u0026thinsp;=\u0026thinsp;5.72, \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.001, \u003cem\u003eCohen's d\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.306), Northeast region (\u003cem\u003et\u003c/em\u003e\u0026thinsp;=\u0026thinsp;5.93, \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.001, \u003cem\u003eCohen's d\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.317), and East China region (\u003cem\u003et\u003c/em\u003e\u0026thinsp;=\u0026thinsp;4.08, \u003cem\u003ep\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.008, \u003cem\u003eCohen's d\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.218). The above results are shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e and Table\u0026nbsp;\u003cspan refid=\"Tab4\" class=\"InternalRef\"\u003e4\u003c/span\u003e.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab4\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e\u003cdiv class=\"CaptionNumber\"\u003eTable 4\u003c/div\u003e\u003cdiv class=\"CaptionContent\"\u003e\u003cp\u003e\u003cem\u003eHuman and LLM samples are based on the mean and standard deviation-M(SD) of the Big Five personality and subjective well-being in the seven regions of China.\u003c/em\u003e\u003c/p\u003e\u003c/div\u003e\u003c/caption\u003e\u003ccolgroup cols=\"10\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c8\" colnum=\"8\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c9\" colnum=\"9\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c10\" colnum=\"10\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e\u003cp\u003eRegion\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e\u003cp\u003eType\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e\u003cp\u003eSize\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c4\"\u003e\u003cp\u003eAge\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c5\"\u003e\u003cp\u003eConscientiousness\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c6\"\u003e\u003cp\u003eExtraversion\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c7\"\u003e\u003cp\u003eAgreeableness\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c8\"\u003e\u003cp\u003eOpenness\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c9\"\u003e\u003cp\u003eNeuroticism\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c10\"\u003e\u003cp\u003eWell-being\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\" morerows=\"2\" rowspan=\"3\"\u003e\u003cp\u003eNorth China\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eHuman\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e700\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e38.66(12.42)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e11.52(1.89)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\u003cp\u003e9.95(2.16)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e\u003cp\u003e11.49(1.77)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e\u003cp\u003e9.97(2.57)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e\u003cp\u003e8.79(2.22)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e\u003cp\u003e7.60(2.08)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eGPT-4\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e700\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e40.88(12.88)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e11.26(1.65)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\u003cp\u003e9.91(1.66)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e\u003cp\u003e11.63(1.46)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e\u003cp\u003e11.84(1.78)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e\u003cp\u003e9.00(1.59)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e\u003cp\u003e7.03(0.56)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eKimi\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e700\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e40.18(12.64)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e10.78(2.44)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\u003cp\u003e9.92(2.00)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e\u003cp\u003e10.98(2.68)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e\u003cp\u003e11.03(2.45)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e\u003cp\u003e9.01(1.80)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e\u003cp\u003e7.17(1.02)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\" morerows=\"2\" rowspan=\"3\"\u003e\u003cp\u003eNortheast China\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eHuman\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e700\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e43.27(11.64)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e11.30(1.98)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\u003cp\u003e9.73(2.20)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e\u003cp\u003e11.66(1.89)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e\u003cp\u003e9.46(2.62)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e\u003cp\u003e8.82(2.26)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e\u003cp\u003e7.57(2.24)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eGPT-4\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e700\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e39.98(12.68)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e11.37(1.69)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\u003cp\u003e10.05(1.78)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e\u003cp\u003e11.76(1.49)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e\u003cp\u003e11.79(1.74)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e\u003cp\u003e9.17(1.62)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e\u003cp\u003e7.05(0.55)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eKimi\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e700\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e41.04(12.71)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e10.74(2.47)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\u003cp\u003e10.03(1.89)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e\u003cp\u003e11.13(2.63)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e\u003cp\u003e11.10(2.65)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e\u003cp\u003e8.98(1.75)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e\u003cp\u003e7.12(1.03)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\" morerows=\"2\" rowspan=\"3\"\u003e\u003cp\u003eEast China\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eHuman\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e700\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e39.84(12.02)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e11.50(1.89)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\u003cp\u003e9.80(2.14)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e\u003cp\u003e11.52(1.71)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e\u003cp\u003e9.25(2.39)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e\u003cp\u003e8.79(2.10)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e\u003cp\u003e7.51(1.99)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eGPT-4\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e700\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e40.10(12.72)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e11.17(1.69)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\u003cp\u003e9.90(1.67)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e\u003cp\u003e11.55(1.51)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e\u003cp\u003e11.67(1.78)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e\u003cp\u003e8.96(1.60)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e\u003cp\u003e7.09(0.56)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eKimi\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e700\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e40.04(12.56)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e10.86(2.35)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\u003cp\u003e10.00(1.95)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e\u003cp\u003e11.02(2.70)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e\u003cp\u003e11.00(2.59)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e\u003cp\u003e8.88(1.70)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e\u003cp\u003e7.20(1.03)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\" morerows=\"2\" rowspan=\"3\"\u003e\u003cp\u003eCentral China\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eHuman\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e700\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e41.38(11.63)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e11.54(1.87)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\u003cp\u003e9.88(2.12)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e\u003cp\u003e11.48(1.72)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e\u003cp\u003e9.37(2.44)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e\u003cp\u003e8.68(2.07)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e\u003cp\u003e7.37(1.91)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eGPT-4\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e700\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e40.69(12.44)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e11.27(1.69)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\u003cp\u003e10.01(1.67)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e\u003cp\u003e11.60(1.51)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e\u003cp\u003e11.66(1.84)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e\u003cp\u003e9.00(1.64)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e\u003cp\u003e7.01(0.57)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eKimi\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e700\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e40.05(12.40)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e10.86(2.40)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\u003cp\u003e9.92(1.88)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e\u003cp\u003e11.08(2.63)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e\u003cp\u003e10.94(2.60)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e\u003cp\u003e8.81(1.63)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e\u003cp\u003e7.19(1.00)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\" morerows=\"2\" rowspan=\"3\"\u003e\u003cp\u003eSouth China\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eHuman\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e700\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e39.85(12.17)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e11.57(1.86)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\u003cp\u003e10.29(1.97)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e\u003cp\u003e11.28(1.76)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e\u003cp\u003e9.24(2.50)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e\u003cp\u003e8.81(2.18)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e\u003cp\u003e7.03(2.11)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eGPT-4\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e700\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e40.35(13.07)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e11.12(1.68)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\u003cp\u003e9.99(1.73)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e\u003cp\u003e11.55(1.53)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e\u003cp\u003e11.76(1.75)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e\u003cp\u003e8.94(1.52)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e\u003cp\u003e7.06(0.59)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eKimi\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e700\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e40.62(12.46)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e10.87(2.41)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\u003cp\u003e9.87(2.04)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e\u003cp\u003e11.24(2.52)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e\u003cp\u003e11.06(2.49)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e\u003cp\u003e8.87(1.69)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e\u003cp\u003e7.20(1.01)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\" morerows=\"2\" rowspan=\"3\"\u003e\u003cp\u003eSouthwest China\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eHuman\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e700\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e38.95(11.73)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e11.56(1.80)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\u003cp\u003e10.31(1.97)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e\u003cp\u003e11.34(1.64)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e\u003cp\u003e9.66(2.35)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e\u003cp\u003e9.09(2.04)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e\u003cp\u003e6.91(2.43)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eGPT-4\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e700\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e40.60(12.66)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e10.98(1.64)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\u003cp\u003e9.79(1.71)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e\u003cp\u003e11.46(1.51)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e\u003cp\u003e11.67(1.71)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e\u003cp\u003e8.91(1.54)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e\u003cp\u003e7.07(0.57)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eKimi\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e700\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e40.53(12.41)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e10.76(2.38)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\u003cp\u003e9.90(1.98)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e\u003cp\u003e10.96(2.66)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e\u003cp\u003e10.94(2.60)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e\u003cp\u003e8.87(1.76)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e\u003cp\u003e7.13(1.01)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\" morerows=\"2\" rowspan=\"3\"\u003e\u003cp\u003eNorthwest China\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eHuman\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e700\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e38.81(11.80)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e11.65(1.97)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\u003cp\u003e9.83(2.15)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e\u003cp\u003e11.54(1.90)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e\u003cp\u003e9.92(2.57)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e\u003cp\u003e9.14(2.19)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e\u003cp\u003e7.10(2.17)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eGPT-4\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e700\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e39.94(12.47)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e11.35(1.68)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\u003cp\u003e9.97(1.70)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e\u003cp\u003e11.72(1.51)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e\u003cp\u003e11.91(1.76)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e\u003cp\u003e9.15(1.58)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e\u003cp\u003e7.02(0.58)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eKimi\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e700\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e40.55(12.26)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e10.16(2.64)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\u003cp\u003e9.51(2.16)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e\u003cp\u003e10.33(2.81)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e\u003cp\u003e10.35(2.77)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e\u003cp\u003e9.02(2.02)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e\u003cp\u003e6.94(1.08)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003ctfoot\u003e\u003ctr\u003e\u003ctd colspan=\"10\"\u003e\u003cem\u003eNote.\u003c/em\u003e (1) The five personality dimensions (Conscientiousness, Extraversion, Agreeableness, Openness, and Neuroticism) are scored on a 1\u0026ndash;5 scale, where higher scores indicate greater levels of conscientiousness, extraversion, agreeableness, openness, and neuroticism respectively (i.e., increased emotional instability). (2) The subjective well-being scale ranges from 1 to 10 points, with higher scores indicating stronger well-being. (3) The DeepSeek simulated sample (AI) refers to demographic profiles generated through the DeepSeek model that match real-world samples in demographic variables (gender, age).\u003c/td\u003e\u003c/tr\u003e\u003c/tfoot\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003eIn summary, in the real human sample, conscientiousness, extraversion, openness, agreeableness, neuroticism, and well-being all showed significant regional differences; in the GPT-4 (global-trained LLM) simulated sample, conscientiousness, openness, agreeableness, and neuroticism showed significant regional differences, while extraversion and well-being did not show regional differences; in the Kimi (China-trained LLM) simulated sample, conscientiousness, openness, agreeableness, extraversion, and well-being all showed significant regional differences, but neuroticism did not show regional differences.\u003c/p\u003e\u003cdiv id=\"Sec11\" class=\"Section2\"\u003e\u003ch2\u003eTest of the Relationship Between Personality and Well-being\u003c/h2\u003e\u003cp\u003eIn this study, we conducted multiple linear regression analyses on real human data, Kimi-simulated data, and GPT-4-simulated data to examine the predictive effect of the Big Five personality traits on well-being. The results of the regression analyses are presented in Table\u0026nbsp;\u003cspan refid=\"Tab5\" class=\"InternalRef\"\u003e5\u003c/span\u003e, \u003cspan refid=\"Tab6\" class=\"InternalRef\"\u003e6\u003c/span\u003e, \u003cspan refid=\"Tab7\" class=\"InternalRef\"\u003e7\u003c/span\u003e. In the real human sample, the regression model was significant overall, explaining 4.6% of the variance in well-being (\u003cem\u003eR\u003c/em\u003e\u0026sup2; = 0.046, \u003cem\u003eF\u003c/em\u003e(5, 4894)\u0026thinsp;=\u0026thinsp;47.5, \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.001). Among the personality dimensions, conscientiousness (\u003cem\u003eβ\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.045, \u003cem\u003ep\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.002), extraversion (\u003cem\u003eβ\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.059, \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.001), agreeableness (\u003cem\u003eβ\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.089, \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.001), and openness (\u003cem\u003eβ\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.102, \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.001) significantly positively predicted well-being, while neuroticism (\u003cem\u003eβ\u003c/em\u003e = -0.094, \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.001) significantly negatively predicted well-being. In the GPT-4-simulated sample, the regression model was also significant overall, accounting for 15.6% of the variance in well-being (\u003cem\u003eR\u003c/em\u003e\u0026sup2; = 0.156, \u003cem\u003eF\u003c/em\u003e(5, 4894)\u0026thinsp;=\u0026thinsp;181, \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.001). Openness (\u003cem\u003eβ\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.346, \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.001) and agreeableness (\u003cem\u003eβ\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.109, \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.001) significantly positively predicted well-being, while neuroticism (\u003cem\u003eβ\u003c/em\u003e = -0.184, \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.001) significantly negatively predicted well-being. Extraversion (\u003cem\u003ep\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.711) and conscientiousness (\u003cem\u003ep\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.622) did not show significant predictive effects. This indicates that there are certain structural differences between GPT-4 and real data in the simulation modeling of well-being. In the Kimi-simulated sample, the regression model was also significant overall, explaining 23.8% of the variance in well-being (\u003cem\u003eR\u003c/em\u003e\u0026sup2; = 0.238, \u003cem\u003eF\u003c/em\u003e(5, 4894)\u0026thinsp;=\u0026thinsp;307, \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.001). Conscientiousness (\u003cem\u003eβ\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.145, \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.001), extraversion (\u003cem\u003eβ\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.073, \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.001), agreeableness (\u003cem\u003eβ\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.209, \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.001), and openness (\u003cem\u003eβ\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.194, \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.001) significantly positively predicted well-being, while neuroticism (\u003cem\u003ep\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.741) did not show a significant predictive effect. This suggests that there are fewer structural differences between Kimi and real data in the simulation modeling of well-being.\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab5\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e\u003cdiv class=\"CaptionNumber\"\u003eTable 5\u003c/div\u003e\u003cdiv class=\"CaptionContent\"\u003e\u003cp\u003e\u003cem\u003eA comparative regression analysis of the five personality dimensions predicting subjective well-being (human samples)\u003c/em\u003e\u003c/p\u003e\u003c/div\u003e\u003c/caption\u003e\u003ccolgroup cols=\"5\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e\u003cp\u003ePredictor\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e\u003cp\u003e\u003cem\u003eβ\u003c/em\u003e\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e\u003cp\u003e\u003cem\u003eSE\u003c/em\u003e\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c4\"\u003e\u003cp\u003e\u003cem\u003et\u003c/em\u003e\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c5\"\u003e\u003cp\u003e\u003cem\u003ep\u003c/em\u003e\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eConstant\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e4.861\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e0.314\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e15.46\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eConscientiousness\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e0.045\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e0.017\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e3.05\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e0.002\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eExtraversion\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e0.059\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e0.015\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e4.00\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eAgreeableness\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e0.089\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e0.018\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e6.14\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eOpenness\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e0.102\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e0.013\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e6.95\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eNeuroticism\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e-0.094\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e0.014\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e-6.62\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003ctfoot\u003e\u003ctr\u003e\u003ctd colspan=\"5\"\u003e\u003cem\u003eNote.\u003c/em\u003e In the table, \u003cem\u003eβ\u003c/em\u003e is the standardized regression coefficient, \u003cem\u003eSE\u003c/em\u003e is the standard error, \u003cem\u003et\u003c/em\u003e is the statistical significance test of the regression coefficient, and \u003cem\u003ep\u003c/em\u003e is the significance level (double-sided test).\u003c/td\u003e\u003c/tr\u003e\u003c/tfoot\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab6\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e\u003cdiv class=\"CaptionNumber\"\u003eTable 6\u003c/div\u003e\u003cdiv class=\"CaptionContent\"\u003e\u003cp\u003e\u003cem\u003eA comparative regression analysis of the five personality dimensions predicting subjective well-being (GPT-4 simulated samples)\u003c/em\u003e\u003c/p\u003e\u003c/div\u003e\u003c/caption\u003e\u003ccolgroup cols=\"5\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e\u003cp\u003ePredictor\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e\u003cp\u003e\u003cem\u003eβ\u003c/em\u003e\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e\u003cp\u003e\u003cem\u003eSE\u003c/em\u003e\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c4\"\u003e\u003cp\u003e\u003cem\u003et\u003c/em\u003e\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c5\"\u003e\u003cp\u003e\u003cem\u003ep\u003c/em\u003e\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eConstant\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e5.881\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e0.065\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e90.51\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eConscientiousness\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e-0.011\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e0.008\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e-0.49\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e0.622\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eExtraversion\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e0.007\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e0.006\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e0.37\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e0.711\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eAgreeableness\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e0.109\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e0.008\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e4.86\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eOpenness\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e0.346\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e0.005\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e21.416\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eNeuroticism\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e-0.184\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e0.005\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e-12.45\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003ctfoot\u003e\u003ctr\u003e\u003ctd colspan=\"5\"\u003e\u003cem\u003eNote.\u003c/em\u003e Same as Table\u0026nbsp;\u003cspan refid=\"Tab5\" class=\"InternalRef\"\u003e5\u003c/span\u003e.\u003c/td\u003e\u003c/tr\u003e\u003c/tfoot\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab7\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e\u003cdiv class=\"CaptionNumber\"\u003eTable 7\u003c/div\u003e\u003cdiv class=\"CaptionContent\"\u003e\u003cp\u003e\u003cem\u003eA comparative regression analysis of the five personality dimensions predicting subjective well-being (Kimi simulated samples)\u003c/em\u003e\u003c/p\u003e\u003c/div\u003e\u003c/caption\u003e\u003ccolgroup cols=\"5\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e\u003cp\u003ePredictor\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e\u003cp\u003e\u003cem\u003eβ\u003c/em\u003e\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e\u003cp\u003e\u003cem\u003eSE\u003c/em\u003e\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c4\"\u003e\u003cp\u003e\u003cem\u003et\u003c/em\u003e\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c5\"\u003e\u003cp\u003e\u003cem\u003ep\u003c/em\u003e\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eConstant\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e4.372\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e0.101\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e43.28\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eConscientiousness\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e0.145\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e0.006\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e9.36\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eExtraversion\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e0.073\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e0.007\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e5.33\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eAgreeableness\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e0.209\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e0.006\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e12.95\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eOpenness\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e0.194\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e0.006\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e12.39\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eNeuroticism\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e0.004\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e0.007\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e0.33\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e0.741\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003ctfoot\u003e\u003ctr\u003e\u003ctd colspan=\"5\"\u003e\u003cem\u003eNote.\u003c/em\u003e Same as Table\u0026nbsp;\u003cspan refid=\"Tab5\" class=\"InternalRef\"\u003e5\u003c/span\u003e.\u003c/td\u003e\u003c/tr\u003e\u003c/tfoot\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec12\" class=\"Section2\"\u003e\u003ch2\u003ePCA results of LLMs and human samples\u003c/h2\u003e\u003cp\u003eTo further compare the overall psychological structural differences between LLM-simulated data (GPT-4 and Kimi) and human data, we conducted principal component analysis (PCA) on the six variables in both samples, extracted the first two principal components (PC1 and PC2), and plotted the variable loading plot (see Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e). Across the three datasets, the first principal component (PC1) captured the dominant share of variance\u0026mdash;76.11% (GPT-4), 58.13% (Kimi), and 46.60% (human)\u0026mdash;indicating a shared core axis of variation. In all cases, subjective well-beingshowed the strongest loading on PC1, underscoring its central role in the overall psychological profile. By contrast, the second principal component (PC2) exhibited marked divergence across datasets. AI-generated samples showed lower PC2 contributions\u0026mdash;9.31% (GPT-4) and 11.40% (Kimi)\u0026mdash;with relatively limited dispersion along this axis, whereas the human sample showed a higher PC2 contribution (16.95%) and broader heterogeneity across both PC1 and PC2. Correspondingly, the geometry of the score distributions progressed from compact (GPT-4) to more diffuse (Kimi) to most dispersed (human). Loadings on PC2 for Extraversion, Openness, and Neuroticism differed in both rank order and direction between LLMs-simulated and human data, suggesting that while large language models recover the primary structural axis, they underrepresent multidimensional variation and the breadth of real-world individual differences. Consistent with these patterns, cumulative variance explained by the first two components decreased from AI to humans: 85.42% (GPT-4), 69.53% (Kimi), and 63.55% (human).\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003c/div\u003e"},{"header":"Discussion","content":"\u003cp\u003eThis study used data from 4,900 real respondents of the 2018 China Family Panel Studies (CFPS2018) as a benchmark to compare the performance of Kimi-Chat v1.5 (a large language model trained in China) and GPT-4 (a globally trained large language model) in simulating the Big Five personality traits (Conscientiousness, Extraversion, Agreeableness, Openness, Neuroticism) and subjective well-being across seven macro-regions of China (North China, Northeast China, East China, Central China, South China, Southwest China, Northwest China), and conducted an in-depth analysis of the simulation effectiveness for the Big Five personality traits and subjective well-being. Overall, this study verified the stability and reproducibility of large language models (LLMs) in simulating psychometric dimensions with strong structural properties and clear semantics (e.g., the Big Five personality traits), which was consistent with existing research conclusions on LLM-generated virtual participants (De Winter et al., \u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e2024\u003c/span\u003e; A. Wang et al., \u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e2025\u003c/span\u003e). However, the \"cultural lineage\" of training corpora might have been a key factor influencing simulation accuracy\u0026mdash;Kimi-Chat v1.5, trained on Chinese corpora, was superior to the globally trained GPT-4 in reproducing regional differences and structural associations, as it was more aligned with Chinese socio-economic and cultural contexts (Talhelm et al., \u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e2014\u003c/span\u003e). Although LLMs had shown a certain ability to simulate regional personality characteristics and the distribution of subjective well-being, there were still some biases and limitations when compared with real human data.\u003c/p\u003e\u003cp\u003eIn terms of the overall ability to reproduce regional psychological differences, LLMs generally captured the overall trends of psychological trait differences across regions at the macro level, but deviated from real data in specific dimensions (e.g., certain personality traits and well-being), reflecting the limitations of model simulation. In the real human sample, all dimensions of the Big Five personality traits and subjective well-being exhibited significant regional differences. Compared with the real human sample, Kimi-Chat v1.5 outperformed GPT-4. In Kimi-Chat v1.5\u0026rsquo;s simulated data, only Neuroticism did not show significant regional differences; the other four personality dimensions and well-being all exhibited significant regional differences. Post-hoc tests further revealed that its regional ranking (e.g., Agreeableness scores in North China and Northeast China were higher than those in Southwest China) was consistent with the trend in the human sample. In contrast, GPT-4\u0026rsquo;s simulated data not only showed no significant regional differences in Extraversion, but also failed to detect significant regional differences in subjective well-being, reflecting the \"selective capture\" limitation of GPT-4 (a globally trained model) in representing China\u0026rsquo;s regional psychological characteristics. Human well-being was not only influenced by individual personality traits (Anglim et al., \u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e2020\u003c/span\u003e; Grant et al., \u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e2009\u003c/span\u003e; Zhai et al., \u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e2013\u003c/span\u003e), but also closely related to multiple factors such as culture, economy, and society (Oishi et al., \u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e2011\u003c/span\u003e). However, GPT-4\u0026rsquo;s training corpora might not have fully covered these regional factors, which limited its ability to simulate well-being in specific regions.\u003c/p\u003e\u003cp\u003eCross-cultural psychology had long pointed out that psychological characteristics were not randomly distributed, but embedded in local cultural and economic contexts (Talhelm et al., \u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e2014\u003c/span\u003e). For example, people in rice-farming areas of southern China tended to adopt holistic thinking, while those in wheat-farming areas of northern China tended to use analytical thinking. Such differences should have been reflected in the Big Five personality traits and well-being. This study found that Kimi-Chat v1.5 could reproduce these differences (e.g., in the human sample, Extraversion scores in South China and Southwest China were significantly higher than those in North China and Northeast China, and this trend was consistent in Kimi\u0026rsquo;s simulated data), whereas GPT-4 failed to capture the regional differences in Extraversion and well-being. This study suggested that the reason might lie in the differences in the cultural lineage of the training corpora between the two models. Kimi-Chat v1.5 was centered on Chinese corpora, including long texts such as daily language, legal documents, and academic papers. It also enhanced logical coherence in the Chinese context through hierarchical training, and even included multimodal regional scenario data (e.g., social survey reports and local media narratives from different provinces). This enabled it to more accurately capture psychological differences across Chinese regions. For instance, the Openness trait derived from the multi-ethnic culture in Southwest China showed a consistent score trend between Kimi\u0026rsquo;s simulated data and the human sample (Openness in Southwest China was higher than that in Northwest China); Chinese corpora accounted for less than 0.1% of GPT-4\u0026rsquo;s training data, and GPT-4 mainly relied on general corpora such as English Wikipedia and books. This led to its reliance on \"pan-cultural stereotypes\" (Argyle et al., \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2023\u003c/span\u003e) in understanding China\u0026rsquo;s regional psychology. For example, it simplified \"well-being\" into a general \"life satisfaction\" indicator, overlooking the unique influencing factors in different regions of China (e.g., the role of social support networks in North China and community cohesion in Northeast China in promoting well-being), which might have led to the absence of significant regional differences in its well-being simulation. In contrast, Kimi\u0026rsquo;s simulation of subjective well-being exhibited significant regional differences. This might have been because the bias in LLMs\u0026rsquo; regional well-being simulation was not merely a \"model capability issue,\" but determined by the \"coverage and depth\" of cultural information in the training corpora - the \"localization\" of Chinese corpora made Kimi-Chat v1.5 a more suitable virtual participant tool for regional psychological research in China.\u003c/p\u003e\u003cp\u003eThe findings of our study revealed the impact of training corpus bias on model outputs. Since the Kimi-Chat model was trained on a Chinese corpus, it had a stronger fit in the Chinese cultural context, and performed better in predicting well-being and personality in specific regions such as Northeast China. In contrast, because GPT-4 adopted global corpora, its model outputs had poor adaptability in some regions, especially in regions with complex and diverse cultural backgrounds such as South China and Southwest China. This phenomenon reminded us that when using LLMs for regional psychological research, special attention must be paid to the adaptability of cultural backgrounds; especially in the field of cross-cultural research, the selection of LLMs and the cultural biases contained in their training processes significantly affected the predictive performance of the models. However, the two LLMs still had limitations in simulating these regional personality differences, and they failed to accurately reproduce the strengths or weaknesses of personality traits in some regions. In the simulated data of Kimi-Chat v1.5 and GPT-4, the differences in Extraversion and Openness scores between different regions were reduced (see Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e), which was consistent with the conclusion proposed by Serapio-Garc\u0026iacute;a et al. (\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e2023\u003c/span\u003e) regarding the differences in the difficulty of simulating personality dimensions - i.e., models exhibited lower stability in dimensions such as Extraversion and Openness, which relied on social interaction or experiential exploration (Sorokovikova et al., \u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e2024\u003c/span\u003e; Trott et al., \u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e2023\u003c/span\u003e), and could not fully simulate human behavior. Similarly, such biases in the simulation of regional characteristics also reflected the structural limitation of large models in lacking real experiences and interactions with regional cultural and social contexts (Grossmann et al., \u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e2023\u003c/span\u003e). Overall, LLMs could simulate the overall trend of regional distribution differences in personality traits, but their reproduction of specific regional characteristics was still not detailed enough.\u003c/p\u003e\u003cdiv id=\"Sec14\" class=\"Section2\"\u003e\u003ch2\u003eThe predictive effect of personality dimension on subjective well-being by LLMs and human\u003c/h2\u003e\u003cp\u003eCould the correlation patterns between personality and well-being simulated by LLMs be consistent with those of real human populations? In the structural simulation of the personality-well-being association, Kimi-Chat v1.5 was more aligned with real human data. In the human sample, Conscientiousness, Extraversion, Agreeableness, and Openness significantly positively predicted well-being, while Neuroticism significantly negatively predicted well-being. For Kimi-Chat v1.5, only the predictive effect of Neuroticism was not significant; the predictive directions of the other four dimensions were completely consistent with those of humans, and its explanatory power was higher. In contrast, GPT-4\u0026rsquo;s predictive strength for Openness was far higher than that in the human sample, with its explanatory power falling between Kimi-Chat v1.5 and the human sample, indicating a strong structural bias. However, GPT-4\u0026rsquo;s simulated data failed to replicate the significant positive effects of Conscientiousness and Extraversion on well-being observed in real data (Anglim et al., \u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e2020\u003c/span\u003e). Specifically, GPT-4 simulated the effects of certain personality dimensions (e.g., Agreeableness, Neuroticism) on well-being relatively well, but its simulation of other dimensions was less accurate (see Table\u0026nbsp;\u003cspan refid=\"Tab6\" class=\"InternalRef\"\u003e6\u003c/span\u003e). For example, the positive effect of Openness on well-being was overestimated in the simulated sample, while Conscientiousness and Extraversion had no significant effects on well-being in the simulated sample. On the one hand, this bias might have stemmed from LLMs\u0026rsquo; reliance on explicit, emotional traits during training and response generation, making it difficult to reconstruct the actual impacts of implicit, long-term stable personality traits on well-being. On the other hand, it was also related to the lack of real interactions and subjective experiences in large models, which hindered their ability to accurately simulate well-being and personality variables involving subjective feelings (e.g., Extraversion) (Grossmann et al., \u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e2023\u003c/span\u003e). The phenomenon of structural bias observed in this study was consistent with recent empirical findings that when large models are used as virtual participants, they may cause identity flattening and error amplification in the simulation of group psychological structures (A. Wang et al., \u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e2025\u003c/span\u003e).\u003c/p\u003e\u003cp\u003eMatz et al. (\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e2016\u003c/span\u003e) pointed out that the association between personality and well-being is moderated by the \u0026ldquo;congruence between personality and life choices\u0026rdquo;, and this moderating effect varies across different cultural contexts. This study suggested that the differences between Kimi-Chat v1.5 and GPT-4 in the personality-well-being association might reflect their varying abilities to capture culture-specific associations in Chinese culture. In the human sample, the positive prediction of well-being by Conscientiousness (e.g., being rigorous and planned) and Extraversion (e.g., being socially active) reflected the importance of \u0026ldquo;fulfilling responsibilities\u0026rdquo; (e.g., family responsibilities, work commitment) and \u0026ldquo;social connection\u0026rdquo; (e.g., interactions with relatives and friends) in the Chinese context. Since Kimi-Chat v1.5\u0026rsquo;s training corpus included a large number of life narratives of Chinese people (e.g., social media posts, interview records), it could partially capture this association; only the predictive effect of Neuroticism (emotional instability) was absent, which might have been due to fewer expressions of the negative correlation between \u0026ldquo;Neuroticism and well-being\u0026rdquo; in the Chinese corpus, leading to insufficient learning by the model. GPT-4 lacked the predictive effects of Conscientiousness and Extraversion because the \u0026ldquo;personality-well-being\u0026rdquo; associations in its general corpus were mostly based on Western samples (e.g., the association between Extraversion and personal achievement in Western culture), rather than the associations between \u0026ldquo;Conscientiousness and family stability\u0026rdquo; and \u0026ldquo;Extraversion and interpersonal harmony\u0026rdquo; in Chinese culture. For example, GPT-4 might have simplified \u0026ldquo;Conscientiousness\u0026rdquo; to \u0026ldquo;task completion ability\u0026rdquo; instead of \u0026ldquo;a sense of responsibility towards family and work\u0026rdquo; in the Chinese context, resulting in its inability to predict well-being. In addition, the predictive strength of Openness in GPT-4 (see Table\u0026nbsp;\u003cspan refid=\"Tab6\" class=\"InternalRef\"\u003e6\u003c/span\u003e) was far higher than that in the human sample. This might have been because the association between \u0026ldquo;Openness and well-being\u0026rdquo; in its training corpus mostly came from general expressions such as \u0026ldquo;happiness from exploring new things,\u0026rdquo; while ignoring the emphasis on \u0026ldquo;traditional values\u0026rdquo; in some regions of China (e.g., Northwest China). Excessively high Openness might instead conflict with local cultural expectations and reduce well-being, but GPT-4 failed to capture this culture-specific \u0026ldquo;boundary condition.\u0026rdquo;\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec15\" class=\"Section2\"\u003e\u003ch2\u003eThe overall psychological structure of LLMs and human\u003c/h2\u003e\u003cp\u003eThis study compared the \"Big Five personality-well-being\" psychological structures of real human samples from China and simulated samples of large language models (GPT-4, Kimi-Chat v1.5) through Principal Component Analysis (PCA). It found that all three types of samples exhibited the commonality that \"well-being was the core loading variable of the first principal component (PC1)\", indicating that large language models could capture the general regularity that \"well-being is a core element of human psychology\" through text training. However, significant cultural and model differences existed in the \"correlation logic between Big Five personality dimensions and well-being\". In the real human sample, PC1 presented a \"trade-off relationship between well-being, Conscientiousness, and Agreeableness\" - a characteristic closely related to the collectivist orientation and responsibility ethics of Chinese culture. In the context of Chinese social culture, Conscientiousness (e.g., emphasis on family responsibilities and work rigor) and Agreeableness (e.g., pursuit of interpersonal harmony) are important personality traits for individuals to integrate into society. Nevertheless, excessive investment in these traits is often accompanied by \"suppression of personal needs\". For instance, individuals who took on excessive family care responsibilities (high Conscientiousness) or overly accommodated the needs of others (high Agreeableness) would experience consumption of psychological resources, which in turn weakened their well-being. This \"dynamic balance between social adaptation and personal well-being\" is a typical psychological pattern formed by the Chinese population in long-term family interactions and social role practice. As a model developed locally in China, Kimi-Chat v1.5\u0026rsquo;s training corpus included a large number of Chinese texts and psychological descriptions in the Chinese cultural context (e.g., discussions in Chinese communities on the relationships between \"diligence and responsibility\", \"kindness\", and \"happiness\"). Therefore, although the PC1 simulated by Kimi-Chat v1.5 did not fully replicate the \"trade-off mechanism\" of the human sample, it could still capture the \"negative correlation between well-being and Conscientiousness\", avoiding complete separation between dimensions. In contrast, GPT-4\u0026rsquo;s training corpus mainly consisted of English texts and global general content, and it lacked in-depth acquisition of the psychological logic of \"responsibility first\" and \"harmony as priority\" in Chinese culture - it could neither understand the socio-cultural roots of \"the Chinese population sacrificing well-being due to excessive Conscientiousness\" nor access sufficient empirical psychological descriptions of the Chinese population. Ultimately, this led to PC1 of GPT-4 exhibiting the characteristic of \"single dominance of well-being\", where the influences of personality dimensions such as Conscientiousness and Agreeableness were greatly compressed, completely deviating from the psychological reality of the Chinese population.\u003c/p\u003e\u003cp\u003eIn terms of the multidimensional heterogeneity of psychological structures, both models (Kimi-Chat v1.5 and GPT-4) had \"simplification bias\". The psychological structure of real humans is multidimensional and heterogeneous\u0026mdash;even within the same region, individuals with different ages, occupations, and educational backgrounds have different correlation patterns between personality and well-being (e.g., Openness has a stronger impact on well-being in young people, while Conscientiousness has a stronger impact in middle-aged people). However, the PCA results of this study showed that the two LLMs significantly compressed the multidimensionality of the psychological structure. For example, PC2 of the human sample contributed 16.95%, and the loading directions of Extraversion, Openness, and Neuroticism on PC2 were different from those on PC1 (e.g., the loading of well-being on PC1 was positive, while the loading of Extraversion on PC2 was positive and that of Neuroticism was negative), reflecting the independent role of the secondary dimension of \"social tendency-emotional stability\" beyond the \"core well-being\". In contrast, the contribution of PC2 in Kimi-Chat v1.5 and GPT-4 decreased significantly, and the order of variable loadings was disordered (e.g., the loading of Openness on PC2 in GPT-4 was opposite to that in humans). This indicated that although LLMs could capture the core axis of the psychological structure (e.g., the dominant effect of well-being on PC1), they significantly simplified the multidimensional differences in the real human psychological structure, over-relying on \"average trends\" and ignoring the inter-individual heterogeneity of the human sample. This \"simplification bias\" might have been due to LLMs\u0026rsquo; tendency to generate data based on \"the most common patterns\" (Argyle et al., \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2023\u003c/span\u003e). For example, when simulating the North China region, LLMs over-relied on the average trend that \"people in North China are extraverted and have high well-being\", while ignoring the well-being differences between \"urban residents and rural residents\" in North China. Even though Kimi-Chat v1.5 had a smaller bias, it still could not completely avoid this \"averaging\" tendency, resulting in the heterogeneity of its simulated data being lower than that of the real human sample.\u003c/p\u003e\u003cp\u003eBy comparing the psychological structure simulations of the two LLMs, this study found that in terms of the principal component explanatory rate, the PC1 explanatory rate of GPT-4 (76.11%) was significantly higher than that of Kimi-Chat v1.5 (58.13%) and the human sample (46.6%), while its PC2 explanatory rate (9.31%) was much lower than that of Kimi-Chat v1.5 (11.4%) and the human sample (16.95%). This difference essentially stemmed from GPT-4\u0026rsquo;s lack of awareness of \"the psychological diversity of the Chinese population\"\u0026mdash;the well-being of the Chinese population is driven not only by personal subjective experience but also closely related to \"personality expressions that conform to cultural expectations\" (e.g., conscientiousness, kindness), presenting the characteristic of \"multi-dimensional interactive influence\". However, due to the lack of support from Chinese cultural texts, GPT-4 could only simplify the psychological structure into a \"single dimension of well-being\", losing the pluralistic balance of human psychology. Although Kimi-Chat v1.5 also had dimension compression, its PC1 explanatory rate was closer to that of the human sample because the local corpus included descriptions of \"multiple factors influencing well-being\" (e.g., discussions in Chinese literature on the relationships between personality, social support, and well-being), retaining more potential space for multi-dimensional interaction. In terms of intra-personality correlations, PC2 of the human sample clearly revealed an \"opposing relationship between Neuroticism and Openness\"\u0026mdash;this correlation also implied Chinese cultural specificity. Evaluations of \"acceptance of new things\" in Chinese society are often linked to \"emotional stability\" (e.g., when facing social changes, individuals with stable emotions are more likely to adapt to new environments and embrace new ideas). Since Kimi-Chat v1.5\u0026rsquo;s local corpus included personality descriptions in such cultural contexts, its PC2 could still weakly capture the \"correlation between Neuroticism and Openness\". In contrast, GPT-4 lacked texts on personality interactions in Chinese culture, so its PC2 had almost no effective variable correlations, completely losing the classic personality relationship of \"emotion-cognition\" in the human sample. This further confirmed the constraint of cultural context on the psychological simulation of models.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec16\" class=\"Section2\"\u003e\u003ch2\u003eLimitations, Application and Future Directions\u003c/h2\u003e\u003cp\u003eThis study innovatively introduced large language models (Kimi-Chat v1.5, trained in China, and GPT-4, trained globally) to investigate regional psychological structures in China. Without inputting or guiding any raw data from CFPS 2018, the study found that Kimi-Chat v1.5 exhibited a high degree of trend consistency with human participants when simulating group personality traits and subjective well-being (e.g., across the seven macro-regions, the directions of regional differences in Conscientiousness, Extraversion, Agreeableness, Openness, and well-being were completely consistent with those of humans, with only Neuroticism showing no significant regional differences). In contrast, although GPT-4 could replicate regional trends for some personality dimensions (e.g., Conscientiousness, Agreeableness), it exhibited obvious deviations from human trends in core indicators (e.g., no significant regional differences in subjective well-being, disordered regional ranking of Extraversion, and only marginally significant regional differences in Openness). This study not only provided a clear prospect for using \"virtual participants\" to assist regional psychological research in China but also put forward methodological insights based on \"differences in the cultural lineage of model training corpora\" for the application of this method in cross-cultural psychological research.\u003c/p\u003e\u003cp\u003e First, LLM (e.g., Kimi-Chat v1.5) more accurately captured China\u0026rsquo;s regional cultural and psychological characteristics, such as the strong collective consciousness in North China and the high social activity in South China. This made Kimi-Chat a better tool for simulating regional trends in questionnaire responses before formal surveys, allowing researchers to identify potential issues in questionnaire design. For example, if the Openness scores in Northwest China were significantly lower than expected, it could indicate cultural misalignment in survey items, such as overemphasizing \"new things\" in urban contexts. This allows for improved tool development. In contrast, GPT-4, trained on a global corpus, has limited ability to address these biases due to its small representation of Chinese data, making it more suited for cross-cultural comparison than as a primary tool for pre-survey testing. Second, LLM can be used as a virtual participant to simulate responses under various experimental conditions (e.g., different question orders and regional prompts) before launching large-scale cross-regional psychological research. This pre-experimentation saves time and costs while helping refine hypotheses. For instance, adjusting Conscientiousness items and observing changes in the relationship with well-being could predict response patterns for Chinese populations that value family responsibilities. While GPT-4 can also be used for similar simulations, its biases in variable associations (e.g., lack of predictive effect for Conscientiousness on well-being) require cross-validation with Kimi-Chat to ensure accuracy. Finally, LLM can address sample acquisition challenges, such as limited samples in remote regions or ethical restrictions on sensitive variables like Neuroticism. It can generate simulated data reflecting regional demographics (e.g., age, gender, GDP level) to supplement real data, offering complete coverage across China\u0026rsquo;s regions. GPT-4\u0026rsquo;s simulated data showed significant regional deviations and could only serve as a preliminary supplement. Regardless of the model, it\u0026rsquo;s essential to compare simulated data with real data (e.g., CFPS 2018) to ensure consistency and reliability, preventing biases from distorting conclusions (e.g., Kimi-Chat\u0026rsquo;s underestimation of Conscientiousness).\u003c/p\u003e\u003cp\u003eAn important significance of this study was that it verified the practicality of Chinese LLMs in regional psychological research - specifically, Kimi-Chat v1.5 could relatively accurately replicate the distribution of the Big Five personality traits and well-being across China\u0026rsquo;s seven macro-regions. Furthermore, the study argued that the method of using LLMs to generate simulated data could avoid social desirability bias inherent in traditional human-based surveys (Tourangeau \u0026amp; Yan, \u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e2007\u003c/span\u003e). For example, in real surveys, participants might underestimate their Neuroticism scores because they \"were unwilling to admit emotional instability,\" but LLM simulations had no such concerns, providing a new approach for researching \"sensitive psychological dimensions.\"\u003c/p\u003e\u003cp\u003eThis study provided a transferable framework for LLM-based psychological simulation. The \"real data benchmark -- multi-model comparison -- verification\" process of this study could be extended to LLM-based psychological research in other cultural contexts (e.g., comparing the regional psychological simulation capabilities of local LLMs in Japan and the United States), offering methodological references for \"culturally adapted LLM evaluation\" globally. However, the study argued that LLMs had certain limitations in cross-cultural adaptability. Although LLMs could simulate personality traits and well-being distributions across different regions, their performance was constrained by training corpora and cultural backgrounds. Therefore, future research should focus on improving the cross-cultural adaptability of LLMs, especially for regions with significant cultural differences. For instance, more localized corpora could be introduced and training methods optimized to enhance the models\u0026rsquo; predictive capabilities in different cultural environments. Additionally, with the continuous advancement of LLMs, future research could further improve their simulation performance by integrating multiple models, cross-disciplinary knowledge, and multi-dimensional data - particularly in fine-grained psychological dimensions and complex regional cultural contexts. Despite some limitations in simulating regional psychological structures, LLMs still demonstrated great potential in practical applications. For example, LLMs could provide efficient simulation tools for psychological research, especially in large-scale data collection and processing, significantly reducing costs and mitigating social desirability bias in traditional survey methods. Furthermore, the popularization of LLMs could promote interdisciplinary research, facilitating the integration of psychology, sociology, and artificial intelligence. However, in practice, we must carefully consider the cultural adaptability and potential biases of models to ensure their effectiveness and fairness in different contexts.\u003c/p\u003e\u003cp\u003eOverall, as large language models, Kimi-Chat v1.5 and GPT-4 showed certain potential in simulating the regional distribution of the Big Five personality traits and well-being, but they also had obvious limitations. In terms of strengths, the demographic characteristics of the samples simulated by Kimi-Chat v1.5 and GPT-4 were similar to those of real samples, and they achieved initial success in revealing the overall patterns of psychological differences across regions - supporting the feasibility and efficiency of using virtual participants in large-scale psychological research. However, this study still had several limitations in research design, data sources, simulation strategies, and result interpretation. First, our study adopted a cross-sectional design in which virtual participants were generated by large language models and their simulated results were compared with real data. This design was inherently correlational, making it impossible to establish causal inference. Moreover, by focusing only on the overall regional level, it might have overlooked intra-regional individual differences and dynamic changes in psychological states. Second, the pre-training corpora of the models lacked sufficient information on the cultural ecology and socio-economic background of different regions, which limited their sensitivity to regional psychological differences. At the same time, the real survey data used might have had sampling bias and insufficient representativeness, and these factors could have affected the reliability of the comparison results. Third, large language models such as Kimi-Chat v1.5 and GPT-4 had limited ability to handle complex emotional factors and specific cultural contexts, leading to inaccurate simulation of subjective psychological experiences such as well-being. The distribution of psychological traits generated by the models could not fully replicate the richness and diversity of real human populations. Additionally, in terms of result interpretation, although the simulated data of the models showed consistency with real data in several aspects, this did not directly indicate that the models had truly replicated human psychological mechanisms. Such similarity might have partially originated from existing patterns or biases in the models\u0026rsquo; training corpora. Therefore, caution was required when interpreting the results of this study, and the models could not yet fully replace real surveys (Dillion et al., \u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e2023\u003c/span\u003e; Harding et al., \u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e2024\u003c/span\u003e). Finally, another important limitation was that using large language models to generate virtual participants carried the risk of multidimensional stereotypes and psychological structure bias. The training corpora of Kimi-Chat v1.5 and GPT-4 mainly came from public online texts, which often contained inherent stereotypes about the economic development, cultural atmosphere, or social characteristics of specific regions (Lucy \u0026amp; Bamman, \u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e2021\u003c/span\u003e). This caused the models to unconsciously amplify these stereotypes during the simulation of regional psychological characteristics, leading to expanded biases in certain regional traits (Argyle et al., \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2023\u003c/span\u003e). For example, this study found that the well-being scores of Northeast China were significantly underestimated by the models (see Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e), which might have been related to the reinforcement of negative narratives about Northeast China (e.g., economic decline, population outflow) in the corpora. In contrast, several personality and well-being scores in East China were higher than those in other regions (see Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e), which might have originated from excessive positive stereotypes in the models\u0026rsquo; training data. Furthermore, the models also exhibited \"visibility bias\" in simulating psychological structures - they were more likely to capture and amplify explicit, easily expressible psychological traits (e.g., Neuroticism, Openness) but showed poor performance in restoring implicit, long-term stable personality traits such as Conscientiousness. This triple bias (economic-cultural-psychological) might have led to systematic errors in the LLMs\u0026rsquo; modeling of regional psychological structures (A. Wang et al., \u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e2025\u003c/span\u003e).\u003c/p\u003e\u003cp\u003eFuture research could improve Kimi-Chat v1.5, GPT-4, and similar models in the following aspects: First, enrich and diversify the LLMs\u0026rsquo; training corpora by adding text data on the cultural ecology and socio-economic background of different regions (Demszky et al., \u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e2023\u003c/span\u003e) to enhance the LLMs\u0026rsquo; sensitivity to regional differences. Second, improve the LLMs\u0026rsquo; ability to simulate complex emotions and social interactions - for example, by introducing more refined affective computing models or longitudinal data on human emotions to enhance the accuracy of capturing well-being and emotional changes. Third, design \"region-specific prompts\" (e.g., adding \"You are a 35-year-old woman living in rural Sichuan, familiar with local farming culture\") to test whether they could improve the LLMs\u0026rsquo; simulation of intra-regional heterogeneity. At the same time, incorporate \"personality sub-dimensions\" (e.g., \"orderliness\" and \"sense of responsibility\" under Conscientiousness) and \"specific dimensions of well-being\" (e.g., life satisfaction, emotional experience) to more accurately evaluate the ability of LLMs to simulate regional psychological structures. Fourth, leverage the multimodal advantages of recent LLMs or AI (e.g., text-image interleaved data) by incorporating region-specific image information (e.g., ice and snow landscapes in Northeast China, Lingnan architecture in South China) to observe whether multimodal input could strengthen the LLMs\u0026rsquo; understanding of regional psychology. Meanwhile, use multi-year data (e.g., 2014, 2018, 2022) to test whether LLMs could simulate temporal changes in regional psychological traits (e.g., the long-term impact of economic development on well-being). Finally, conduct cross-cultural comparative studies, combining sample data from different countries and regions to test the applicability of large language models in different cultural contexts, thereby further verifying and expanding their psychological simulation capabilities. In addition, methodologically, further explore the extensibility of large language models (Kimi-Chat v1.5 or GPT-4) in simulating group psychological data. Could their simulation performance in personality and well-being be extended to other variables in psychological processes? Which types of psychological variables could not be simulated - these questions deserved further investigation.\u003c/p\u003e\u003cp\u003e In summary, this study introduced Kimi-Chat v1.5 and GPT-4 to simulate psychological structures across different regions in China, preliminarily verifying the application prospect of large language models as \"virtual participants\" and revealing their current limitations. This finding provided valuable reflections for the methods and theories of psychological research. On the one hand, large language models were expected to become innovative tools for reducing the cost of large-sample surveys and avoiding social desirability bias in traditional surveys; on the other hand, their shortcomings in simulating complex human psychology and cultural backgrounds must be acknowledged (Grossmann et al., \u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e2023\u003c/span\u003e). With the continuous enrichment of model training data and improvements in algorithms, the ability of large language models to simulate regional psychological differences and complex personality traits was expected to further improve. It was anticipated that future research would make more progress in inter-model comparison, cultural localization, and cross-cultural personality simulation, bringing new changes to psychological research methods and thus providing stronger support for cross-cultural research in personality and social psychology.\u003c/p\u003e\u003c/div\u003e"},{"header":"Conclusion","content":"\u003cp\u003eThis study presented a comparative evaluation of two large language models (LLMs) - Kimi-Chat v1.5, a China-trained model, and GPT-4, a globally trained model - in simulating regional psychological structures across China. Our findings revealed that Kimi-Chat v1.5, trained on Chinese corpora, more effectively captured the regional variations in Big-Five personality traits and subjective well-being compared to GPT-4. This might be due to Kimi's alignment with the cultural and socio-economic contexts specific to China. Despite the successes, both models exhibited limitations, including an inability to fully replicate the complexity of human psychological experiences. This highlighted the significance of model training data in influencing simulation accuracy and underlined the need for culturally sensitive adaptations of LLMs for regional psychological research. Our study demonstrated the potential of LLMs in regional psychological research, offering a low-cost and scalable alternative to traditional surveys. This study provided the first cultural-calibration benchmark for virtual-participant tools in regional psychological research within a single culture (e.g., Chinese culture). Moreover, this benchmark offers methodological insights for future regional psychological research, as it highlights the necessity of using culture-adapted LLMs for simulating psychological traits in different cultural contexts. Additionally, it provides a low-cost, replicable alternative to traditional surveys prone to social-desirability bias.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003eConsent to Participate\u003c/p\u003e\n\u003cp\u003eAll human participants involved in the China Family Panel Studies (CFPS2018) provided written informed consent prior to the original data collection. The CFPS project team explicitly informed participants of the study\u0026rsquo;s purpose, the scope of data usage, confidentiality protection measures, and their right to withdraw from the survey at any time without adverse consequences. For any minor participants (if applicable) in the original CFPS, written informed consent was additionally obtained from their legal guardians. This study uses de-identified secondary data from CFPS2018; the data have been processed to remove all personal identifying information (e.g., names, ID numbers, specific addresses) to protect participant privacy. According to the data usage regulations of the CFPS project, secondary analysis of fully anonymized data does not require additional informed consent from individual participants.\u003c/p\u003e\n\u003cp\u003eEthics Approval\u003c/p\u003e\n\u003cp\u003eThis study was conducted in accordance with the Declaration of Helsinki (World Medical Association, 2013). The original China Family Panel Studies (CFPS2018) project, whose data were used in this study, was approved by the Internal Review Board (IRB) of the Institute of Social Science Survey (ISSS), Peking University (Approval No.: ISSS-IRB-2018-002). The secondary analysis of CFPS2018 de-identified data in this study was reviewed and deemed compliant with ethical guidelines by the Data Access Committee of ISSS, Peking University, and no additional IRB approval was required due to the non-identifiable nature of the data and the absence of direct interaction with human participants.\u0026nbsp;This article does not contain any studies with human participants performed by any of the authors.\u003c/p\u003e\n\u003cp\u003eConflicts of interest\u003c/p\u003e\n\u003cp\u003eThe authors have no relevant financial or non-financial interests to disclose.\u003c/p\u003e\n\u003cp\u003eAvailability of data\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eThe data will be publicly shared upon publication. The complete dataset will be available via the Open Science Framework.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eCode availability\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eThe scripts for generating stimuli and performing data analyses of this article are available from the corresponding author via email.\u003c/p\u003e\n\u003cp\u003eAuthors\u0026apos; contributions\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u0026nbsp;XZ conceived, designed the experiment, and wrote the manuscript. XZ collected the data, drew the graph and analyzed the data. XZ and JH contributed to the development of the research problems planning, execution and study design, provided guidance for the study setup. All authors revised the manuscript. XZ provided technical guidance and support.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eAcknowledgments\u003c/p\u003e\n\u003cp\u003eData collection, analysis and draft writing were\u0026nbsp;completed\u0026nbsp;by\u0026nbsp;XZ. XZ and\u0026nbsp;JH\u0026nbsp;contributed to the revision of the draft.\u0026nbsp;This\u0026nbsp;work was no funding.\u003cstrong\u003e\u003cbr\u003e\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eAnglim J, Horwood S, Smillie LD, Marrero RJ, Wood JK (2020) Predicting psychological and subjective well-being from personality: A meta-analysis. Psychol Bull 146(4):279\u0026ndash;323. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1037/bul0000226\u003c/span\u003e\u003cspan address=\"10.1037/bul0000226\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eArgyle LP, Busby EC, Fulda N, Gubler JR, Rytting C, Wingate D (2023) Out of One, Many: Using Language Models to Simulate Human Samples. Political Anal 31(3):337\u0026ndash;351. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1017/pan.2023.2\u003c/span\u003e\u003cspan address=\"10.1017/pan.2023.2\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eBaumeister RF, Vohs KD, Funder DC (2007) Psychology as the Science of Self-Reports and Finger Movements: Whatever Happened to Actual Behavior? Perspect Psychol Sci 2(4):396\u0026ndash;403. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1111/j.1745-6916.2007.00051.x\u003c/span\u003e\u003cspan address=\"10.1111/j.1745-6916.2007.00051.x\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eBisbee J, Clinton JD, Dorff C, Kenkel B, Larson JM (2024) Synthetic Replacements for Human Survey Data? The Perils of Large Language Models. Political Anal 32(4):401\u0026ndash;416. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1017/pan.2024.5\u003c/span\u003e\u003cspan address=\"10.1017/pan.2024.5\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eBogg T, Roberts BW (2004) Conscientiousness and Health-Related Behaviors: A Meta-Analysis of the Leading Behavioral Contributors to Mortality. Psychol Bull 130(6):887\u0026ndash;919. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1037/0033-2909.130.6.887\u003c/span\u003e\u003cspan address=\"10.1037/0033-2909.130.6.887\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eBubeck S, Chandrasekaran V, Eldan R, Gehrke J, Horvitz E, Kamar E, Lee P, Lee YT, Li Y, Lundberg S, Nori H, Palangi H, Ribeiro MT, Zhang Y (2023) \u003cem\u003eSparks of Artificial General Intelligence: Early experiments with GPT-4\u003c/em\u003e (No. arXiv:2303.12712). arXiv. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.48550/arXiv.2303.12712\u003c/span\u003e\u003cspan address=\"10.48550/arXiv.2303.12712\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eDe Winter JCF, Driessen T, Dodou D (2024) The use of ChatGPT for personality research: Administering questionnaires using generated personas. Pers Indiv Differ 228:112729. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.paid.2024.112729\u003c/span\u003e\u003cspan address=\"10.1016/j.paid.2024.112729\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eDemszky D, Yang D, Yeager DS, Bryan CJ, Clapper M, Chandhok S, Eichstaedt JC, Hecht C, Jamieson J, Johnson M, Jones M, Krettek-Cobb D, Lai L, JonesMitchell N, Ong DC, Dweck CS, Gross JJ, Pennebaker JW (2023) Using large language models in psychology. Nat Reviews Psychol. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1038/s44159-023-00241-5\u003c/span\u003e\u003cspan address=\"10.1038/s44159-023-00241-5\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eDillion D, Tandon N, Gu Y, Gray K (2023) Can AI language models replace human participants? Trends Cogn Sci 27(7):597\u0026ndash;600. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.tics.2023.04.008\u003c/span\u003e\u003cspan address=\"10.1016/j.tics.2023.04.008\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eGrant S, Langan-Fox J, Anglim J (2009) The Big Five Traits as Predictors of Subjective and Psychological Well-Being. Psychol Rep 105(1):205\u0026ndash;231. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.2466/PR0.105.1.205-231\u003c/span\u003e\u003cspan address=\"10.2466/PR0.105.1.205-231\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eGrossmann I, Feinberg M, Parker DC, Christakis NA, Tetlock PE, Cunningham WA (2023) AI and the transformation of social science research. Science 380(6650):1108\u0026ndash;1109. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1126/science.adi1778\u003c/span\u003e\u003cspan address=\"10.1126/science.adi1778\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eHahn E, Gottschling J, Spinath FM (2012) Short measurements of personality \u0026ndash; Validity and reliability of the GSOEP Big Five Inventory (BFI-S). J Res Pers 46(3):355\u0026ndash;359. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.jrp.2012.03.008\u003c/span\u003e\u003cspan address=\"10.1016/j.jrp.2012.03.008\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eHarding J, D\u0026rsquo;Alessandro W, Laskowski NG, Long R (2024) AI language models cannot replace human research participants. AI Soc 39(5):2603\u0026ndash;2605. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1007/s00146-023-01725-x\u003c/span\u003e\u003cspan address=\"10.1007/s00146-023-01725-x\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eKe L, Tong S, Cheng P, Peng K (2025) Exploring the frontiers of LLMs in psychological applications: A comprehensive review. Artif Intell Rev 58(10):305. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1007/s10462-025-11297-5\u003c/span\u003e\u003cspan address=\"10.1007/s10462-025-11297-5\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eKovač G, Sawayama M, Portelas R, Colas C, Dominey PF, Oudeyer P-Y (2023) \u003cem\u003eLarge Language Models as Superpositions of Cultural Perspectives\u003c/em\u003e (No. arXiv:2307.07870). arXiv. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.48550/arXiv.2307.07870\u003c/span\u003e\u003cspan address=\"10.48550/arXiv.2307.07870\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLucy L, Bamman D (2021) Gender and Representation Bias in GPT-3 Generated Stories. \u003cem\u003eProceedings of the Third Workshop on Narrative Understanding\u003c/em\u003e, 48\u0026ndash;55. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.18653/v1/2021.nuse-1.5\u003c/span\u003e\u003cspan address=\"10.18653/v1/2021.nuse-1.5\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eMatz SC, Gladstone JJ, Stillwell D (2016) Money Buys Happiness When Spending Fits Our Personality. Psychol Sci 27(5):715\u0026ndash;725. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1177/0956797616635200\u003c/span\u003e\u003cspan address=\"10.1177/0956797616635200\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eMcCrae RR, John OP (1992) An Introduction to the Five-Factor Model and Its Applications. J Pers 60(2):175\u0026ndash;215. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1111/j.1467-6494.1992.tb00970.x\u003c/span\u003e\u003cspan address=\"10.1111/j.1467-6494.1992.tb00970.x\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eMei Q, Xie Y, Yuan W, Jackson MO (2024) A Turing test of whether AI chatbots are behaviorally similar to humans. \u003cem\u003eProceedings of the National Academy of Sciences\u003c/em\u003e, \u003cem\u003e121\u003c/em\u003e(9), e2313925121. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1073/pnas.2313925121\u003c/span\u003e\u003cspan address=\"10.1073/pnas.2313925121\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eOishi S, Kesebir S, Diener E (2011) Income Inequality and Happiness. Psychol Sci 22(9):1095\u0026ndash;1100. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1177/0956797611417262\u003c/span\u003e\u003cspan address=\"10.1177/0956797611417262\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003ePaunonen SV, Ashton MC (2001) Big Five factors and facets and the prediction of behavior. J Personal Soc Psychol 81(3):524\u0026ndash;539. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1037/0022-3514.81.3.524\u003c/span\u003e\u003cspan address=\"10.1037/0022-3514.81.3.524\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eRathje S, Mirea D-M, Sucholutsky I, Marjieh R, Robertson CE, Van Bavel JJ (2024) GPT is an effective tool for multilingual psychological text analysis. \u003cem\u003eProceedings of the National Academy of Sciences\u003c/em\u003e, \u003cem\u003e121\u003c/em\u003e(34), e2308950121. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1073/pnas.2308950121\u003c/span\u003e\u003cspan address=\"10.1073/pnas.2308950121\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eSarstedt M, Adler SJ, Rau L, Schmitt B (2024) Using large language models to generate silicon samples in consumer and marketing research: Challenges, opportunities, and guidelines. Psychol Mark 41(6):1254\u0026ndash;1270. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1002/mar.21982\u003c/span\u003e\u003cspan address=\"10.1002/mar.21982\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eSchimmack U, Diener E, Oishi S (2002) Life-Satisfaction Is a Momentary Judgment and a Stable Personality Characteristic: The Use of Chronically Accessible and Stable Sources. J Pers 70(3):345\u0026ndash;384. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1111/1467-6494.05008\u003c/span\u003e\u003cspan address=\"10.1111/1467-6494.05008\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eSchimmack U, Oishi S, Furr RM, Funder DC (2004) Personality and Life Satisfaction: A Facet-Level Analysis. Pers Soc Psychol Bull 30(8):1062\u0026ndash;1075. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1177/0146167204264292\u003c/span\u003e\u003cspan address=\"10.1177/0146167204264292\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eSerapio-Garc\u0026iacute;a G, Safdari M, Crepy C, Sun L, Fitz S, Abdulhai M, Faust A, Matarić M (2023) \u003cem\u003ePersonality Traits in Large Language Models\u003c/em\u003e. In Review. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.21203/rs.3.rs-3296728/v1\u003c/span\u003e\u003cspan address=\"10.21203/rs.3.rs-3296728/v1\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eSorokovikova A, Fedorova N, Rezagholi S, Yamshchikov IP (2024) \u003cem\u003eLLMs Simulate Big Five Personality Traits: Further Evidence\u003c/em\u003e (No. arXiv:2402.01765). arXiv. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.48550/arXiv.2402.01765\u003c/span\u003e\u003cspan address=\"10.48550/arXiv.2402.01765\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eSteel P, Schmidt J, Shultz J (2008) Refining the relationship between personality and subjective well-being. Psychol Bull 134(1):138\u0026ndash;161. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1037/0033-2909.134.1.138\u003c/span\u003e\u003cspan address=\"10.1037/0033-2909.134.1.138\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eStrachan JWA, Albergo D, Borghini G, Pansardi O, Scaliti E, Gupta S, Saxena K, Rufo A, Panzeri S, Manzi G, Graziano MSA, Becchio C (2024) Testing theory of mind in large language models and humans. Nat Hum Behav 8(7):1285\u0026ndash;1295. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1038/s41562-024-01882-z\u003c/span\u003e\u003cspan address=\"10.1038/s41562-024-01882-z\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eTalhelm T, Zhang X, Oishi S, Shimin C, Duan D, Lan X, Kitayama S (2014) Large-Scale Psychological Differences Within China Explained by Rice Versus Wheat Agriculture. Science 344(6184):603\u0026ndash;608. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1126/science.1246850\u003c/span\u003e\u003cspan address=\"10.1126/science.1246850\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eTourangeau R, Yan T (2007) Sensitive questions in surveys. Psychol Bull 133(5):859\u0026ndash;883. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1037/0033-2909.133.5.859\u003c/span\u003e\u003cspan address=\"10.1037/0033-2909.133.5.859\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eTrott S, Jones C, Chang T, Michaelov J, Bergen B (2023) Do Large Language Models Know What Humans Know? Cogn Sci 47(7):e13309. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1111/cogs.13309\u003c/span\u003e\u003cspan address=\"10.1111/cogs.13309\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eWang A, Morgenstern J, Dickerson JP (2025) Large language models that replace human participants can harmfully misportray and flatten identity groups. Nat Mach Intell 7(3):400\u0026ndash;411. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1038/s42256-025-00986-z\u003c/span\u003e\u003cspan address=\"10.1038/s42256-025-00986-z\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eWang J, Liu C, Cai Z (2022) Digital literacy and subjective happiness of low-income groups: Evidence from rural China. Front Psychol 13:1045187. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3389/fpsyg.2022.1045187\u003c/span\u003e\u003cspan address=\"10.3389/fpsyg.2022.1045187\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eZhai Q, Willis M, O\u0026rsquo;Shea B, Zhai Y, Yang Y (2013) Big Five personality traits, job satisfaction and subjective wellbeing in China. Int J Psychol 48(6):1099\u0026ndash;1108. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1080/00207594.2012.732700\u003c/span\u003e\u003cspan address=\"10.1080/00207594.2012.732700\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"humanities-and-social-sciences-communications","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"palcomms","sideBox":"Learn more about [Humanities \u0026 Social Sciences Communications](http://www.nature.com/palcomms/)","snPcode":"41599","submissionUrl":"https://submission.springernature.com/new-submission/41599/3","title":"Humanities and Social Sciences Communications","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Nature AJ","inReviewEnabled":true,"inReviewRevisionsEnabled":false},"keywords":"large language model, big five personality, subjective well-being, regional psychological structure, virtual participants","lastPublishedDoi":"10.21203/rs.3.rs-7665724/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-7665724/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eThe rapid advancement of large language models (LLMs) had opened new avenues for simulating psychological structures at the population level. This study compared the performance of Kimi-Chat v1.5 (a China-trained model) and GPT-4 (a globally trained model) in reproducing regional psychological profiles in China, focusing on the Big-Five personality traits and subjective well-being. By using the 2018 China Family Panel Studies (CFPS 2018) as the benchmark for real human data, we assessed the fidelity of both LLMs in capturing regional variations across seven major Chinese regions. Results indicated that Kimi-Chat v1.5 more accurately replicated human responses, particularly in regions with distinct cultural characteristics, while GPT-4 showed significant discrepancies, particularly in well-being and openness. Our findings emphasized the importance of training-corpus lineage and suggested that culturally adapted LLMs could be a useful tool in regional psychological research. We discussed the implications of these findings for future applications and highlighted the limitations of current LLM capabilities in simulating human psychological complexity.\u003c/p\u003e","manuscriptTitle":"Empirical Examination of Large Language Models in Regional Psychological Structures Simulation: Personality and Well-being","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-11-10 11:03:31","doi":"10.21203/rs.3.rs-7665724/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2026-02-25T20:28:31+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-01-06T15:39:09+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-01-05T06:13:13+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"32268022119877602464918483389760660158","date":"2025-12-11T16:08:06+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"216581978896592765814344705764887631664","date":"2025-12-11T09:39:14+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"286928054266827786950277280645007621313","date":"2025-12-09T20:52:58+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-11-25T16:14:48+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"302953195567934204974687061128786943797","date":"2025-11-02T13:26:29+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"232802372127545770671907580862536618161","date":"2025-10-29T16:17:56+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2025-10-29T05:46:25+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2025-10-29T05:44:01+00:00","index":"","fulltext":""},{"type":"editorInvited","content":"","date":"2025-10-24T08:36:21+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2025-10-08T08:20:33+00:00","index":"","fulltext":""},{"type":"submitted","content":"Humanities and Social Sciences Communications","date":"2025-10-08T08:15:07+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"humanities-and-social-sciences-communications","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"palcomms","sideBox":"Learn more about [Humanities \u0026 Social Sciences Communications](http://www.nature.com/palcomms/)","snPcode":"41599","submissionUrl":"https://submission.springernature.com/new-submission/41599/3","title":"Humanities and Social Sciences Communications","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Nature AJ","inReviewEnabled":true,"inReviewRevisionsEnabled":false}}],"origin":"","ownerIdentity":"3ed4804e-13f9-4572-917f-75b8a4de9d6c","owner":[],"postedDate":"November 10th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[{"id":57647845,"name":"Humanities/Cultural and media studies"},{"id":57647846,"name":"Social science/Cultural and media studies"},{"id":57647847,"name":"Biological sciences/Psychology"},{"id":57647848,"name":"Social science/Psychology"}],"tags":[],"updatedAt":"2026-03-30T15:23:41+00:00","versionOfRecord":[],"versionCreatedAt":"2025-11-10 11:03:31","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-7665724","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-7665724","identity":"rs-7665724","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00