Genetic ancestry and male founder effects explain differences in height and lactose tolerance in 60 Caucasian populations | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Genetic ancestry and male founder effects explain differences in height and lactose tolerance in 60 Caucasian populations Pavel Grasgruber This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-4354427/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract This study aimed to examine geographical associations of genetic factors (24 Y haplogroups, 10 autosomal ancestry components) with mean male height and the occurrence of lactose tolerance-associated alleles in a sample of 60 genetically interconnected Caucasian populations of Europe, the Near East, and North Africa. The results show that Y haplogroups or their combinations often match almost perfectly the geographical occurrence of a particular autosomal ancestry (correlation coefficients reaching up to r = 0.99), demonstrating that male founder effects played a crucial role in shaping population history. Male height adjusted for major environmental factors is positively related mainly to ancestry components BHG (Baltic hunter-gatherers), Villabruna, and Yamnaya, and the combined frequency of five Y haplogroups (I1, I2a-P37.2, N, Q, R1b-U106). The frequency of the European lactose tolerance-associated allele 13910*T correlates primarily with Yamnaya ancestry and with the combination of six Y haplogroups (I1, I2a-M223, Q, R1a, R1b-S116, R1b-U106), whereas the Near Eastern allele 13915*G is predicted by Natufian ancestry and three Y haplogroups typical of Arab populations (E1b-M123, J1, T). Of further note is the fact that country-level relationships between body height and ancestry components show both concordance and stark differences with genetic studies using individual-level relationships, which can potentially have important implications. In summary, many of the findings achieved are extremely impressive and their causality can often be inferred from already documented findings. Others offer hypotheses that could be tested with more sophisticated research. Height BMI lactose tolerance Y haplogroups autosomal DNA Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 Figure 9 Figure 10 Figure 11 Figure 12 Figure 13 Figure 14 Figure 15 1. Introduction The boom in genetic studies during the last three decades has also brought interest in the research of important phenotypic traits such as body height, obesity, lactose tolerance, pigmentation, and various other adaptations related to diet and health. The heritability of these traits is investigated by genome-wide association studies (GWAS), which are aimed at identifying causally associated genetic loci. However, finding such associations is difficult because they may be confounded by environmental factors, are ethnic-specific, influenced by linkage disequilibrium (a strong relationship between the occurrence of specific alleles, which are not necessarily all linked to the same phenotypic traits), and often result from extreme polygenicity - the cumulative effect of a huge number (tens of thousands) of genetic variants, which by themselves explain only a negligible part of the total variability [ 1 – 2 ]. One such trait that is very difficult to predict reliably is body height (Fig. 1A). Although this physical characteristic is highly heritable, it is also strongly influenced by the environment, and hence, it is even used as a tool to study historical changes in health and the quality of life [ 3 – 6 ]. Previous papers on this topic [ 7 – 9 ] have shown that the environmental factors most directly influencing physical growth are nutrition (protein consumption) and child mortality (which represents the occurrence of infectious diseases that exhaust growth potential). The effect of these variables is mediated through various socio-economic factors. Of these, total fertility appears to have the most independent position in regression analyses as it influences the distribution of resources within families and reflects more subtle aspects of childcare, such as the length of breastfeeding. The most informative socioeconomic indicator with the greatest predictive power ( r = 0.83, p < 0.001 in 96 countries) is the inequality-adjusted Human Development Index (HDI), which combines GDP per capita, life expectancy, and the level of education. For these understandable reasons, GWAS examining the genetic determinants of height are fraught with fundamental problems and many genetic loci identified as causal predictors of stature appear only as spurious correlates [ 10 – 11 ]. Still, a recent study [ 12 ] reported the identification of 12,111 independent single nucleotide polymorphisms (SNPs) that were able to explain up to 45% of the inter-individual differences in height in European participants – supposedly the maximum that can be achieved from the combination of SNP markers. Despite this significant progress, the accuracy of these polygenic predictions (polygenic scores) in non-European populations remains much lower (24 ≥ %) because they are based on samples of predominantly European ancestry. Furthermore, even ‘European’ polygenic height scores may not be applicable to populations inhabiting Europe in previous historical periods [ 13 – 14 ]. In fact, the genetic potential of the Epigravettian hunter-gatherers from the Adriatic glacial refugium (Villabruna cluster, also known as ‘Western European hunter-gatherers’/WHG) and the Eneolithic steppe populations (Yamnaya cluster) is notoriously inconsistent in light of modern polygenic estimates [ 11 , 13 – 17 ]. Therefore, some authors try to synchronize polygenic height scores in ancient skeletons with their reconstructed height [ 17 – 18 ]. Others have already given up on calculating polygenic scores and only examine the relationship between the proportion of ancestry components and physical characteristics at the individual level [ 19 – 21 ]. Essentially the same methodology – albeit at the level of countries – is the subject of the present work. In the studies mentioned above [ 7 – 8 ], such an approach was already tested with Y haplogroups (Y hgs) – haplotypes on the male Y chromosome that are inherited from father to son. Although the Y chromosome contains only a limited number of protein-coding genes, which are mostly associated with the male phenotype and reproductive function [ 22 – 23 ], Y haplogroups are still an excellent tool for studying interpopulation genetic variability because they are signatures of male ‘founder effects’ – a rapid spread of the phenotype of the dominant male within patrilocal (and/or polygamous) societies. A unique historical laboratory for mapping these processes is the Late Eneolithic and Early Bronze Age period in Europe (~ 2900 − 1600 cal. BC), when migrating steppe pastoralists from the Yamnaya culture and their later genetic derivatives (Corded Ware culture, Bell Beaker culture, Únětice culture) largely replaced the autochthonous inhabitants of Europe. Y chromosomal lineages that survived this turbulent period can currently explain a large percentage of variation (> 50%) in height across European countries. The most important position has Y hg I, which was initially the main paternal lineage of Mesolithic hunter-gatherers from the Villabruna cluster [ 24 – 25 ]. The effect of Y hg I is potentiated in combination with R1b-U106, which accompanied the first wave of the Corded Ware culture in Central Europe [ 26 ]. On the other hand, the short height of European nations correlates with another Yamnaya-derived lineage (R1b-S116) associated with the Bell Beaker culture [ 27 ], and especially with Y haplogroups of Near Eastern origin (E, G, J), which accords with the results of most paleogenetic studies quoted above that attribute low genetic potential for height to Near Eastern farmers. The major aim of the present study is to update and significantly expand the spectrum of Y haplogroups in the Caucasian populations of Europe, the Near East, and North Africa, which are mutually genetically interconnected [ 28 ] and share multiple physical traits. This should reveal more detailed relationships between male founder effects and height that were not evident before. Furthermore, given the growing amount of high-quality genomic data on many world nations, it would be interesting if such a study also incorporated autosomal ancestry components. The second objective again relates to the papers mentioned above [ 7 – 8 ], which found strong associations between Y haplogroups and phenotypic lactose tolerance. In Europe, there exists an apparent connection between lactose tolerance and Y haplogroups I, R1b-S116, and R1b-U106, whereas an independent nucleus of lactose tolerance in the Near East is related to the geographical distribution of Y haplogroup J1. At the same time, the occurrence of lactose tolerance in these two areas is characterized by the presence of different alleles [ 29 – 30 ] (Fig. 1B). These findings should be placed into the context of new studies investigating the evolution of lactose tolerance in Europe and the Eurasian steppes since the Late Eneolithic. 2. Materials and methods 2.1 Physical characteristics Information on the current mean height of young males in Europe, the Near East, and North Africa was taken from Grasgruber & Hrazdíra [ 9 ] but included six updated countries (Supplementary dataset, Sheet 1). The means come from recent studies finished between 2004–2018 and are not necessarily the most recent, as the primary goal was to synchronize them with four variables used for environmental adjusting (nutrition, child mortality, total fertility, Human Development Index), which were calculated for the period 1995–2013. Despite the availability of information on height for Kosovo, Israel, and Luxembourg, the latter two countries must have been excluded because no representative genetic data were available (see below). Data on Kosovo also had information gaps. Therefore, the examined sample consisted of only 60 countries. 2.2 Y haplogroups The spectrum of Y haplogroups examined in previous studies [ 7 – 8 ] included E1b-M78, E1b-M81, G (G-M201), I (I-M170), J (J-M304), J1 (J1-M267), J2 (J2-M172), R1a (R1a–M420), R1b (R1b–M343), R1b-S116 and R1b-U106. In the present study, this number was expanded to incorporate Y haplogroups E1b-M123, L (L-M20), N (N-M231), Q (Q-M242), T (T-M184), three major subbranches of Y hg I (I1, I2a-P37.2, I2a-M223), and five subbranches of Y hg G. Eventually, 12 major Y haplogroups were selected for this study and supplemented by 13 subbranches of E, G, I, and R1b. Since the frequency of G2 is virtually identical to G2a, only G2 was preferred, and the analyses thus included a total of 24 Y chromosome lineages. Five other subbranches (I2-M438, I2a-M26, I2a-M436, R1b-M269, R1b-L23xM412/R1b-Z2103) were not available for all countries, but due to their significance, some of them were used for supplementary analyses. (Note that these Y haplogroup statistics, as well as the statistics on autosomal ancestry components, are not included in the Supplementary material for the time being, as they will be utilized for several future studies.) In general, only samples with a minimum of 50 individuals were considered, and the essential goal was to find samples with the highest possible representativeness, including hundreds to thousands of individuals. From the beginning, it was clear that no usable data for Israel and Luxembourg would be available, which was the main limiting factor for the number of countries included in the present study. In addition, given the paucity of sources for the subbranches of E, G, I, and R1b, their frequencies were sometimes collected from different and usually less representative studies than the 12 major Y haplogroups. At the same time, an emphasis was placed on ensuring that the total frequencies of Y hgs E, G, I, and R1b in these studies were similar and did not differ by more than 3%. 2.3 Genotypic lactose tolerance The genotypic frequency of lactose tolerance is listed in the Supplementary dataset, Sheet 2. Although new data on phenotypic lactose tolerance have become available since the publication of the previous work [ 7 – 8 ], they are influenced by different methodologies, often vary widely within the same regions, and some reported numbers (especially from developing countries) are clearly erroneous and unusable. In contrast, genotypic data can be used to trace the origin of specific lactose tolerance-associated alleles directly. In total, the dataset contains the frequencies of five alleles typical of Europe, the Near East, or North Africa (13910*T, 13915*G, 14009*G, 13907*G, 14010*C), as well as the frequencies of observed genotypic lactose tolerance (calculated as the sum of homozygotes and heterozygotes, i.e. T/T + C/T in the case of 13910*T) and the frequencies of predicted genotypic lactose tolerance (calculated from the number of homozygotes and heterozygotes using Hardy–Weinberg equilibrium). Given the scarcity of information on 14009*G, 13907*G, and 14010*C, and their limited geographical distribution in the countries under investigation, only 13910*T (European allele) and 13915*G (Near Eastern allele) were analyzed in detail. 2.4 Ancestry components Information on five major autosomal ancestry components typical of Europe was kindly provided by W. Barrie via personal communication. These data were used in the pre-print by Allentoft et al. [ 14 ] and include the genetic clusters Villabruna (or West European hunter-gatherers, WHG), Eastern European hunter-gatherers (EHG), Caucasus hunter-gatherers (CHG), Anatolian Neolithic (or ‘Neolithic farmers’), and Yamnaya. Their frequencies were available for at least two individuals in 37 out of 39 European countries and in 16 out of 21 countries of the Near East and North Africa, and were extracted from a model of seven autosomal components (k = 7), which also included East Asian and sub-Saharan African ancestry (represented mostly in very small proportions and hence unimportant for the purpose of the present study). The missing frequencies for Moldova and Montenegro were estimated from the average of neighboring countries/territories (Albania, Bosnia and Herzegovina, Croatia, Kosovo, and Serbia in the case of Montenegro; Romania and Ukraine in the case of Moldova). The ancestry proportions in the Near East and North Africa were unusable because data for five neighboring countries (Bahrain, Oman, Qatar, Saudi Arabia, and the United Arab Emirates) were missing and the k = 7 model did not distinguish two autosomal clusters (Natufian and Taforalt), which are crucial for understanding the genetic history of Arabia and Northwestern Africa. For all these reasons, two new ancestry models were tested, provided by the commercial website AncestralWhispers.org and based on the Global25 database, which collects information on 300,000 SNPs from published scientific sources. The first model (k = 11) included the most elementary, pre-Holocene ancestry components: Villabruna, EHG (Eastern hunter-gatherers), Anatolian Neolithic, CHG (Caucasus hunter-gatherers), Iranian Neolithic, Natufian, Taforalt, Indus Valley, East Asian, Nilotic, West African. The last four components were little represented and had only a supplementary role. The second model (k = 12) combined pre-Holocene and Holocene components, adding Yamnaya and replacing EHG with BHG (Baltic hunter-gatherers). Both these models were available for 56 countries (except for Bahrain, Kuwait, Oman, and Qatar). 2.5 Statistical analyses Besides the common Pearson linear correlations, the relationships among the variables examined were assessed by principal component (PCA) analyses and factor analyses, using the statistical software Statistica 14.0 and PAST. Since physical growth is strongly influenced by the environment, the results concerning height were adjusted for nutrition and socio-economic indicators in multiple regression models. These models were also used for the estimation of genetic differences in height based on the genetic factors examined. 3. Results 3.1 Y haplogroups The current inter-population differences in Y haplogroup frequencies in Europe, the Near East, and North Africa are expressed by PCA analyses in Figs. 2 A- 2 B. A simplified picture showing the most frequent Y haplogroup in each population is shown in Fig. 3 A. The geographical distribution of individual Y haplogroups and their combinations is displayed in Supplementary Figs. 1–13. In Europe, the minimum spanning tree branches out from Hungary into three major regions: The first one (Western and Northern Europe) is characterized by the dominance of R1b-S116, I1, and R1b-U106; the second one (East-Central and Eastern Europe) is rich in N and/or R1a; the third one (Southeastern Europe) has the highest frequency of I2a-P37.2, E1b-M78, and J2. Other Y haplogroups have too low frequencies, but a noteworthy phenomenon is the disproportionate presence of G2a* & G2a2 in the Central Mediterranean. In the Near East and North Africa, the root of the minimum spanning tree is centered in Lebanon and we can clearly see a separation of North African countries - an effect of the autochthonous Berber lineage E1b-M81. The Near Eastern nations significantly diverge along the Factor 2 axis: Whereas the Caucasus (non-Arab) region is characterized by the high frequency of G, J2, and R1b, in the Arabian Peninsula, we mostly find the predominance of J1. Y haplogroups E1b-M78, E1b-M123, I, L, R1a, and T have a central position. 3.2 Ancestry components The geographical distribution of autosomal ancestry components is displayed in Supplementary Figures 14-18. The most frequent pre-Holocene ancestry components (in the k=11 model) are shown in Figure 3B, which clearly illustrates the dominance of Anatolian Neolithic ancestry. Other autosomal clusters have a more peripheral distribution: EHG in Northeastern Europe, CHG in Georgia, Iranian Neolithic in Iran, and Natufian in Arabia and Egypt. The Villabruna component is a minor element with a regional peak in the Baltic region and the Taforalt component is concentrated in Northwestern Africa. The Yamnaya component (a mixture of CHG and EHG, not included in this model) is the most frequent in the northern zone of Europe, from Iceland and Ireland through Scandinavia to the Baltic region and northwestern Russia. Nevertheless, it should be emphasized that in contrast with Y haplogroups, the frequencies of these components are not fixed numbers and depend both on their choice and their number. As a result, absolute frequencies will always change when the ancestry model is changed, and only relative proportions may remain more or less constant. However, even relative proportions can be significantly skewed when the model includes ancestry components that are closely related. The meaningfulness of the ancestry models can, therefore, be verified by their mutual comparison (Supplementary Table 1) or by correlations with ancestry-specific markers – in this case Y haplogroups (Supplementary Tables 2-8). The k=7 model (Figure 4A) distinguishes only five components relevant for 39 European countries (Villabruna, EHG, Yamnaya, Anatolian Neolithic, CHG). As a result, the situation is simplified and three missing ancestries (Iranian Neolithic, Natufian, Taforalt) are naturally merged with their most closely related counterparts: Iranian Neolithic with CHG, and Natufian and Taforalt with Anatolian Neolithic. In addition, the k=7 model does not differentiate a specific Villabruna component that once introgressed into the gene pool of West European farmers (designated as ‘WEF Villabruna’ in the present study) and that is apparently combined with Anatolian Neolithic. Since Yamnaya is a mixture of CHG and EHG [25], the residual frequencies of CHG and EHG reflect their population history unrelated to the Yamnaya formation. The more detailed k=11 model (Figure 4B) is available for 56 countries and includes seven components with meaningfully high frequencies (Villabruna, EHG, Anatolian Neolithic, CHG, Iranian Neolithic, Natufian, Taforalt). It does not distinguish Yamnaya and offers the ‘purest’ proportions of EHG and CHG, which are, inevitably, much higher than in the k=7 model. However, some CHG in the Mediterranean is merged with other Near Eastern clusters. The Villabruna component now incorporates WEF Villabruna, which explains why Villabruna k=11 frequencies in Western Europe are disproportionately higher when compared with Villabruna k=7 (Figure 5A). On the other hand, Villabruna k=11 frequencies in Eastern Europe are somewhat deflated, which must be ascribed to the broadly distorting effect of EHG ancestry: EHG is originally an Upper Paleolithic mixture of Villabruna and Ancient North Eurasian (ANE) ancestry [31], and EHG populations later mixed with Villabruna populations in the Mesolithic Balkans and particularly in Scandinavia and the Baltic region [14, 32], creating a specific ‘Baltic hunterer-gatherer’ ancestry (BHG). The most detailed k=12 model (Figures 4C-4F) is likewise available for 56 countries. Similar to the k=7 model, it separates Yamnaya as an independent genetic cluster and in addition, it replaces EHG with the above-mentioned BHG. The geographical distribution of BHG is most similar to EHG k=11 ( r = 0.95, p < 0.001), confirming that EHG k=11 does not represent ‘pure’ EHG and includes a significant proportion of Villabruna ancestry. Given that the frequency of BHG is higher than the sum of EHG k=7 and Villabruna k=7 , BHG may also include some proportion of WEF Villabruna, a component that is now singled out in Western Europe and whose frequencies appear to be underestimated, being non-zero in only seven countries. In both models k=11 and k=12, we find practically the same absolute frequencies of the Natufian and Taforalt components ( r = 1.00, p < 0.001), and the Iranian Neolithic component ( r = 0.99, p < 0.001). Furthermore, we can also find a very high mutual concordance regarding Anatolian Neolithic ancestry in all three models in Europe, although the absolute frequency of Anatolian Neolithic k =12 is ~8% lower. 3.3 Relationships between ancestry components and Y haplogroups In general, the proportion of Anatolian Neolithic, Iranian Neolithic, Natufian, and Taforalt components is the most consistent across all models and the meaningfulness of these data can be further demonstrated by the impressive relationships of Natufian and Taforalt with their ancestry-specific Y haplogroups (Figures 5B-5C). Besides J1, Natufian is also significantly associated with E1b-M123 and T, and all these three Y haplogroups can be designated as typically ‘Arabic’. On the other hand, the connection between Anatolian Neolithic and its original paternal signature Y hg G (especially G2a2) [33] is completely diluted in the Near East. A relatively strong relationship is retained only in Europe, but even here, Anatolian Neolithic is more strongly correlated with Y hg J2 (Supplementary Figures 19A-19D). CHG ancestry and Iranian Neolithic ancestry share a relatively recent common origin [34] and both were originally accompanied mainly by Y hgs J2 and L [35-36]. In today’s Near East and North Africa, the situation is very different and CHG is mainly correlated with Y hg G2 in the Caucasus area, whereas Iranian Neolithic shows the only noteworthy connection with Yamnaya-associated R1a (Supplementary Figures 20A-20B). Since Y hg G has low diversity in the Caucasus and is not supposed to be autochthonous [37], we must assume a strong founder effect and extensive replacement of local CHG-associated paternal lineages caused by the intrusion of populations with Anatolian Neolithic ancestry. This process may not have included only Y hg G but also other Y haplogroups, especially I and J2 (Supplementary Figures 20C-20D), as indicated by their tight clustering in Figure 4E. The results regarding EHG and Villabruna (and their Holocene derivatives BHG and Yamnaya) are the least coherent, which stems from their long historical interconnection. The Villabruna component was typical of Mesolithic Europe west of the Carpathians and its original Y haplogroups were I and R1b-V88 [24-25, 38]. These lineages also prevailed in the mixed EHG & Villabruna (BHG) populations between the Balkans and Northern Europe during the Mesolithic. At present, the highest frequency of BHG-derived Villabruna (Villabruna k=7 ) can be found in the Baltic region (Latvia and Lithuania), but its original Y haplogroups were already replaced by R1a during the Late Eneolithic (Corded Ware) period and by the Uralic lineage N during the Iron Age [33, 39]. Consequently, Villabruna k=7 is currently associated mainly with N and R1a, and the combination of these two Y haplogroups is significantly complementary in this regard ( r = 0.90, p < 0.001) (Supplementary Figure 21A). This means that Y hg N is practically interchangeable with R1a and represents the same autosomal ancestry. The mismatch between N & R1a and Villabruna ancestry later spread with Slavic-speaking populations to Central Europe (Figure 5D). However, there are two other populations with a notable proportion of Villabruna ancestry that retained Y hg I and experienced extensive geographic expansion. One is typical of Scandinavia (especially Sweden), with the predominance of Y hg I1, and the other can be found in the Western Balkans, with the overwhelming dominance of I2a-M423 (a subbranch of I2a-P37.2) [37]. The population history of WEF Villabruna in Western Europe was very different: This ancestry component correlates positively with Anatolian Neolithic k=7 ( r = 0.46, p = 0.003), a West European Y haplogroup R1b-S116 ( r = 0.70, p < 0.001), and especially with I2a-M26 ( r = 0.79, p < 0.001 in 29 countries) (Supplementary Figures 21B-21D). I2a-M26 is a subbranch of I2a-P37.2, which introgressed into the gene pool of West European farmers and is most widespread in Sardinia (38.9%) [33, 40]. The EHG component dominated east of the Carpathians during the Mesolithic and its main Y haplogroups were Q, R1a, and R1b [15, 25, 41]. Almost all non-Yamnaya EHG ancestry in today’s Europe is descended from the mixed BHG population, which can be illustrated by a very high geographical correlation between EHG k=7 and Villabruna k=7 ( r = 0.87, p < 0.001) (Supplementary Figure 21E). Consequently, EHG k=7 shares many relationships with Villabruna k=7 , including a very strong association with N & R1a ( r = 0.94, p < 0.001) (Supplementary Figure 21F). The combination of EHG k=7 and Villabruna k=7 components is, therefore, practically interchangeable with BHG k=12 , although their sum is lower than the frequency of BHG k=12 (Supplementary Figures 22A-22F). During the Eneolithic period (5 th -4 th millennium BC), the mixture of EHG and CHG in the steppe gave rise to the Yamnaya component, whose paternal signatures were likewise Q, R1a, and R1b. However, in the k=7 model, Yamnaya correlates most strongly with Y hg I1 ( r = 0.60, p < 0.001) (Supplementary Figure 23A). Although until recently, finds of Y hg I1 prior to the Nordic Bronze Age (1700-500 BC) were very rare and its history in Scandinavia was enigmatic, Posth et al. [32] documented this branch in a male from northern Germany (~3233 cal. BC), who was assigned to the local Funnelbeaker culture and had a Villabruna-like genetic profile. This shows that Yamnaya-associated Y hgs were largely replaced during admixture with the indigenous population of Scandinavia. In addition, Yamnaya was also part of the I2a-P37.2 expansion in the Balkans (Supplementary Figure 23B). As a whole, Yamnaya is almost perfectly linearly correlated with five ‘European’ Y haplogroups (I, N, Q, R1a, R1b) ( r = 0.96, p < 0.001) and with indigenous European components EHG and Villabruna (Supplementary Figures 23C-23E). On the other hand, it is strongly mutually exclusive with non-European Y haplogroups and non-European ancestry components (Supplementary Figure 23F). 3.3 Male height vs. Y haplogroups The relationships between male height and Y haplogroups are displayed by factor analyses in Figures 4A-4F, in Table 1, and in great detail in Supplementary Figures 24-32. In Europe, these comparisons confirm previous findings [7] and identify Y hg I as the main predictor of tallness ( r = 0.57, p < 0.001). At the same time, this is true not only in Europe but even in the entire sample of 60 countries ( r = 0.76, p < 0.001) (Figure 6A). At present, Y hg I has two main frequency peaks in Sweden (46.3%, mostly I1) and in Bosnia and Herzegovina (55.3%, mostly I2a-P37.2 and its subbranch I2a-M423). The strength of the positive relationship slightly increases when Y hg I is combined with R1b-U106 ( r = 0.80, p < 0.001) (Figure 6B), whose geographical distribution is more limited, with a frequency peak in the Netherlands (34.2%). A combination of five European Y haplogroups (I, N, Q, R1a, R1b) improves the correlation coefficient even further ( r = 0.83, p < 0.001) (Supplementary Figure 30A). Among ancestry-specific Y hgs, those associated with Germanic nations (I1, I2a-M223, R1b-U106) are the most noteworthy ( r = 0.64, p < 0.001) (Supplementary Figure 30E). The factor analyses in Figures 4A-4C also help to identify a more subtle combination of six ‘height-related’ Y haplogroups, which have the most specific relationship to male height in Europe: I1, I2a-P37.2, I2a-M223, N, Q, and R1b-U106 ( r = 0.72, p < 0.001) (Figure 6C). Nevertheless, the additive effect of I2a-M223 is negligible and ambiguous in other combinations. Given that the relationships between Y haplogroups and height can be influenced by environmental factors, the correlation coefficients must have been adjusted (Table 2). In Europe, these potentially confounding variables consist of nutrition (protein supply) and three socio-economic factors with the most direct causal effect (child mortality, total fertility, inequality-adjusted Human Development Index). Interestingly, the strength of the partial correlations is often even greater: Y hg I increases its predictive power to r = 0.67 ( p < 0.001), I2a-P37.2 to r = 0.64 ( p < 0.001), and R1a to r = 0.42 ( p = 0.011), reflecting the economic underdevelopment of Eastern Europe, which hinders the full expression of the genetic potential. In contrast, I1 largely loses significance (to r = 0.39, p = 0.019) and R1b-U106 becomes an insignificant factor (to r = 0.08, p = 0.64), although its partial correlation is partly retained in the total sample ( r = 0.27, p = 0.049). This must be ascribed to the fact that both I1 ( r = 0.65, p < 0.001) and R1b-U106 ( r = 0.68, p < 0.001) are most strongly associated with high protein quality in Europe (the ratio between the proteins from dairy & pork / wheat). Also noteworthy are the amplified negative tendencies of R1b-S116 ( r = -0.39, p = 0.021) and partly even I2a-M223 ( r = -0.28, p = 0.11), suggesting that their role is influenced by higher living standards in Western European countries. The most parsimonious combination of Y haplogroups after adjusting is I1 & I2a-P37.2 ( r = 0.70, p < 0.001) and this result improves only slightly (to r = 0.72, p < 0.001) with five Y hgs (I1, I2a-P37.2, Q, R1a, R1b-U106) (Figure 6D), not to mention that it is practically the same as the combination of mere four lineages (I1, I2a-P37.2, R1a, R1b-U106). The six ‘height-related’ Y hgs partly lose their importance ( r = 0.66, p < 0.001). Although Y hg N does not contribute to these relationships, its outlier frequency in Finland (Figure 6C) indicates that its phenotypic effect in the Finnish population may be highly specific. This would not be surprising given Finland’s isolated genetic history [42]. After excluding Finland, it is the combination of I1, I2a-P37.2, N, Q, and R1b-U106, which gives the highest partial correlation in 38 European countries ( r = 0.81, p < 0.001), but I1, I2a-P37.2, N, and R1b-U106 reach nearly the same value ( r = 0.80, p < 0.001) (Supplementary Figures 31A-31B). Although N and R1a are otherwise mutually complementary, adding R1a to these combinations decreases the partial r- value because it leads to overestimating male height in the Baltic region and Eastern Europe as a whole (Supplementary Figures 31C-31D). In contrast with European Y haplogroups, non-European Y haplogroups (E, G, J, L, T) correlate strongly negatively with height in the total sample ( r = -0.82, p < 0.001), which can be ascribed mainly to Y hg J1 ( r = -0.80, p < 0.001). This negative relationship reaches a maximum when J1 is combined with E1b-M123 and T ( r = -0.84, p < 0.001) (Figure 6E). As already mentioned, these three Y haplogroups are typical of the Arab peninsula and reach the highest frequency in Yemen (78.4%). Interestingly, we can also observe that Y haplogroups J2 and L, which are widespread in the Near East and correlate negatively with male height in Europe ( r = -0.57, p < 0.001), are strongly associated with tall statures in the Near East ( r = 0.77, p < 0.001). This result further slightly increases to r = 0.78, when J2 & L are combined with E1b-M78 or G2a*& G2a2. After adjusting for variables specific to the Near East, the combination of E1b-M78, J2, and L reaches a very high partial correlation of r = 0.92 ( p < 0.001), although this comparison does not include Bahrain and Qatar. When similar adjustments are performed in the total sample (in 58 countries, again without Bahrain and Qatar), the importance of European Y haplogroups markedly decreases (to r = 0.50, p < 0.005), confirming the expected influence of better living conditions in Europe. As already mentioned, this confounding effect is most evident in the Western European lineage R1b-S116, whose relationship to height is largely reversed. The role of Y hg I and various Y haplogroup combinations also decreases but still remains highly significant, and the highest partial correlation can be found in four Y haplogroups: I1, I2a-P37.2, Q, R1b-U106 ( r = 0.66, p < 0.001) (Figure 6F). After the exclusion of Finland, the best result is again achieved with the same five Y haplogroups as in Europe (I1, I2a-P37.2, N, Q, R1b-U106) ( r = 0.72, p < 0.001). The negative relationship of the three ‘Arabic’ Y hgs E1b-M123, J1, and T is also retained ( r = -0.62, p < 0.001), but the drop in the r -value is far greater, suggesting that the association between height and these markers may be more strongly distorted by the lower quality of life. 3.4 Male height vs. ancestry components Table 3 shows the relationship between male height and ancestry components according to all three models tested. This comparison identifies two ancestries that are consistently associated with tallness: Villabruna and Yamnaya. A significant positive relationship can be found even with EHG k=11 (a component in Yamnaya) and BHG k=12 (a mixed EHG-Villabruna ancestry) (Figures 7A-7D). In all comparisons, Yamnaya or EHG k=11 appear to be stronger predictors of height than Villabruna or BHG. However, the role of non-Yamnaya EHG k=7 , which does not include the Villabruna component, is non-significant. Furthermore, in the k=7 model after adjusting, the difference between Villabruna and Yamnaya in Europe nearly disappears ( r = 0.58 vs. r = 0.61) (Table 4), and it even reverses ( r = 0.57 vs. r = 0.55), when only three elementary factors (nutrition, child mortality, total fertility) are used as potential confounders. Disregarding the k=11 model (in which EHG includes a large proportion of Villabruna), BHG and Villabruna also correlate more strongly in the more developed Western Europe, where we generally observe highly linear relationships approaching r = 0.90 (Table 5). In contrast, correlations between height and ancestry in Eastern Europe are much weaker, suggesting a greater role of the environment. This is reminiscent of the observation made in the case of Y hgs I2a-P37.2 and R1a. These data suggest that the genetic potential of the Villabruna cluster, which is more represented in Eastern Europe, may actually be higher than that of Yamnaya. This is further supported by the fact that Villabruna k=7 correlates more strongly with key height-related, adjusted combinations of Y haplogroups (Supplementary Table 2). In fact, the deficit of Villabruna ancestry in Western European populations descended from the Bell Beaker culture (Y hg R1b-S116) can explain why they are ~3 cm shorter than other Europeans with a similar proportion of Yamnaya ancestry. This can be illustrated by the example of Ireland on the one hand, and Sweden on the other hand (Figures 7A-7B). The EHG k=7 component likewise gains in importance after adjusting ( r = 0.49, p = 0.003), but does not combine well with Villabruna or Yamnaya, indicating its secondary role (Supplementary Figures 33A-33B). Figures 7E-7F and Supplementary Figures 33-34 illustrate that height in Europe is in an inverse relationship with all five Near Eastern ancestry components and this remains true even after controlling for the environment (Table 4). In the Near East and North Africa, Anatolian Neolithic predicts tall statures and similar tendencies can be observed in the CHG component. On the other hand, Natufian has a negative role. These relationships (albeit weaker) are likewise retained after adjustments. The situation in the total sample changes more: After controlling for environmental factors, negative partial correlations in the k=12 model can be found in Natufian ( r = -0.50, p < 0.001), WEF Villabruna ( r = -0.36, p = 0.009), and partly in Iranian Neolithic ( r = -0.27, p = 0.057) and Anatolian Neolithic ( r = -0.27, p = 0.055). The r -values in Anatolian Neolithic radically change from positive to negative, which must be ascribed to the high frequencies of this cluster in affluent Western European countries. The fact that WEF Villabruna (Supplementary Figure 34E) decreases a positive correlation coefficient when combined with BHG in Europe suggests that it carries predispositions for short height that were typical of Anatolian Neolithic. A noteworthy anomaly that can be observed in virtually all graphic comparisons in Europe is the eccentric trend in the Western Balkan area of the Dinaric Alps (former Yugoslavia and Albania). Here, we find the tallest statures in the world and the highest frequencies of Villabruna-associated Y hg I, yet Villabruna and Yamnaya ancestry are insufficient to explain this phenomenon and local heights appear to increase with the proportion of Anatolian Neolithic and CHG ancestry (cf. Figure 7E). This is in striking contrast to the situation in other regions of Europe. Inevitably, correlation coefficients in Europe profoundly increase when Western Balkan countries are excluded (Table 5). The same anomalous tendency can be seen in Supplementary Figure 30A, where Montenegro is the tallest European country, despite a relative deficit of European Y haplogroups (I, N, Q, R1a, R1b). A lineage that improves the position of Montenegro in this graph and significantly increases the correlation coefficient in Europe (from r = 0.61 to r = 0.68, p < 0.001) is E1b-M78. Another slight increase occurs after the inclusion of J2 ( r = 0.70, p < 0.001). The specific role of E1b-M78 and J2 in the Western Balkans can also be seen in Supplementary Figures 32A-32B, which compare non-European Y haplogroups. These findings raise interesting questions regarding the well-known Holocene founder effect of E1b-M78 in the Balkans, which included the subbranch E1b-V13 [43-44]. The frequency of E1b-V13 currently reaches its maximum in the Ghegs from North Albania (37.8%) [44] and Kosovo (up to ~44%) [45]. Interestingly, Cruciani et al. [43] demonstrated a high geographical correlation between E1b-V13 and the subbranch J2b in Europe and their similar times to the most recent common ancestor (TMRCA) ~2700-2000 BC, supporting a long shared history. This observation can be confirmed even in the present study because E1b-M78 and J2 mutually overlap both in the Balkans ( r = 0.75, p = 0.008) and in the seven countries of the Dinaric Alps ( r = 0.84, p =0.017). 3.5 Lactose tolerance vs. Y haplogroups Lactose tolerance-associated alleles included in this study consist of 13910*T, 13915*G, 14009*G, 13907*G, and 14010*C, but only 13910*T and 13915*G were represented in high frequencies. The typically European allele 13910*T was available for 35 out of 39 European countries and for 52 countries from the total sample (Figure 8A). It has two main frequency peaks in Ireland (86.6%) and Iceland (85.3%) but is also widespread in Scandinavia (~75%) and in the United Kingdom (74.5%). Table 6, Supplementary Table 9, and Figure 9A illustrate that the distribution of 13910*T in Europe is most closely tied to three ‘Germanic’ Y haplogroups I1, I2a-M223, and R1b-U106, whose combination is complementary ( r = 0.79, p < 0.001) and is clearly responsible for the elevated 13910*T frequency in Eastern Europe. Although the presence of 13910*T in Eastern Europe can also be linked to N and R1a, their role is considerably weaker. Besides the Germanic Y hgs, the most important lineage is obviously R1b-S116 in Western Europe. Six Y haplogroups (I1, I2a-M223, Q, R1a, R1b-S116, R1b-U106) are the strongest correlates of 13910*T both in Europe ( r = 0.89, p < 0.001) and in the entire sample of 52 countries ( r = 0.94, p < 0.001) (Figure 10A). Although adding Y hg N improves the outlier position of Finland, it overestimates 13910*T frequency in other Baltic countries (Supplementary Figures 35A-35F). Expectably, all non-European Y hgs have a negative relationship with 13910*T in Europe and this applies especially to E1b-M78 & J2 ( r = -0.78, p < 0.001), which are concentrated in the Balkans. In the Near East and North Africa, the situation is much simpler. 13910*T is represented in low frequencies, although a notable exception is Northwestern Africa (~10-20% 13910*T). The dominant lactose tolerance-associated allele is 13915*G, whose frequencies are available for 30 countries (13 in the Near East and 16 in the Near East & North Africa). Its occurrence reaches a peak in Saudi Arabia (58.7%) and Yemen (54.9%), but outside the Arabian Peninsula, it abruptly decreases to 3.7% in Egypt and 2.9% in Syria (Figure 8B). The only Y haplogroup correlating consistently with 13915*G is J1 ( r = 0.87, p < 0.001 in the Near East and r = 0.89, p < 0.001 in the complete sample of 30 countries), which is graphically demonstrated in Figure 9B. In the Near East & North Africa, E1b-M123 is also gaining importance, and Y hg T shows a decently positive role in the whole sample (Table 6, Supplementary Table 10). Nevertheless, E1b-M123 and T have only a slightly additive effect when combined with J1, showing that J1 is definitely the most important factor (Figure 10B, Supplementary Figures 36A-36D). Figure 10B also shows that 13915*G starts to increase in a population only when the cumulative frequency of E1b-M123, J1, and T reaches > 30%. This suggests that not all subbranches of these Y haplogroups are associated with 13915*G and a higher resolution would be needed. In any case, 13915*G is not positively correlated with any other Y haplogroup, confirming that its present distribution is closely related to the expansion of pastoral populations from the Arabian Peninsula. 3.6 Lactose tolerance vs. ancestry components In contrast with the diverse spectrum of Y haplogroups connected with 13910*T, there is only one ancestry correlating consistently positively with this allele in Europe: Yamnaya ( r = 0.83, p < 0.001 in the k=7 model) (Table 7, Figure 11A). However, Villabruna k=11 and BHG k=12 are also significantly correlated, and Villabruna k=7 reaches significance when Europe is divided into a western and eastern half (Figure 11B, Supplementary Figures 37-38). These relationships reflect the involvement of Villabruna-associated lineages I1 and I2a-M223, as well as the weaker 13910*T selection in Eastern Europe (mirrored by the less significant role of Y hgs N and R1a) and the weaker relationship of 13910*T with WEF Villabruna ancestry in Western Europe (which consequently weakens the relationship between 13910*T and Villabruna k=11 in Western Europe and brings it closer to the Eastern European correlation line). The Near Eastern allele 13915*G shows the only significant connection with Natufian ancestry, but the correlations are even more linear than those observed in the three ‘Arabic’ Y haplogroups: In both k=11 and k=12 models, they reach r = 0.91 ( p < 0.001) in the total sample of 27 countries, r = 0.93 ( p < 0.001) in 13 countries from the Near East & North Africa, and r = 0.96 ( p < 0.001) in 10 countries from the Near East (Supplementary Figures 39A-39B). Still, even here, we can see that the selection for 13915*G postdates the spread of the Natufian component, as Levantine countries (Lebanon, Syria) have nearly zero 13915*G levels despite 20-25% Natufian ancestry. 3.7 Male height vs. lactose tolerance Previous studies [7-8] documented a paradoxical relationship of lactose tolerance with male height, which is positive in Europe but negative in the Near East. The present study unequivocally confirms these findings (Figures 12A-12B). In Europe, this positive relationship can be explained by the association between 13910*T and the Yamnaya component because Eastern European countries (and particularly those from the Western Balkans) deviate from the correlation line. The negative correlation between 13915*G and male height is likewise easy to explain due to the strong multicollinearity of 13915*G with the ‘Arabic’ Y haplogroups and Natufian ancestry, which consistently predict the shortest heights. 4. Discussion The data in this study are the result of an intensive collection of information from available literature and internet databases, which, unfortunately, are still not perfect in their representativeness. For this reason, they must be considered only provisional, especially regarding the occurrence of rare Y chromosome subbranches in some regions (e.g., R1b-S116 and R1b-U106 in the Near East and North Africa), for which only lower quality sources were often available. However, care was taken to ensure that the frequency of the 12 major Y haplogroups was similar across all studies used, which can explain why the correlations between Y haplogroups and other variables are extremely strong (not rarely approaching r = 1.00) and indirectly suggest that, despite the above-mentioned limitations, they are more than sufficiently accurate for the study’s purpose. Although the statistical power of many findings is extraordinary, it is understandable that it does not prove causal relationships by itself. Similar problems are faced by GWAS studies that are essentially based on a similar methodology, i.e., correlations between certain alleles and physical traits in individuals (which, moreover, may not apply to all regions). Extrapolating the current occurrence of genetic factors into the distant past also has its limitations and risks, as evidenced, for example, by the original misidentification of Y hg R1b with the Upper Paleolithic legacy in Western Europe [46]. Nevertheless, the advantage of the present study is the fact that most of its results can be placed in the context of already available knowledge, have a meaningful rationale, and mutually support each other. Examining genotype-phenotype relationships at the country level also makes it possible to overcome the fundamental barrier of individual-level GWAS, whose results cannot be reliably applied to different populations. 4.1 Tallness in Europe can be traced to the heritage of Villabruna and Yamnaya ancestry Y haplogroups are signatures of male founder effects and their strong (sometimes perfectly linear) correlation with ancestry components testifies that they play a key role in shaping population history. The present work markedly refines the findings of previous studies [7-8] and corrects the effect of all Y haplogroups for environmental factors. Although this adjustment cannot be perfect, its results are in agreement with expectations that take into account nutritional and socio-economic factors. First of all, we can see that Y hgs I1 (in Scandinavia) and I2a-P37.2 (in the Western Balkans) are by far the most important predictors of tallness and the genetic potential in the Western Balkans is still not fully expressed due to suboptimal living conditions. Y haplogroup R1b-U106 has a much weaker effect than the unadjusted correlations indicate, which is due to the fact that the peak of its occurrence is in the Netherlands, where we could find the highest level of protein quality over the last decades. Despite that, R1b-U106 has some decent additive role in adjusted combinations, especially in the total sample. Y haplogroup Q has a generally small frequency across the 60 countries examined and its positive contribution is likewise small, but remains significant even after adjusting. Y haplogroup N has a very special position: It decreases correlation coefficients because it markedly overestimates male height in Finland, but after excluding Finland, it profoundly improves the strength of all Y haplogroup combinations. The possibility that Y hg N frequency in Finland (64.2%) would be erroneous is unlikely because it is based on a very large sample (n = 4375) and similar results (~60% Y hg N) are reported by other studies. Therefore, the phenotype reflected by Y hg N in Finland is probably different than in other Baltic countries, which is also mirrored in correlations with the 13910*T allele. Y haplogroup R1a is practically interchangeable with Y hg N as for its relationship to male height and improves partial correlations in some adjusted combinations. However, its combination with Y hg N is not productive as it already leads to the overestimation of height in Eastern European countries. Consequently, the use of five Y haplogroups (I1, I2a-P37.2, N, Q, R1b-U106) appears to be the most rational for predicting tallness in the countries studied, although it cannot include Finland. These five lineages also appear in an ideal multiple regression model of male height, which is based on all Y haplogroups with a presumably causative effect (i.e., after adjusting for the environment), but does not include Finland. This model includes seven Y haplogroups (five correlating positively and two correlating negatively) and explains 85.53% variance in 59 countries (Figures 12A-12B, Supplementary Tables 15-16). All the six height-associated Y haplogroups discussed above were originally connected with either Villabruna/BHG ancestry (I1, I2a-P37.2), or Yamnaya ancestry (Q, R1a, R1b-U106), or started to reflect BHG ancestry due to recent genetic processes (N). The comparison of ancestry components confirms these results and identifies BHG, Villabruna, and Yamnaya as the most important autosomal clusters predicting tallness. According to Mathieson et al. [15], Yamnaya is a stronger factor than Villabruna (WHG), whereas Berg et al. [16] attributed greater importance to Villabruna. These authors later retracted their findings due to the possible environmental confounding and ethnic specificity of the height-associated SNPs [10-11]. Based on data in the present study, Yamnaya k=7 shows a stronger association with height than Villabruna k=7 , but this finding is influenced by the socio-economic underdevelopment of Eastern European countries, where Villabruna occurs in the highest frequencies. After adjusting for environmental factors, the difference between these two components disappears and Villabruna emerges as visibly more important in Western Europe, despite small population frequencies. The three major sources of expansion of Villabruna ancestry (Figure 12C) also correspond to the three major regional peaks of male height in Europe. Nevertheless, even the best prediction model based on ancestry components (Figure 12D) cannot match prediction models based on Y haplogroups. This is not only due to the lower ‘resolution’ of ancestry components but also due to the specific roots of the exceptional tallness in the Western Balkans, which deserves a more detailed discussion. 4.2 The complex origin of the Dinaric phenomenon The frequency of Y hg I reaches the highest values in the world in the mountainous region of the Dinaric Alps, stretching from Slovenia to northern Albania. The peak of its frequency is in Herzegovina (73.3% in Croats from Herzegovina) and most of this high proportion belongs to I2a-P37.2 (71.1%) [47]. Virtually all I2a-P37.2 in the Dinaric Alps is represented by its subbranch I2a-M423 (I2a1a2) [48-49], which was widespread in Mesolithic and Neolithic Europe [33]. According to the YFull database (https://www.yfull.com/), I2a-M423 includes several notable relict lineages in the British Isles but is otherwise concentrated in Eastern Europe. The typically East European subbranches of I2a-M423 share a common root in I2a-S9952 (I2a1a2b1a1a), whose estimated TMRCA is ~1400 BC. The area of the Dinaric Alps is characterized by the strong founder effect of I2a-PH908 (I2a1a2b1a1a1c), a subbranch with a TMRCA in ~300 AD. Its frequencies may be approximately 33% in Bosnia and Herzegovina, 28% in Montenegro, 27% in Serbia, and 26% in Croatia. The work of our research team [50] identified a geographical nucleus with extraordinarily tall statures (≥184 cm), which includes Dalmatia, Herzegovina, and the northwestern parts of Montenegro (Figure 13), and agrees with the regional distribution of Y hg I [45, 47, 49]. However, the Western Balkan area is characterized by several peculiarities: First, the ratio between Y hg I and Villabruna k=7 or BHG k=12 ancestry in the Western Balkans is higher than in Scandinavia, indicating a stronger male founder effect over a genetically alien substrate. Second, these genetic factors are surprisingly little represented in Montenegrins (38.2% I; 3.4% Villabruna k=7 ; 10.5% BHG k=12 ) and especially in Kosovar Albanians (8.0% I; 1.2% Villabruna k=7 ), despite the fact that Montenegrins are the tallest in the world (with 182.9 cm in men) and the height of Kosovar Albanian men (179.5 cm) [51] surpasses most Western Europeans with much higher living standards. Male height in the once heavily isolated Albania is also rising rapidly due to improved nutrition and is already approaching 177 cm in some coastal regions. Therefore, in the future, we can expect an even greater statistical deviation of the Western Balkan region within Europe. The graphical comparisons in the present study allow to put forward a hypothesis that the roots of this paradox lie in a specific genetic component that can be traced to a male founder effect in Albanians. In contrast to the rest of Europe, this height-associated component is based on Anatolian Neolithic and CHG ancestry, and Near Eastern lineages E1b-M78 (E1b-V13) and J2 (J2b). Although it could be assumed that these predispositions for tallness spread across the Dinaric Alps due to historical inter-group contacts, it is striking that all ancestry components associated with I2a-P37.2 (Villabruna k=7 , EHG k=7 , Yamnaya k=7 ) are individually completely exclusive with Anatolian Neolithic & CHG ancestry k=7 in the Balkans ( r = -1.00, p < 0.001) (Supplementary Figure 23F). The very history of I2a-P37.2 in the Dinaric Alps still remains unclear. At the dawn of population genetics, it was assumed that its high frequency was a relic of the Late Upper Paleolithic Epigravettian culture, which occupied the Adriatic glacial refugium in Italy and the Western Balkans, and was the source of the Villabruna cluster [46]. Indeed, one possibly Mesolithic sample of Y hg I2 with a Villabruna autosomal profile was found in the Vrbička Cave in Montenegro [52]. However, based on the internal STR diversity of I2a-P37.2, it was hypothesized that its current distribution is the result of a much recent expansion from the area east of the Carpathians ~1000 BC [53-54]. Solving this problem is fundamentally complicated by the poor preservation of skeletal remains in the limestone bedrock of the Dinaric Alps, as well as by the general scarcity of post-Mesolithic I2a-P37.2 samples in Eastern Europe, which have not been differentiated beyond the level of I2a-S9952 (I2a1a2b1a1a). The key subbranch I2a-M423 starts to appear systematically as late as at the beginning of the Bronze Age (2500-2000 BC) in Bulgaria, Romania, and Serbia, together with a notable reemergence of Villabruna ancestry [52, 55]. So far, the best explored region of the Dinaric Alps is the Croatian part of the Adriatic coast, where the arrival of I2a-M423 appears to be surprisingly recent and was preceded by groups related to modern Albanians, with a clear dominance of J2b. The two oldest cases of I2a-M423 in the Adriatic area are known from the Bezdanjača Cave (Lika-Senj county) and were indirectly dated to ~1150 cal. BC [33, 52]. The only available paleogenetic sample from Bosnia and Herzegovina (Klakar in northern Bosnia, ~1500 cal. BC) was identified as I2a-M223 and the oldest post-Mesolithic samples from Montenegro come from Velika Gruda in the coastal part of the country (~1350-1140 cal. BC). Similar to coastal Croatia, the males from Velika Gruda belonged almost exclusively to J2/J2b, except for a single case of Y hg I, which had an outlier autosomal profile with a higher proportion of Villabruna ancestry [52]. The newest paper touching this topic [56] analyzed 161 samples from the Balkans covering the period 1-1500 AD. Out of 78 male specimens that were older than 1000 AD, 15 were assigned to E1b-V13, but only three (from northern and southeastern Serbia) to I2a-P37.2 (I2a1a2b1a/I2a1a2b1a1a) and were dated to 800-1000 AD. Based on the high proportion of Slavic ancestry in these individuals (>50%), the authors assume that this finding supports the recent, Slavic origin of I2a-P37.2 in the Balkans. However, similar to previous studies, this study did not test a single sample from Bosnia & Herzegovina and Montenegro, and only few from the peripheral areas of the Dinaric Alps. The crucial problem is to explain, how the founder effect of this presumably Slavic Y haplogroup produced a phenotype that is very different from that of contemporary Slavic peoples in Eastern Europe – from much taller height through brachycephalic cranial morphology [57] to noticeably darker pigmentation of hair and eyes in Herzegovina than in Bosnia (Grasgruber et al. – unpublished data). Although the present paper cannot illuminate the exact origin of I2a-P37.2, it can at least indicate its post-Yamnaya expansion from Bosnia and Herzegovina associated with the spread of EHG, Villabruna, and Yamnaya ancestry in the Balkans. 4.3 Concordance with the existing literature Given the problematic nature of polygenic height scores, some authors have recently resorted to comparing the distribution of ancestry components with selected physical traits at the individual level. Because the methodology in the current paper is essentially similar (with the only fundamental difference being that it compares population averages and not individuals), the results should be theoretically identical. One such example is the study by Marnetto et al. [19], who examined the relationship between four autosomal ancestries (Yamnaya, Villabruna/WHG, Anatolian Neolithic, Siberian) and multiple physical traits in a large sample from the Estonian Biobank. The presence of excess Yamnaya ancestry was documented at certain regions of the genome, which were identified as predictors of tallness by GWAS. Other ancestry components were not significant in this regard and were rather negatively associated. However, when the total (genome-wide) proportion of genetic ancestries in an individual was compared with measured physical characteristics, the relationships were consistent with findings in the present paper: Villabruna ancestry was clearly the strongest predictor of tallness, followed by Yamnaya. The Anatolian Neolithic and Siberian components had a negative effect. The authors consider this second approach to be more susceptible to environmental confounding because the distribution of ancestry components in Estonia differs by geography and may, therefore, be connected with geographical differences in environmental conditions. Nevertheless, based on the results of the present study, environmental relationships with the same ancestry components would have to exist at the European level as well. A subsequent pre-print by the same authors [21], using the genomes of 50,000 European individuals from the UK Biobank, found a good agreement with the Estonian Biobank, as for the relationship between trait-associated genomic regions and three ancestry components. In the case of height, Yamnaya ancestry again predicted tallness, whereas Anatolian Neolithic ancestry was insignificant and Villabruna ancestry had a negative role. In contrast, the relationships of genome-wide ancestry proportions with selected physical traits differed between these biobanks and those regarding height showed no apparent association. Similar findings were also reported by another recent pre-print by Irving-Pease et al. [20], who used >400,000 genomes of British individuals from the UK Biobank and examined the presence of ancestry components at trait-associated genomic regions. This work is particularly relevant because the authors differentiated five ancestry components, which they had provided for the present study in the k=7 model. According to their data, above-average predictive power was attributed to Yamnaya, EHG, and CHG, whereas Anatolian Neolithic and especially Villabruna were deeply below-average in this regard. Discrepancies with individual-level relationships based on genome-wide ancestry could be theoretically explained by environmental confounding, but the position of Villabruna ancestry at the bottom of relationships with height-associated loci clearly represents an insurmountable contradiction with the country-level findings and requires clarification. Since most of the donors in the UK Biobank come from the British Isles, it is inevitable that local 'Villabruna' ancestry consists of ≥50% WEF Villabruna, which can be an important confounding element (see Figure 5A). However, these two components were separated in the k=7 ancestry model used by Irving-Pease et al. [20], which suggests that the crucial problem may lie in the method of how the height-associated loci were identified. More concretely, there may exist a strong bias in favor of Yamnaya ancestry and against Villabruna ancestry. This would not be surprising if the relevant GWAS were conducted on Western European populations with a high proportion of Yamnaya and a low proportion of Villabruna with the predominance of the WEF Villabruna fraction. Irrespective of the roots of these conflicting results, it is clear that studies of this sort need a sufficiently high resolution of ancestry components or a comparison of their effect across multiple regions because phenotypic associations can be completely antagonistic even in seemingly identical genetic clusters. This can be demonstrated not only by the polarity between Villabruna and WEF Villabruna in Europe, but also between Anatolian Neolithic & CHG components in the Western Balkans and the rest of Europe. 4.4 The evolution of lactose tolerance Various speculations have been put forward about the geographical origin of alleles associated with lactose tolerance. Although the practice of dairying is documented in early Anatolian farmers during the 7 th millennium BC and was widespread in Neolithic Europe, it was not accompanied by the occurrence of 13910*T allele [58]. At the same time, the short statures of Central and Western European farmers (~161-163 cm in men) [59-60] indirectly testify that the utilization of high-quality dairy proteins in the diet must have been very limited and strong evolutionary selection towards lactose tolerance cannot be expected. The first documented – albeit still questionable – cases of 13910*T heterozygotes come from the Eneolithic sites of Varna in Bulgaria (~4610 BC) and Alexandria in Ukraine (~3650 BC), and the first verified carrier was an Eneolithic woman from Gura Baciului in Romania (~3440 BC) [58]. The concentration of these samples in Southeastern Europe accords with recent research, which shows that the origin of 13910*T is connected with the Yamnaya culture from the East European steppe [14, 20]. Nevertheless, the evolution of this allele was long and predated the Yamnaya phase, possibly coinciding with the selection of alleles associated with metabolic adaptations to famine [20]. Somewhat surprisingly, samples of steppe populations from the Yamnaya culture (~3300-2600 BC) and the Catacomb culture (~2600-2000 BC), as well as the Corded Ware culture from Central Europe and the Baltic region (~2900-2300 BC) were characterized by a very low frequency of 13910*T (< 1%) [61]. More 13910*T-positive individuals start to emerge in the paleogenetic record during the era of the Bell Beaker and Únětice cultures in Central Europe (late 3 rd millennium BC) [58]. The results of the present study do not contradict this scenario. The high correlation between 13910*T and Yamnaya ancestry, especially in the k=7 model ( r = 0.83, p < 0.001), confirms that Yamnaya was the original source of these evolutionary adaptations [62]. Irrespective of differences in 13910*T frequencies, all Eneolithic and Early Bronze Age populations stemming from the steppe were characterized by high height averages (~168-173 cm in men) [59-60, 63], which shows that in addition to superior genetic predispositions, they also enjoyed much better nutrition than Neolithic farmers, suggesting the existence of a ‘milk-drinking culture’ [64]. The subsequent selection of the 13910*T allele in Europe depended on local evolutionary processes that also included populations of BHG origin and Villabruna/BHG-associated Y haplogroups (I1, I2a-M223). The strongest selection of this kind apparently occurred in Germanic-speaking ethnicities (Y hgs I1, I2a-M223, R1b-U106) and in Celtic groups descending from the Bell Beaker culture (Y hg R1b-S116). The Germanic lineages I2a-M223 and R1b-U106 reach a frequency peak in the Netherlands and strongly correlate with each other in Europe ( r = 0.78, p < 0.001), indicating that they were part of the same population expansion. Both are also significantly correlated with the Scandinavian lineage I1, but this relationship is substantially weaker and essentially limited to Central Europe and Britain, suggesting that Germanic-speaking peoples arose from an amalgamation of two populations with different paternal origins (Supplementary Figures 40A-40C). On the other hand, the correlations between R1b-S116 and these three Y haplogroups tend to be negative in Western Europe, indicating different population histories (Supplementary Figures 41A-41C). In general, we can see that the areas with the highest occurrence of 13910*T in Europe roughly correspond to areas with poor soil quality or poor recovery capacity (low soil resilience) (Figure 14A). In such an environment, milk could have been a ready source of nutrients in the event of a famine [58]. On the other hand, fermented dairy products in which the content of lactose is reduced (yoghurt) or almost completely eliminated (curd, cheese) require longer and more sophisticated preparation. The mutual interaction between the culture of regular milk drinking and the evolution of lactose tolerance can be illustrated by Figure 14B: The presence of 13910*T in a European population predicts the current consumption of dairy products, irrespective of their form. A notable exception is the mountainous region of the southwestern Balkans, where we observe very low soil quality, a long tradition of pastoralism, very high consumption of dairy products (even in the form of liquid milk), but a relatively low frequency of 13910*T. The selection for 13915*G in the Near East occurred independently of 13910*T, as a genetic adaptation to the consumption of camel’s milk. Here, too, the reasons are understandable because milk was a very valuable source of high-quality nutrients and the necessary fluids in the unfavorable environment of the desert [65]. Interestingly, this allele appears to be only ~4100 years old [66] and its evolution coincides with the aridification of the Arabian Peninsula [67]. The fact that lactose tolerance did not contribute to the selection for tall stature in Arab populations is somewhat counterintuitive but the evolutionary pressures associated with Bergmann's ecological rule (adaptation to hot climates) and the universal scarcity of natural resources [68] may have worked in the opposite direction. 5. Conclusion The purpose of the current work was to shed light on the geographical relationships between genetic factors (Y haplogroups, ancestry components) on the one hand and physical characteristics (body height, lactose tolerance) on the other. This comparison yields a large number of impressive results from which meaningful causal relationships can be inferred. Others offer room for hypotheses that can be confirmed by more sophisticated research methods. Due to the limited extent of this text, not all of these findings can be discussed in detail, but in addition to the demonstrated associations regarding height and lactose tolerance, it is important to point out possible implications for genetic research working with individual data, because some of its results cannot be reconciled with country-level findings. In the near future, the collected frequencies of genetic factors could be used for comparison with other physical characteristics or trait-associated alleles such as lean body mass or pigmentation. Declarations Competing interests: The author declares no competing interests. Data and materials availability: Data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials and the Supplementary Dataset. Acknowledgments: The author would like to thank William Barrie and the team of AncestralWhispers.org for providing data on the proportion of autosomal genetic components. References Tam, V., Patel, N., Turcotte, M., Bossé, Y., Paré, G., Meyre, D. Benefits and limitations of genome-wide association studies. Nature Reviews Genetics. 2019; 20(8):467–484. Uffelmann, E., Huang, Q. Q., Munung, N. S., De Vries, J., Okada, Y., Martin, A. R., et al. Genome-wide association studies. Nature Reviews Methods Primers. 2021; 1(1):1–21. Silventoinen, K. Determinants of variation in adult body height. Journal of Biosocial Science. 2003; 35(2):263–285. Deaton, A. Height, health, and development. Proceedings of the National Academy of Sciences. 2007; 104(33):13232–13237. Akachi, Y., Canning, D. Inferring the economic standard of living and health from cohort height: Evidence from modern populations in developing countries. Economics & Human Biology. 2015; 19:114–128. Perkins, J. M., Subramanian, S. V., Davey Smith, G., Özaltin, E. Adult height, nutrition, and population health. Nutrition Reviews. 2016; 74(3):149–165. Grasgruber, P., Cacek, J., Kalina, T., Sebera, M. The role of nutrition and genetics as key determinants of the positive height trend. Economics & Human Biology. 2014; 15:81–100. Grasgruber, P., Sebera, M., Hrazdíra, E., Cacek, J., Kalina, T. Major correlates of male height: A study of 105 countries. Economics & Human Biology. 2016; 21:172–195. Grasgruber, P., Hrazdíra, E. Nutritional and socio-economic predictors of adult height in 152 world populations. Economics & Human Biology. 2020; 37 :100848. Berg, J. J., Harpak, A., Sinnott-Armstrong, N., Joergensen, A. M., Mostafavi, H., Field, Y., et al. Reduced signal for polygenic adaptation of height in UK Biobank. Elife. 2019; 8:e39725. Sohail, M., Maier, R. M., Ganna, A., Bloemendal, A., Martin, A. R., Turchin, M. C., et al. Polygenic adaptation on height is overestimated due to uncorrected stratification in genome-wide association studies. Elife. 2019; 8: e39702. Yengo, L., Vedantam, S., Marouli, E., Sidorenko, J., Bartell, E., Sakaue, S., et al. A saturated map of common genetic variants associated with human height. Nature. 2022; 610(7933):704–712. Cox, S. L., Ruff, C. B., Maier, R. M., Mathieson, I. Genetic contributions to variation in human stature in prehistoric Europe. Proceedings of the National Academy of Sciences. 2019; 116(43):21484–21492. Allentoft ME, Sikora M, Refoyo-Martínez A, Irving-Pease EK, Fischer A, Barrie W, et al. Population Genomics of Stone Age Eurasia. bioRxiv. 2022; 2022-05. Mathieson I, Lazaridis I, Rohland N, Mallick S, Patterson N, Roodenberg SA, et al. Genome-wide patterns of selection in 230 ancient Eurasians. Nature. 2015; 528(7583):499–503. Berg JJ, Zhang X, Coop G. Polygenic adaptation has impacted multiple anthropometric traits. BioRxiv. 2017; 167551. Marciniak, S., Bergey, C. M., Silva, A. M., Hałuszko, A., Furmanek, M., Veselka, B., et al. An integrative skeletal and paleogenomic analysis of stature variation suggests relatively reduced health for early European farmers. Proceedings of the National Academy of Sciences. 2022; 119(15):e2106743119. Cox, S. L., Moots, H. M., Stock, J. T., Shbat, A., Bitarello, B. D., Nicklisch, N., et al. Predicting skeletal stature using ancient DNA. American Journal of Biological Anthropology. 2022; 177(1):162–174. Marnetto, D., Pankratov, V., Mondal, M., Montinaro, F., Pärna, K., Vallini, L., et al. Ancestral genomic contributions to complex traits in contemporary Europeans. Current Biology. 2022; 32(6):1412–1419. Irving-Pease, E. K., Refoyo-Martínez, A., Ingason, A., Pearson, A., Fischer, A., Barrie, W., et al. The selection landscape and genetic legacy of Ancient Eurasians. bioRxiv. 2022; 2022-09. Pankratov, V., Mezzavilla, M., Aneli, S., Fusco, D., Wilson, J. F., Metspalu, M., et al. Ancestral genetic components are consistently associated with the complex trait landscape in European biobanks. bioRxiv. 2023; 2023–10. Parker, K., Erzurumluoglu, A. M., & Rodriguez, S. The Y chromosome: a complex locus for genetic analyses of complex human traits. Genes. 2020; 11(11):1273. Rhie, A., Nurk, S., Cechova, M., Hoyt, S. J., Taylor, D. J., Altemose, N., et al. The complete sequence of a human Y chromosome. Nature. 2023; 621(7978):344–354. Fu Q, Posth C, Hajdinjak M, Petr M, Mallick S, Fernandes D, et al. The genetic history of Ice Age Europe. Nature. 2016; 534(7606):200–205. Mathieson I, Alpaslan-Roodenberg S, Posth C, Szécsényi-Nagy A, Rohland N, Mallick S, et al. The genomic history of southeastern Europe. Nature. 2018; 555(7695):197–203. Papac L, Ernée M, Dobeš M, Langová M, Rohrlach AB, Aron F, et al. Dynamic changes in genomic and social structures in third millennium BCE central Europe. Science Advances. 2021; 7(35):eabi6941. Olalde, I., Brace, S., Allentoft, M. E., Armit, I., Kristiansen, K., Booth, T., et al. The Beaker phenomenon and the genomic transformation of northwest Europe. Nature. 2018; 555 (7695):190–196. Tishkoff, S. A., Reed, F. A., Friedlaender, F. R., Ehret, C., Ranciaro, A., Froment, A., et al. The genetic structure and history of Africans and African Americans. Science. 2009; 324(5930):1035–1044. Itan Y, Jones BL, Ingram CJ, Swallow DM, Thomas MG. A worldwide correlation of lactase persistence phenotype and genotypes. BMC Evolutionary Biology. 2010; 10(1): 1–11. Global Lactase persistence Association Database. https://www.ucl.ac.uk/biosciences/gee/molecular-and-cultural-evolution-lab/global-lactase-persistence-association-database-glad . Mattila, T., Svensson, E., Juras, A., Günther, T., Kashuba, N., Ala-Hulkko, T., et al. Genetic continuity, isolation, and gene flow in Stone Age Central and Eastern Europe. Preprint from Research Square, 12 Sep 2022. https://doi.org/10.21203/rs.3.rs-1966812/v1 Posth, C., Yu, H., Ghalichi, A., Rougier, H., Crevecoeur, I., Huang, Y., et al. Palaeogenomics of Upper Palaeolithic to Neolithic European hunter-gatherers. Nature. 2023; 615(7950):117–126. Allen Ancient DNA Resource. 2023. https://reich.hms.harvard.edu/allen-ancient-dna-resource-aadr-downloadable-genotypes-present-day-and-ancient-dna-data . Lazaridis, I., Nadel, D., Rollefson, G., Merrett, D. C., Rohland, N., Mallick, S., et al. Genomic insights into the origin of farming in the ancient Near East. Nature. 2016; 536(7617):419–424. Narasimhan VM, Patterson N, Moorjani P, Rohland N, Bernardos R, Mallick S, et al. The formation of human populations in South and Central Asia. Science.2019; 365(6457):p.eaat7487. Wang CC, Reinhold S, Kalmykov A, Wissgott A, Brandt G, Jeong C, et al. Ancient human genome-wide data from a 3000-year interval in the Caucasus corresponds with eco-geographic regions. Nature Communications. 2019; 10(1):1–13. Rootsi, S., Kivisild, T., Benuzzi, G., Bermisheva, M., Kutuev, I., Barać, L., et al. Phylogeography of Y-chromosome haplogroup I reveals distinct domains of prehistoric gene flow in Europe. The American Journal of Human Genetics. 2004; 75 (1):128–137. Marcus, J. H., Posth, C., Ringbauer, H., Lai, L., Skeates, R., Sidore, C., et al. Genetic history from the Middle Neolithic to present on the Mediterranean island of Sardinia. Nature Communications. 2020; 11 (1): 939. Supplementary material, Supplementary Fig. 8. Mittnik A, Wang CC, Pfrengle S, Daubaras M, Zariņa G, Hallgren F, et al. The genetic prehistory of the Baltic Sea region. Nature Communications. 2018; 9(1):1–1. Grugni, V., Raveane, A., Colombo, G., Nici, C., Crobu, F., Ongaro, L., et al. Y-chromosome and surname analyses for reconstructing past population structures: The Sardinian population as a test case. International Journal of Molecular Sciences. 2019; 20 (22): 5763. Anthony, D. W., Khokhlov, A. A., Agapov, S. A., Agapov, D. S., Schulting, R., Olalde, I., Reich, D. The Eneolithic cemetery at Khvalynsk on the Volga River. Praehistorische Zeitschrift. 2022; 97 (1):22–67. Palo, J. U., Ulmanen, I., Lukka, M., Ellonen, P., Sajantila, A. Genetic markers and population history: Finland revisited. European Journal of Human Genetics. 2009; 17 (10):1336–1346. Cruciani F, La Fratta R, Trombetta B, Santolamazza P, Sellitto D, Colomb EB, et al. Tracing past human male movements in northern/eastern Africa and western Eurasia: new clues from Y-chromosomal haplogroups E-M78 and J-M12. Molecular Biology and Evolution. 2007; 24(6):1300–1311. Sarno S, Tofanelli S, De Fanti S, Quagliariello A, Bortolini E, Ferri G, et al. Shared language, diverging genetic histories: high-resolution analysis of Y-chromosome variability in Calabrian and Sicilian Arbereshe. European Journal of Human Genetics. 2016; 24(4):600–606. Pericic, M., Lauc, L. B., Klaric, I. M., Rootsi, S., Janićijević, B., Rudan, I., et al. High-resolution phylogenetic analysis of southeastern Europe traces major episodes of paternal gene flow among Slavic populations. Molecular Biology and Evolution. 2005; 22 (10):1964–1975. Semino, O., Passarino, G., Oefner, P. J., Lin, A. A., Arbuzova, S., Beckman, L. E., et al. The genetic legacy of Paleolithic Homo sapiens sapiens in extant Europeans: AY chromosome perspective. Science. 2000; 290 (5494):1155–1159. Marjanovic, D., Fornarino, S., Montagna, S., Primorac, D., Hadziselimovic, R., Vidovic, S., et al. The peopling of modern Bosnia-Herzegovina: Y‐chromosome haplogroups in the three main ethnic groups. Annals of Human Genetics. 2005; 69 (6):757–763. Regueiro, M., Rivera, L., Damnjanovic, T., Lukovic, L., Milasin, J., Herrera, R. J. High levels of Paleolithic Y-chromosome lineages characterize Serbia. Gene. 2012; 498 (1):59–67. Šarac, J., Šarić, T., Havaš Auguštin, D., Novokmet, N., Vekarić, N., Mustać, M., et al. Genetic heritage of Croatians in the Southeastern European gene pool—Y chromosome analysis of the Croatian continental and Island population. American Journal of Human Biology. 2016; 28 (6):837–845. Grasgruber, P., Mašanović, B., Prce, S., Popović, S., Arifi, F., Bjelica, D., et al. Mapping the Mountains of Giants: Anthropometric Data from the Western Balkans Reveal a Nucleus of Extraordinary Physical Stature in Europe. Biology. 2022; 11 (5):786. Masanovic, B., Bavcevic, T., Prskalo, I. Regional differences in adult body height in Kosovo. Montenegrin Journal of Sports Science and Medicine. 2019; 8(1):69. Lazaridis I, Alpaslan-Roodenberg S, Acar A, Açıkkol A, Agelarakis A, Aghikyan L, et al. The genetic history of the Southern Arc: A bridge between West Asia and Europe. Science. 2022; 377(6609): eabm4247. Utevska OM. [The gene pool of Ukrainians according to different systems of genetic markers: the origin and place in the European genetic landscape]. Doctoral dissertation. National Research Center for Radiation Medicine of National Academy of Sciences of Ukraine. 2017. https://drive.google.com/file/d/0B1bUIW1YACgZaHlTR3NEWlNjUU 0/view?pli=1&resourcekey=0-TyXs2Z6J3zo5CJBcY7KMyw p. 20–21. Mihajlovic, M., Tanasic, V., Markovic, M. K., Kecmanovic, M., Keckarevic, D. Distribution of Y-chromosome haplogroups in Serbian population groups originating from historically and geographically significant distinct parts of the Balkan Peninsula. Forensic Science International: Genetics. 2022; 61: 102767. Žegarac, A., Winkelbach, L., Blöcher, J., Diekmann, Y., Krečković Gavrilović, M., Porčić, M., et al. Ancient genomes provide insights into family structure and the heredity of social status in the early Bronze Age of southeastern Europe. Scientific Reports. 2021; 11(1):10072. Olalde, I., Carrión, P., Mikić, I., Rohland, N., Mallick, S., Lazaridis, I., Mah, M., Korać, M., Golubović, S., Petković, S. and Miladinović-Radmilović, N. A genetic history of the Balkans from Roman frontier to Slavic migrations. Cell. 2023; 186(25):5472–5485. Coon, C.S. The Races of Europe . New York: The Macmillan Company. 1939; p. 587–595. https://archive.org/details/racesofeurope031695mbp . Evershed, R. P., Davey Smith, G., Roffet-Salque, M., Timpson, A., Diekmann, Y., Lyon, M. S., et al. Dairying, diseases and the evolution of lactase persistence in Europe. Nature. 2022; 608(7922):336–345. Ehler, E., Vančata, V. Neolithic transition in Europe: evolutionary anthropology study. Anthropologie (1962-), 2009; 47(3):185–193. Ruff, C. B. (Ed.). Skeletal variation and adaptation in Europeans: Upper Paleolithic to the twentieth century . John Wiley & Sons. 2017. Burger, J., Link, V., Blöcher, J., Schulz, A., Sell, C., Pochon, Z., et al. Low prevalence of lactase persistence in Bronze Age Europe indicates ongoing strong selection over the last 3,000 years. Current Biology. 2020; 30(21):4307–4315. Wilkin, S., Ventresca Miller, A., Fernandes, R., Spengler, R., Taylor, W. T. T., Brown, D. R., et al. Dairying enabled Early Bronze Age Yamnaya steppe expansions. Nature. 2021; 598 (7882):629–633. Kruts S.I. Paleoantropologicheskiye issledovaniya Stepnogo Podneprovya (epokha bronzy) [Paleoanthropological studies of the Steppe Dnieper region (Bronze Age)] . Kiev: Nauk. Dumka. 1984; p. 22, 43. Scott, A., Reinhold, S., Hermes, T., Kalmykov, A. A., Belinskiy, A., Buzhilova, A., et al. Emergence and intensification of dairying in the Caucasus and Eurasian steppes. Nature Ecology & Evolution. 2022; 6(6):813–822. Reilly, B. J. Revisiting Bedouin Desert Adaptations: Lactase Persistence as a Factor in Arabian Peninsula History. Journal of Arabian Studies. 2012; 2 (2):93–107. Enattah, N. S., Jensen, T. G., Nielsen, M., Lewinski, R., Kuokkanen, M., Rasinpera, H., et al. Independent introduction of two lactase-persistence alleles into human populations reflects different history of adaptation to milk culture. The American Journal of Human Genetics. 2008; 82(1):57–72. Lüning, S., Vahrenholt, F. Holocene climate development of North Africa and the Arabian Peninsula. In: Bendadoud, A., et al. (Eds.) The Geology of the Arab World - An Overview . Springer. 2019; p. 524–527. Wells JC, Saunders MA, Lea AS, Cortina-Borja M, Shirley MK. Beyond Bergmann's rule: Global variability in human body composition is associated with annual average precipitation and annual temperature volatility. American Journal of Physical Anthropology. 2019; 170(1):75–87. Tables Tables 1 to 7 are available in the Supplementary Files section. Additional Declarations No competing interests reported. Supplementary Files Tables.docx Supplementarymaterial.pdf Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-4354427","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":299818389,"identity":"eff04805-3baf-4f57-9903-331a24ef3da2","order_by":0,"name":"Pavel Grasgruber","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA1ElEQVRIiWNgGAWjYBACefaGxMd/KhgY2NjBfAnCWgx7Djw24DkD1MJMrBaGG4nPJHjbgAxmYh3G2JCcbCA5b5s8HzMD62beHRYM/P0H8GthZziW+MBw223DNmYGttu8ZyQYJA4Q0MLY2JNskLjtNiNES5sEgwFjAwGXHeb/JnFwzm17hBaCnjrGkCbZ2HA7EaGFjYAOwx6GZGOGY7eT25gZ227ObZPgkThDQIu8/IPExww1t23ntzcfu/G2rU6OYIghAYi3eYhWPwpGwSgYBaMANwAArRs9XaCOhrMAAAAASUVORK5CYII=","orcid":"","institution":"Masaryk University","correspondingAuthor":true,"prefix":"","firstName":"Pavel","middleName":"","lastName":"Grasgruber","suffix":""}],"badges":[],"createdAt":"2024-05-01 13:09:50","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-4354427/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-4354427/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":56282783,"identity":"6c376fbf-d098-41ab-8c39-fa5db094b8d1","added_by":"auto","created_at":"2024-05-10 21:36:05","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":545284,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eDistribution of the examined \u0026nbsp;\u0026nbsp;physical characteristics in the Old World.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cem\u003eNote:\u003c/em\u003e The data on male height in Figure 1A come from Grasgruber \u0026amp; Hrazdíra [9], with six updated values, and from unpublished research aimed at sub-Saharan Africa (Grasgruber – in preparation). Figure 1B was adapted from Itan et al. [29].\u003c/p\u003e","description":"","filename":"1.png","url":"https://assets-eu.researchsquare.com/files/rs-4354427/v1/f33de1fbe6218f39f7a55a93.png"},{"id":56282766,"identity":"3ead6629-efe4-4c32-901f-be91e4ad5c2b","added_by":"auto","created_at":"2024-05-10 21:35:56","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":506088,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003ePrincipal component analyses displaying inter-population differences in Y haplogroup frequencies. \u0026nbsp;\u003c/strong\u003eThe sample of 60 countries is divided into five regions distinguished by different colors (Western Europe – blue, Eastern Europe – yellow, the non-Arab part of the Near East – dark green, the Arab part of the Near East – light green, North Africa – pink).\u003c/p\u003e","description":"","filename":"2.png","url":"https://assets-eu.researchsquare.com/files/rs-4354427/v1/cd08377561a60bd1f6d9c046.png"},{"id":56282605,"identity":"f2dec1b1-6467-4ff1-9daa-1410ad9ead10","added_by":"auto","created_at":"2024-05-10 21:34:33","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":744473,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eGeographical distribution of the most frequent Y haplogroups and ancestry components in each population. A) Y haplogroups (60 countries). B) Ancestry components for the k=11 model (56 countries without Bahrain, Kuwait, Oman, and Qatar). \u003c/strong\u003eFrequencies that differed by less than 1% are combined and indicated by hatching.\u003c/p\u003e\n\u003cp\u003e\u003cem\u003eAbbreviations:\u003c/em\u003e CHG = Caucasus hunter-gatherers; EHG = Eastern hunter-gatherers.\u003c/p\u003e","description":"","filename":"3.png","url":"https://assets-eu.researchsquare.com/files/rs-4354427/v1/df1b616df1dd159514b42c7a.png"},{"id":56282726,"identity":"6a7f7c90-a669-4a30-b787-d2701a3a649f","added_by":"auto","created_at":"2024-05-10 21:35:24","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":99599,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eFactor analyses of male height and genetic factors in Europe (39 countries), the Near East (12 countries without Bahrain, Kuwait, Oman, and Qatar), in the Near East and North Africa (17 countries without Bahrain, Kuwait, Oman, and Qatar), and in the total sample (56 countries without Bahrain, Kuwait, Oman, and Qatar), according to different ancestry models. \u003c/strong\u003eFor better clarity and meaningfulness, only Y haplogroups and ancestry components with a frequency of at least ≥ 5% in some population were included.\u003c/p\u003e\n\u003cp\u003e\u003cem\u003eAbbreviations:\u003c/em\u003eBHG = Baltic hunter-gatherers; CHG = Caucasus hunter-gatherers; EHG = Eastern hunter-gatherers; WEF Villabruna = Villabruna ancestry in West European Neolithic farmers.\u003c/p\u003e","description":"","filename":"4.png","url":"https://assets-eu.researchsquare.com/files/rs-4354427/v1/906804fdcc4faa6a91469338.png"},{"id":56282727,"identity":"368dd013-5a49-4092-b445-4f76f183032d","added_by":"auto","created_at":"2024-05-10 21:35:24","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":96302,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eRelationships between ancestry components or between ancestry components and their ancestry-specific Y haplogroups.\u003c/strong\u003e\u003c/p\u003e","description":"","filename":"5.png","url":"https://assets-eu.researchsquare.com/files/rs-4354427/v1/c6becadfa5a80b85c159bf35.png"},{"id":56282608,"identity":"5191c7d7-a5da-4e3c-af03-1db75c7d9742","added_by":"auto","created_at":"2024-05-10 21:34:35","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":121697,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eRelationships between male height and Y haplogroup frequencies.\u003c/strong\u003e\u003c/p\u003e","description":"","filename":"6.png","url":"https://assets-eu.researchsquare.com/files/rs-4354427/v1/9c1f1868187092c787387d4a.png"},{"id":56282640,"identity":"70b0747e-91e4-4db8-ae15-912cff85ef81","added_by":"auto","created_at":"2024-05-10 21:34:44","extension":"png","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":107497,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eRelationships between male height and the proportions of ancestry components.\u003c/strong\u003e\u003c/p\u003e","description":"","filename":"7.png","url":"https://assets-eu.researchsquare.com/files/rs-4354427/v1/1d797645b533d30467807c1f.png"},{"id":56282805,"identity":"0ac1de21-5690-4256-9e48-5aa61daa4e53","added_by":"auto","created_at":"2024-05-10 21:36:19","extension":"png","order_by":8,"title":"Figure 8","display":"","copyAsset":false,"role":"figure","size":524857,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eFrequencies of lactose tolerance-associated alleles 13910*T (A) and 13915*G (B).\u003c/strong\u003e\u003c/p\u003e","description":"","filename":"8.png","url":"https://assets-eu.researchsquare.com/files/rs-4354427/v1/db97428684ecf869d7662e5a.png"},{"id":56282607,"identity":"85b846ad-d3a6-4ae6-b483-4a15ac093373","added_by":"auto","created_at":"2024-05-10 21:34:34","extension":"png","order_by":9,"title":"Figure 9","display":"","copyAsset":false,"role":"figure","size":70583,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eFactor analyses of the lactose tolerance-associated alleles and genetic factors. A) Europe (35 countries). B) Near East \u0026amp; North Africa (16 countries). \u003c/strong\u003eFor better clarity and meaningfulness, only Y haplogroups and ancestry components with a frequency of at least ≥ 5% in some country were included.\u003c/p\u003e","description":"","filename":"9.png","url":"https://assets-eu.researchsquare.com/files/rs-4354427/v1/2eb35a0a7ba6f983eea716f5.png"},{"id":56282697,"identity":"10d3230f-d4a2-4952-8813-ab643e9e964d","added_by":"auto","created_at":"2024-05-10 21:35:12","extension":"png","order_by":10,"title":"Figure 10","display":"","copyAsset":false,"role":"figure","size":86182,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eRelationships between the frequencies of lactose tolerance-associated alleles 13910*T (A) and 13915*G (B) and the most strongly associated combinations of Y haplogroups.\u003c/strong\u003e\u003c/p\u003e","description":"","filename":"10.png","url":"https://assets-eu.researchsquare.com/files/rs-4354427/v1/9e7d9e0085cf6789bd5d992b.png"},{"id":56282648,"identity":"6615a960-154e-426b-80a0-eb89fd71ff36","added_by":"auto","created_at":"2024-05-10 21:34:47","extension":"png","order_by":11,"title":"Figure 11","display":"","copyAsset":false,"role":"figure","size":76391,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eRelationships between 13910*T frequency and the proportions of ancestry components in Europe.\u003c/strong\u003e\u003c/p\u003e","description":"","filename":"11.png","url":"https://assets-eu.researchsquare.com/files/rs-4354427/v1/18821af065826f2a67806aa2.png"},{"id":56282597,"identity":"a60d45a3-7e60-4caa-8198-1586cce515e0","added_by":"auto","created_at":"2024-05-10 21:34:31","extension":"png","order_by":12,"title":"Figure 12","display":"","copyAsset":false,"role":"figure","size":83845,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eRelationships between the frequencies of lactose tolerance-associated alleles 13910*T (A) and 13915*G (B) and the most associated combinations of Y haplogroups.\u003c/strong\u003e\u003c/p\u003e","description":"","filename":"12.png","url":"https://assets-eu.researchsquare.com/files/rs-4354427/v1/7867a8266619e8812aca80bb.png"},{"id":56282728,"identity":"f53a7649-6942-4346-ae6c-1f85a9596dbd","added_by":"auto","created_at":"2024-05-10 21:35:28","extension":"png","order_by":13,"title":"Figure 13","display":"","copyAsset":false,"role":"figure","size":883688,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eFigure 12. A) Average male height in the 60 countries examined. B) Predicted male height in 59 countries based on Y haplogroups, after excluding Finland (optimal multiple regression model). \u003c/strong\u003eOnly 11 Y haplogroups with a presumed causal role were included: positive (I1, I2a-P37.2, N, Q, R1a, R1b-U106) and negative (E1b-M123, G2a* \u0026amp; G2a2, J1, J2, T). \u003cstrong\u003eC) The frequency of Villabruna\u003c/strong\u003e\u003csup\u003e\u003cstrong\u003ek=7\u003c/strong\u003e\u003c/sup\u003e\u003cstrong\u003e ancestry and three main sources of its expansion characterized by specific Y haplogroups. D) Predicted male height in 56 countries based on ancestry components in the k=12 model (optimal multiple regression model).\u003c/strong\u003e Only four ancestry components with a presumed causal role were included: positive (BHG, Yamnaya) and negative (Natufian, WEF Villabruna).\u003c/p\u003e","description":"","filename":"13.png","url":"https://assets-eu.researchsquare.com/files/rs-4354427/v1/4d7a546bb314259846fdfe4a.png"},{"id":56282767,"identity":"ada244f5-fc53-4bf1-80d4-06f6273d0c4d","added_by":"auto","created_at":"2024-05-10 21:35:59","extension":"png","order_by":14,"title":"Figure 14","display":"","copyAsset":false,"role":"figure","size":648226,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eFigure 13. The distribution of male \u0026nbsp;\u0026nbsp;height in the Dinaric Alps (former Yugoslavia and Albania). \u003c/strong\u003eTaken from Grasgruber et al. [50]. The dotted black \u0026nbsp;\u0026nbsp;line demarcates the area of the Dinaric mountain range.\u003c/p\u003e","description":"","filename":"14.png","url":"https://assets-eu.researchsquare.com/files/rs-4354427/v1/e0ca814fb8d269e0f5a8d871.png"},{"id":56282690,"identity":"56b7c559-b3a8-4327-b3cb-b302ec248033","added_by":"auto","created_at":"2024-05-10 21:35:09","extension":"png","order_by":15,"title":"Figure 15","display":"","copyAsset":false,"role":"figure","size":496746,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eFigure 14. A) \u0026nbsp;\u0026nbsp;Inherent land quality in Europe. \u003c/strong\u003e\u003cem\u003eSource:\u003c/em\u003e\u003cstrong\u003e \u003c/strong\u003eUSDA-NRCS, \u0026nbsp;\u0026nbsp;Soil Science Division, World Soil Resources, Washington D.C. (1998). \u003ca href=\"https://www.nrcs.usda.gov/wps/portal/nrcs/detail/soils/use/worldsoils/?cid=nrcs142p2_054011\"\u003ehttps://www.nrcs.usda.gov\u003c/a\u003e. This file is licensed under the Creative Commons \u0026nbsp;\u0026nbsp;Attribution-Share Alike 4.0 International license. \u003cstrong\u003eB) Relationship between 13910*T frequency in Europe and the mean per \u0026nbsp;\u0026nbsp;capita supply of dairy proteins (FAOSTAT, Food balance statistics, 2010-2021. \u0026nbsp;\u0026nbsp;https://www.fao.org/faostat/en/#data/FBS). \u003c/strong\u003e\u003cem\u003eNote:\u003c/em\u003e The mean supply for the same period was 32.0 g/day in \u0026nbsp;\u0026nbsp;Montenegro and 33.4 g/day in Albania, at the assumed ~20-25% frequency of \u0026nbsp;\u0026nbsp;13910*T.\u003c/p\u003e","description":"","filename":"15.png","url":"https://assets-eu.researchsquare.com/files/rs-4354427/v1/e1cb36ab917b7f953f717dbc.png"},{"id":62454099,"identity":"9b830ab8-1bb1-463c-a37f-9fec4d862c5a","added_by":"auto","created_at":"2024-08-14 11:14:21","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":6025887,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-4354427/v1/b1ce8f81-3f07-4175-83f1-14780c0a4a5a.pdf"},{"id":56282603,"identity":"e23e90f0-fd38-4ee2-b1cd-379e5bf3e8f7","added_by":"auto","created_at":"2024-05-10 21:34:32","extension":"docx","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":71948,"visible":true,"origin":"","legend":"","description":"","filename":"Tables.docx","url":"https://assets-eu.researchsquare.com/files/rs-4354427/v1/f73b489c061ad764f2bb14d8.docx"},{"id":56282810,"identity":"695f7947-99a6-4a78-89bf-738c9f6d2782","added_by":"auto","created_at":"2024-05-10 21:36:39","extension":"pdf","order_by":2,"title":"","display":"","copyAsset":false,"role":"supplement","size":9793895,"visible":true,"origin":"","legend":"","description":"","filename":"Supplementarymaterial.pdf","url":"https://assets-eu.researchsquare.com/files/rs-4354427/v1/ddbc8a69c9ce7b113cb80b96.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Genetic ancestry and male founder effects explain differences in height and lactose tolerance in 60 Caucasian populations","fulltext":[{"header":"1. Introduction","content":"\u003cp\u003eThe boom in genetic studies during the last three decades has also brought interest in the research of important phenotypic traits such as body height, obesity, lactose tolerance, pigmentation, and various other adaptations related to diet and health. The heritability of these traits is investigated by genome-wide association studies (GWAS), which are aimed at identifying causally associated genetic loci. However, finding such associations is difficult because they may be confounded by environmental factors, are ethnic-specific, influenced by linkage disequilibrium (a strong relationship between the occurrence of specific alleles, which are not necessarily all linked to the same phenotypic traits), and often result from extreme polygenicity - the cumulative effect of a huge number (tens of thousands) of genetic variants, which by themselves explain only a negligible part of the total variability [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eOne such trait that is very difficult to predict reliably is body height (Fig.\u0026nbsp;1A). Although this physical characteristic is highly heritable, it is also strongly influenced by the environment, and hence, it is even used as a tool to study historical changes in health and the quality of life [\u003cspan additionalcitationids=\"CR4 CR5\" citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e]. Previous papers on this topic [\u003cspan additionalcitationids=\"CR8\" citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e] have shown that the environmental factors most directly influencing physical growth are nutrition (protein consumption) and child mortality (which represents the occurrence of infectious diseases that exhaust growth potential). The effect of these variables is mediated through various socio-economic factors. Of these, total fertility appears to have the most independent position in regression analyses as it influences the distribution of resources within families and reflects more subtle aspects of childcare, such as the length of breastfeeding. The most informative socioeconomic indicator with the greatest predictive power (\u003cem\u003er\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.83, \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.001 in 96 countries) is the inequality-adjusted Human Development Index (HDI), which combines GDP per capita, life expectancy, and the level of education. For these understandable reasons, GWAS examining the genetic determinants of height are fraught with fundamental problems and many genetic loci identified as causal predictors of stature appear only as spurious correlates [\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e]. Still, a recent study [\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e] reported the identification of 12,111 independent single nucleotide polymorphisms (SNPs) that were able to explain up to 45% of the inter-individual differences in height in European participants \u0026ndash; supposedly the maximum that can be achieved from the combination of SNP markers.\u003c/p\u003e \u003cp\u003eDespite this significant progress, the accuracy of these polygenic predictions (polygenic scores) in non-European populations remains much lower (24 \u0026ge; %) because they are based on samples of predominantly European ancestry. Furthermore, even \u0026lsquo;European\u0026rsquo; polygenic height scores may not be applicable to populations inhabiting Europe in previous historical periods [\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e]. In fact, the genetic potential of the Epigravettian hunter-gatherers from the Adriatic glacial refugium (Villabruna cluster, also known as \u0026lsquo;Western European hunter-gatherers\u0026rsquo;/WHG) and the Eneolithic steppe populations (Yamnaya cluster) is notoriously inconsistent in light of modern polygenic estimates [\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e, \u003cspan additionalcitationids=\"CR14 CR15 CR16\" citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e]. Therefore, some authors try to synchronize polygenic height scores in ancient skeletons with their reconstructed height [\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e]. Others have already given up on calculating polygenic scores and only examine the relationship between the proportion of ancestry components and physical characteristics at the individual level [\u003cspan additionalcitationids=\"CR20\" citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eEssentially the same methodology \u0026ndash; albeit at the level of countries \u0026ndash; is the subject of the present work. In the studies mentioned above [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e], such an approach was already tested with Y haplogroups (Y hgs) \u0026ndash; haplotypes on the male Y chromosome that are inherited from father to son. Although the Y chromosome contains only a limited number of protein-coding genes, which are mostly associated with the male phenotype and reproductive function [\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e], Y haplogroups are still an excellent tool for studying interpopulation genetic variability because they are signatures of male \u0026lsquo;founder effects\u0026rsquo; \u0026ndash; a rapid spread of the phenotype of the dominant male within patrilocal (and/or polygamous) societies. A unique historical laboratory for mapping these processes is the Late Eneolithic and Early Bronze Age period in Europe (~\u0026thinsp;2900\u0026thinsp;\u0026minus;\u0026thinsp;1600 cal. BC), when migrating steppe pastoralists from the Yamnaya culture and their later genetic derivatives (Corded Ware culture, Bell Beaker culture, \u0026Uacute;nětice culture) largely replaced the autochthonous inhabitants of Europe. Y chromosomal lineages that survived this turbulent period can currently explain a large percentage of variation (\u0026gt;\u0026thinsp;50%) in height across European countries. The most important position has Y hg I, which was initially the main paternal lineage of Mesolithic hunter-gatherers from the Villabruna cluster [\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e]. The effect of Y hg I is potentiated in combination with R1b-U106, which accompanied the first wave of the Corded Ware culture in Central Europe [\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e]. On the other hand, the short height of European nations correlates with another Yamnaya-derived lineage (R1b-S116) associated with the Bell Beaker culture [\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e], and especially with Y haplogroups of Near Eastern origin (E, G, J), which accords with the results of most paleogenetic studies quoted above that attribute low genetic potential for height to Near Eastern farmers.\u003c/p\u003e \u003cp\u003eThe major aim of the present study is to update and significantly expand the spectrum of Y haplogroups in the Caucasian populations of Europe, the Near East, and North Africa, which are mutually genetically interconnected [\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e] and share multiple physical traits. This should reveal more detailed relationships between male founder effects and height that were not evident before. Furthermore, given the growing amount of high-quality genomic data on many world nations, it would be interesting if such a study also incorporated autosomal ancestry components.\u003c/p\u003e \u003cp\u003eThe second objective again relates to the papers mentioned above [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e], which found strong associations between Y haplogroups and phenotypic lactose tolerance. In Europe, there exists an apparent connection between lactose tolerance and Y haplogroups I, R1b-S116, and R1b-U106, whereas an independent nucleus of lactose tolerance in the Near East is related to the geographical distribution of Y haplogroup J1. At the same time, the occurrence of lactose tolerance in these two areas is characterized by the presence of different alleles [\u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e] (Fig.\u0026nbsp;1B). These findings should be placed into the context of new studies investigating the evolution of lactose tolerance in Europe and the Eurasian steppes since the Late Eneolithic.\u003c/p\u003e"},{"header":"2. Materials and methods","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003e2.1 Physical characteristics\u003c/h2\u003e \u003cp\u003eInformation on the current mean height of young males in Europe, the Near East, and North Africa was taken from Grasgruber \u0026amp; Hrazd\u0026iacute;ra [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e] but included six updated countries (Supplementary dataset, Sheet 1). The means come from recent studies finished between 2004\u0026ndash;2018 and are not necessarily the most recent, as the primary goal was to synchronize them with four variables used for environmental adjusting (nutrition, child mortality, total fertility, Human Development Index), which were calculated for the period 1995\u0026ndash;2013. Despite the availability of information on height for Kosovo, Israel, and Luxembourg, the latter two countries must have been excluded because no representative genetic data were available (see below). Data on Kosovo also had information gaps. Therefore, the examined sample consisted of only 60 countries.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec4\" class=\"Section2\"\u003e \u003ch2\u003e2.2 Y haplogroups\u003c/h2\u003e \u003cp\u003eThe spectrum of Y haplogroups examined in previous studies [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e] included E1b-M78, E1b-M81, G (G-M201), I (I-M170), J (J-M304), J1 (J1-M267), J2 (J2-M172), R1a (R1a\u0026ndash;M420), R1b (R1b\u0026ndash;M343), R1b-S116 and R1b-U106. In the present study, this number was expanded to incorporate Y haplogroups E1b-M123, L (L-M20), N (N-M231), Q (Q-M242), T (T-M184), three major subbranches of Y hg I (I1, I2a-P37.2, I2a-M223), and five subbranches of Y hg G. Eventually, 12 major Y haplogroups were selected for this study and supplemented by 13 subbranches of E, G, I, and R1b. Since the frequency of G2 is virtually identical to G2a, only G2 was preferred, and the analyses thus included a total of 24 Y chromosome lineages. Five other subbranches (I2-M438, I2a-M26, I2a-M436, R1b-M269, R1b-L23xM412/R1b-Z2103) were not available for all countries, but due to their significance, some of them were used for supplementary analyses. (Note that these Y haplogroup statistics, as well as the statistics on autosomal ancestry components, are not included in the Supplementary material for the time being, as they will be utilized for several future studies.)\u003c/p\u003e \u003cp\u003eIn general, only samples with a minimum of 50 individuals were considered, and the essential goal was to find samples with the highest possible representativeness, including hundreds to thousands of individuals. From the beginning, it was clear that no usable data for Israel and Luxembourg would be available, which was the main limiting factor for the number of countries included in the present study. In addition, given the paucity of sources for the subbranches of E, G, I, and R1b, their frequencies were sometimes collected from different and usually less representative studies than the 12 major Y haplogroups. At the same time, an emphasis was placed on ensuring that the total frequencies of Y hgs E, G, I, and R1b in these studies were similar and did not differ by more than 3%.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec5\" class=\"Section2\"\u003e \u003ch2\u003e2.3 Genotypic lactose tolerance\u003c/h2\u003e \u003cp\u003eThe genotypic frequency of lactose tolerance is listed in the Supplementary dataset, Sheet 2. Although new data on phenotypic lactose tolerance have become available since the publication of the previous work [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e], they are influenced by different methodologies, often vary widely within the same regions, and some reported numbers (especially from developing countries) are clearly erroneous and unusable. In contrast, genotypic data can be used to trace the origin of specific lactose tolerance-associated alleles directly. In total, the dataset contains the frequencies of five alleles typical of Europe, the Near East, or North Africa (13910*T, 13915*G, 14009*G, 13907*G, 14010*C), as well as the frequencies of observed genotypic lactose tolerance (calculated as the sum of homozygotes and heterozygotes, i.e. T/T\u0026thinsp;+\u0026thinsp;C/T in the case of 13910*T) and the frequencies of predicted genotypic lactose tolerance (calculated from the number of homozygotes and heterozygotes using Hardy\u0026ndash;Weinberg equilibrium). Given the scarcity of information on 14009*G, 13907*G, and 14010*C, and their limited geographical distribution in the countries under investigation, only 13910*T (European allele) and 13915*G (Near Eastern allele) were analyzed in detail.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec6\" class=\"Section2\"\u003e \u003ch2\u003e2.4 Ancestry components\u003c/h2\u003e \u003cp\u003eInformation on five major autosomal ancestry components typical of Europe was kindly provided by W. Barrie via personal communication. These data were used in the pre-print by Allentoft et al. [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e] and include the genetic clusters Villabruna (or West European hunter-gatherers, WHG), Eastern European hunter-gatherers (EHG), Caucasus hunter-gatherers (CHG), Anatolian Neolithic (or \u0026lsquo;Neolithic farmers\u0026rsquo;), and Yamnaya. Their frequencies were available for at least two individuals in 37 out of 39 European countries and in 16 out of 21 countries of the Near East and North Africa, and were extracted from a model of seven autosomal components (k\u0026thinsp;=\u0026thinsp;7), which also included East Asian and sub-Saharan African ancestry (represented mostly in very small proportions and hence unimportant for the purpose of the present study). The missing frequencies for Moldova and Montenegro were estimated from the average of neighboring countries/territories (Albania, Bosnia and Herzegovina, Croatia, Kosovo, and Serbia in the case of Montenegro; Romania and Ukraine in the case of Moldova). The ancestry proportions in the Near East and North Africa were unusable because data for five neighboring countries (Bahrain, Oman, Qatar, Saudi Arabia, and the United Arab Emirates) were missing and the k\u0026thinsp;=\u0026thinsp;7 model did not distinguish two autosomal clusters (Natufian and Taforalt), which are crucial for understanding the genetic history of Arabia and Northwestern Africa.\u003c/p\u003e \u003cp\u003eFor all these reasons, two new ancestry models were tested, provided by the commercial website AncestralWhispers.org and based on the Global25 database, which collects information on 300,000 SNPs from published scientific sources. The first model (k\u0026thinsp;=\u0026thinsp;11) included the most elementary, pre-Holocene ancestry components: Villabruna, EHG (Eastern hunter-gatherers), Anatolian Neolithic, CHG (Caucasus hunter-gatherers), Iranian Neolithic, Natufian, Taforalt, Indus Valley, East Asian, Nilotic, West African. The last four components were little represented and had only a supplementary role. The second model (k\u0026thinsp;=\u0026thinsp;12) combined pre-Holocene and Holocene components, adding Yamnaya and replacing EHG with BHG (Baltic hunter-gatherers). Both these models were available for 56 countries (except for Bahrain, Kuwait, Oman, and Qatar).\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec7\" class=\"Section2\"\u003e \u003ch2\u003e2.5 Statistical analyses\u003c/h2\u003e \u003cp\u003eBesides the common Pearson linear correlations, the relationships among the variables examined were assessed by principal component (PCA) analyses and factor analyses, using the statistical software Statistica 14.0 and PAST. Since physical growth is strongly influenced by the environment, the results concerning height were adjusted for nutrition and socio-economic indicators in multiple regression models. These models were also used for the estimation of genetic differences in height based on the genetic factors examined.\u003c/p\u003e \u003c/div\u003e"},{"header":"3. Results","content":"\u003cdiv id=\"Sec9\" class=\"Section2\"\u003e \u003ch2\u003e3.1 Y haplogroups\u003c/h2\u003e \u003cp\u003eThe current inter-population differences in Y haplogroup frequencies in Europe, the Near East, and North Africa are expressed by PCA analyses in Figs.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e2\u003c/span\u003eA-\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e2\u003c/span\u003eB. A simplified picture showing the most frequent Y haplogroup in each population is shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e3\u003c/span\u003eA. The geographical distribution of individual Y haplogroups and their combinations is displayed in Supplementary Figs.\u0026nbsp;1\u0026ndash;13. In Europe, the minimum spanning tree branches out from Hungary into three major regions: The first one (Western and Northern Europe) is characterized by the dominance of R1b-S116, I1, and R1b-U106; the second one (East-Central and Eastern Europe) is rich in N and/or R1a; the third one (Southeastern Europe) has the highest frequency of I2a-P37.2, E1b-M78, and J2. Other Y haplogroups have too low frequencies, but a noteworthy phenomenon is the disproportionate presence of G2a* \u0026amp; G2a2 in the Central Mediterranean.\u003c/p\u003e \u003cp\u003eIn the Near East and North Africa, the root of the minimum spanning tree is centered in Lebanon and we can clearly see a separation of North African countries - an effect of the autochthonous Berber lineage E1b-M81. The Near Eastern nations significantly diverge along the Factor 2 axis: Whereas the Caucasus (non-Arab) region is characterized by the high frequency of G, J2, and R1b, in the Arabian Peninsula, we mostly find the predominance of J1. Y haplogroups E1b-M78, E1b-M123, I, L, R1a, and T have a central position. \u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cem\u003e\u003cstrong\u003e3.2 Ancestry components\u0026nbsp;\u003c/strong\u003e\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003eThe geographical distribution of autosomal ancestry components is displayed in Supplementary Figures 14-18. The most frequent pre-Holocene ancestry components (in the k=11 model) are shown in Figure 3B, which clearly illustrates the dominance of Anatolian Neolithic ancestry. Other autosomal clusters have a more peripheral distribution: EHG in Northeastern Europe, CHG in Georgia, Iranian Neolithic in Iran, and Natufian in Arabia and Egypt. The Villabruna component is a minor element with a regional peak in the Baltic region and the Taforalt component is concentrated in Northwestern Africa. The Yamnaya component (a mixture of CHG and EHG, not included in this model) is the most frequent in the northern zone of Europe, from Iceland and Ireland through Scandinavia to the Baltic region and northwestern Russia.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eNevertheless, it should be emphasized that in contrast with Y haplogroups, the frequencies of these components are not fixed numbers and depend both on their choice and their number. As a result, absolute frequencies will always change when the ancestry model is changed, and only relative proportions may remain more or less constant. However, even relative proportions can be significantly skewed when the model includes ancestry components that are closely related. The meaningfulness of the ancestry models can, therefore, be verified by their mutual comparison (Supplementary Table 1) or by correlations with ancestry-specific markers \u0026ndash; in this case Y haplogroups (Supplementary Tables 2-8).\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eThe k=7 model (Figure 4A) distinguishes only five components relevant for 39 European countries (Villabruna, EHG, Yamnaya, Anatolian Neolithic, CHG). As a result, the situation is simplified and three missing ancestries (Iranian Neolithic, Natufian, Taforalt) are naturally merged with their most closely related counterparts: Iranian Neolithic with CHG, and Natufian and Taforalt with Anatolian Neolithic. In addition, the k=7 model does not differentiate a specific Villabruna component that once introgressed into the gene pool of West European farmers (designated as \u0026lsquo;WEF Villabruna\u0026rsquo; in the present study) and that is apparently combined with Anatolian Neolithic. Since Yamnaya is a mixture of CHG and EHG [25], the residual frequencies of CHG and EHG reflect their population history unrelated to the Yamnaya formation.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eThe more detailed k=11 model (Figure 4B) is available for 56 countries and includes seven components with meaningfully high frequencies (Villabruna, EHG, Anatolian Neolithic, CHG, Iranian Neolithic, Natufian, Taforalt). It does not distinguish Yamnaya and offers the \u0026lsquo;purest\u0026rsquo; proportions of EHG and CHG, which are, inevitably, much higher than in the k=7 model. However, some CHG in the Mediterranean is merged with other Near Eastern clusters. The Villabruna component now incorporates WEF Villabruna, which explains why Villabruna\u003csup\u003ek=11\u003c/sup\u003e frequencies in Western Europe are disproportionately higher when compared with Villabruna\u003csup\u003ek=7\u003c/sup\u003e (Figure 5A). On the other hand, Villabruna\u003csup\u003ek=11\u003c/sup\u003e frequencies in Eastern Europe are somewhat deflated, which must be ascribed to the broadly distorting effect of EHG ancestry: EHG is originally an Upper Paleolithic mixture of Villabruna and Ancient North Eurasian (ANE) ancestry [31], and EHG populations later mixed with Villabruna populations in the Mesolithic Balkans and particularly in Scandinavia and the Baltic region [14, 32], creating a specific \u0026lsquo;Baltic hunterer-gatherer\u0026rsquo; ancestry (BHG).\u003c/p\u003e\n\u003cp\u003eThe most detailed k=12 model (Figures 4C-4F)\u0026nbsp;is likewise available for 56 countries. Similar to the k=7 model, it\u0026nbsp;separates Yamnaya as an independent genetic cluster and in addition, it replaces EHG with the above-mentioned BHG. The geographical distribution of BHG is most similar to EHG\u003csup\u003ek=11\u003c/sup\u003e (\u003cem\u003er\u0026nbsp;\u003c/em\u003e= 0.95, \u003cem\u003ep\u003c/em\u003e \u0026lt; 0.001), confirming that EHG\u003csup\u003ek=11\u003c/sup\u003e does not represent \u0026lsquo;pure\u0026rsquo; EHG and includes a significant proportion of Villabruna ancestry. Given that the frequency of BHG is higher than the sum of EHG\u003csup\u003ek=7\u003c/sup\u003e and Villabruna\u003csup\u003ek=7\u003c/sup\u003e, BHG may also include some proportion of WEF Villabruna, a component that is now singled out in Western Europe and whose frequencies appear to be underestimated, being non-zero in only seven countries. In both models k=11 and k=12, we find practically the same absolute frequencies of the Natufian and Taforalt components (\u003cem\u003er\u0026nbsp;\u003c/em\u003e= 1.00, \u003cem\u003ep\u003c/em\u003e \u0026lt; 0.001), and the Iranian Neolithic component (\u003cem\u003er\u003c/em\u003e = 0.99, \u003cem\u003ep\u003c/em\u003e \u0026lt; 0.001). Furthermore, we can also find a very high mutual concordance regarding Anatolian Neolithic ancestry in all three models in Europe, although the absolute frequency of Anatolian Neolithic\u003csup\u003ek =12\u003c/sup\u003e is ~8% lower.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cem\u003e\u003cstrong\u003e3.3 Relationships between ancestry components and Y haplogroups\u003c/strong\u003e\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003eIn general, the proportion of Anatolian Neolithic, Iranian Neolithic, Natufian, and Taforalt components is the most consistent across all models and the meaningfulness of these data can be further demonstrated by the impressive relationships of Natufian and Taforalt with their ancestry-specific Y haplogroups (Figures 5B-5C). Besides J1, Natufian is also significantly associated with E1b-M123 and T, and all these three Y haplogroups can be designated as typically \u0026lsquo;Arabic\u0026rsquo;. On the other hand, the connection between Anatolian Neolithic and its original paternal signature Y hg G (especially G2a2) [33] is completely diluted in the Near East. A relatively strong relationship is retained only in Europe, but even here, Anatolian Neolithic is more strongly correlated with Y hg J2 (Supplementary Figures 19A-19D).\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eCHG ancestry and Iranian Neolithic ancestry share a relatively recent common origin [34] and both were originally accompanied mainly by Y hgs J2 and L [35-36]. In today\u0026rsquo;s Near East and North Africa, the situation is very different and CHG is mainly correlated with Y hg G2 in the Caucasus area, whereas Iranian Neolithic shows the only noteworthy connection with Yamnaya-associated R1a (Supplementary Figures 20A-20B). Since Y hg G has low diversity in the Caucasus and is not supposed to be autochthonous [37], we must assume a strong founder effect and extensive replacement of local CHG-associated paternal lineages caused by the intrusion of populations with Anatolian Neolithic ancestry. This process may not have included only Y hg G but also other Y haplogroups, especially I and J2 (Supplementary Figures 20C-20D), as indicated by their tight clustering in Figure 4E.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eThe results regarding EHG and Villabruna (and their Holocene derivatives BHG and Yamnaya) are the least coherent, which stems from their long historical interconnection.\u0026nbsp;The Villabruna component was typical of Mesolithic Europe west of the Carpathians\u0026nbsp;and its original Y haplogroups were I and R1b-V88 [24-25, 38]. These lineages also prevailed in the mixed EHG \u0026amp; Villabruna (BHG) populations between the Balkans and Northern Europe during the Mesolithic. At present, the highest frequency of BHG-derived Villabruna (Villabruna\u003csup\u003ek=7\u003c/sup\u003e) can be found in the Baltic region (Latvia and Lithuania), but its original Y haplogroups were already replaced by\u0026nbsp;R1a during the Late Eneolithic (Corded Ware) period and by the Uralic lineage N during the Iron Age [33, 39]. Consequently, Villabruna\u003csup\u003ek=7\u003c/sup\u003e is currently associated mainly with N and R1a, and the combination of these two Y haplogroups is significantly complementary in this regard (\u003cem\u003er\u0026nbsp;\u003c/em\u003e= 0.90, \u003cem\u003ep\u003c/em\u003e \u0026lt; 0.001) (Supplementary Figure 21A). This means that Y hg N is practically interchangeable with R1a and represents the same autosomal ancestry. The mismatch between N \u0026amp; R1a and Villabruna ancestry later spread with Slavic-speaking populations to Central Europe (Figure 5D). However, there are two other populations with a notable proportion of Villabruna ancestry that retained Y hg I and experienced extensive geographic expansion. One is typical of Scandinavia (especially Sweden), with the predominance of Y hg I1, and the other can be found in the Western Balkans, with the overwhelming dominance of I2a-M423 (a subbranch of I2a-P37.2) [37].\u003c/p\u003e\n\u003cp\u003eThe population history of WEF Villabruna in Western Europe was very different: This ancestry component correlates positively with Anatolian Neolithic\u003csup\u003ek=7\u003c/sup\u003e (\u003cem\u003er\u003c/em\u003e = 0.46, \u003cem\u003ep\u0026nbsp;\u003c/em\u003e= 0.003), a West European Y haplogroup R1b-S116 (\u003cem\u003er\u0026nbsp;\u003c/em\u003e= 0.70, \u003cem\u003ep\u003c/em\u003e \u0026lt; 0.001), and especially with I2a-M26 (\u003cem\u003er\u0026nbsp;\u003c/em\u003e= 0.79, \u003cem\u003ep\u0026nbsp;\u003c/em\u003e\u0026lt; 0.001 in 29 countries) (Supplementary Figures 21B-21D). I2a-M26 is a subbranch of I2a-P37.2, which introgressed into the gene pool of West European farmers and is most widespread in Sardinia (38.9%) [33, 40].\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eThe EHG component dominated east of the Carpathians during the Mesolithic and its main\u0026nbsp;Y haplogroups were Q, R1a, and R1b [15, 25, 41]. Almost all non-Yamnaya EHG ancestry in today\u0026rsquo;s Europe is descended from the mixed BHG population,\u0026nbsp;which can be illustrated by a very high geographical correlation between EHG\u003csup\u003ek=7\u003c/sup\u003e and Villabruna\u003csup\u003ek=7\u003c/sup\u003e (\u003cem\u003er\u0026nbsp;\u003c/em\u003e= 0.87, \u003cem\u003ep\u003c/em\u003e \u0026lt; 0.001) (Supplementary Figure 21E). Consequently, EHG\u003csup\u003ek=7\u003c/sup\u003e shares many relationships with Villabruna\u003csup\u003ek=7\u003c/sup\u003e, including a very strong association with N \u0026amp; R1a (\u003cem\u003er\u003c/em\u003e = 0.94, \u003cem\u003ep\u003c/em\u003e \u0026lt; 0.001) (Supplementary Figure 21F). The combination of EHG\u003csup\u003ek=7\u0026nbsp;\u003c/sup\u003eand Villabruna\u003csup\u003ek=7\u003c/sup\u003e components is, therefore, practically interchangeable with BHG\u003csup\u003ek=12\u003c/sup\u003e, although their sum is lower than the frequency of BHG\u003csup\u003ek=12\u003c/sup\u003e (Supplementary Figures 22A-22F).\u003c/p\u003e\n\u003cp\u003eDuring the Eneolithic period (5\u003csup\u003eth\u003c/sup\u003e-4\u003csup\u003eth\u003c/sup\u003e millennium BC), the mixture of EHG and CHG in the steppe gave rise to the Yamnaya component, whose paternal signatures were likewise Q, R1a, and R1b. However, in the k=7 model, Yamnaya correlates most strongly with Y hg I1 (\u003cem\u003er\u003c/em\u003e = 0.60, \u003cem\u003ep\u003c/em\u003e \u0026lt; 0.001) (Supplementary Figure 23A). Although until recently, finds of Y hg I1 prior to the Nordic Bronze Age (1700-500 BC) were very rare and its history in Scandinavia was enigmatic, Posth et al. [32] documented this branch in a male from northern Germany (~3233 cal. BC), who was assigned to the local Funnelbeaker culture and had a Villabruna-like genetic profile. This shows that Yamnaya-associated Y hgs were largely replaced during admixture with the indigenous population of Scandinavia. In addition, Yamnaya was also part of the I2a-P37.2 expansion in the Balkans (Supplementary Figure 23B). As a whole, Yamnaya is almost perfectly linearly correlated with five \u0026lsquo;European\u0026rsquo; Y haplogroups (I, N, Q, R1a, R1b) (\u003cem\u003er\u003c/em\u003e = 0.96, \u003cem\u003ep\u003c/em\u003e \u0026lt; 0.001) and with indigenous European components EHG and Villabruna (Supplementary Figures 23C-23E). On the other hand, it is strongly mutually exclusive with non-European Y haplogroups and non-European ancestry components (Supplementary Figure 23F).\u003c/p\u003e\n\u003cp\u003e\u003cem\u003e\u003cstrong\u003e3.3 Male height vs. Y haplogroups\u003c/strong\u003e\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003eThe relationships between male height and Y haplogroups are displayed by factor analyses in Figures 4A-4F, in Table 1, and in great detail in Supplementary Figures 24-32. In Europe, these comparisons confirm previous findings [7] and identify Y hg I as the main predictor of tallness (\u003cem\u003er\u003c/em\u003e = 0.57, \u003cem\u003ep\u003c/em\u003e \u0026lt; 0.001). At the same time, this is true not only in Europe but even in the entire sample of 60 countries (\u003cem\u003er\u003c/em\u003e = 0.76, \u003cem\u003ep\u003c/em\u003e \u0026lt; 0.001) (Figure 6A). At present, Y hg I has two main frequency peaks in Sweden (46.3%, mostly I1) and in Bosnia and Herzegovina (55.3%, mostly I2a-P37.2 and its subbranch I2a-M423). The strength of the positive relationship slightly increases when Y hg I is combined with R1b-U106 (\u003cem\u003er\u003c/em\u003e = 0.80, \u003cem\u003ep\u003c/em\u003e \u0026lt; 0.001) (Figure 6B), whose geographical distribution is more limited, with a frequency peak in the Netherlands (34.2%). A combination of five European Y haplogroups (I, N, Q, R1a, R1b) improves the correlation coefficient even further (\u003cem\u003er\u0026nbsp;\u003c/em\u003e= 0.83, \u003cem\u003ep\u003c/em\u003e \u0026lt; 0.001) (Supplementary Figure 30A). Among ancestry-specific Y hgs, those associated with Germanic nations (I1, I2a-M223, R1b-U106) are the most noteworthy (\u003cem\u003er\u003c/em\u003e = 0.64, \u003cem\u003ep\u003c/em\u003e \u0026lt; 0.001) (Supplementary Figure 30E). The factor analyses in Figures 4A-4C also help to identify a more subtle combination of six \u0026lsquo;height-related\u0026rsquo; Y haplogroups, which have the most specific relationship to male height in Europe: I1, I2a-P37.2, I2a-M223, N, Q, and R1b-U106 (\u003cem\u003er\u0026nbsp;\u003c/em\u003e= 0.72, p \u0026lt; 0.001) (Figure 6C). Nevertheless, the additive effect of I2a-M223 is negligible and ambiguous in other combinations.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eGiven that the relationships between Y haplogroups and height can be influenced by environmental factors, the correlation coefficients must have been adjusted (Table 2). In Europe, these potentially confounding variables consist of nutrition (protein supply) and three socio-economic factors with the most direct causal effect (child mortality, total fertility, inequality-adjusted Human Development Index). Interestingly, the strength of the partial correlations is often even greater: Y hg I increases its predictive power to \u003cem\u003er\u0026nbsp;\u003c/em\u003e= 0.67 (\u003cem\u003ep\u003c/em\u003e \u0026lt; 0.001), I2a-P37.2 to \u003cem\u003er\u0026nbsp;\u003c/em\u003e= 0.64 (\u003cem\u003ep\u0026nbsp;\u003c/em\u003e\u0026lt; 0.001), and R1a to \u003cem\u003er\u0026nbsp;\u003c/em\u003e= 0.42 (\u003cem\u003ep\u003c/em\u003e = 0.011), reflecting the economic underdevelopment of Eastern Europe, which hinders the full expression of the genetic potential. In contrast, I1 largely loses significance (to \u003cem\u003er\u0026nbsp;\u003c/em\u003e= 0.39, \u003cem\u003ep\u003c/em\u003e = 0.019) and R1b-U106 becomes an insignificant factor (to \u003cem\u003er\u0026nbsp;\u003c/em\u003e= 0.08, \u003cem\u003ep\u003c/em\u003e = 0.64), although its partial correlation is partly retained in the total sample (\u003cem\u003er\u003c/em\u003e = 0.27, \u003cem\u003ep\u003c/em\u003e = 0.049). This must be ascribed to the fact that both I1 (\u003cem\u003er\u003c/em\u003e = 0.65, \u003cem\u003ep\u0026nbsp;\u003c/em\u003e\u0026lt; 0.001) and R1b-U106 (\u003cem\u003er\u003c/em\u003e = 0.68, \u003cem\u003ep\u0026nbsp;\u003c/em\u003e\u0026lt; 0.001) are most strongly associated with high protein quality in Europe (the ratio between the proteins from dairy \u0026amp; pork / wheat). Also noteworthy are the amplified negative tendencies of R1b-S116 (\u003cem\u003er\u003c/em\u003e = -0.39, \u003cem\u003ep\u003c/em\u003e = 0.021) and partly even I2a-M223 (\u003cem\u003er\u003c/em\u003e = -0.28, \u003cem\u003ep\u003c/em\u003e = 0.11), suggesting that their role is influenced by higher living standards in Western European countries.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eThe most parsimonious combination of Y haplogroups after adjusting is I1 \u0026amp; I2a-P37.2 (\u003cem\u003er\u0026nbsp;\u003c/em\u003e= 0.70, \u003cem\u003ep\u0026nbsp;\u003c/em\u003e\u0026lt; 0.001) and this result improves only slightly (to \u003cem\u003er\u003c/em\u003e = 0.72, \u003cem\u003ep\u003c/em\u003e \u0026lt; 0.001) with five Y hgs (I1, I2a-P37.2, Q, R1a, R1b-U106) (Figure 6D), not to mention that it is practically the same as the combination of mere four lineages (I1, I2a-P37.2, R1a, R1b-U106). The six \u0026lsquo;height-related\u0026rsquo; Y hgs partly lose their importance (\u003cem\u003er\u003c/em\u003e = 0.66, \u003cem\u003ep\u003c/em\u003e \u0026lt; 0.001). Although Y hg N does not contribute to these relationships, its outlier frequency in Finland (Figure 6C) indicates that its phenotypic effect in the Finnish population may be highly specific. This would not be surprising given Finland\u0026rsquo;s isolated genetic history [42]. After excluding Finland, it is the combination of I1, I2a-P37.2, N, Q, and R1b-U106, which gives the highest partial correlation in 38 European countries (\u003cem\u003er\u003c/em\u003e = 0.81, \u003cem\u003ep\u003c/em\u003e \u0026lt; 0.001), but I1, I2a-P37.2, N, and R1b-U106 reach nearly the same value (\u003cem\u003er\u003c/em\u003e = 0.80, \u003cem\u003ep\u003c/em\u003e \u0026lt; 0.001) (Supplementary Figures 31A-31B). Although N and R1a are otherwise mutually complementary, adding R1a to these combinations decreases the partial \u003cem\u003er-\u003c/em\u003evalue because it leads to overestimating male height in the Baltic region and Eastern Europe as a whole (Supplementary Figures 31C-31D).\u003c/p\u003e\n\u003cp\u003eIn contrast with European Y haplogroups, non-European Y haplogroups (E, G, J, L, T) correlate strongly negatively with height in the total sample (\u003cem\u003er\u003c/em\u003e = -0.82, \u003cem\u003ep\u003c/em\u003e \u0026lt; 0.001), which can be ascribed mainly to Y hg J1 (\u003cem\u003er\u003c/em\u003e = -0.80, \u003cem\u003ep\u003c/em\u003e \u0026lt; 0.001). This negative relationship reaches a maximum when J1 is combined with E1b-M123 and T (\u003cem\u003er\u003c/em\u003e = -0.84, \u003cem\u003ep\u003c/em\u003e \u0026lt; 0.001) (Figure 6E). As already mentioned, these three Y haplogroups are typical of the Arab peninsula and reach the highest frequency in Yemen (78.4%). Interestingly, we can also observe that Y haplogroups J2 and L, which are widespread in the Near East and correlate negatively with male height in Europe (\u003cem\u003er\u003c/em\u003e = -0.57, \u003cem\u003ep\u0026nbsp;\u003c/em\u003e\u0026lt; 0.001), are strongly associated with tall statures in the Near East (\u003cem\u003er\u0026nbsp;\u003c/em\u003e= 0.77, \u003cem\u003ep\u003c/em\u003e \u0026lt; 0.001). This result further slightly increases to \u003cem\u003er\u003c/em\u003e = 0.78, when J2 \u0026amp; L are combined with E1b-M78 or G2a*\u0026amp; G2a2. \u0026nbsp;After adjusting for variables specific to the Near East, the combination of E1b-M78, J2, and L reaches a very high partial correlation of \u003cem\u003er\u0026nbsp;\u003c/em\u003e= 0.92 (\u003cem\u003ep\u003c/em\u003e \u0026lt; 0.001), although this comparison does not include Bahrain and Qatar.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eWhen similar adjustments are performed in the total sample (in 58 countries, again without Bahrain and Qatar), the importance of European Y haplogroups markedly decreases (to \u003cem\u003er\u003c/em\u003e = 0.50, \u003cem\u003ep\u003c/em\u003e \u0026lt; 0.005), confirming the expected influence of better living conditions in Europe. As already mentioned, this confounding effect is most evident in the Western European lineage R1b-S116, whose relationship to height is largely reversed. The role of Y hg I and various Y haplogroup combinations also decreases but still remains highly significant, and the highest partial correlation can be found in four Y haplogroups: I1, I2a-P37.2, Q, R1b-U106 (\u003cem\u003er\u003c/em\u003e = 0.66, \u003cem\u003ep\u003c/em\u003e \u0026lt; 0.001) (Figure 6F). After the exclusion of Finland, the best result is again achieved with the same five Y haplogroups as in Europe (I1, I2a-P37.2, N, Q, R1b-U106) (\u003cem\u003er\u003c/em\u003e = 0.72, \u003cem\u003ep\u003c/em\u003e \u0026lt; 0.001). The negative relationship of the three \u0026lsquo;Arabic\u0026rsquo; Y hgs E1b-M123, J1, and T is also retained (\u003cem\u003er\u0026nbsp;\u003c/em\u003e= -0.62, \u003cem\u003ep\u0026nbsp;\u003c/em\u003e\u0026lt; 0.001), but the drop in the \u003cem\u003er\u003c/em\u003e-value is far greater, suggesting that the association between height and these markers may be more strongly distorted by the lower quality of life.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cem\u003e\u003cstrong\u003e3.4 Male height vs. ancestry components\u003c/strong\u003e\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003eTable 3 shows the relationship between male height and ancestry components according to all three models tested. This comparison identifies two ancestries that are consistently associated with tallness: Villabruna and Yamnaya. A significant positive relationship can be found even with EHG\u003csup\u003ek=11\u003c/sup\u003e (a component in Yamnaya) and BHG\u003csup\u003ek=12\u003c/sup\u003e (a mixed EHG-Villabruna ancestry) (Figures 7A-7D). In all comparisons, Yamnaya or EHG\u003csup\u003ek=11\u003c/sup\u003e appear to be stronger predictors of height than Villabruna or BHG. However, the role of non-Yamnaya EHG\u003csup\u003ek=7\u003c/sup\u003e, which does not include the Villabruna component, is non-significant. Furthermore, in the k=7 model after adjusting, the difference between Villabruna and Yamnaya in Europe nearly disappears (\u003cem\u003er\u003c/em\u003e = 0.58 vs. \u003cem\u003er\u0026nbsp;\u003c/em\u003e= 0.61) (Table 4), and it even reverses (\u003cem\u003er\u003c/em\u003e = 0.57 vs. \u003cem\u003er\u003c/em\u003e = 0.55), when only three elementary factors (nutrition, child mortality, total fertility) are used as potential confounders. Disregarding the k=11 model (in which EHG includes a large proportion of Villabruna), BHG and Villabruna also correlate more strongly in the more developed Western Europe, where we generally observe highly linear relationships approaching \u003cem\u003er\u0026nbsp;\u003c/em\u003e= 0.90 (Table 5). In contrast, correlations between height and ancestry in Eastern Europe are much weaker, suggesting a greater role of the environment. This is reminiscent of the observation made in the case of Y hgs I2a-P37.2 and R1a.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eThese data suggest that the genetic potential of the Villabruna cluster, which is more represented in Eastern Europe, may actually be higher than that of Yamnaya.\u0026nbsp;This is further supported by the fact that Villabruna\u003csup\u003ek=7\u0026nbsp;\u003c/sup\u003ecorrelates more strongly with key height-related, adjusted combinations of Y haplogroups (Supplementary Table 2). In fact, the deficit of Villabruna ancestry in Western European populations descended from the Bell Beaker culture (Y hg R1b-S116) can explain why they are ~3 cm shorter than other Europeans with a similar proportion of Yamnaya ancestry. This can be illustrated by the example of Ireland on the one hand, and Sweden on the other hand (Figures 7A-7B).\u0026nbsp;The EHG\u003csup\u003ek=7\u003c/sup\u003e component likewise gains in importance after adjusting (\u003cem\u003er\u003c/em\u003e = 0.49, \u003cem\u003ep\u003c/em\u003e = 0.003), but does not combine well with Villabruna or Yamnaya, indicating its secondary role (Supplementary Figures 33A-33B).\u003c/p\u003e\n\u003cp\u003eFigures 7E-7F and Supplementary Figures 33-34 illustrate that height in Europe is in an inverse relationship with all five Near Eastern ancestry components and this remains true even after controlling for the environment (Table 4). In the Near East and North Africa, Anatolian Neolithic predicts tall statures and similar tendencies can be observed in the CHG component. On the other hand, Natufian has a negative role. These relationships (albeit weaker) are likewise retained after adjustments.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eThe situation in the total sample changes more: After controlling for environmental factors, negative partial correlations in the k=12 model can be found in Natufian (\u003cem\u003er\u003c/em\u003e = -0.50, \u003cem\u003ep\u0026nbsp;\u003c/em\u003e\u0026lt; 0.001), WEF Villabruna (\u003cem\u003er\u003c/em\u003e = -0.36, \u003cem\u003ep\u003c/em\u003e = 0.009), and partly in Iranian Neolithic (\u003cem\u003er\u003c/em\u003e = -0.27, \u003cem\u003ep\u003c/em\u003e = 0.057) and Anatolian Neolithic (\u003cem\u003er\u003c/em\u003e = -0.27, \u003cem\u003ep\u003c/em\u003e = 0.055). The \u003cem\u003er\u003c/em\u003e-values in Anatolian Neolithic radically change from positive to negative, which must be ascribed to the high frequencies of this cluster in affluent Western European countries. The fact that WEF Villabruna (Supplementary Figure 34E) decreases a positive correlation coefficient when combined with BHG in Europe suggests that it carries predispositions for short height that were typical of Anatolian Neolithic.\u003c/p\u003e\n\u003cp\u003eA noteworthy anomaly that can be observed in virtually all graphic comparisons in Europe is the eccentric trend in the Western Balkan area of the Dinaric Alps (former Yugoslavia and Albania). Here, we find the tallest statures in the world and the highest frequencies of Villabruna-associated Y hg I, yet Villabruna and Yamnaya ancestry are insufficient to explain this phenomenon and local heights appear to increase with the proportion of Anatolian Neolithic and CHG ancestry (cf. Figure 7E). This is in striking contrast to the situation in other regions of Europe. Inevitably, correlation coefficients in Europe profoundly increase when Western Balkan countries are excluded (Table 5). The same anomalous tendency can be seen in Supplementary Figure 30A, where Montenegro is the tallest European country, despite a relative deficit of European Y haplogroups (I, N, Q, R1a, R1b). A lineage that improves the position of Montenegro in this graph and significantly increases the correlation coefficient in Europe (from \u003cem\u003er\u003c/em\u003e = 0.61 to \u003cem\u003er\u003c/em\u003e = 0.68, \u003cem\u003ep\u003c/em\u003e \u0026lt; 0.001) is E1b-M78. Another slight increase occurs after the inclusion of J2 (\u003cem\u003er\u0026nbsp;\u003c/em\u003e= 0.70, \u003cem\u003ep\u003c/em\u003e \u0026lt; 0.001). The specific role of E1b-M78 and J2 in the Western Balkans can also be seen in Supplementary Figures 32A-32B, which compare non-European Y haplogroups.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eThese findings raise interesting questions regarding the well-known\u0026nbsp;Holocene founder effect of\u0026nbsp;E1b-M78 in the Balkans, which\u0026nbsp;included the subbranch\u0026nbsp;E1b-V13 [43-44].\u0026nbsp;The frequency of E1b-V13 currently reaches its maximum in the Ghegs from North Albania (37.8%) [44]\u0026nbsp;and Kosovo (up to ~44%) [45]. Interestingly, Cruciani et al. [43] demonstrated a high geographical correlation between E1b-V13 and the subbranch J2b in Europe and their similar times to the most recent common ancestor (TMRCA) ~2700-2000 BC, supporting a long shared history. This observation can be confirmed even in the present study because E1b-M78 and J2 mutually overlap both in the Balkans (\u003cem\u003er\u003c/em\u003e = 0.75, \u003cem\u003ep\u003c/em\u003e = 0.008) and in the seven countries of the Dinaric Alps (\u003cem\u003er\u003c/em\u003e = 0.84, \u003cem\u003ep\u003c/em\u003e =0.017).\u003c/p\u003e\n\u003cp\u003e\u003cem\u003e\u003cstrong\u003e3.5 Lactose tolerance vs. Y haplogroups\u003c/strong\u003e\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003eLactose tolerance-associated alleles included in this study consist of 13910*T, 13915*G, 14009*G, 13907*G, and 14010*C, but only 13910*T and 13915*G were represented in high frequencies. The typically European allele 13910*T was available for 35 out of 39 European countries and for 52 countries from the total sample (Figure 8A). It has two main frequency peaks in Ireland (86.6%) and Iceland (85.3%) but is also widespread in Scandinavia (~75%) and in the United Kingdom (74.5%). Table 6, Supplementary Table 9, and Figure 9A illustrate that the distribution of 13910*T in Europe is most closely tied to three \u0026lsquo;Germanic\u0026rsquo; Y haplogroups I1, I2a-M223, and R1b-U106, whose combination is complementary (\u003cem\u003er\u003c/em\u003e = 0.79, \u003cem\u003ep\u0026nbsp;\u003c/em\u003e\u0026lt; 0.001) and is clearly responsible for the elevated 13910*T frequency in Eastern Europe. Although the presence of 13910*T in Eastern Europe can also be linked to N and R1a, their role is considerably weaker. Besides the Germanic Y hgs, the most important lineage is obviously R1b-S116 in Western Europe. Six Y haplogroups (I1, I2a-M223, Q, R1a, R1b-S116, R1b-U106) are the strongest correlates of 13910*T both in Europe (\u003cem\u003er\u003c/em\u003e = 0.89, \u003cem\u003ep\u0026nbsp;\u003c/em\u003e\u0026lt; 0.001) and in the entire sample of 52 countries (\u003cem\u003er\u0026nbsp;\u003c/em\u003e= 0.94, \u003cem\u003ep\u003c/em\u003e \u0026lt; 0.001) (Figure 10A). Although adding Y hg N improves the outlier position of Finland, it overestimates 13910*T frequency in other Baltic countries (Supplementary Figures 35A-35F). Expectably, all non-European Y hgs have a negative relationship with 13910*T in Europe and this applies especially to E1b-M78 \u0026amp; J2 (\u003cem\u003er\u003c/em\u003e = -0.78, \u003cem\u003ep\u003c/em\u003e \u0026lt; 0.001), which are concentrated in the Balkans.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eIn the Near East and North Africa, the situation is much simpler. 13910*T is represented in low frequencies, although a notable exception is Northwestern Africa (~10-20% 13910*T). The dominant lactose tolerance-associated allele is 13915*G, whose frequencies are available for 30 countries (13 in the Near East and 16 in the Near East \u0026amp; North Africa). Its occurrence reaches a peak in Saudi Arabia (58.7%) and Yemen (54.9%), but outside the Arabian Peninsula, it abruptly decreases to 3.7% in Egypt and 2.9% in Syria (Figure 8B). The only Y haplogroup correlating consistently with 13915*G is J1 (\u003cem\u003er\u003c/em\u003e = 0.87, \u003cem\u003ep\u003c/em\u003e \u0026lt; 0.001 in the Near East and \u003cem\u003er\u0026nbsp;\u003c/em\u003e= 0.89, \u003cem\u003ep\u003c/em\u003e \u0026lt; 0.001 in the complete sample of 30 countries), which is graphically demonstrated in Figure 9B. In the Near East \u0026amp; North Africa, E1b-M123 is also gaining importance, and Y hg T shows a decently positive role in the whole sample (Table 6, Supplementary Table 10). Nevertheless, E1b-M123 and T have only a slightly additive effect when combined with J1, showing that J1 is definitely the most important factor (Figure 10B, Supplementary Figures 36A-36D). Figure 10B also shows that 13915*G starts to increase in a population only when the cumulative frequency of E1b-M123, J1, and T reaches \u0026gt; 30%. This suggests that not all subbranches of these Y haplogroups are associated with 13915*G and a higher resolution would be needed. In any case, 13915*G is not positively correlated with any other Y haplogroup, confirming that its present distribution is closely related to the expansion of pastoral populations from the Arabian Peninsula.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cem\u003e3.6 Lactose tolerance vs. ancestry components\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003eIn contrast with the diverse spectrum of Y haplogroups connected with 13910*T, there is only one ancestry correlating consistently positively with this allele in Europe: Yamnaya (\u003cem\u003er\u003c/em\u003e = 0.83, \u003cem\u003ep\u003c/em\u003e \u0026lt; 0.001 in the k=7 model) (Table 7, Figure 11A). However, Villabruna\u003csup\u003ek=11\u003c/sup\u003e and BHG\u003csup\u003ek=12\u0026nbsp;\u003c/sup\u003eare also significantly correlated, and Villabruna\u003csup\u003ek=7\u003c/sup\u003e reaches significance when Europe is divided into a western and eastern half (Figure 11B, Supplementary Figures 37-38). These relationships reflect the involvement of Villabruna-associated lineages I1 and I2a-M223, as well as the weaker 13910*T selection in Eastern Europe (mirrored by the less significant role of Y hgs N and R1a) and the weaker relationship of 13910*T with WEF Villabruna ancestry in Western Europe (which consequently weakens the relationship between 13910*T and Villabruna\u003csup\u003ek=11\u003c/sup\u003e in Western Europe and brings it closer to the Eastern European correlation line). \u0026nbsp;\u003c/p\u003e\n\u003cp\u003eThe Near Eastern allele 13915*G shows the only significant connection with Natufian ancestry, but the correlations are even more linear than those observed in the three \u0026lsquo;Arabic\u0026rsquo; Y haplogroups: In both k=11 and k=12 models, they reach \u003cem\u003er\u0026nbsp;\u003c/em\u003e= 0.91 (\u003cem\u003ep\u003c/em\u003e \u0026lt; 0.001) in the total sample of 27 countries, \u003cem\u003er\u003c/em\u003e = 0.93 (\u003cem\u003ep\u0026nbsp;\u003c/em\u003e\u0026lt; 0.001) in 13 countries from the Near East \u0026amp; North Africa, and \u003cem\u003er\u0026nbsp;\u003c/em\u003e= 0.96 (\u003cem\u003ep\u003c/em\u003e \u0026lt; 0.001) in 10 countries from the Near East (Supplementary Figures 39A-39B). Still, even here, we can see that the selection for 13915*G postdates the spread of the Natufian component, as Levantine countries (Lebanon, Syria) have nearly zero 13915*G levels despite 20-25% Natufian ancestry.\u003c/p\u003e\n\u003cp\u003e\u003cem\u003e\u003cstrong\u003e3.7 Male height vs. lactose tolerance\u0026nbsp;\u003c/strong\u003e\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003ePrevious studies [7-8] documented a paradoxical relationship of lactose tolerance with male height, which is positive in Europe but negative in the Near East. The present study unequivocally confirms these findings (Figures 12A-12B). In Europe, this positive relationship can be explained by the association between 13910*T and the Yamnaya component because Eastern European countries (and particularly those from the Western Balkans) deviate from the correlation line. The negative correlation between 13915*G and male height is likewise easy to explain due to the strong multicollinearity of 13915*G with the \u0026lsquo;Arabic\u0026rsquo; Y haplogroups and Natufian ancestry, which consistently predict the shortest heights.\u003c/p\u003e\u003c/div\u003e"},{"header":"4. Discussion","content":"\u003cp\u003eThe data in this study are the result of an intensive collection of information from available literature and internet databases, which, unfortunately, are still not perfect in their representativeness. For this reason, they must be considered only provisional, especially regarding the occurrence of rare Y chromosome subbranches in some regions (e.g., R1b-S116 and R1b-U106 in the Near East and North Africa), for which only lower quality sources were often available. However, care was taken to ensure that the frequency of the 12 major Y haplogroups was similar across all studies used, which can explain why the correlations between Y haplogroups and other variables are extremely strong (not rarely approaching \u003cem\u003er\u0026nbsp;\u003c/em\u003e= 1.00) and indirectly suggest that, despite the above-mentioned limitations, they are more than sufficiently accurate for the study\u0026rsquo;s purpose.\u003c/p\u003e\n\u003cp\u003eAlthough the statistical power of many findings is extraordinary, it is understandable that it does not prove causal relationships by itself. Similar problems are faced by GWAS studies that are essentially based on a similar methodology, i.e., correlations between certain alleles and physical traits in individuals (which, moreover, may not apply to all regions). Extrapolating the current occurrence of genetic factors into the distant past also has its limitations and risks, as evidenced, for example, by the original misidentification of Y hg R1b with the Upper Paleolithic legacy in Western Europe [46]. Nevertheless, the advantage of the present study is the fact that most of its results can be placed in the context of already available knowledge, have a meaningful rationale, and mutually support each other. Examining genotype-phenotype relationships at the country level also makes it possible to overcome the fundamental barrier of individual-level GWAS, whose results cannot be reliably applied to different populations.\u003c/p\u003e\n\u003cp\u003e\u003cem\u003e\u003cstrong\u003e4.1 Tallness in Europe can be traced to the heritage of Villabruna and Yamnaya ancestry\u0026nbsp;\u003c/strong\u003e\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003eY haplogroups are signatures of male founder effects and their strong (sometimes perfectly linear) correlation with ancestry components testifies that they play a key role in shaping population history. The present work markedly refines the findings of previous studies [7-8] and corrects the effect of all Y haplogroups for environmental factors. Although this adjustment cannot be perfect, its results are in agreement with expectations that take into account nutritional and socio-economic factors.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eFirst of all, we can see that Y hgs I1 (in Scandinavia) and I2a-P37.2 (in the Western Balkans) are by far the most important predictors of tallness and the genetic potential in the Western Balkans is still not fully expressed due to suboptimal living conditions. Y haplogroup R1b-U106 has a much weaker effect than the unadjusted correlations indicate, which is due to the fact that the peak of its occurrence is in the Netherlands, where we could find the highest level of protein quality over the last decades. Despite that, R1b-U106 has some decent additive role in adjusted combinations, especially in the total sample. Y haplogroup Q has a generally small frequency across the 60 countries examined and its positive contribution is likewise small, but remains significant even after adjusting. Y haplogroup N has a very special position: It decreases correlation coefficients because it markedly overestimates male height in Finland, but after excluding Finland, it profoundly improves the strength of all Y haplogroup combinations. The possibility that Y hg N frequency in Finland (64.2%) would be erroneous is unlikely because it is based on a very large sample (n = 4375) and similar results (~60% Y hg N) are reported by other studies. Therefore, the phenotype reflected by Y hg N in Finland is probably different than in other Baltic countries, which is also mirrored in correlations with the 13910*T allele. Y haplogroup R1a is practically interchangeable with Y hg N as for its relationship to male height and improves partial correlations in some adjusted combinations. However, its combination with Y hg N is not productive as it already leads to the overestimation of height in Eastern European countries. Consequently, the use of five Y haplogroups (I1, I2a-P37.2, N, Q, R1b-U106) appears to be the most rational for predicting tallness in the countries studied, although it cannot include Finland. These five lineages also appear in an ideal multiple regression model of male height, which is based on all Y haplogroups with a presumably causative effect (i.e., after adjusting for the environment), but does not include Finland. This model includes seven Y haplogroups (five correlating positively and two correlating negatively) and explains 85.53% variance in 59 countries (Figures 12A-12B, Supplementary Tables 15-16).\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eAll the six height-associated Y haplogroups discussed above were originally connected with either Villabruna/BHG ancestry (I1, I2a-P37.2), or Yamnaya ancestry (Q, R1a, R1b-U106), or started to reflect BHG ancestry due to recent genetic processes (N). The comparison of ancestry components confirms these results and identifies BHG, Villabruna, and Yamnaya as the most important autosomal clusters predicting tallness. According to Mathieson et al. [15], Yamnaya is a stronger factor than Villabruna (WHG), whereas Berg et al. [16] attributed greater importance to Villabruna. These authors later retracted their findings due to the possible environmental confounding and ethnic specificity of the height-associated SNPs [10-11]. Based on data in the present study, Yamnaya\u003csup\u003ek=7\u003c/sup\u003e shows a stronger association with height than Villabruna\u003csup\u003ek=7\u003c/sup\u003e, but this finding is influenced by the socio-economic underdevelopment of Eastern European countries, where Villabruna occurs in the highest frequencies. After adjusting for environmental factors, the difference between these two components disappears and Villabruna emerges as visibly more important in Western Europe, despite small population frequencies. The three major sources of expansion of Villabruna ancestry (Figure 12C) also correspond to the three major regional peaks of male height in Europe. Nevertheless, even the best prediction model based on ancestry components (Figure 12D) cannot match prediction models based on Y haplogroups. This is not only due to the lower \u0026lsquo;resolution\u0026rsquo; of ancestry components but also due to the specific roots of the exceptional tallness in the Western Balkans, which deserves a more detailed discussion.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cem\u003e\u003cstrong\u003e4.2 The complex origin of the Dinaric phenomenon\u0026nbsp;\u003c/strong\u003e\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003eThe frequency of Y hg I reaches the highest values in the world in the mountainous region of the Dinaric Alps, stretching from Slovenia to northern Albania. The peak of its frequency is in Herzegovina (73.3% in Croats from Herzegovina) and most of this high proportion belongs to I2a-P37.2 (71.1%) [47]. Virtually all I2a-P37.2 in the Dinaric Alps is represented by its subbranch I2a-M423 (I2a1a2) [48-49], which was widespread in Mesolithic and Neolithic Europe [33]. According to the YFull database (https://www.yfull.com/), I2a-M423 includes several notable relict lineages in the British Isles but is otherwise concentrated in Eastern Europe. The typically East European subbranches of I2a-M423 share a common root in I2a-S9952 (I2a1a2b1a1a), whose estimated TMRCA is ~1400 BC. \u0026nbsp;The area of the Dinaric Alps is characterized by the strong founder effect of I2a-PH908 (I2a1a2b1a1a1c), a subbranch with a TMRCA in ~300 AD. Its frequencies may be approximately 33% in Bosnia and Herzegovina, 28% in Montenegro, 27% in Serbia, and 26% in Croatia.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eThe work of our research team [50] identified a geographical nucleus with extraordinarily tall statures (\u0026ge;184 cm), which includes Dalmatia, Herzegovina, and the northwestern parts of Montenegro (Figure 13), and agrees with the regional distribution of Y hg I [45, 47, 49]. However, the Western Balkan area is characterized by several peculiarities: First, the ratio between Y hg I and Villabruna\u003csup\u003ek=7\u003c/sup\u003e or BHG\u003csup\u003ek=12\u003c/sup\u003e ancestry in the Western Balkans is higher than in Scandinavia, indicating a stronger male founder effect over a genetically alien substrate. Second, these genetic factors are surprisingly little represented in Montenegrins (38.2% I; 3.4% Villabruna\u003csup\u003ek=7\u003c/sup\u003e; 10.5% BHG\u003csup\u003ek=12\u003c/sup\u003e) and especially in Kosovar Albanians (8.0% I; 1.2% Villabruna\u003csup\u003ek=7\u003c/sup\u003e), despite the fact that Montenegrins are the tallest in the world (with 182.9 cm in men) and the height of Kosovar Albanian men (179.5 cm) [51] surpasses most Western Europeans with much higher living standards. Male height in the once heavily isolated Albania is also rising rapidly due to improved nutrition and is already approaching 177 cm in some coastal regions. Therefore, in the future, we can expect an even greater statistical deviation of the Western Balkan region within Europe.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eThe graphical comparisons in the present study allow to put forward a hypothesis that the roots of this paradox lie in a specific genetic component that can be traced to a male founder effect in Albanians. In contrast to the rest of Europe, this height-associated component is based on Anatolian Neolithic and CHG ancestry, and Near Eastern lineages E1b-M78 (E1b-V13) and J2 (J2b). Although it could be assumed that these predispositions for tallness spread across the Dinaric Alps due to historical inter-group contacts, it is striking that all ancestry components associated with I2a-P37.2 (Villabruna\u003csup\u003ek=7\u003c/sup\u003e, EHG\u003csup\u003ek=7\u003c/sup\u003e, Yamnaya\u003csup\u003ek=7\u003c/sup\u003e) are individually completely exclusive with Anatolian Neolithic \u0026amp; CHG ancestry\u003csup\u003ek=7\u003c/sup\u003e in the Balkans (\u003cem\u003er\u003c/em\u003e = -1.00, \u003cem\u003ep\u003c/em\u003e \u0026lt; 0.001) (Supplementary Figure 23F).\u003c/p\u003e\n\u003cp\u003e\u0026nbsp;The very history of I2a-P37.2 in the Dinaric Alps still remains unclear. At the dawn of population genetics, it was assumed that its high frequency was a relic of the Late Upper Paleolithic Epigravettian culture, which occupied the Adriatic glacial refugium in Italy and the Western Balkans, and was the source of the Villabruna cluster [46]. Indeed, one possibly Mesolithic sample of Y hg I2 with a Villabruna autosomal profile was found in the Vrbička Cave in Montenegro [52].\u0026nbsp;However, based on the internal STR diversity of I2a-P37.2, it was hypothesized that its current distribution is the result of a much recent expansion from the area east of the Carpathians ~1000 BC\u0026nbsp;[53-54].\u003c/p\u003e\n\u003cp\u003eSolving this problem is fundamentally complicated by the poor preservation of skeletal remains in the limestone bedrock of the Dinaric Alps, as well as by the general scarcity of post-Mesolithic I2a-P37.2 samples in Eastern Europe, which have not been differentiated beyond the level of I2a-S9952 (I2a1a2b1a1a). The key subbranch I2a-M423 starts to appear systematically as late as at the beginning of the Bronze Age (2500-2000 BC) in Bulgaria, Romania, and Serbia, together with a notable reemergence of Villabruna ancestry [52, 55]. So far, the best explored region of the Dinaric Alps is the Croatian part of the Adriatic coast, where the arrival of I2a-M423 appears to be surprisingly recent and was preceded by groups related to modern Albanians, with a clear dominance of J2b. The two oldest cases of I2a-M423 in the Adriatic area are known from the Bezdanjača Cave (Lika-Senj county) and were indirectly dated to ~1150 cal. BC [33, 52]. The only available paleogenetic sample from Bosnia and Herzegovina (Klakar in northern Bosnia, ~1500 cal. BC) was identified as I2a-M223 and the oldest post-Mesolithic samples from Montenegro come from Velika Gruda in the coastal part of the country (~1350-1140 cal. BC). Similar to coastal Croatia, the males from Velika Gruda belonged almost exclusively to J2/J2b, except for a single case of Y hg I, which had an outlier autosomal profile with a higher proportion of Villabruna ancestry [52].\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eThe newest paper touching this topic [56] analyzed 161 samples from the Balkans covering the period 1-1500 AD. Out of 78 male specimens that were older than 1000 AD, 15 were assigned to E1b-V13, but only three (from northern and southeastern Serbia) to I2a-P37.2 (I2a1a2b1a/I2a1a2b1a1a) and were dated to 800-1000 AD. Based on the high proportion of Slavic ancestry in these individuals (\u0026gt;50%), the authors assume that this finding supports the recent, Slavic origin of I2a-P37.2 in the Balkans. However, similar to previous studies, this study did not test a single sample from Bosnia \u0026amp; Herzegovina and Montenegro, and only few from the peripheral areas of the Dinaric Alps. The crucial problem is to explain, how the founder effect of this presumably Slavic Y haplogroup produced a phenotype that is very different from that of contemporary Slavic peoples in Eastern Europe \u0026ndash; from much taller height through brachycephalic cranial morphology [57] to noticeably darker pigmentation of hair and eyes in Herzegovina than in Bosnia (Grasgruber et al. \u0026ndash; unpublished data). Although the present paper cannot illuminate the exact origin of I2a-P37.2, it can at least indicate its post-Yamnaya expansion from Bosnia and Herzegovina associated with the spread of EHG, Villabruna, and Yamnaya ancestry in the Balkans.\u003c/p\u003e\n\u003cp\u003e\u003cem\u003e\u003cstrong\u003e4.3 Concordance with the existing literature\u003c/strong\u003e\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003eGiven the problematic nature of polygenic height scores, some authors have recently resorted to comparing the distribution of ancestry components with selected physical traits at the individual level.\u0026nbsp;Because the methodology in the current paper is essentially similar (with the only fundamental difference being that it compares population averages and not individuals), the results should be theoretically identical. One such example is the study by Marnetto et al. [19],\u0026nbsp;who examined the relationship between four autosomal ancestries (Yamnaya, Villabruna/WHG, Anatolian Neolithic, Siberian) and multiple physical traits in a large sample from the Estonian Biobank. The presence of excess Yamnaya ancestry was documented at certain regions of the genome, which were identified as predictors of tallness by GWAS. Other ancestry components were not significant in this regard and were rather negatively associated. However, when the total (genome-wide) proportion of genetic ancestries in an individual was compared with measured physical characteristics, the relationships were consistent with findings in the present paper: Villabruna ancestry was clearly the strongest predictor of tallness, followed by Yamnaya. The Anatolian Neolithic and Siberian components had a negative effect. The authors consider this second approach to be more susceptible to environmental confounding because the distribution of ancestry components in Estonia differs by geography and may, therefore, be connected with geographical differences in environmental conditions. Nevertheless, based on the results of the present study, environmental relationships with the same ancestry components would have to exist at the European level as well.\u003c/p\u003e\n\u003cp\u003eA subsequent pre-print by the same authors [21], using the genomes of 50,000 European individuals from the UK Biobank, found a good agreement with the Estonian Biobank, as for the relationship between trait-associated genomic regions and three ancestry components. In the case of height, Yamnaya ancestry again predicted tallness, whereas Anatolian Neolithic ancestry was insignificant and Villabruna ancestry had a negative role. In contrast, the relationships of genome-wide ancestry proportions with selected physical traits differed between these biobanks and those regarding height showed no apparent association. Similar findings were also reported by another recent pre-print by Irving-Pease et al. [20], who used \u0026gt;400,000 genomes of British individuals from the UK Biobank and examined the presence of ancestry components at trait-associated genomic regions. This work is particularly relevant because the authors differentiated five ancestry components, which they had provided for the present study in the k=7 model. According to their data, above-average predictive power was attributed to Yamnaya, EHG, and CHG, whereas Anatolian Neolithic and especially Villabruna were deeply below-average in this regard.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eDiscrepancies with individual-level relationships based on genome-wide ancestry could be theoretically explained by environmental confounding, but the position of Villabruna ancestry at the bottom of relationships with height-associated loci clearly represents an insurmountable contradiction with the country-level findings and requires clarification. Since most of the donors in the UK Biobank come from the British Isles, it is inevitable that local \u0026apos;Villabruna\u0026apos; ancestry consists of \u0026ge;50% WEF Villabruna, which can be an important confounding element (see Figure 5A). However, these two components were separated in the k=7 ancestry model used by Irving-Pease et al. [20], which suggests that the crucial problem may lie in the method of how the height-associated loci were identified. More concretely, there may exist a strong bias in favor of Yamnaya ancestry and against Villabruna ancestry. This would not be surprising if the relevant GWAS were conducted on Western European populations with a high proportion of Yamnaya and a low proportion of Villabruna with the predominance of the WEF Villabruna fraction. Irrespective of the roots of these conflicting results, it is clear that studies of this sort need a sufficiently high resolution of ancestry components or a comparison of their effect across multiple regions because phenotypic associations can be completely antagonistic even in seemingly identical genetic clusters. This can be demonstrated not only by the polarity between Villabruna and WEF Villabruna in Europe, but also between Anatolian Neolithic \u0026amp; CHG components in the Western Balkans and the rest of Europe.\u003c/p\u003e\n\u003cp\u003e\u003cem\u003e\u003cstrong\u003e4.4 The evolution of lactose tolerance\u003c/strong\u003e\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003eVarious speculations have been put forward about the geographical origin\u0026nbsp;of alleles associated with lactose tolerance. Although the practice of dairying is documented in early Anatolian farmers during the 7\u003csup\u003eth\u003c/sup\u003e millennium BC and was widespread in Neolithic Europe, it was not accompanied by the occurrence of 13910*T allele [58]. At the same time, the short statures of Central and Western European farmers (~161-163 cm in men) [59-60] indirectly testify that the utilization of high-quality dairy proteins in the diet must have been very limited and strong evolutionary selection towards lactose tolerance cannot be expected. The first documented \u0026ndash; albeit still questionable \u0026ndash; cases of 13910*T heterozygotes come from the Eneolithic sites of Varna in Bulgaria \u0026nbsp;(~4610 BC) and Alexandria in Ukraine (~3650 BC), and the first verified carrier was an Eneolithic woman from Gura Baciului in Romania (~3440 BC) [58].\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eThe concentration of these samples in Southeastern Europe accords with recent research, which shows that the origin of 13910*T is connected with the Yamnaya culture from the East European steppe [14, 20]. Nevertheless, the evolution of this allele was long and predated the Yamnaya phase, possibly coinciding with the selection of alleles associated with metabolic adaptations to famine [20].\u0026nbsp;Somewhat surprisingly, samples of steppe populations from the Yamnaya culture (~3300-2600 BC) and the Catacomb culture (~2600-2000 BC), as well as the Corded Ware culture from Central Europe and the Baltic region (~2900-2300 BC) were characterized by a very low frequency of 13910*T (\u0026lt; 1%) [61]. More 13910*T-positive individuals start to emerge in the paleogenetic record during the era of the Bell Beaker and \u0026Uacute;nětice cultures in Central Europe (late 3\u003csup\u003erd\u003c/sup\u003e millennium BC) [58].\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eThe results of the present study do not contradict this scenario.\u0026nbsp;The high correlation between 13910*T and Yamnaya ancestry, especially in the k=7 model (\u003cem\u003er\u003c/em\u003e = 0.83, \u003cem\u003ep\u0026nbsp;\u003c/em\u003e\u0026lt; 0.001), confirms that Yamnaya was the original source of these evolutionary adaptations [62]. Irrespective of differences in 13910*T frequencies, all Eneolithic and Early Bronze Age populations stemming from the steppe were characterized by high height averages (~168-173 cm in men) [59-60, 63], which shows that in addition to superior genetic predispositions, they also enjoyed much better nutrition than Neolithic farmers, suggesting the existence of a \u0026lsquo;milk-drinking culture\u0026rsquo; [64]. The subsequent selection of the 13910*T allele in Europe depended on local evolutionary processes that also included populations of BHG origin and Villabruna/BHG-associated Y haplogroups (I1, I2a-M223).\u0026nbsp;The strongest selection of this kind apparently occurred in Germanic-speaking ethnicities (Y hgs I1, I2a-M223, R1b-U106) and in Celtic groups descending from the Bell Beaker culture (Y hg R1b-S116).\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eThe Germanic lineages I2a-M223 and R1b-U106 reach a frequency peak in the Netherlands and strongly correlate with each other in Europe (\u003cem\u003er\u003c/em\u003e = 0.78, \u003cem\u003ep\u0026nbsp;\u003c/em\u003e\u0026lt; 0.001), indicating that they were part of the same population expansion. Both are also significantly correlated with the Scandinavian lineage I1, but this relationship is substantially weaker and essentially limited to Central Europe and Britain, suggesting that Germanic-speaking peoples arose from an amalgamation of two populations with different paternal origins (Supplementary Figures 40A-40C). On the other hand, the correlations between R1b-S116 and these three Y haplogroups tend to be negative in Western Europe, indicating different population histories (Supplementary Figures 41A-41C).\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eIn general, we can see that the areas with the highest occurrence of 13910*T in Europe roughly correspond to areas with poor soil quality or poor recovery capacity (low soil resilience) (Figure 14A). In such an environment, milk could have been a ready source of nutrients in the event of a famine [58]. On the other hand, fermented dairy products in which the content of lactose is reduced (yoghurt) or almost completely eliminated (curd, cheese) require longer and more sophisticated preparation. The mutual interaction between the culture of regular milk drinking and the evolution of lactose tolerance can be illustrated by Figure 14B: The presence of 13910*T in a European population predicts the current consumption of dairy products, irrespective of their form. A notable exception is the mountainous region of the southwestern Balkans, where we observe very low soil quality, a long tradition of pastoralism, very high consumption of dairy products (even in the form of liquid milk), but a relatively low frequency of 13910*T.\u003c/p\u003e\n\u003cp\u003eThe selection for 13915*G in the Near East occurred independently of 13910*T, as a genetic adaptation to the consumption of camel\u0026rsquo;s milk. Here, too, the reasons are understandable because milk was a very valuable source of high-quality nutrients and the necessary fluids in the unfavorable environment of the desert [65]. Interestingly, this allele appears to be only ~4100 years old [66] and its evolution coincides with the aridification of the Arabian Peninsula [67]. The fact that lactose tolerance did not contribute to the selection for tall stature in Arab populations is somewhat counterintuitive but the evolutionary pressures associated with Bergmann\u0026apos;s ecological rule (adaptation to hot climates) and the universal scarcity of natural resources [68] may have worked in the opposite direction.\u003c/p\u003e"},{"header":"5. Conclusion","content":"\u003cp\u003eThe purpose of the current work was to shed light on the geographical relationships between genetic factors (Y haplogroups, ancestry components) on the one hand and physical characteristics (body height, lactose tolerance) on the other. This comparison yields a large number of impressive results from which meaningful causal relationships can be inferred. Others offer room for hypotheses that can be confirmed by more sophisticated research methods. Due to the limited extent of this text, not all of these findings can be discussed in detail, but in addition to the demonstrated associations regarding height and lactose tolerance, it is important to point out possible implications for genetic research working with individual data, because some of its results cannot be reconciled with country-level findings. In the near future, the collected frequencies of genetic factors could be used for comparison with other physical characteristics or trait-associated alleles such as lean body mass or pigmentation.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eCompeting interests:\u003c/strong\u003e The author declares no competing interests.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eData and materials availability:\u003c/strong\u003e Data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials and the Supplementary Dataset.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAcknowledgments:\u003c/strong\u003e The author would like to thank William Barrie and the team of AncestralWhispers.org for providing data on the proportion of autosomal genetic components.\u0026nbsp;\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eTam, V., Patel, N., Turcotte, M., Boss\u0026eacute;, Y., Par\u0026eacute;, G., Meyre, D. Benefits and limitations of genome-wide association studies. Nature Reviews Genetics. 2019; 20(8):467\u0026ndash;484.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eUffelmann, E., Huang, Q. Q., Munung, N. S., De Vries, J., Okada, Y., Martin, A. R., et al. Genome-wide association studies. Nature Reviews Methods Primers. 2021; 1(1):1\u0026ndash;21.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSilventoinen, K. Determinants of variation in adult body height. Journal of Biosocial Science. 2003; 35(2):263\u0026ndash;285.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDeaton, A. Height, health, and development. \u003cem\u003eProceedings of the National Academy of Sciences.\u003c/em\u003e 2007; 104(33):13232\u0026ndash;13237.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAkachi, Y., Canning, D. Inferring the economic standard of living and health from cohort height: Evidence from modern populations in developing countries. Economics \u0026amp; Human Biology. 2015; 19:114\u0026ndash;128.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePerkins, J. M., Subramanian, S. V., Davey Smith, G., \u0026Ouml;zaltin, E. Adult height, nutrition, and population health. Nutrition Reviews. 2016; 74(3):149\u0026ndash;165.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGrasgruber, P., Cacek, J., Kalina, T., Sebera, M. The role of nutrition and genetics as key determinants of the positive height trend. Economics \u0026amp; Human Biology. 2014; 15:81\u0026ndash;100.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGrasgruber, P., Sebera, M., Hrazd\u0026iacute;ra, E., Cacek, J., Kalina, T. Major correlates of male height: A study of 105 countries. Economics \u0026amp; Human Biology. 2016; 21:172\u0026ndash;195.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGrasgruber, P., Hrazd\u0026iacute;ra, E. Nutritional and socio-economic predictors of adult height in 152 world populations. Economics \u0026amp; Human Biology. 2020; \u003cem\u003e37\u003c/em\u003e:100848.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBerg, J. J., Harpak, A., Sinnott-Armstrong, N., Joergensen, A. M., Mostafavi, H., Field, Y., et al. Reduced signal for polygenic adaptation of height in UK Biobank. Elife. 2019; 8:e39725.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSohail, M., Maier, R. M., Ganna, A., Bloemendal, A., Martin, A. R., Turchin, M. C., et al. Polygenic adaptation on height is overestimated due to uncorrected stratification in genome-wide association studies. Elife. 2019; 8: e39702.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYengo, L., Vedantam, S., Marouli, E., Sidorenko, J., Bartell, E., Sakaue, S., et al. A saturated map of common genetic variants associated with human height. Nature. 2022; 610(7933):704\u0026ndash;712.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCox, S. L., Ruff, C. B., Maier, R. M., Mathieson, I. Genetic contributions to variation in human stature in prehistoric Europe. \u003cem\u003eProceedings of the National Academy of Sciences.\u003c/em\u003e 2019; 116(43):21484\u0026ndash;21492.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAllentoft ME, Sikora M, Refoyo-Mart\u0026iacute;nez A, Irving-Pease EK, Fischer A, Barrie W, et al. Population Genomics of Stone Age Eurasia. \u003cem\u003ebioRxiv.\u003c/em\u003e 2022; 2022-05.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMathieson I, Lazaridis I, Rohland N, Mallick S, Patterson N, Roodenberg SA, et al. Genome-wide patterns of selection in 230 ancient Eurasians. Nature. 2015; 528(7583):499\u0026ndash;503.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBerg JJ, Zhang X, Coop G. Polygenic adaptation has impacted multiple anthropometric traits. BioRxiv. 2017; 167551.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMarciniak, S., Bergey, C. M., Silva, A. M., Hałuszko, A., Furmanek, M., Veselka, B., et al. An integrative skeletal and paleogenomic analysis of stature variation suggests relatively reduced health for early European farmers. \u003cem\u003eProceedings of the National Academy of Sciences.\u003c/em\u003e 2022; 119(15):e2106743119.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCox, S. L., Moots, H. M., Stock, J. T., Shbat, A., Bitarello, B. D., Nicklisch, N., et al. Predicting skeletal stature using ancient DNA. American Journal of Biological Anthropology. 2022; 177(1):162\u0026ndash;174.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMarnetto, D., Pankratov, V., Mondal, M., Montinaro, F., P\u0026auml;rna, K., Vallini, L., et al. Ancestral genomic contributions to complex traits in contemporary Europeans. Current Biology. 2022; 32(6):1412\u0026ndash;1419.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eIrving-Pease, E. K., Refoyo-Mart\u0026iacute;nez, A., Ingason, A., Pearson, A., Fischer, A., Barrie, W., et al. The selection landscape and genetic legacy of Ancient Eurasians. \u003cem\u003ebioRxiv.\u003c/em\u003e 2022; 2022-09.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePankratov, V., Mezzavilla, M., Aneli, S., Fusco, D., Wilson, J. F., Metspalu, M., et al. Ancestral genetic components are consistently associated with the complex trait landscape in European biobanks. bioRxiv. 2023; 2023\u0026ndash;10.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eParker, K., Erzurumluoglu, A. M., \u0026amp; Rodriguez, S. The Y chromosome: a complex locus for genetic analyses of complex human traits. Genes. 2020; 11(11):1273.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRhie, A., Nurk, S., Cechova, M., Hoyt, S. J., Taylor, D. J., Altemose, N., et al. The complete sequence of a human Y chromosome. Nature. 2023; 621(7978):344\u0026ndash;354.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFu Q, Posth C, Hajdinjak M, Petr M, Mallick S, Fernandes D, et al. The genetic history of Ice Age Europe. Nature. 2016; 534(7606):200\u0026ndash;205.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMathieson I, Alpaslan-Roodenberg S, Posth C, Sz\u0026eacute;cs\u0026eacute;nyi-Nagy A, Rohland N, Mallick S, et al. The genomic history of southeastern Europe. Nature. 2018; 555(7695):197\u0026ndash;203.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePapac L, Ern\u0026eacute;e M, Dobeš M, Langov\u0026aacute; M, Rohrlach AB, Aron F, et al. Dynamic changes in genomic and social structures in third millennium BCE central Europe. Science Advances. 2021; 7(35):eabi6941.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eOlalde, I., Brace, S., Allentoft, M. E., Armit, I., Kristiansen, K., Booth, T., et al. The Beaker phenomenon and the genomic transformation of northwest Europe. Nature. 2018; \u003cem\u003e555\u003c/em\u003e(7695):190\u0026ndash;196.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTishkoff, S. A., Reed, F. A., Friedlaender, F. R., Ehret, C., Ranciaro, A., Froment, A., et al. The genetic structure and history of Africans and African Americans. Science. 2009; 324(5930):1035\u0026ndash;1044.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eItan Y, Jones BL, Ingram CJ, Swallow DM, Thomas MG. A worldwide correlation of lactase persistence phenotype and genotypes. BMC Evolutionary Biology. 2010; 10(1): 1\u0026ndash;11.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGlobal Lactase persistence Association Database. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.ucl.ac.uk/biosciences/gee/molecular-and-cultural-evolution-lab/global-lactase-persistence-association-database-glad\u003c/span\u003e\u003cspan address=\"https://www.ucl.ac.uk/biosciences/gee/molecular-and-cultural-evolution-lab/global-lactase-persistence-association-database-glad\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMattila, T., Svensson, E., Juras, A., G\u0026uuml;nther, T., Kashuba, N., Ala-Hulkko, T., et al. Genetic continuity, isolation, and gene flow in Stone Age Central and Eastern Europe. Preprint from Research Square, 12 Sep 2022. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.21203/rs.3.rs-1966812/v1\u003c/span\u003e\u003cspan address=\"10.21203/rs.3.rs-1966812/v1\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePosth, C., Yu, H., Ghalichi, A., Rougier, H., Crevecoeur, I., Huang, Y., et al. Palaeogenomics of Upper Palaeolithic to Neolithic European hunter-gatherers. Nature. 2023; 615(7950):117\u0026ndash;126.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAllen Ancient DNA Resource. 2023. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://reich.hms.harvard.edu/allen-ancient-dna-resource-aadr-downloadable-genotypes-present-day-and-ancient-dna-data\u003c/span\u003e\u003cspan address=\"https://reich.hms.harvard.edu/allen-ancient-dna-resource-aadr-downloadable-genotypes-present-day-and-ancient-dna-data\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLazaridis, I., Nadel, D., Rollefson, G., Merrett, D. C., Rohland, N., Mallick, S., et al. Genomic insights into the origin of farming in the ancient Near East. Nature. 2016; 536(7617):419\u0026ndash;424.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eNarasimhan VM, Patterson N, Moorjani P, Rohland N, Bernardos R, Mallick S, et al. The formation of human populations in South and Central Asia. Science.2019; 365(6457):p.eaat7487.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWang CC, Reinhold S, Kalmykov A, Wissgott A, Brandt G, Jeong C, et al. Ancient human genome-wide data from a 3000-year interval in the Caucasus corresponds with eco-geographic regions. Nature Communications. 2019; 10(1):1\u0026ndash;13.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRootsi, S., Kivisild, T., Benuzzi, G., Bermisheva, M., Kutuev, I., Barać, L., et al. Phylogeography of Y-chromosome haplogroup I reveals distinct domains of prehistoric gene flow in Europe. The American Journal of Human Genetics. 2004; \u003cem\u003e75\u003c/em\u003e(1):128\u0026ndash;137.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMarcus, J. H., Posth, C., Ringbauer, H., Lai, L., Skeates, R., Sidore, C., et al. Genetic history from the Middle Neolithic to present on the Mediterranean island of Sardinia. Nature Communications. 2020; \u003cem\u003e11\u003c/em\u003e(1): 939. Supplementary material, Supplementary Fig. 8.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMittnik A, Wang CC, Pfrengle S, Daubaras M, Zariņa G, Hallgren F, et al. The genetic prehistory of the Baltic Sea region. Nature Communications. 2018; 9(1):1\u0026ndash;1.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGrugni, V., Raveane, A., Colombo, G., Nici, C., Crobu, F., Ongaro, L., et al. Y-chromosome and surname analyses for reconstructing past population structures: The Sardinian population as a test case. International Journal of Molecular Sciences. 2019; \u003cem\u003e20\u003c/em\u003e(22): 5763.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAnthony, D. W., Khokhlov, A. A., Agapov, S. A., Agapov, D. S., Schulting, R., Olalde, I., Reich, D. The Eneolithic cemetery at Khvalynsk on the Volga River. Praehistorische Zeitschrift. 2022; \u003cem\u003e97\u003c/em\u003e(1):22\u0026ndash;67.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePalo, J. U., Ulmanen, I., Lukka, M., Ellonen, P., Sajantila, A. Genetic markers and population history: Finland revisited. European Journal of Human Genetics. 2009; \u003cem\u003e17\u003c/em\u003e(10):1336\u0026ndash;1346.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCruciani F, La Fratta R, Trombetta B, Santolamazza P, Sellitto D, Colomb EB, et al. Tracing past human male movements in northern/eastern Africa and western Eurasia: new clues from Y-chromosomal haplogroups E-M78 and J-M12. Molecular Biology and Evolution. 2007; 24(6):1300\u0026ndash;1311.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSarno S, Tofanelli S, De Fanti S, Quagliariello A, Bortolini E, Ferri G, et al. Shared language, diverging genetic histories: high-resolution analysis of Y-chromosome variability in Calabrian and Sicilian Arbereshe. European Journal of Human Genetics. 2016; 24(4):600\u0026ndash;606.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePericic, M., Lauc, L. B., Klaric, I. M., Rootsi, S., Janićijević, B., Rudan, I., et al. High-resolution phylogenetic analysis of southeastern Europe traces major episodes of paternal gene flow among Slavic populations. Molecular Biology and Evolution. 2005; \u003cem\u003e22\u003c/em\u003e(10):1964\u0026ndash;1975.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSemino, O., Passarino, G., Oefner, P. J., Lin, A. A., Arbuzova, S., Beckman, L. E., et al. The genetic legacy of Paleolithic Homo sapiens sapiens in extant Europeans: AY chromosome perspective. Science. 2000; \u003cem\u003e290\u003c/em\u003e(5494):1155\u0026ndash;1159.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMarjanovic, D., Fornarino, S., Montagna, S., Primorac, D., Hadziselimovic, R., Vidovic, S., et al. The peopling of modern Bosnia-Herzegovina: Y‐chromosome haplogroups in the three main ethnic groups. Annals of Human Genetics. 2005; \u003cem\u003e69\u003c/em\u003e(6):757\u0026ndash;763.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRegueiro, M., Rivera, L., Damnjanovic, T., Lukovic, L., Milasin, J., Herrera, R. J. High levels of Paleolithic Y-chromosome lineages characterize Serbia. Gene. 2012; \u003cem\u003e498\u003c/em\u003e(1):59\u0026ndash;67.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eŠarac, J., Šarić, T., Havaš Auguštin, D., Novokmet, N., Vekarić, N., Mustać, M., et al. Genetic heritage of Croatians in the Southeastern European gene pool\u0026mdash;Y chromosome analysis of the Croatian continental and Island population. American Journal of Human Biology. 2016; \u003cem\u003e28\u003c/em\u003e(6):837\u0026ndash;845.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGrasgruber, P., Mašanović, B., Prce, S., Popović, S., Arifi, F., Bjelica, D., et al. Mapping the Mountains of Giants: Anthropometric Data from the Western Balkans Reveal a Nucleus of Extraordinary Physical Stature in Europe. Biology. 2022; \u003cem\u003e11\u003c/em\u003e(5):786.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMasanovic, B., Bavcevic, T., Prskalo, I. Regional differences in adult body height in Kosovo. Montenegrin Journal of Sports Science and Medicine. 2019; 8(1):69.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLazaridis I, Alpaslan-Roodenberg S, Acar A, A\u0026ccedil;ıkkol A, Agelarakis A, Aghikyan L, et al. The genetic history of the Southern Arc: A bridge between West Asia and Europe. Science. 2022; 377(6609): eabm4247.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eUtevska OM. \u003cem\u003e[The gene pool of Ukrainians according to different systems of genetic markers: the origin and place in the European genetic landscape].\u003c/em\u003e Doctoral dissertation. National Research Center for Radiation Medicine of National Academy of Sciences of Ukraine. 2017. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://drive.google.com/file/d/0B1bUIW1YACgZaHlTR3NEWlNjUU\u003c/span\u003e\u003cspan address=\"https://drive.google.com/file/d/0B1bUIW1YACgZaHlTR3NEWlNjUU\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e0/view?pli=1\u0026amp;resourcekey=0-TyXs2Z6J3zo5CJBcY7KMyw p. 20\u0026ndash;21.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMihajlovic, M., Tanasic, V., Markovic, M. K., Kecmanovic, M., Keckarevic, D. Distribution of Y-chromosome haplogroups in Serbian population groups originating from historically and geographically significant distinct parts of the Balkan Peninsula. Forensic Science International: Genetics. 2022; 61: 102767.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eŽegarac, A., Winkelbach, L., Bl\u0026ouml;cher, J., Diekmann, Y., Krečković Gavrilović, M., Porčić, M., et al. Ancient genomes provide insights into family structure and the heredity of social status in the early Bronze Age of southeastern Europe. Scientific Reports. 2021; 11(1):10072.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eOlalde, I., Carri\u0026oacute;n, P., Mikić, I., Rohland, N., Mallick, S., Lazaridis, I., Mah, M., Korać, M., Golubović, S., Petković, S. and Miladinović-Radmilović, N. A genetic history of the Balkans from Roman frontier to Slavic migrations. Cell. 2023; 186(25):5472\u0026ndash;5485.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCoon, C.S. \u003cem\u003eThe Races of Europe\u003c/em\u003e. New York: The Macmillan Company. 1939; p. 587\u0026ndash;595. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://archive.org/details/racesofeurope031695mbp\u003c/span\u003e\u003cspan address=\"https://archive.org/details/racesofeurope031695mbp\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eEvershed, R. P., Davey Smith, G., Roffet-Salque, M., Timpson, A., Diekmann, Y., Lyon, M. S., et al. Dairying, diseases and the evolution of lactase persistence in Europe. Nature. 2022; 608(7922):336\u0026ndash;345.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eEhler, E., Vančata, V. Neolithic transition in Europe: evolutionary anthropology study. Anthropologie (1962-), 2009; 47(3):185\u0026ndash;193.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRuff, C. B. (Ed.). \u003cem\u003eSkeletal variation and adaptation in Europeans: Upper Paleolithic to the twentieth century\u003c/em\u003e. John Wiley \u0026amp; Sons. 2017.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBurger, J., Link, V., Bl\u0026ouml;cher, J., Schulz, A., Sell, C., Pochon, Z., et al. Low prevalence of lactase persistence in Bronze Age Europe indicates ongoing strong selection over the last 3,000 years. Current Biology. 2020; 30(21):4307\u0026ndash;4315.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWilkin, S., Ventresca Miller, A., Fernandes, R., Spengler, R., Taylor, W. T. T., Brown, D. R., et al. Dairying enabled Early Bronze Age Yamnaya steppe expansions. Nature. 2021; \u003cem\u003e598\u003c/em\u003e(7882):629\u0026ndash;633.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKruts S.I. \u003cem\u003ePaleoantropologicheskiye issledovaniya Stepnogo Podneprovya (epokha bronzy) [Paleoanthropological studies of the Steppe Dnieper region (Bronze Age)]\u003c/em\u003e. Kiev: Nauk. Dumka. 1984; p. 22, 43.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eScott, A., Reinhold, S., Hermes, T., Kalmykov, A. A., Belinskiy, A., Buzhilova, A., et al. Emergence and intensification of dairying in the Caucasus and Eurasian steppes. Nature Ecology \u0026amp; Evolution. 2022; 6(6):813\u0026ndash;822.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eReilly, B. J. Revisiting Bedouin Desert Adaptations: Lactase Persistence as a Factor in Arabian Peninsula History. Journal of Arabian Studies. 2012; \u003cem\u003e2\u003c/em\u003e(2):93\u0026ndash;107.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eEnattah, N. S., Jensen, T. G., Nielsen, M., Lewinski, R., Kuokkanen, M., Rasinpera, H., et al. Independent introduction of two lactase-persistence alleles into human populations reflects different history of adaptation to milk culture. The American Journal of Human Genetics. 2008; 82(1):57\u0026ndash;72.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eL\u0026uuml;ning, S., Vahrenholt, F. Holocene climate development of North Africa and the Arabian Peninsula. In: Bendadoud, A., et al. (Eds.) \u003cem\u003eThe Geology of the Arab World - An Overview\u003c/em\u003e. Springer. 2019; p. 524\u0026ndash;527.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWells JC, Saunders MA, Lea AS, Cortina-Borja M, Shirley MK. Beyond Bergmann's rule: Global variability in human body composition is associated with annual average precipitation and annual temperature volatility. American Journal of Physical Anthropology. 2019; 170(1):75\u0026ndash;87.\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"},{"header":"Tables","content":"\u003cp\u003eTables 1 to 7 are available in the Supplementary Files section.\u003c/p\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Height, BMI, lactose tolerance, Y haplogroups, autosomal DNA","lastPublishedDoi":"10.21203/rs.3.rs-4354427/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-4354427/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eThis study aimed to examine geographical associations of genetic factors (24 Y haplogroups, 10 autosomal ancestry components) with mean male height and the occurrence of lactose tolerance-associated alleles in a sample of 60 genetically interconnected Caucasian populations of Europe, the Near East, and North Africa. The results show that Y haplogroups or their combinations often match almost perfectly the geographical occurrence of a particular autosomal ancestry (correlation coefficients reaching up to \u003cem\u003er\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.99), demonstrating that male founder effects played a crucial role in shaping population history. Male height adjusted for major environmental factors is positively related mainly to ancestry components BHG (Baltic hunter-gatherers), Villabruna, and Yamnaya, and the combined frequency of five Y haplogroups (I1, I2a-P37.2, N, Q, R1b-U106). The frequency of the European lactose tolerance-associated allele 13910*T correlates primarily with Yamnaya ancestry and with the combination of six Y haplogroups (I1, I2a-M223, Q, R1a, R1b-S116, R1b-U106), whereas the Near Eastern allele 13915*G is predicted by Natufian ancestry and three Y haplogroups typical of Arab populations (E1b-M123, J1, T). Of further note is the fact that country-level relationships between body height and ancestry components show both concordance and stark differences with genetic studies using individual-level relationships, which can potentially have important implications. In summary, many of the findings achieved are extremely impressive and their causality can often be inferred from already documented findings. Others offer hypotheses that could be tested with more sophisticated research.\u003c/p\u003e","manuscriptTitle":"Genetic ancestry and male founder effects explain differences in height and lactose tolerance in 60 Caucasian populations","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2024-05-10 21:19:53","doi":"10.21203/rs.3.rs-4354427/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"e36097a7-5419-4462-b063-6ea487e94d15","owner":[],"postedDate":"May 10th, 2024","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2024-08-14T11:06:10+00:00","versionOfRecord":[],"versionCreatedAt":"2024-05-10 21:19:53","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-4354427","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-4354427","identity":"rs-4354427","version":["v1"]},"buildId":"qtupq5eGEP_6zYnWcrvyt","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.