Kalpana Kalpana (Editor)

Genetic history of Europe

Updated on
Edit
Like
Comment
Share on FacebookTweet on TwitterShare on LinkedInShare on Reddit
Genetic history of Europe

The genetic history of Europe is complicated because European populations have a complicated demographic history, including many successive periods of population growth. The history must be inferred from the patterns of genetic diversity across continents and time. The primary data come from sequences of mitochondrial, Y-chromosome, and autosomal DNA (including single-nucleotide polymorphisms) from modern populations and if available from ancient DNA.

Contents

Classical genetic markers (by proxy)

One of the first scholars to perform genetic studies was Luigi Luca Cavalli-Sforza. He used classical genetic markers to analyse DNA by proxy. This method studies differences in the frequencies of particular allelic traits, namely polymorphisms from proteins found within human blood (such as the ABO blood groups, Rhesus blood antigens, HLA loci, immunoglobulins, G6PD isoenzymes, among others). Subsequently his team calculated genetic distance between populations, based on the principle that two populations that share similar frequencies of a trait are more closely related than populations that have more divergent frequencies of the trait.

From this, he constructed phylogenetic trees that showed genetic distances diagrammatically. His team also performed principal component analyses, which is good at analysing multivariate data with minimal loss of information. The information that is lost can be partly restored by generating a second principal component, and so on. In turn, the information from each individual principal component (PC) can be presented graphically in synthetic maps. These maps show peaks and troughs, which represent populations whose gene frequencies take extreme values compared to others in the studied area.

Peaks and troughs usually connected by smooth gradients are called clines. Genetic clines can be generated by adaptation to environment (natural selection), continuous gene flow between two initially different populations or a demographic expansion into a scarcely populated environment, with little initial admixture with existing populations. Cavalli-Sforza connected these gradients with postulated pre-historic population movements, based on archaeological and linguistic theories. However, given that the time depths of such patterns are not known, "associating them with particular demographic events is usually speculative".

Direct DNA analysis

Studies using direct DNA analysis are now abundant and may use mitochondrial DNA (mtDNA), the non-recombining portion of the Y chromosome (NRY), or even autosomal DNA. MtDNA and NRY DNA share some similar features, which have made them particularly useful in genetic anthropology. These properties include the direct, unaltered inheritance of mtDNA and NRY DNA from mother to offspring and father to son, respectively, without the 'scrambling' effects of genetic recombination. We also presume that these genetic loci are not affected by natural selection and that the major process responsible for changes in base pairs has been mutation (which can be calculated).

The smaller effective population size of the NRY and mtDNA enhances the consequences of drift and founder effect, relative to the autosomes, making NRY and mtDNA variation a potentially sensitive index of population composition. These biologically plausible assumptions are not concrete; Rosser suggests that climatic conditions may affect the fertility of certain lineages.

The underlying mutation rate used by the geneticists is more questionable. They often use different mutation rates and studies frequently arrive at vastly different conclusions. NRY and mtDNA may be so susceptible to drift that some ancient patterns may have become obscured. Another assumption is that population genealogies are approximated by allele genealogies. Guido Barbujani points out that this only holds if population groups develop from a genetically monomorphic set of founders. Barbujani argues that there is no reason to believe that Europe was colonised by monomorphic populations. This would result in an overestimation of haplogroup age, thus falsely extending the demographic history of Europe into the Late Paleolithic rather than the Neolithic era. (See also Genetic drift, Founder effect, Population bottleneck.) Greater certainty about chronology may be obtained from studies of ancient DNA (see below), but so far these have been comparatively few.

Whereas Y-DNA and mtDNA haplogroups represent but a small component of a person’s DNA pool, autosomal DNA has the advantage of containing hundreds of thousands of examinable genetic loci, thus giving a more complete picture of genetic composition. Descent relationships can only be determined on a statistical basis, because autosomal DNA undergoes recombination. A single chromosome can record a history for each gene. Autosomal studies are much more reliable for showing the relationships between existing populations, but do not offer the possibilities for unravelling their histories in the same way as mtDNA and NRY DNA studies promise, despite their many complications.

Genetic studies operate on numerous assumptions and suffer from methodological limitations, such as selection bias and confounding phenomena like genetic drift, foundation and bottleneck effects cause large errors, particularly in haplogroup studies. No matter how accurate the methodology, conclusions derived from such studies are compiled on the basis of how the author envisages their data fits with established archaeological or linguistic theories.

Relation between Europeans and other populations

According to Cavalli-Sforza's work, all non-African populations are more closely related to each other than to Africans; supporting the hypothesis that all non-Africans descend from a single old-African population. The genetic distance from Africa to Europe (16.6) was found to be shorter than the genetic distance from Africa to East Asia (20.6), and much shorter than that from Africa to Australia (24.7). He explains:

... both Africans and Asians contributed to the settlement of Europe, which began about 40,000 years ago. It seems very reasonable to assume that both continents nearest to Europe contributed to its settlement, even if perhaps at different times and maybe repeatedly. It is reassuring that the analysis of other markers also consistently gives the same results in this case. Moreover, a specific evolutionary model tested, i.e., that Europe is formed by contributions from Asia and Africa, fits the distance matrix perfectly (6). In this simplified model, the migrations postulated to have populated Europe are estimated to have occurred at an early date (30,000 years ago), but it is impossible to distinguish, on the basis of these data, this model from that of several migrations at different times. The overall contributions from Asia and Africa were estimated to be around two-thirds and one-third, respectively".

This particular model used an Out of Africa migration 100,000 years ago, which separated Africans from non-Africans, followed by a single admixture event 30,000 years ago leading to the formulation of the European population. The admixture event consisted of a source population that was 35% African and 65% East Asian. However, the study notes that a more realistic scenario would include several admixture events occurring over a sustained period. In particular, they cite the spread of farming from a source population in West Asia 5000–9000 years ago that may have played a role in the genetic relatedness of Africans and Europeans, since West Asia is sandwiched in between Africa and Central Asia.

The model assumed an out of Africa migration 100kya and a single admixture event 30kya. However, most contemporary studies have more recent dates that place the out of Africa migration 50-70kya. The study also involved a direct comparison between Sub-Saharan Africans (Central Africans and Senegalese) and Europeans. North African populations were omitted from the study. These considerations might help explain the apparent short genetic distance between Europeans and Africans.

European population sub-structure

Geneticists have found that Europe is relatively genetically homogeneous, but distinct sub-population patterns of various types of genetic markers have been found, particularly along a southeast-northwest cline. For example, Cavalli-Sforza’s principal component analyses revealed five major clinal patterns throughout Europe, and similar patterns have continued to be found in more recent studies.

  1. A cline of genes with highest frequencies in the Middle East, spreading to lowest levels northwest. Cavalli-Sforza originally described this as faithfully reflecting the spread of agriculture in Neolithic times. This has been the general tendency in interpretation of all genes with this pattern.
  2. A cline of genes with highest frequencies among Finnish and Sami in the extreme north east, and spreading to lowest frequencies in the south west.
  3. A cline of genes with highest frequencies in the area of the lower Don and Volga rivers in southern Russia, and spreading to lowest frequencies in Spain, Southern Italy, Greece and the areas inhabited by Saami speakers in the extreme north of Scandinavia. Cavalli-Sforza associated this with the spread of Indo-European languages, which he links in turn to a "secondary expansion" after the spread of agriculture, associated with animal grazing.
  4. A cline of genes with highest frequencies in the Balkans and Southern Italy, spreading to lowest levels in Britain and the Basque country. Cavalli-Sforza associates this with "the Greek expansion, which reached its peak in historical times around 1000 and 500 BC but which certainly began earlier".
  5. A cline of genes with highest frequencies in the Basque country, and lower levels beyond the area of Iberia and Southern France. In perhaps the most well-known conclusion from Cavalli-Sforza, this weakest of the five patterns was described as isolated remnants of the pre-Neolithic population of Europe, "who at least partially withstood the expansion of the cultivators". It corresponds roughly to the geographical spread of rhesus negative blood types. In particular, the conclusion that the Basques are a genetic isolate has become widely discussed, but also a controversial conclusion.

He also created a phylogenetic tree to analyse the internal relationships among Europeans. He found four major 'outliers'- Basques, Sami, Finns and Icelanders; a result he attributed to their relative isolation (note: with the exception of the Icelanders, the rest of the groups speak non-Indo-European languages). Greeks and Yugoslavs represented a second group of less extreme outliers. The remaining populations clustered into several groups : "Celtic", "Germanic", "south-western Europeans", "Scandinavians" and "eastern Europeans".

Genetic studies after Cavalli-Sforza

New technologies have allowed for DNA haplotypes to be studied directly with increasing speed and accuracy, giving more refined data than was available in the original studies of Cavalli-Sforza.

Human Y-chromosome DNA haplogroups

There are four main Y-chromosome DNA haplogroups that account for most of Europe's patrilineal descent.

  • Haplogroup R1b is common all over Europe but especially common in Western Europe. Nearly all of this R1b in Europe is in the form of the R1b1a2 (2011 name) (R-M269) sub-clade, specifically within the R-L23 sub-sub-clade whereas R1b found in Central Asia, western Asia and Africa tends to be in other clades. It has also been pointed out that outlier types are present in Europe and are particularly notable in some areas such as Sardinia and Armenia. Haplogroup R1b frequencies vary from highs in western Europe in a steadily decreasing cline with growing distance from the Atlantic: 80–90% (Welsh, Basque, Irish, Scots, Bretons) around 70–80% in other areas of Spain, Britain and France and around 40–60% in most other parts of western Europe like eastern Germany, and northern-central Italy . It drops outside this area and is around 30% or less in areas such as southern Italy, Sweden, Poland, the Balkans and Cyprus. R1b remains the most common clade as one moves east to Germany, while farther east, in Poland, R1a is more common (see below). In southeastern Europe, R1b drops behind R1a in the area in and around Hungary and Serbia but is more common both to the south and north of this region. R1b in Western Europe is dominated by at least two sub-clades, R-U106, which is distributed from the east side of the Rhine into northern and central Europe (with a strong presence in England) and R-P312, which is most common west of the Rhine, including the British Isles. Some have posited that this haplogroup's presence in Europe dates back to the LGM, while others link it to the spread of the Centum branch of the Indo-European languages.
  • Haplogroup I is found in the form of various sub-clades throughout Europe and is found at highest frequencies in Bosnia and Herzegovina 65%, Croatia, Norway, Denmark, Sardinia, Serbia, Sweden parts of Germany, Romania/Moldova and other countries in the Balkan Peninsula and Scandinavia. This clade is found at its highest expression by far in Europe and may have been there since before the LGM.
  • Haplogroup R1a, almost entirely in the R1a1a sub-clade, is prevalent in much of Eastern and Central Europe (also in South and Central Asia). For example, there is a sharp increase in R1a1 and decrease in R1b1b2 as one goes east from Germany to Poland. It also has a substantial presence in Scandinavia (particularly Norway),. In the Baltic countries R1a frequencies decrease from Lithuania (45%) to Estonia (around 30%). Many people link this haplogroup to the spread of the Indo-European languages in Europe, while some limit this connection to the Satem branch of said language family.
  • Haplogroup E1b1b (formerly known as E3b) represents the last major direct migration from Africa into Europe. It is believed to have first appeared in the Horn of Africa approximately 26,000 years ago and dispersed to North Africa and the Near East during the late Paleolithic and Mesolithic periods. E1b1b lineages are closely linked to the diffusion of Afroasiatic languages. Although present throughout Europe, tt peaks in the Balkan region. It is also common in Italy and the Iberian peninsula. Haplogroup E1b1b1, mainly in the form of its E1b1b1a2 (E-V13) sub-clade, reaches frequencies above 47% around the area of Kosovo. This clade is thought to have arrived in Europe from western Asia either in the later Mesolithic, or the Neolithic. North Africa subclade E-M81 is also present in Spain and Portugal.
  • Putting aside small enclaves, there are also several haplogroups apart from the above three that are most common in certain areas of Europe.

  • Haplogroup N is common only in the northeast of Europe and in the form of its N1c1 sub-clade reaches frequencies of approximately 60% among Finns and approximately 40% among Lithuanians. This clade is also found far into the east in Siberia, Japan and China.
  • Haplogroup E1b1b1, mainly in the form of its E1b1b1a2 (E-V13) sub-clade, reaches frequencies above 40% around the area of Kosovo. This clade is thought to have arrived in Europe from western Asia either in the later Mesolithic, or the Neolithic.
  • Haplogroup J2, in various sub-clades (J2a, J2b), is found in levels of around 15–30% in parts of the Balkans and Italy and is common all over Europe and especially the Mediterranean basin.
  • Human mitochondrial DNA haplogroups

    There have been a number of studies about the mitochondrial DNA haplogroups (mtDNA) in Europe. In contrast to Y DNA haplogroups, mtDNA haplogroups did not show as much geographical patterning, but were more evenly ubiquitous. Apart from the outlying Saami, all Europeans are characterised by the predominance of haplogroups H, U and T. The lack of observable geographic structuring of mtDNA may be due to socio-cultural factors, namely the phenomena of polygyny and patrilocality. According to the University of Oulu Library in Finland:

    Classical polymorphic markers (i.e. blood groups, protein electromorphs and HLA antigenes) have suggested that Europe is a genetically homogeneous continent with a few outliers such as the Saami, Sardinians, Icelanders and Basques (Cavalli-Sforza et al. 1993, Piazza 1993). The analysis of mtDNA sequences has also shown a high degree of homogeneity among European populations, and the genetic distances have been found to be much smaller than between populations on other continents, especially Africa (Comas et al. 1997).

    The mtDNA haplogroups of Europeans are surveyed by using a combination of data from RFLP analysis of the coding region and sequencing of the hypervariable segment I. About 99% of European mtDNAs fall into one of ten haplogroups: H, I, J, K, M, T, U, V, W or X (Torroni et al. 1996a). Each of these is defined by certain relatively ancient and stable polymorphic sites located in the coding region (Torroni et al. 1996a)... Haplogroup H, which is defined by the absence of an AluI site at bp 7025, is the most prevalent, comprising half of all Europeans (Torroni et al. 1996a, Richards et al. 1998)... Six of the European haplogroups (H, I, J, K, T and W) are essentially confined to European populations (Torroni et al. 1994, 1996a), and probably originated after the ancestral Caucasoids became genetically separated from the ancestors of the modern Africans and Asians.

    Genetic studies suggest some maternal gene flow to eastern Europe from eastern Asia or southern Siberia 13,000 – 6,600 years BP. Analysis of Neolithic skeletons in the Great Hungarian Plain found a high frequency of eastern Asian mtDNA haplogroups, some of which survive in modern eastern European populations. Maternal gene flow to Europe from sub-Saharan Africa began as early as 11,000 years BP, although the majority of lineages, approximately 65%, are estimated to have arrived more recently, including during the Romanization period, the Arab conquests of southern Europe, and during the Atlantic slave trade.

    Ancient DNA

    The genetic history of Europe has mostly been reconstructed from the modern populations of Europe, assuming genetic continuity. This is because it is far easier to retrieve DNA from living subjects than ancient human remains. However, a growing number of ancient mtDNA and Y-DNA analyses are available from both the historical and prehistoric periods. 2015 saw an exponential rise in the number of ancient DNA samples available. From September 2014 to November 2015, the number of samples available went from 10 to 230.

    In a 2015 study, researchers reported on the DNA analysis of 94 skeletons (65 from their own analyses and 25 from previously reported results in literature) mostly 8,000–3,000 years old from Europe and Russia. In a 2016 study, researchers reported on the DNA analysis of 51 individuals from the Upper Paleolithic to the early Neolithic, ranging from 45,000 to 7,000 years ago.

    Ice Age

    From a study of 51 individuals, researchers were able to identify five separate genetic clusters of ancient Europeans during the Ice Age: the Věstonice Cluster (34,000–26,000 years ago), associated with the Gravettian culture; the Mal'ta Cluster (24,000–17,000), associated with the Mal'ta-Buret' culture, the El Mirón Cluster (19,000–14,000 years ago), associated with the Magdalenian culture; the Villabruna Cluster (14,000–7,000 years ago) and the Satsurblia Cluster (13,000 to 10,000 years ago), Caucasus hunter-gatherers.

    From around 37,000 years ago, all ancient Europeans began to share some ancestry with modern Europeans. This founding population is represented by GoyetQ116-1, a 35,000 year old specimen from Belgium. This lineage disappears from the record and is not found again until 19,000 BP in Spain at El Mirón, which shows strong affinities to GoyetQ116-1. During this interval, the distinct Věstonice Cluster is predominant in Europe, even at Goyet. The re-expansion of the El Mirón Cluster coincided with warming temperatures following the retreat of the glaciers during the Last Glacial Maximum. From 37,000 to 14,000 years ago, the population of Europe consisted of an isolated population descended from a founding population that didn't interbreed significantly with other populations.

    Around 14,000 years ago, the Villabruna Cluster shifted away from GoyetQ116-1 affinity and started to show more affinity with the Near East, a shift which coincided with the warming temperatures of the Bølling-Allerød interstadial. This genetic shift shows that Near East populations had probably already begun moving into Europe during the end of the Upper Paleolithic, about 6,000 years earlier than previously thought, before the introduction of farming. A few specimens from the Villabruna Cluster also show genetic affinities for East Asians that are derived from gene flow.

    Post-Ice Age

    Researchers identified three major waves of human migrations into Europe: the original mesolithic hunter-gatherers, neolithic farmers from the Levant about 8000 years ago, and a third wave about 5000 years ago from the Yamna culture, horse-riding herders from the Pontic–Caspian steppe.

    Yamna component

    The Yamna may have brought Indo-European languages with them. The Yamna altered the gene pools of northern and central Europe; some populations such as Norwegians, owe around 50% of their ancestry to this group.

    The Yamna component contains partial ancestry from an Ancient North Eurasian component first identified in Mal'ta. According to Iosif Lazaridis, "the Ancient North Eurasian ancestry is proportionally the smallest component everywhere in Europe, never more than 20 percent, but we find it in nearly every European group we’ve studied." This genetic component does not come directly from the Mal'ta lineage itself, but a related lineage that separated from the Mal'ta lineage.

    Half of the Yamna component is derived from a Caucasus hunter-gatherer strand (Satsurblia). On November 16, 2015, in a study published in the journal Nature Communications, geneticists announced that they had found a new fourth ancestral "tribe" or "strand" which had contributed to the modern European gene pool. They analysed genomes from two hunter-gatherers from Georgia which were 13,300 and 9,700 years old, and found that these Caucasus hunter-gatherers were probably the source of the farmer-like DNA in the Yamna.

    According to co-author Dr Andrea Manica of the University of Cambridge: "The question of where the Yamnaya come from has been something of a mystery up to now....we can now answer that as we've found that their genetic make-up is a mix of Eastern European hunter-gatherers and a population from this pocket of Caucasus hunter-gatherers who weathered much of the last Ice Age in apparent isolation."

    Genetic adaptations

    In a 2015 study based on 230 ancient DNA samples, researchers traced the origins of several genetic adaptations found in Europe. The original mesolithic hunter-gatherers were dark skinned and blue eyed. The HERC2 and OCA2 variations for blue eyes are derived from the original mesolithic hunter-gatherers, and the genes were also found in the Yamna people. The HERC2 variation for blue eyes first appears around 13,000 to 14,000 years ago in Italy and the Caucasus.

    The migration of Neolithic farmers into Europe brought along several new adaptations. The variation for light skin colour was introduced to Europe by the neolithic farmers. After the arrival of the neolithic farmers, a SLC22A4 mutation was selected for, a mutation which probably arose to deal with ergothioneine deficiency but increases the risk of ulcerative colitis, coeliac disease, and irritable bowel syndrome.

    The genetic variations for lactase persistence and greater height came with the Yamna people.

    Archaic ancestry

    Due to natural selection, the percentage of Neanderthal DNA in ancient Europeans gradually decreased over time. From 45,000 BP to 7,000 BP, the percentage dropped from around 3–6% to 2%. The removal of Neanderthal-derived alleles occurred more frequently around genes than other parts of the genome.

    Relation between Europeans and other populations

    A 2007 study by Bauchet, which utilised about 10,000 autosomal DNA SNPs arrived at similar results. Principal component analysis clearly identified four widely dispersed groupings, corresponding to Africa, Europe, Central Asia and South Asia. PC1 separated Africans from the other populations, PC2 divided Asians from Europeans and Africans, whilst PC3 split Central Asians apart from South Asians.

    European population sub-structure

    A study in May 2009 of 19 populations from Europe using 270,000 SNPs highlighted the genetic diversity of European populations corresponding to the northwest to southeast gradient and distinguished "four several distinct regions" within Europe:

  • Finland, showing the greatest distance to the rest of Europeans.
  • the Baltic region (Estonia, Latvia and Lithuania), western Russia and eastern Poland.
  • Central and Western Europe.
  • Italy, "with the southern Italians being more distant".
  • In this study, barrier analysis revealed "genetic barriers" between Finland, Italy and other countries and interestingly, barriers could also be demonstrated within Finland (between Helsinki and Kuusamo) and Italy (between northern and southern part, Fst=0.0050). Fst (Fixation index) was found to correlate considerably with geographic distances ranging from ≤0.0010 for neighbouring populations to 0.0200–0.0230 for Southern Italy and Finland. For comparisons, pair-wise Fst of non-European samples were as follows: Europeans – Africans (Yoruba) 0.1530; Europeans – Chinese 0.1100; Africans (Yoruba) – Chinese 0.1900.

    A study by Chao Tian in August 2009 extended the analysis of European population genetic structure to include additional southern European groups and Arab populations (Palestinians, Druzes...) from the Near-East. This study determined autosomal Fst between 18 population groups and concluded that, in general, genetic distances corresponded to geographical relationships with smaller values between population groups with origins in neighbouring countries/regions (for example, Greeks/Tuscans: Fst=0.0010, Greeks/Palestinians: Fst=0.0057) compared with those from very different regions in Europe (for example Greeks/Swedish: Fst=0.0087, Greeks/Russians: Fst=0.0108).

    Autosomal DNA

    Seldin (2006) used over 5,000 autosomal SNPs. It showed "a consistent and reproducible distinction between ‘northern’ and ‘southern’ European population groups". Most individual participants with southern European ancestry (Italians, Greeks, Portuguese, Spaniards), and Ashkenazi Jews have >85% membership in the southern population; and most northern, western, central, and eastern Europeans (Swedes, English, Irish, Germans, and Ukrainians) have >90% in the northern population group. However, many of the participants in this study were actually American citizens who self-identified with different European ethnicities based on self-reported familial pedigree.

    A similar study in 2007 using samples exclusively from Europe found that the most important genetic differentiation in Europe occurs on a line from the north to the south-east (northern Europe to the Balkans), with another east-west axis of differentiation across Europe. Its findings were consistent with earlier results based on mtDNA and Y-chromosonal DNA that support the theory that modern Iberians (Spanish and Portuguese) hold the most ancient European genetic ancestry, as well as separating Basques and Sami from other European populations.

    It suggested that the English and Irish cluster with other Northern and Eastern Europeans such as Germans and Poles, while some Basque and Italian individuals also clustered with Northern Europeans. Despite these stratifications, it noted the unusually high degree of European homogeneity: "there is low apparent diversity in Europe with the entire continent-wide samples only marginally more dispersed than single population samples elsewhere in the world".

    In 2008, two international research teams published analyses of large-scale genotyping of large samples of Europeans, using over 300,000 autosomal SNPs. With the exception of usual isolates such as Basques, Finns and Sardinians, the European population lacked sharp discontinuities (clustering) as previous studies have found (see Seldin et al. 2006 and Bauchet et al. 2007), although there was a discernible south to north gradient. Overall, they found only a low level of genetic differentiation between subpopulations, and differences which did exist were characterised by a strong continent-wide correlation between geographic and genetic distance. In addition, they found that diversity was greatest in southern Europe due a larger effective population size and/or population expansion from southern to northern Europe. The researchers take this observation to imply that genetically, Europeans are not distributed into discrete populations.

    A study on north-eastern populations, published in March 2013, found that Komi peoples formed a pole of genetic diversity that is distinct from other populations.

    Autosomal genetic distances (Fst) based on SNPs (2009)

    The genetic distance between populations is often measured by Fixation index (Fst), based on genetic polymorphism data, such as single-nucleotide polymorphisms (SNPs) or microsatellites. Fst is a special case of F-statistics, the concept developed in the 1920s by Sewall Wright. Fst is simply the correlation of randomly chosen alleles within the same sub-population relative to that found in the entire population. It is often expressed as the proportion of genetic diversity due to allele frequency differences among populations.

    The values range from 0 to 1. A zero value implies that the two populations are panmixis, that they are interbreeding freely. A value of one would imply that the two populations are completely separate. The greater the Fst value, the greater the genetic distance. Essentially, these low Fst values suggest that the majority of genetic variation is at the level of individuals within the same population group (~ 85%); whilst belonging to a different population group within same ‘race’/ continent, and even to different racial/ continental groups added a much smaller degree of variation (3–8%; 6–11%, respectively).

    CEU – Utah residents with ancestry from Northern and Western Europe.

    Apparent migrations into Europe

    The prehistory of the European peoples can be traced by the examination of archaeological sites, linguistic studies and by the examination of the DNA of the people who live in Europe or from ancient DNA. The research continues and so theories rise and fall. Although it is possible to track migrations of people across Europe using founder analysis of DNA, most information on these movements comes from archaeology.

    It is important to note that the colonisation of Europe did not occur in discrete migrations, as might appear to be suggested. Rather, the settlement process was complex and "likely to have occurred in multiple (sic) waves from the east and to have been subsequently obscured by millennia of recurrent gene flow".

    Palaeolithic Era

    Neanderthals inhabited much of Europe and western Asia from as far back as 130,000 years ago. They existed in Europe as late as 30,000 years ago. They were eventually replaced by anatomically modern humans (A.M.H.), Cro-Magnons, who began to appear in Europe c. 40,000 years ago. Given that the two hominid species likely coexisted in Europe, anthropologists have long wondered whether the two interacted. The question was resolved only in 2010, when it was established that Eurasian populations exhibit Neanderthal admixture, estimated at 1.5–2.1% on average. The question now became whether this admixture had taken place in Europe, or rather in the Levant, prior to A.M.H. migration into Europe.

    There has also been speculation about the inheritance of specific genes from Neanderthals. For example, one MAPT locus 17q21.3 which is split into deep genetic lineages H1 and H2. Since the H2 lineage seems restricted to European populations, several authors had argued for inheritance from Neanderthals beginning in 2005. However the preliminary results from the sequencing of the full Neanderthal Genome at that time (2009), failed to uncover evidence of interbreeding between Neanderthals and modern humans. By 2010, findings by Svante Pääbo (Max Planck Institute for Evolutionary Anthropology at Leipzig, Germany), Richard E. Green (University of California, Santa Cruz), and David Reich (Harvard Medical School), comparing the genetic material from the bones of three Neanderthals with that from five modern humans, did show a relationship between Neanderthals and modern people outside Africa.

    It is thought that modern humans began to colonise Europe during the Upper Paleolithic about 40,000 years ago. Some evidence shows the spread of the Aurignacian culture. From a Y-chromosome perspective, Semino (2000) proposed that the large haplogroup R1 is an ancient Eurasiatic marker brought in by Homo sapiens who diffused west into Europe ~ 40 ky ago. Haplogroup I might represent another putative Palaeolithic marker whose age has been estimated to ~ 22 kYa. Whilst it is 'unique' to Europe, it probably arose in descendants of men arriving from the Middle East c. 20–25 kYa, arising from parent haplogroup IJ. At this time, another Upper Palaeolithic culture appears, the Gravettian culture.

    Thus the genetic data suggests that, at least from the perspective of patrilineal ancestry, modern humans might have taken two colonising routes, one from the Middle East via the Balkans and another from Central Asia to the north of the Black Sea. It is now believed that the haplogroup R1 is substantially younger: a 2008 study dated the most recent common ancestor of the haplogroup R1 by 18.5 kYa, and the most recent ancestor of the haplogroup IJ by 38.5 kYa, suggesting that haplogroup IJ colonists formed the first wave and haplogroup R1 arrived much later.

    Martin Richards et al. found that 15–40% of extant mtDNA lineages trace back to the Palaeolithic migrations (depending on whether one allows for multiple founder events). MtDNA haplogroup U5, dated to be ~ 40–50 kYa, arrived during the first early upper Palaeolithic colonisation. Individually, it accounts for 5–15% of total mtDNA lineages. Middle U.P. movements are marked by the haplogroups HV, I and U4. HV split into Pre-V (around 26,000 years old) and the larger branch H, both of which spread over Europe, possibly via Gravettian contacts.

    Haplogroup H accounts for about half the gene lines in Europe, with many subgroups. The above mtDNA lineages or their precursors, are most likely to have arrived into Europe via the Middle East. This contrasts with Y DNA evidence, whereby some 50%-plus of male lineages are characterised by the R1 superfamily, which is of possible central Asian origin. Ornella Semino postulates that these differences "may be due in part to the apparent more recent molecular age of Y chromosomes relative to other loci, suggesting more rapid replacement of previous Y chromosomes. Gender-based differential migratory demographic behaviors will also influence the observed patterns of mtDNA and Y variation".

    Last Glacial Maximum: refugia and re-colonization

    The Last Glacial Maximum ("LGM") started c. 30 ka BC, at the end of MIS 3, leading to a depopulation of Northern Europe. According to the classical model, people took refuge in climatic sanctuaries (or refugia) as follows:

  • Northern Iberia and Southwest France, together making up the "Franco-Cantabrian" refugium
  • The Balkans
  • Ukraine and more generally the northern coast of the Black Sea
  • Italy.
  • This event decreased the overall genetic diversity in Europe, a "result of drift, consistent with an inferred population bottleneck during the Last Glacial Maximum". As the glaciers receded from about 16,000–13,000 years ago, Europe began to be slowly repopulated by people from refugia, leaving genetic signatures.

    Some Y haplogroup I clades appear to have diverged from their parental haplogroups sometime during or shortly after the LGM. Haplogroup I2 is prevalent in the western Balkans, as well as the rest of southeastern and central-eastern Europe in more moderate frequencies. Its frequency drops rapidly in central Europe, suggesting that the survivors bearing I2 lineages expanded predominantly through south-eastern and central-eastern Europe.

    Cinnioglu sees evidence for the existence of an Anatolian refuge, which also harboured Hg R1b1b2. Today, R1b dominates the y chromosome landscape of western Europe, including the British Isles, suggesting that there could have been large population composition changes based on migrations after the LGM.

    Semino, Passarino and Pericic place the origins of haplogroup R1a within the Ukrainian ice-age refuge. Its current distribution in eastern Europe and parts of Scandinavia are in part reflective of a re-peopling of Europe from the southern Russian/Ukrainian steppes after the Late Glacial Maximum.

    From an mtDNA perspective, Richards et al. found that the majority of mtDNA diversity in Europe is accounted for by post-glacial re-expansions during the late upper Palaeolithic/ Mesolithic. "The regional analyses lend some support to the suggestion that much of western and central Europe was repopulated largely from the southwest when the climate improved. The lineages involved include much of the most common haplogroup, H, as well as much of K, T, W, and X." The study could not determine whether there were new migrations of mtDNA lineages from the near east during this period; a significant input was deemed unlikely.

    The alternative model of more refugees was discussed by Bilton et al.

    Neolithic migrations

    A big cline in genetic variation that has long been recognised in Europe seems to show important dispersals from the direction of the Middle East. This has often been linked to the spread of farming technology during the Neolithic, which has been argued to be one of the most important periods in determining modern European genetic diversity.

    The Neolithic started with the introduction of farming, beginning in SE Europe approximately 7000–3000 BC, and extending into NW Europe between 4500–1700 BC. During this era, the Neolithic revolution led to drastic economic as well as socio-cultural changes in Europe and this is also thought to have had a big effect on Europe's genetic diversity, especially concerning genetic lineages entering Europe from the Middle East into the Balkans. There were several phases of this period:

  • In a late European Mesolithic prelude to the Neolithic, it appears that Near Eastern peoples from areas that already had farming, and who also had sea-faring technology, had a transient presence in Greece, for example at Franchthi Cave.
  • There is consensus that agricultural technology and the main breeds of animals and plants which are farmed entered Europe from somewhere in the area of the Fertile Crescent and specifically the Levant region from the Sinai to Southern Anatolia (Less certainly, this agricultural revolution is sometimes argued to have in turn been partly triggered by movements of people and technology coming across the Sinai from Africa.)
  • A later stage of the Neolithic, the so-called Pottery Neolithic, saw an introduction of pottery into the Levant, Balkans and Southern Italy (it had been present in the area of modern Sudan for some time before it is found in the Eastern Mediterranean but it is thought to have developed independently) and this may have also been a period of cultural transfer from the Levant into the Balkans.
  • Spread of neolithic technology

    An important issue regarding the genetic impact of neolithic technologies in Europe is the manner by which they were transferred into Europe; whether farming was introduced by a significant migration of farmers from the Near East (Cavalli-Sforza's biological demic diffusion model) or a "cultural diffusion" or a combination of the two. Secondarily, population geneticists have tried to clarify whether any genetic signatures of Near Eastern origin correspond to the expansion routes postulated by the archaeological evidence.

    Martin Richards estimated that only 11% of European mtDNA is due to immigration in this period, suggesting that farming was spread primarily due to being adopted by indigenous Mesolithic populations, rather than due to immigration from Near East. Gene flow from SE to NW Europe seems to have continued in the Neolithic, the percentage significantly declining towards the British Isles. Classical genetics also suggested that the largest admixture to the European Paleolithic/Mesolithic stock was due to the Neolithic revolution of the 7th to 5th millennia BC. Three main mtDNA gene groups have been identified as contributing Neolithic entrants into Europe: J, T1 and U3 (in that order of importance). With others, they amount up to around 20% of the gene pool.

    In 2000, Semino's study on Y DNA revealed the presence of haplotypes belonging to the large clade E1b1b1 (E-M35). These were predominantly found in the southern Balkans, southern Italy and parts of Iberia. Semino connected this pattern, along with J haplogroup subclades, to be the Y-DNA component of Cavalli-Sforza's Neolithic demic-diffusion of farmers from the Near East. Rosser et al. rather saw it as a (direct) 'North African component' in European genealogy, although they did not propose a timing and mechanism to account for it. Underhill and Kivisild (2007) also described E1b1b as representing a late-Pleistocene migration from Africa to Europe over the Sinai Peninsula in Egypt, evidence for which does not show up in mitochondrial DNA.

    Concerning timing the distribution and diversity of V13 however, Battaglia et al. (2008) proposed an earlier movement whereby the E-M78* lineage ancestral to all modern E-V13 men moved rapidly out of a Southern Egyptian homeland and arrived in Europe with only Mesolithic technologies. They then suggest that the E-V13 sub-clade of E-M78 only expanded subsequently as native Balkan 'foragers-cum-farmers' adopted Neolithic technologies from the Near East. They propose that the first major dispersal of E-V13 from the Balkans may have been in the direction of the Adriatic Sea with the Neolithic Impressed Ware culture often referred to as Impressa or Cardial. Peričic et al. (2005), rather propose that the main route of E-V13 spread was along the Vardar-Morava-Danube river 'highway' system.

    In contrast to Battaglia, Cruciani et al. (2007) tentatively suggested (i) a different point where the V13 mutation happened on its way from Egypt to the Balkans via the Middle East, and (ii) a later dispersal time. The authors proposed that the V13 mutation first appeared in western Asia, where it is found in low but significant frequencies, whence it entered the Balkans sometime after 11 kYa. It later experienced a rapid dispersal which he dated to c. 5300 years ago in Europe, coinciding with the Balkan Bronze Age. Like Peričic et al. they consider that "the dispersion of the E-V13 and J-M12 haplogroups seems to have mainly followed the river waterways connecting the southern Balkans to north-central Europe".

    More recently, Lacan et al. (2011) announced that a 7000-year-old skeleton in a Neolithic context in a Spanish funeral cave, was an E-V13 man. (The other specimens tested from the same site were in haplogroup G2a, which has been found in Neolithic contexts throughout Europe.) Using 7 STR markers, this specimen was identified as being similar to modern individuals tested in Albania, Bosnia, Greece, Corsica, and Provence. The authors therefore proposed that, whether or not the modern distribution of E-V13 of today is a result of more recent events, E-V13 was already in Europe within the Neolithic, carried by early farmers from the Eastern Mediterranean to the Western Mediterranean, much earlier than the Bronze age. This supports the proposals of Battaglia et al. rather than Cruciani et al. at least concerning earliest European dispersals, but E-V13 may have dispersed more than once. Even more recent than the Bronze Age, it has also been proposed that modern E-V13's modern distribution in Europe is at least partly caused by Roman era movements of people. (See below.)

    After an initial focus upon E1b1b as a Neolithic marker, a more recent study in January 2010, looked at Y haplogroup R1b1b, which is much more common in Western Europe. Mark Jobling said: "We focused on the commonest Y-chromosome lineage in Europe, carried by about 110 million men, it follows a gradient from south-east to north-west, reaching almost 100% frequency in Ireland. We looked at how the lineage is distributed, how diverse it is in different parts of Europe, and how old it is." The results suggested that the lineage R1b1b2 (R-M269), like E1b1b or J lineages, spread together with farming from the Near East. Dr Patricia Balaresque added: "In total, this means that more than 80% of European Y chromosomes descend from incoming farmers. In contrast, most maternal genetic lineages seem to descend from hunter-gatherers. To us, this suggests a reproductive advantage for farming males over indigenous hunter-gatherer males during the switch from hunting and gathering, to farming".

    A more recent article concerning R1b made the counter claim that "the data are still controversial and the analyses so far performed are prone to a number of biases" and propose that the data are best explained by "an earlier, pre-Neolithic dispersal of haplogroups from a common ancestral gene pool".

    One hypothesis—the Anatolian hypothesis—suggests an origin of the Indo-Europeans in Anatolia with an expansion due to the Neolithic revolution.

    Bronze and Iron Age migrations

    The Bronze Age saw the development of long-distance trading networks, particularly along the Atlantic Coast and in the Danube valley. There was migration from Norway to Orkney and Shetland in this period (and to a lesser extent to mainland Scotland and Ireland). There was also migration from Germany to eastern England. Martin Richards estimated that there was about 4% mtDNA immigration to Europe in the Bronze Age.

    Another theory about the origin of the Indo-European language centres around a hypothetical Proto-Indo-European people, who traced in the Kurgan hypothesis, to north of the Black and Caspian Seas at about 4500 BC. They domesticated the horse and possibly invented the wheel, and are considered to have spread their culture and genes across Europe. The Y haplogroup R1a is a proposed marker of these "Kurgan" genes, as is the Y Haplogroup R1b, although these haplogroups as a whole may be much older than the language family.

    The rate of their physical expansion would have declined at the western edge of the steppe, but carriers of the R1a haplogroup are present in substantial numbers as far west as Germany. The Kurgan culture and language went farther, carried by the R1b haplogroup, and eventually replacing most cultures and languages all the way to the Atlantic. During the Iron Age, Celts are recorded as having moved from Gaul into Italy, Eastern Europe and Anatolia. The relationship between the Celts of Gaul and Spain is unclear as any migration occurred before records exist.

    In the far north, carriers of the Y-haplogroup N arrived to Europe from Siberia, eventually expanding as far as Finland, though the specific timing of their arrival is uncertain. The most common North European subclade N1c1 is estimated to be around 8,000 years old. There is evidence of human settlement in Finland dating back to 8500 BCE, linked with Kunda culture and its putative ancestor Swiderian culture, but the latter is thought to have European origin. The geographical spread of haplogroup N in Europe is well aligned with the Pit–Comb Ware culture, whose emergence is commonly dated c. 4200 BCE, and with the distribution of Uralic languages. Mitochondrial DNA studies of Sami people, Haplogroup U5 are consistent with multiple migrations to Scandinavia from Volga-Ural region, starting 6,000 to 7,000 years before present.

    The relationship between roles of European and Asian colonists in the prehistory of Finland is a point of some contention, and some scholars insist that Finns are "predominantly Eastern European and made up of people who trekked north from the Ukrainian refuge during the Ice Age". Farther east, the issue is less contentious. Haplogroup N carriers account for a significant part of all non-Slavic ethnic groups in northern Russia, including 37% of Karelians, 35% of Komi people (65% according to another study), 67% of Mari people, as many as 98% of Nenets people, 94% of Nganasans, and 86% to 94% of Yakuts.

    Roman and post-Roman period

    During the period of the Roman Empire, historical sources show that there were many movements of people around Europe, both within and outside the Empire. Historic sources sometimes cite instances of genocide inflicted by the Romans upon rebellious provincial tribes. If this did in fact occur, it would have been limited given that modern populations show considerable genetic continuity in their respective regions. The process of 'romanisation' appears to have been accomplished by the colonisation of provinces by a few Latin speaking administrators, military personnel, settled veterans, and private citizens (merchants, traders) who emanated from the Empire's various regions (and not merely from Roman Italy). They served as a nucleus for the acculturation of local notables.

    Given their small numbers and varied origins, Romanization does not appear to have left distinct genetic signatures in Europe. Indeed, Romance-speaking populations in the Balkans, like Romanians, Aromanians, Moldovans, etc. have been found to genetically resemble neighbouring Greek and South Slavic-speaking peoples rather than modern Italians, proving that they were genetically speaking, mainly through I2a2 M-423 and E1b1b1, V-13 Haplogroups native to this area.

    Steven Bird has speculated that E1b1b1a was spread during the Roman era through Thracian and Dacian populations from the Balkans into the rest of Europe.

    Concerning the late Roman period of (not only) Germanic "Völkerwanderung", some suggestions have been made, at least for Britain, with Y haplogroup I1a being associated with Anglo-Saxon immigration in eastern England, and R1a being associated with Norse immigration in northern Scotland.

    References

    Genetic history of Europe Wikipedia