Field of Science

Showing posts with label genealogy. Show all posts
Showing posts with label genealogy. Show all posts

Doug McDonald's BioGeographical Ancestry Test

In my quest to explore various aftermarket sources of information on my 23andMe raw data, I emailed my data to Doug McDonald for his BGA testing service. I first found out about his service on the 23++ Chrome extension website, although the email they have listed for McDonald is wrong. I finally got the correct email (mcdonald at scs dot uiuc dot edu) through the 23andMe community forums.

Here's what he sent back:

Michelle: The program says you are English, 100%. However, on the chromosomes one sees a small <.5% American block that is fairly strong and likely real, but not 100% sure. The other blocks on the chromosomes are likely noise.

Here's a look at the chromosome painting you get from McDonald (similar to 23andMe's "ancestry painting"):

Click to enlarge.

The red sections are European, the brown sections are no data/not enough data, the two small blue sections are African, the small grey section is Mideastern South Asian, and the green section is Amerindian. I would like to know which side the Amerindian segment is on so I can explore this further. My mother's 23andMe results are due any week now, so I will likely email her data to him as soon as I get my hands on it.

You also get a number of what I think are PCA plots, showing where you map out with respect to his reference groups. He only sent one for me, but he sent multiple for my boyfriend. Here's mine:

Click to enlarge.

The crosshairs show where I fall, in the middle of the English cluster, close to the French side. I thought I had more of an Irish minority than a French minority, but hey, whatever. I notice he doesn't have a sample German population, either.

Most of this is not new information to me, but does provide corroborating evidence for the ~1% Amerindian that shows up for me in Razib's admixture analysis. The results for my boyfriend go along more or less with what came up for him at the Harappa Ancestry Project.

More on ADMIXTURE.

Here's another ADMIXTURE analysis of my heritage that uses the same K=12 but with different sample populations for comparison. You can compare this to my prior post to see how my admixture has 'changed'. This is a good illustration of the limitations of this method: it is highly dependent on what you put in it!

My Asian admixture is gone, and my Arab/Near Eastern admixture is greatly reduced. But not only that, look at what has happened with even my European admixture! It seems to split three ways now: Western (typified by the Spanish Basque population), Southern (Sardinian, which is an island off of mainland Italy), and North/eastern (closest match appears to be equally Russian and the islands off the northern coast of Scotland). I still have the 1% Amerindian; that doesn't seem to be going anywhere as long as the Pima are in the mix. I'd be interested to see what happened if other, more northern Native American sample populations are included. (Perhaps I should learn how to run ADMIXTURE myself, but I don't have a computer that runs Linux, and I'm loathe to install a dual-boot.) At any rate, this more closely resembles how I had previously thought of my heritage, but is it actually closer to the truth? It's hard to say for certain.

Edit: Another interesting thing is what I seem to lack in comparison to your average white American, which is the Berber (North African, with lots of immigration to Europe; think Othello and Morgan Freeman in Robin Hood) genetic input in bright pink. Huh.

My genotyping results, plus a brief introduction to population genetics

My 23 and Me results finally came, and I've spent the little amount of free time I've had this week exploring the results. If you are unfamiliar, 23 and Me is a personal genotyping service. In short, I sent them some DNA and they identified various genes and gave me the results. The genotyping method used by 23 and Me is different from genome sequencing, because instead of actually determining every single nucleotide in my entire genome, they isolate segments called single-nucleotide polymorphisms (SNPs) and identify the gene alleles at that location. This is much cheaper and faster than sequencing (you may remember that your DNA contains a lot of non-coding regions, redundancies, etc), and still provides a good amount of meaningful results.

My friend Razib was kind enough to include my raw data in his hobby genealogy dataset that he's been playing with in a program called ADMIXTURE. The idea behind this software is that you input the genetic data for a group of people, and the program determines the relative contribution of hypothetical "parent" populations to each individual's genome. Let's look at some examples:

Click to enlarge. The K=2 at the bottom means that this plot was generated assuming only two parent populations. Usually when you set K=2 you wind up separating African vs. non-African genetic components, but this dataset doesn't have any African populations. It is almost entirely Eurasian, so the first split is actually east (Asian) to west (European). The red component shows the proportion of European ancestry, and the teal component shows the proportion of Asian ancestry, if we assume that these are the only two reference populations. The biggest problem with this kind of analysis is that it is often hard to determine which K's are meaningful. Razib has a good caveat on the limitations of ADMIXTURE here. Anyhow, my data is at the very bottom. As you can see, when K=2 I am overwhelmingly European with a tiny Asian component. This is more or less in line with other white Americans and many Europeans (see the Irish, Swedish, and French samples), but also very similar to, say, Palestinians. Clearly we need to add more parent populations.

Now we have K=3, so three theoretical parent populations. The first thing you may notice is that the Pima group separates out from the others as the only group with a large amount of lime green. The Pima are a Native American ethnic group, and therefore the most genetically distant from the other groups because of the amount of time they spent isolated from the rest of the world. So now our three reference groups are European (still red), Asian (now blue), and "Native American" (lime green). However, notice that the group with the second-largest green component is the Yakut, a Russian ethnic group. It is more likely that when we set K=3, the third group is actually a Siberian component. This makes a bit of sense because Native American ethnic groups are widely believed to be descended from a Siberian group that crossed the Bering Strait into Alaska. And the green seen in the other populations would likely be a result of Siberian groups moving the other way, into western Eurasia. Again, my data is at the bottom. If we assume K=3, I have much less Asian genetic input because most of it has actually segregated out into Siberian input.

You with me so far? Let's get a little crazy.

This time K=12. The parent populations can be roughly described as: Dai (I am actually unsure exactly where this population is from, but signs point to south Asia, China/Myanmar-ish), Druze (Near East ethnic group, primarily in the countries surrounding Israel), Lahu (Southeast Asian), Southern European, Arab (1), Native American, Northern European, Arab (2), Siberian, Northeast Asian, Native American (2), and South Asian. The two different Native American groups are likely representing a north/south split, but that is just speculation on my part.

As you can see, I am somewhat unsurprisingly very European. 93% of my genome shuffles out as European, although I am a bit surprised that it is more southern than northern, especially considering that all of my genealogical lines that I can trace back to country of origin are overwhelmingly from the British Isles, with French and German minorities. This difference is probably due to early northward migrations of southern European populations into the British Isles, although I cannot count out the possibility of a more recent southern European ancestor. My paternal grandmother's lines are largely unknown with respect to country of origin, and I always thought she was a bit swarthy (see photo). I have had multiple strangers approach me to ask if I'm Italian (I don't see it personally, but whatever), so I think it is a bit likely that she had more recent Mediterranean ancestry.

I have roughly the same amount of Near East and Arab ancestry as your average white American or European, but I seem to have a bit more Southeast Asian? Not much more, but I'm still not sure how to reconcile that. I have the tracest amounts of Native American ancestry, a little bit on each side of the split, but not significantly more than your average American, so much to everyone in my family's chagrin, I think it is very unlikely that I will find any recent Native American ancestors in my genealogical searches. Sorry guys.