Field of Science

Showing posts with label genes. Show all posts
Showing posts with label genes. Show all posts

Doug McDonald's BioGeographical Ancestry Test

In my quest to explore various aftermarket sources of information on my 23andMe raw data, I emailed my data to Doug McDonald for his BGA testing service. I first found out about his service on the 23++ Chrome extension website, although the email they have listed for McDonald is wrong. I finally got the correct email (mcdonald at scs dot uiuc dot edu) through the 23andMe community forums.

Here's what he sent back:

Michelle: The program says you are English, 100%. However, on the chromosomes one sees a small <.5% American block that is fairly strong and likely real, but not 100% sure. The other blocks on the chromosomes are likely noise.

Here's a look at the chromosome painting you get from McDonald (similar to 23andMe's "ancestry painting"):

Click to enlarge.

The red sections are European, the brown sections are no data/not enough data, the two small blue sections are African, the small grey section is Mideastern South Asian, and the green section is Amerindian. I would like to know which side the Amerindian segment is on so I can explore this further. My mother's 23andMe results are due any week now, so I will likely email her data to him as soon as I get my hands on it.

You also get a number of what I think are PCA plots, showing where you map out with respect to his reference groups. He only sent one for me, but he sent multiple for my boyfriend. Here's mine:

Click to enlarge.

The crosshairs show where I fall, in the middle of the English cluster, close to the French side. I thought I had more of an Irish minority than a French minority, but hey, whatever. I notice he doesn't have a sample German population, either.

Most of this is not new information to me, but does provide corroborating evidence for the ~1% Amerindian that shows up for me in Razib's admixture analysis. The results for my boyfriend go along more or less with what came up for him at the Harappa Ancestry Project.

My Dodecad identity-by-state neighbors.

Dienekes recently computed an identity-by-state (IBS is his abbreviation, although that means something very different to me) similarity matrix for all the individuals in the Dodecad Ancestry Project. The IBS score is a ratio between individuals that, to my understanding, can be read as a proxy for genetic similarity.

He also posted an R object that you can download to find your closest IBS neighbors. I ran the program for myself (DOD493) and added the ancestry of DOD participants that have posted in the ancestry thread. In the interest of saving space, I have removed all the DOD samples that have not disclosed their ancestry (you can see which ones are missing in the "Rank" column). Click to enlarge:


Not much here surprises me. Most of my neighbors have at least one of my ancestral components (British, German, and French). Obvious exceptions are the two Lithuanians, the Spaniard, the Scandinavian, and the Belarusian. My guess is that the Eastern Europeans have unstated Western European admixture and that the Spaniard probably has French admixture specifically.

Something I do find a bit puzzling is that I don't seem to have much genetic similarity with Italians. This has been puzzling me all throughout, ever since I began trying to decipher my own admixture. All through my adult life I have been approached by strangers at the bus stop or in restaurants to tell me that I look identical to their Sicilian (always Sicilian for some reason) niece/grandma/whatever. (I think this may just be the nature of Italians, that they like to seek out their own kind. I've never had someone come up to me and tell me I look like I belong to any other ethnic group.) I expected to find some Italian admixture, probably on my paternal grandmother's side, just because so many Italians seem to think I'm one of them. If I'm not, why is my phenotype so misleading?

Edited to add photo evidence. Separated at birth, obviously. Dude, Snooki doesn't even have brown eyes.

Promethease results

EDIT: I can't believe I forgot to mention that 23andMe is having a sale today! Today only (Monday, April 11th), up-front fees (usually $199) are waived, so that all you have to pay is shipping and the $9/mo subscription service for 12 months. You save 2/3rds of the usual cost!

While browsing the 23andMe community forums this weekend, I found out that there are a bunch of projects and programs you can plug your raw data into to get independent genotype analyses. Some of these projects I'd heard of before by reading genetics blogs, but others I had not. One that caught my eye is Promethease, a program that you run from your computer which gives you results very similar to 23andMe's health section. It compares your raw data to the SNPedia wiki and gives you a summary of your results in an html file that is saved to your computer. Although the program uses information from the internet to interpret your data, your data and results are stored and processed on your computer, so none of your information is sent over the internet. There's an optional $2 donation (paid over Amazon.com) for greater speed and extra features, which I opted to pay because a) I'm impatient and b) I like to support things that I find useful.

The interface is less user-friendly than 23andMe's graphical interface, but it is more in-depth. I doubt that casual users will want to bother with trying to make sense of it, but there are some gems to be found. The report starts with a list of your "most interesting" SNPs (I'd like to know what algorithm they use to determine the relative interestingness of a given SNP), which for me starts off by telling me that I'm female (thanks for noticing), and that I'm a carrier of hemochromatosis, which I already knew from 23andMe. However it does get more interesting from there:


Above I have extracted the analysis for an SNP in oxytocin receptors that has an association with empathy (click to enlarge). As you can see, I am an A;A homozygote at the OXTR gene. The name of the SNP on the left is clickable and will take you to the SNPedia page for that SNP. The SNPedia page for rs53576 includes links to papers that have studied the gene, a brief analysis of results, and even a link to Ed Yong's post covering the OXTR gene! On a personal note, I don't want children and have often been told that I have issues with understanding how my actions impact other people's feelings, so I don't find these results surprising. My boyfriend is a G;G homozygote and is much more empathetic than I am (and wants children way more than I do). Anecdotal evidence, yes, but corroborating evidence all the same.

Other things I learned from Promethease that weren't covered by 23andMe:
  • I am 7x less likely to respond favorably to antidepressants (I can vouch that this is true, I have been on several SSRIs and hated all of them).
  • I have a G allele at rs806380 that may protect me from becoming addicted to cannabis. That tremor in the force you just felt was millions of stoners rising up in anger at the idea that cannabis might be addictive, then suddenly silenced by the distractions of Halo Reach and stuffed crust pizza.
  • I am G;G homozygous at rs4570625. The bad news is that this SNP is associated with panic disorder in females, which I definitely, definitely have. The good news is that G;G homozygotes respond favorably to placebo therapy for their anxiety.
Overall, I'd say this was worth my $2. 23andMe does a better job of visualizing the results and combining related information into a cohesive package, but Promethease is so cheap (or even free, if you're willing to let it run for a few hours instead of 5 minutes) that it is worth the price or the wait if you're interested in getting a "second opinion" of sorts.

Yet another ADMIXTURE post.

I've made squinty eyes at enough ADMIXTURE charts by now that I'm starting to see the trends that keep popping up under different types of runs (reference populations, K numbers, etc). While things begin to break down at high K values, they remain pretty consistent at low K values. I am almost always ~10% Near Eastern (or Middle Eastern, or West Asian, or whatever you want to call it), ~40% South European, and ~50% North European. I think I can take at least that much as gospel at this point. When Native Americans are in the mix, I am always at least 1% Native American. I also often get blips of South/East Asian, but I am still not convinced that that one isn't just noise.

Razib's latest ADMIXTURE run, which I've included here, is typical of what I just described above. It is also almost exactly what came up for me at Dodecad (DOD493). The "Brown Dude" I've included at the bottom of the chart is my boyfriend. His parents are from Karnataka, in southwest India, and his results don't surprise me. I am used to looking at Bengali ADMIXTURE because of the samples in Razib's dataset, and BD's results differ mainly in that he has more West Asian and less East Asian. I'd say that's pretty geographically sound.

On a personal note, 23andMe thinks that there's a 0% possibility that BD and I are related, so that's a relief. He's also not a carrier for the two diseases that I'm a carrier for, which is also a relief. We both have the genes for wet earwax, photic sneeze reflex, and fast caffeine metabolism (I know you were dying to know), so our children will likely be brown-eyed, sticky-eared, constantly sneezing coffee guzzlers, right?

More on ADMIXTURE.

Here's another ADMIXTURE analysis of my heritage that uses the same K=12 but with different sample populations for comparison. You can compare this to my prior post to see how my admixture has 'changed'. This is a good illustration of the limitations of this method: it is highly dependent on what you put in it!

My Asian admixture is gone, and my Arab/Near Eastern admixture is greatly reduced. But not only that, look at what has happened with even my European admixture! It seems to split three ways now: Western (typified by the Spanish Basque population), Southern (Sardinian, which is an island off of mainland Italy), and North/eastern (closest match appears to be equally Russian and the islands off the northern coast of Scotland). I still have the 1% Amerindian; that doesn't seem to be going anywhere as long as the Pima are in the mix. I'd be interested to see what happened if other, more northern Native American sample populations are included. (Perhaps I should learn how to run ADMIXTURE myself, but I don't have a computer that runs Linux, and I'm loathe to install a dual-boot.) At any rate, this more closely resembles how I had previously thought of my heritage, but is it actually closer to the truth? It's hard to say for certain.

Edit: Another interesting thing is what I seem to lack in comparison to your average white American, which is the Berber (North African, with lots of immigration to Europe; think Othello and Morgan Freeman in Robin Hood) genetic input in bright pink. Huh.

My genotyping results, plus a brief introduction to population genetics

My 23 and Me results finally came, and I've spent the little amount of free time I've had this week exploring the results. If you are unfamiliar, 23 and Me is a personal genotyping service. In short, I sent them some DNA and they identified various genes and gave me the results. The genotyping method used by 23 and Me is different from genome sequencing, because instead of actually determining every single nucleotide in my entire genome, they isolate segments called single-nucleotide polymorphisms (SNPs) and identify the gene alleles at that location. This is much cheaper and faster than sequencing (you may remember that your DNA contains a lot of non-coding regions, redundancies, etc), and still provides a good amount of meaningful results.

My friend Razib was kind enough to include my raw data in his hobby genealogy dataset that he's been playing with in a program called ADMIXTURE. The idea behind this software is that you input the genetic data for a group of people, and the program determines the relative contribution of hypothetical "parent" populations to each individual's genome. Let's look at some examples:

Click to enlarge. The K=2 at the bottom means that this plot was generated assuming only two parent populations. Usually when you set K=2 you wind up separating African vs. non-African genetic components, but this dataset doesn't have any African populations. It is almost entirely Eurasian, so the first split is actually east (Asian) to west (European). The red component shows the proportion of European ancestry, and the teal component shows the proportion of Asian ancestry, if we assume that these are the only two reference populations. The biggest problem with this kind of analysis is that it is often hard to determine which K's are meaningful. Razib has a good caveat on the limitations of ADMIXTURE here. Anyhow, my data is at the very bottom. As you can see, when K=2 I am overwhelmingly European with a tiny Asian component. This is more or less in line with other white Americans and many Europeans (see the Irish, Swedish, and French samples), but also very similar to, say, Palestinians. Clearly we need to add more parent populations.

Now we have K=3, so three theoretical parent populations. The first thing you may notice is that the Pima group separates out from the others as the only group with a large amount of lime green. The Pima are a Native American ethnic group, and therefore the most genetically distant from the other groups because of the amount of time they spent isolated from the rest of the world. So now our three reference groups are European (still red), Asian (now blue), and "Native American" (lime green). However, notice that the group with the second-largest green component is the Yakut, a Russian ethnic group. It is more likely that when we set K=3, the third group is actually a Siberian component. This makes a bit of sense because Native American ethnic groups are widely believed to be descended from a Siberian group that crossed the Bering Strait into Alaska. And the green seen in the other populations would likely be a result of Siberian groups moving the other way, into western Eurasia. Again, my data is at the bottom. If we assume K=3, I have much less Asian genetic input because most of it has actually segregated out into Siberian input.

You with me so far? Let's get a little crazy.

This time K=12. The parent populations can be roughly described as: Dai (I am actually unsure exactly where this population is from, but signs point to south Asia, China/Myanmar-ish), Druze (Near East ethnic group, primarily in the countries surrounding Israel), Lahu (Southeast Asian), Southern European, Arab (1), Native American, Northern European, Arab (2), Siberian, Northeast Asian, Native American (2), and South Asian. The two different Native American groups are likely representing a north/south split, but that is just speculation on my part.

As you can see, I am somewhat unsurprisingly very European. 93% of my genome shuffles out as European, although I am a bit surprised that it is more southern than northern, especially considering that all of my genealogical lines that I can trace back to country of origin are overwhelmingly from the British Isles, with French and German minorities. This difference is probably due to early northward migrations of southern European populations into the British Isles, although I cannot count out the possibility of a more recent southern European ancestor. My paternal grandmother's lines are largely unknown with respect to country of origin, and I always thought she was a bit swarthy (see photo). I have had multiple strangers approach me to ask if I'm Italian (I don't see it personally, but whatever), so I think it is a bit likely that she had more recent Mediterranean ancestry.

I have roughly the same amount of Near East and Arab ancestry as your average white American or European, but I seem to have a bit more Southeast Asian? Not much more, but I'm still not sure how to reconcile that. I have the tracest amounts of Native American ancestry, a little bit on each side of the split, but not significantly more than your average American, so much to everyone in my family's chagrin, I think it is very unlikely that I will find any recent Native American ancestors in my genealogical searches. Sorry guys.