Not soup yet!!
So… do you remember those Lipton soup commercials? The ones where the kids kept asking if it was “soup yet”?
That’s the question genealogists seem to be asking today about what are called ethnicity estimates: those percentages we see in our autosomal DNA results that suggest how much of our autosomal DNA we’ve inherited from ancestors who lived in different places around the world 500 or 1000 (or even more) years ago.
AncestryDNA has rolled out its new ethnicity estimates to its full database over the past month and — more than anything else — the results seem to be underscoring the reasons why The Legal Genealogist has one thing to say about these deep ancestral percentages:
Forget them.
They’re cocktail party conversation pieces — and little more. The science just isn’t there yet to back them up.
And because of the fact that the only way to get these percentages is by comparing folks like you and me — alive today — to the test results of other people who are alive today (and not to the actual DNA of our ancient ancestors!!), the science may never really be there.
Just as a few examples, three of the folks who regularly blog about using DNA for genealogy have been left scratching our heads, and more have raised questions about the reference populations chosen to represent the deep ancestral populations — which give us these percentages when our own results are compared to those reference populations.
• Debbie Cruwys Kennett of Cruwys news, who is British and whose ancestors have lived exclusively in Great Britain for generations, has substantially less British ancestry showing in her results than I do and, indeed, than many Americans do, despite the fact that our documented ancestry has far fewer British roots.1
• Roberta Estes of DNAeXplained saw her results go from 80% British Isles to 6% (4% Great Britain and 2% Irish), and from 12% Scandinavian to 10%. “Apparently Ancestry’s V1 was worse than we thought, given that my 80% majority ancestry turned into 6 and my 0% western Europe turned into 79%. Neither of these are correct,” she wrote. “Ancestry’s V2 seems to be somewhat better, but raises the same types of questions about the results.”2
• And in my own case, my well-documented Western European paternal side (all German, back hundreds of years) almost completely disappeared in my results — from 43% Central European in the V1 estimates to 4% Europe West in the updated V2 estimates — and the amount of Scandinavian (considered well over-estimated in V1) actually doubled — from 15% in V1 to 31% in V2.3
Angie Bush, another genetic genealogist, commented on the DNAeXplained blog that “there is still significant improvement to be made in distinguishing Western European and British Isles admixture. This still seems to be quite problematic for many, including me. It seems that Ancestry has traded their v1 Scandinavian/British Isles problem for a new British Isles/Western Europe problem.”4
Now the fact is, what we’re seeing isn’t anybody’s fault. AncestryDNA is using the best available scientific information that it has to analyze the data. So is 23andMe. So are all the companies that do this sort of analysis.
The problem is, it’s all a numbers game based on one fundamental assumption: that people who live in an area today (say, modern Englishmen and -women) and who have all four of their grandparents born in that area are typical of the genetic signature of that population generations ago.
When AncestryDNA estimates that I have 31% Scandinavian ancestry, it does so based on the results of testing samples from 272 people who were born in Scandinavia and whose grandparents were also born there. When it says I have 49% from Great Britain, it does so based on the results of testing samples from 195 people.
The statisticians will tell you that these sample sizes are adequate to the task.
The results suggest that, as the old saying goes, there are three kinds of lies.
Lies, damned lies, and statistics.
SOURCES
- Debbie Cruwys Kennett, “My updated ethnicity results from AncestryDNA – a British perspective,” Cruwys news, posted 17 Sep 2013 (http://cruwys.blogspot.com/ : accessed 27 Oct 2013). ↩
- Roberta Estes, “Ancestry’s Updated V2 Ethnicity Summary,” DNAeXplained, posted 17 Oct 2013 (http://dna-explained.com/ : accessed 27 Oct 2013). ↩
- See Judy G. Russell, “AncestryDNA begins rollout of update,” The Legal Genealogist, posted 13 Sep 2013 (https://www.legalgenealogist.com/blog : accessed 27 Oct 2013). ↩
- Angie Bush, commented on Roberta Estes, “Ancestry’s Updated V2 Ethnicity Summary,” DNAeXplained, posted 17 Oct 2013 (http://dna-explained.com/ : accessed 27 Oct 2013). ↩
Every genealogist needs to add this 1954 classic volume to their bookshelf- How to Lie with Statistics by Darrell Huff. This Ethnicity Percentage silliness shows why Huff’s book will never go out of print.
I just wish people would stop being so focused on these percentages at all! Except for results that DO show some measure of recent origin ethnicity (as, for example, Native Americqan segments), these percentages are simply NOT the reason to do autosomal testing.
Discussions like these leave me wondering about The reliability of Nat Geo’s Geno 2.0. it seems like it would be more reliable than Ancestry, FtDNA or 23andMe because of the way they are testing indigenous populations. But is it? What are your thoughts.
I suspect that National Geographic’s testing will ultimately give us a lot of answers we just don’t have now, Michelle. I wouldn’t put much stock in their reference population assignments now — but down the road the science underlying this stuff may get to be as good as it can be (note the caveat there) based on the Geno 2.0 results.
Interesting how different people’s percentages are changing with the “new” ancestry results.
Mine slightly decreased my British Isles percentage but essentially eliminated the previous iteration’s “Eastern Europe” percentage – a location from which I have no known ancestors. Now I have a lot of Central Europe – which better fits my known German ancestry.
And on behalf of my statistician significant other – hooey to the last comment. 🙂
There’s no doubt whatsoever that V2 is better by far than V1. Perhaps down the road V3 will be better still. And even your statistician significant other will tell you how easy it is to be misled by the numbers…
Point taken.
Perhaps a better saying might be one he DOES like – Statistics means never having to say you’re certain.
LOVE IT!! That sure works for me!
Hear hear!
About time someone de-emphasized this silliness.
The thing is, some people do the testing specifically for this stuff! So it’s mostly a marketing thing.
It is indeed almost exclusively a marketing thing. There are some results that are really useful: if you have a racial mix (white and African-American) in the last few generations, or Native American in the last few generations, those segments may very well show up and confirm family lore. Other than that, well, it’s effectively the same as what we used to do: “Hi! I’m Pisces! What’s your sign?”
http://www.nytimes.com/2013/09/17/science/dna-double-take.html?_r=0
To add to the confusion it is not as rare as once thought for a person to have more than a single genome.
Already reported on here: Decoding the individual genetic code, posted 22 Sep 2013.
Hi Judy, thanks so much for writing this post! I hear more stories about this function of atDNA testing than anything and I have to keep reminding people that it’s not an exact science!
Nowhere near an exact science, Ginger!
You might have footnoted “lies, damned lies and statistics.” The wikipedia article would be a good place to start –
http://en.wikipedia.org/wiki/Lies,_Damn_Lies_and_Statistics
Your column was enjoyable as always. Thanks!
You’re right that I might have — but even Wikipedia in the more general section on the phrase says it’s a quote of uncertain origins, often attributed to Disraeli but never found in his speeches or writings.
I see a lot of reports of Scandinavian percentages in people whose origins are in the British Isles, but have no documented ancestry from Scandianvia for many generations. Think Vikings!
That was the original excuse… um… explanation for the over-reporting of Scandinavian in v1. Nobody (except maybe AncestryDNA) knows why some few of us are seeing such drastic overreporting in v2. Especially when it means my Germans have all disappeared.
My Germans have also disappeared. As you note it did confirm my father was Jewish – as my paper trail has documented and I grew up not knowing – but all of my mother’s German ancestors disappear. I have documented her direct female line back to the 1700s. Her other families are also German – one German speaking in Alsace. The German/Prussia matches are less than 1% in the mtDNA full sequence as well as the FF test. I assume this relates to the limitations of database size and modern population movement.
You got it in one, Carolyn: the ability to distinguish among populations that have been mixed and mixed and mixed by migration, war and more is limited.
The largest problem with the statistical models used is that, with the large and complex “universe” of people they are trying to match, all over the globe, and whose ancestors have moved and intermixed with darn near everyone across the globe, and with possibly incorrect geographic assumptions involved….only a truly *miniscule* percentage of people actually have had their DNA tested at all! (I haven’t. Truly wish my brother would.) The numbers and inclusiveness are *way* too small to provide anything more than rough guesstimates. The guesstimates should get better over time, hopefully. “Hi, Pisces. I’m a Virgo, but my moon sign is Pisces. Are we related??”
It’s more a matter of the size of the reference populations — and understanding what those reference populations are really telling us, Linda. That’s going to take time — and may simply not be possible.
Excellent essay hightlighting the pitfalls of the DNA for genealogy business. What people don’t want to seem to admit (to themselves or anyone else) is that it is pure entertainment dollars they are spending. That’s perfectly fine, but DNA has too quickly become part of a toolkit of one upmanship. There are perfectly useful reasons to do DNA testing, even for genealogy. You’ve exposed one of the reasons not to!
James
There are HUGE reasons to do DNA testing for genealogy. It’s one of the tools in our toolbox that can tell us things we’d never find out otherwise. These numbers ain’t in that mix.
I’m confused about which products you are saying are rubbish? Can you list all the ones to avoid? What about FamilyTreeDNA? They say you can transfer the data from the other cos and then compare yourself with ’62 world reference populations including Native-American, Middle Eastern (including Jewish), African, and European’, is this scientific or not? Thanks!!
Joe, I’m not saying ANY of the PRODUCTS are rubbish. What I’m saying is that the reason to do these autosomal tests is NOT to get these numbers. We do autosomal testing to test theories about our families and to find cousins we might otherwise never connect with in order to share genealogical information. Anyone who tests just for these percentages is missing the boat in terms of what the tests CAN do.
I got into DNA testing for genealogy not for ethnicity breakdowns. I think the average person does see it as a conversation piece–and that is all. There are some who use it more dubiously; I see this on forums and message boards. Some of these people are the ones who expect a certain percentage of a specific ethnicity. I also notice that a big deal has been made of this since Ancestry has released their new breakdowns and can’t help but wonder if a backlash against all things Ancestry is really what is going on. I will admit that Ancestry has it problems, but it is no worse or better than FTDNA or 23andMe.
It may be in part a backlash against Ancestry, but I suspect this sort of criticism will be aimed at any company that gets into this percentage bit because it’s just not soup yet. Some of the results will be “wrong” (as in, perceived to be a mismatch with the paper trail) no matter what.
I fully agree with your advice. Apart from everything else mentioned, these percentages (a) don’t take into account individual migration, and (b) at least for those of us who believe in the scientific account, we all came from Africa in the first place, anyway.
It’s like saying “I’m one-quarter Belgian, two-quarters German, and one-quarter English” (that’s describing my immigrant ancestors, and I say two-quarters German because it accounts for one grandparent on each side). But where did they come from? Belgium has only been a country since 1830, so were they Dutch or French before that? What about a family from an area that changed hands in war every year or so? Yes, I realize the DNA reports aren’t that granular; I’m just trying to emphasize the point.
And, of course, there wasn’t even a Germany before 1871…
Haha! You got it Judy! Because the reference populations are defined differently at each lab (size, population, and algorithm), each one can tell you something different. Thank goodness we can have our raw data to use for other means and through other labs/databases.
My Scandinavian percentage went down with the enhanced v2 from 49% to 24%. My mtdna is V which currently has the highest distribution among the Saami people in Scandinavia and in the Cantabrian region of Spain. So far my mtDNA markers match with those only in Spain and NOT Scandinavia which has their own unique set of markers. So even though my mtDNA heritage is not Scandinavian, this creates a higher percentage of Scandinavian/Northern Europe for me in these tests. Since Ancestry has said their atDNA test includes deep/ancient/before records ancestry I cannot help but think it has somehow skewed the admix results depending on how the reference populations are being defined. This renders its use as evidence rather limited. I use them to confirm or eliminate “family lore” and for targeting Native American genes.
Thank you for the great blog post!
Heather
I am using the current National Geographic Figures for admixtures. They have tested over 600,000 people. I am saying we must use our ancestral documentation and DNA. If this changes in the future, so be it. I trust the amount of people that National Geographic used, over the nationalities used by any of the three big DNA companies.
Thank you for the great blog. I will ponder what you have stated.
Even the Genographic Project has issues, Diana. My secondary reference population according to the Genographic Project is Greek. Greek? Not likely!
I’m not sure it’s really the percentages that ought to be the main focus of concern here! Stripping out the ‘pesky percentages’ how can we trust in the credibility of a science that claims that the same individual [me] is simultaneously “British Isles, Western & Central Europe, and Finnland” and “Northwest European, Basque-Iberian, Balto-Slavic, Mediterranean, European Jewish, Finnish and Anatolian-Caucasian”? It really doen’t seem to stack up and we’re paying out good money for this hooey?
No, actually, that’s NOT why we paid out good money: each of these tests provides us with leads to cousins, cousins we can work with to build our family history. That, frankly, is the only reason to do this sort of testing — and for that, it’s worth every penny.
We each do these things for our own reasons – you for yours me for mine. The picture is far bigger than just cousins!
Well, if you’re doing it solely for these percentages, then you’re wasting your money, but to each his own…
Having a social science background I am not really interested in the percentages, but I do seek reliable geographical indicators. This material is clearly not yet reliable – A point that is not made clear in the promotional literature etc. Naturally I am pleased that you and others feel you are getting what you have paid for, but others of us have to “waste our money”, as you put it, to sample the quality of what’s on offer. It would surely be unscientific to do otherwise.
Right: there are NO reliable geographical indicators below the continental level.