The raw data wall
NOTE: This blog’s recommendation of GEDmatch has been withdrawn due to privacy issues. See “Withdrawing a recommendation,” The Legal Genealogist (https://www.legalgenealogist.com/blog : posted 15 May 2019).
Dear AncestryDNA,
Okay, so it’s not the 12th of June 1987. This isn’t the Brandenburg Gate. I’m not the President of the United States, and you’re not the Premier of the Soviet Union. But hey… c’mon, AncestryDNA…
Tear down this wall!
It’s been just about two months now since I got the results of my AncestryDNA autosomal DNA testing. That’s the kind of test that works across gender lines and helps identify cousins with whom you can share genealogical data to try to find your common ancestors.1And, I have to say, I’d be a lot happier with what I can do with the results of this test if I actually had the results of this test.
The real underlying raw data results.
You see, there’s only so much you can do with a system that’s built around matching family trees that people have uploaded.
In my case, for example, I have 24 matches in the 4th-6th cousin range with a 95% or 96% confidence level. Five of those 24 haven’t uploaded any family tree at all. Seven have private trees. One has a family tree with exactly three people in it. So for more than half of my matches, I can’t find out anything that’s useful just by opening the link to that match.
I have surnames in common with some of the others. You know, those rare surnames like Jones or Johnson or Robertson or Baker. And we share some locations… like the State of Texas or the State of North Carolina.
These are not things that are moving me very far down the road very quickly.
So let me tell you, AncestryDNA, that I need more from you. I really want my raw data. And, even more, I want all your other tested folks to have theirs.
At Family Tree DNA, my Family Finder raw data is 7.4 Mb of compressed data that, when extracted, becomes a 23.9 Mb plain text file in CSV format that can be loaded into a text editor or a spreadsheet. The image above is a very small snippet of my own raw data loaded into a spreadsheet program — just a handful of results from a few of the thousands and thousands of spots in my autosomal DNA that were sampled and recorded in this test.
At 23andMe, my Relative Finder raw data is 7.8 Mb of compressed data that decompresses into 24.7 Mb of plain text — line after line after line of identifiers (the RS ID is a reference SNP ID number2), chromosome numbers, positions and just which two of the four possible results (A, C, G and T) were found at each position.
Now because I have both of those files, I can play around with my results in all kinds of different ways. Most particularly, I can upload them to third-party utility sites and get much more benefit out of having tested my autosomal DNA.
One of my favorite sites — one I’ve written about before3 — is GedMatch.com. It’s become so popular that its hosting company is giving it running fits and it’s trying to raise enough funds through donations to keep all its features available. (I’ve donated — how about you?) But you know what, AncestryDNA? There’s a reason that GedMatch is so popular — and being able to look at the raw data in lots of different ways — including, but not limited to, comparing it to data of people who’ve tested with other companies — is right there at the top of the list.
For more things I can do with raw data that I can’t do without it, check out “Top 10 things to do with your FTDNA raw data”4 and “Autosomal DNA tools.”5
I understand this isn’t a high priority for you, AncestryDNA. But it is important to those of us who are your customers who are interested in genetic genealogy. And I’m not the only one who says so. Read more from CeCe Moore of Your Genetic Genealogist,6 Razib Khan of Gene Expression,7, Debbie Cruwys Kennett of Cruwys news8 and Blaine Bettinger of The Genetic Genealogist9 — and those are just for starters.
So how ’bout it, AncestryDNA?
We know you can do it.
C’mon now… tear down that wall.
SOURCES
- See generally Judy G. Russell, “Autosomal DNA testing,” National Genealogical Society Magazine, October-December 2011, 38-43. ↩
- See Wikipedia (http://www.wikipedia.com), “dbSNP,” rev. 18 Sep 2012. ↩
- Judy G. Russell, “Gedmatch: a DNA geek’s dream site,” The Legal Genealogist, posted 12 Aug 2012 (https://www.legalgenealogist.com/blog : accessed 18 Sep 2012). ↩
- Lindsay M. Greenawalt, “Top 10 things to do with your FTDNA raw data,” Confessions of a Cryokid, posted 16 Jun 2011 (http://cryokidconfessions.blogspot.com : accessed 18 Sep 2012). ↩
- ISOGG Wiki (http://www.isogg.org/wiki), “Autosomal DNA tools,” rev. 2 Dec 2011. ↩
- CeCe Moore, “Follow Up: Lab Error Responsible for Adoptee’s Confusing Match at AncestryDNA ,” Your Genetic Genealogist, posted 24 Aug 2012 (http://www.yourgeneticgenealogist.com : accessed 18 Sep 2012). ↩
- Razib Khan, “Ancestry.com’s AncestryDNA won’t give you your raw data,” Gene Expression, posted 16 Sep 2012 (http://blogs.discovermagazine.com/gnxp/ : accessed 18 Sep 2012). ↩
- Debbie Cruwys Kennett, “AncestryDNA’s response to my request for my raw genetic data,” Cruwys news, posted 30 Aug 2012 (http://cruwys.blogspot.com : accessed 18 Sep 2012). ↩
- Blaine Bettinger, “Problems with AncestryDNA’s Genetic Ethnicity Prediction?,” The Genetic Genealogist, posted 19 Jun 2012 (http://www.thegeneticgenealogist.com : accessed 18 Sep 2012). ↩
Amen. Folks pay for the test, why force them to look at results exclusively through AncestryDNA filters?
I know AncestryDNA says they understand why genetic genealogists want this… but it really is time to get it done, Myrt. It’s been out long enough now.
I sooooooo agree. I have exactly one match that has done me any good and I already knew he was there before the test. The only thing I found out is that we are third cousins. Hopefully, they will listen.
Let ’em know this is important, Cyndy. I suspect part of the problem here is they just don’t understand the demand for this.
Couldn’t agree more. I have many matches similar to yours; no trees, private trees, common locations. I’d really like to be able to use GedMatch.com etc… Set my data free!
Tell ’em, Dave. We need to tell ’em!
Yesssss! Ancestry, please give us our data. I have found a few matches that I can actually confirm, but most have the same “no trees, private trees, etc”. I’ve also been contacted by a couple of those 4th cousin matches who somehow believe I’ll be able to hand them their lineage when they’ve done no research past their grandparents – because we have one of those Smith, Jones, Johnson matching surnames. Another problem is that Ancestry has matched me to a person who has an ancestor of the same name – Edward Smith – who is not the same Edward Smith [different dates, different locations] and actually now shows a comparison tree with our relationship to each other based on this false match and there is NO WAY to delete or notate. Obviously, this person and I have some matching DNA somewhere, but it certainly isn’t these two different Edwards. How much more can they mess up! Rumor has it that Ancestry released only what they thought the average subscriber “could understand”. I’ve been very, very disappointed.
It has been disappointing that things haven’t moved faster, Kay… and that some of these tree matches have been problems because of the quality (or lack thereof) of the trees.
Oh hell, I am still waiting for my invite in the mail! I’ll complain about the lack of results AFTER I get the test!
That’s fair, Jeff! (And I’d still be waiting in the queue as well if I weren’t doing this blog. Put my name in right at the beginning, as did a cousin who also hasn’t gotten to the top of the list yet.)
I’m still waiting for the invite, too. What’s up with the delay on getting these out?
No idea, Mike. I was told back in May that the $99 offer would remain open until all current Ancestry customers’ demand had been filled, but…
Thank you Judy for picking up the rallying cry! AncestryDNA, has this moved up the list of priorities yet…?
Maybe if we all keep nagging, they’ll do it just to quiet us down, CeCe!
Wow, I totally agree about the ancestry.com results. Not much to go on, is there? Plus, I have some German and Dutch ancestors and my DNA was only “56% British Isles, 42% Scandinavian, and 2% uncertain.” Where do the Dutch and German ancestors fall in the above categories?
And if I could learn from other DNA tests, I could compare my results with others done on the Foster surname that would help me find ancestors that so far have alluded me…
Yes, I’ve been disappointed with my results. I was hoping for so much more!
You’re not the only one who’s been surprised by the admixture results, Sarah. Mine were pretty good (my German showing up strongly as Central European). But I seem to be the exception, not the rule, on that.
Just got my results back yesterday. At first I was surprised–NO British Isles (I have so many Brit/Scots-Irish ancestors), but 52% Central European (I’m quite German and Celts originated here so perhaps that makes sense for my British Isles folks), 10% Scandinavian (Must be the Vikings who ended up Scots-Irish), 27% Persian/Turkish/Caucasus (this is just about dead on, I’m 25% Armenian) and 11% Southern European, which may be true as I have one part of my tree I know nothing about. So while I feel OK about these results, everyone else seems unhappy. I’m thinking I should try another service!
I also feel badly because I’m one of those folks with almost no family tree… I use TNG and keep everything in my own database, so I haven’t made the effort on Ancestry as I feel it is so overloaded with trees with no citations. Your post has made me rethink this, I may start thinking how I can transfer some data to Ancestry easily.
Without at least a private tree, Jen, the utility of the AncestryDNA test is … well… limited is a good word.
I forgot that I don’t have to go public with all of my data, I do have some–will add more. Thanks for the reminder.
Glad to help, Jen!
It appears form a quick Google that only 5 states require that a person have access to their own genetic information (presumably as collected for any reason). I don’t understand why the number is so low because this would seem to be a fundamental right to me. So does ancestry.com deny providing genotypes to people in those staes (e.g., in Delaware)? Could be the basis of a nice lawsuit, especially if filed in Delaware. I had my genotypes done at ancestry.com and requested the raw genotypes since their product, a pretty pie chart telling me where my ancestors came from, was something I already knew to vastly greater geographic resolution before I spent the $99; I didn’t even receive the courtesy of a reply.
I haven’t looked at other state laws, but the Delaware law is very specific to access to genetic information and that’s a term defined by the statute to mean “information about inherited genes or chromosomes, and of alterations thereof, whether obtained from an individual or family member, that is scientifically or medically believed to predispose an individual to disease, disorder or syndrome or believed to be associated with a statistically significant increased risk of development of a disease, disorder or syndrome.” The highlighted information appears to take this outside of the Delaware statute.
My AncestryDNA is so far off from my heritage. On my maternal side I am of 100% Norwegian heritage and I can trace my family back to the early 1600s in Norway; on my paternal side I am 50% Norwegian and 50% Irish. Yet my AncestryDNA comes back with a result of 70% British Isles, 16% Eastern European, 8% Finnish/Volga-Ural and 6% uncertain. Am I to believe that with all those Norwegians in my family I have ZERO Scandinavian DNA? And as for all those distant “cousins” they are 96% certain are matches…not one single common name or ancestor anywhere. I feel very ripped off by AncestryDNA.
We’re all waiting for more information on the admixture stuff, Neila, but remember: the admixture (what percentage from where) is only the smallest part of the results overall. It’s who you match and what paper-trail research you can share that really means the most.
I am tired of waiting on the admixture. What do you mean it is the smallest part of the results, overall? It is a major part of the results to me.
My mother’s heritage was 100% Swedish/Norwegian. We have traced the family lines back with good documentation.
One half of my father’s heritage was German and the other half was heavily Dutch.
Ancestry has my ethnicity as 95% British Isles and 5% uncertain. They just erased my mother and over half of my father! But, I got a match with a 98% probability, and we do have a common ancestor. We are third cousins. Our common ancestor was in Sweden!
I have several other matches that I have connected on my chart that are more distant cousins. They connect with my Dutch ancestors.
I have contacted Ancestry about this and they are so arrogant. Their attitude is that I just don’t understand enough about DNA. But I do. My field of study in college was biotechnology.
I just don’t know how you can defend them. I would be embarrassed to defend them.
Nancy, I absolutely understand your frustration. I too think their admixture results are often just plain wrong — and they’re particularly wrong as between Scandinavian and British Isles. Outside of Ancestry’s own scientists, I don’t know of anybody who thinks they’re right especially on those. But admixture within continental populations (the “exactly where in Europe” question) is in its infancy. Nobody today really knows the answers and it’s going to take time for the data to develop well enough for anybody to do that well. National Geographic’s Geno 2.0 is the best and even there the results can be funny: they nailed my closest population match as German (my father was 100% German back for many generations) but said my second closest match was Greek — and that’s so wrong it’s laughable. But when I said it was the smallest part of the puzzle, think about it: you already know your mother’s and father’s heritages; what you’re getting from Ancestry is links to people who match your DNA. Yes, having accurate admixture info would be helpful if the science ever progresses to the point where it’s reliable from any testing lab or company. But the matches are helpful now. I don’t recommend AncestryDNA as the first choice for anybody doing DNA testing. But $99 to fish in that pond of matches isn’t a bad deal if you’re already an Ancestry subscriber.
Caveat emptor, I guess. I too was shocked to get just a lame pie graph and a map, with no raw data or explanation.
A quarter of my ancestors are French Canadian, so there’s no doubt what they’ve been up to since at least as far back as the early 17th century, yet not a speck of France shows up on my map. By the same token my pie graph shows 30% Eastern European, yet I’ve already extensively fleshed out my tree and found little that would actually corroborates this. And then there’s the whopping 10% “Unknown,” with no explanation as to what that means.
I would have thought that Ancestry’s FAQs would have included commentary from geneticists speaking to just such questions. But no, the FAQs are obvious answers to simple questions about linking those “95%-likely” nonrelatives whom you’ve already ruled out based on paper trail when Ancestry threw them at you as “hints.”
I was especially disappointed that the screening process doesn’t ask test subjects what ethnic issues they’re researching. Clearly one does not pay for DNA testing to find ancestors who left easy paper trails. Genetics speaks up for those who DON’T show up on paper–the Native American ancestor lost to history when they “passed for white” to get railroad work, the Jewish ancestor who converted following a pogrom, etc.
Which is why the raw data, especially those off-the-beaten-path “unknown” sequences, are so important. When combined with test subjects’ oral histories (eg, a French-Canadian grandma who said her family was part Indian) it could help geneticists discover their significance, so that they need not remain “unknown” forever.
I see that the American Society of Human Genetics is cautioning about these tests. I wish I had seen their statement before shelling out.
AncestryDNA, why are you acting like you have something to hide? Please do the right thing and stop dumbing down our data!
I certainly wouldn’t say to someone not to test with AncestryDNA, Katherin — but at the moment it isn’t my first choice generally for some of the reasons you’ve raised.
I had the mega test from FTDNA. When I received the results I was surprised..I thought it would be alot more information. That was 2008 fast forward 2012 they reassigned my haplogroup and all but one of my “snps” have changed
My HVRI & HVRII and coding region have all changed. That means they have access to my mtDNA and never notified me that due to changes in CRS we now have to re-check your mtDNA and reassign your haplogroup. What??? I paid for it it belongs to me. Thats how I feel.
There isn’t a lot of information from the so-called mega test. Today’s Family Finder test may give you more of interest to you. As for re-checking your mtDNA, understand that this sort of thing is going to happen from time to time as the haplogroup analysis becomes deeper and richer. They’re not retesting your DNA or changing your results: they’re simply giving you the benefit of the latest thinking, analysis and comparatives.
I’m not happy with Ancestry DNA. I sent my husbands sample in Dec 2012. Then received an e-mail something was wrong with the sample and they would be sending me a kit free of charge to do this again. They do not send a kit and results are sent and the e-mail dissapears. I spoke with a person who said that the test was good and I do not need another sample. Trust is an issue here and now I’m angry. This will be the last year I’m on ancestry since they auto-bill (which I think is wrong). They take for-ever to answer an e-mail. We do PAY for a service here that is NOT cheap. I hope another site opens up that treat customers better.
Not knowing for sure what happened here, I can sure understand your frustration, Christina.
Why did ancestryDNA take other “matching” cousins etnicity results off?? Does anyone know?? I am disappointed. As it was intersting to me that I showed no ethnicity with others who I “shared ancestors” in common with. Is that why they took others ethnicity results off..so a person couldn’t say “look you have me as having no Scandinavian ethicity, but I mattch all these other “cousins” who shown nearly 99% Scandinavian,and 0% of any ethnicity that I show. Could this be the reason they have done this? Why don’t they explain when they do something like this? Does anyone out there know why they did this? I really liked to look at other “cousins” ethnicity results to compare theirs with mine.
I suspect this is a temporary interface glitch.
We really like the “cousin matching” part of DNAancestry. It seems to be working very well. We havde found many, many, shared ancestor’s in common, including “cousins” who don’t have any of our ETHNICTY. However, we think the DNAancestry Ethnicity results are not correct for many people. I think they have a big problem with that part of the test. The “matching cousins ethicity results” (of other cousins) are still not showing up.. We don’t know if it was/is a temporary interface glitch…if it is, why is it taking so long to correct it if it was/is just a glitch? It started a long time before we sent our other message.
I don’t know the answer — it’s something you need to ask AncestryDNA.