The case of the disappearing “matches”
Yes, actually The Legal Genealogist is aware that it isn’t Sunday, so it’s unusual to have a post about DNA appear in this space.
But AncestryDNA hosted an informational session with a number of genetic genealogy bloggers and educators yesterday, and there’s a major development in the works that folks who have tested with AncestryDNA need to know about — and begin to understand — as soon as possible.
Now let’s review what we’re talking about here: the AncestryDNA test is an autosomal DNA test. It looks at the kind of DNA that you inherit equally from both of your parents: you get 22 autosomes1 (plus one gender-determinative chromosome) from your father and 22 autosomes (plus one gender-determinative chromosome) from your mother, for a total of 23 pairs of chromosomes. So this is a test that works across genders to locate relatives — cousins — from all parts of your family tree.2
Ancestry was the last of the three major testing companies to get into the business of testing autosomal DNA, but it has become one of the biggest — if not the biggest — player in that market in the two years since its product launched in May 2012.
One of the things that has set AncestryDNA apart from the other testing companies was a decision to err on the side of inclusion in listing people as matches. Where both Family Tree DNA and 23andMe have fairly tight criteria for declaring two people who’ve tested to be genetic matches, AncestryDNA has had much more liberal criteria.
That has resulted in a much longer match list at AncestryDNA than you’ll find at either of the other two companies — but it’s also resulted in a lot of people being on your match list who really aren’t your genetic relatives at all. The term for those folks: false positives.
So here’s the challenge: how do you eliminate, or at least reduce, the number of false positives without creating the opposite problem — eliminating some people who really are genetic relatives? The term for those folks is false negatives and, from a research standpoint, it’s worse to lose a real match than it is to have to wade through bad ones.
AncestryDNA yesterday outlined its plan for a new matching system which it expects to roll out by the end of 2014. And it’s going to come as a big surprise to a lot of users, especially those of us who have any colonial American ancestry (and who therefore have these long match lists at AncestryDNA):
Our numbers of matches are about to drop like a rock.
And that, folks, is going to be A Good Thing.
The new matching system begins with a better, deeper, more accurate analysis of the data that helps define who is and who really isn’t genetically related.
One part of the new analysis is through the identification of some parts of our genetic code that really don’t mean anything at all. Some pieces of DNA that we once thought meant we were genetic cousins we now understand just mean we’re all human, or all Scandinavian, or all African. By taking those pieces out of the matching system, AncestryDNA will eliminate many of the false positives.
In some cases, someone who has tested with AncestryDNA may lose hundreds of matches by the better definition of the people who really fall into this false positive category. In other cases, the number may drop by thousands. In my own results, for example, AncestryDNA currently reports more than 12,000 matches. I expect the new reporting system to drop that to fewer than 1,000 matches.
The other big part of the new analysis is in the way AncestryDNA looks at the bits and pieces of our genetic code that we use for genetic genealogy testing. Since it doesn’t look at all of our DNA, but only parts of it, autosomal DNA testing relies on making some educated guesses about the parts of it the test itself doesn’t look at.
In a way, it’s like reading a book with only some of the words showing up on the page; the analysis system has to guess at what the missing words are. The better the guesses are, the better the results are. So part of the new system is a better way of thinking about what the missing words are likely to be — and in what language. If I’m 100% European, for example, the system shouldn’t conclude that the missing DNA words are in an Asian language.
Emphasizing the right missing words means the people who show up on our lists as matches will be better matches. More accurate. More likely to really be our cousins.
Now one of the things we hope for is a way to capture the lists of people who are now on our match lists (false positives and all), and especially a way to capture the lists of people who show up in our shaky leaf hint lists even if they are false positives. It’s a data set some of us may want to preserve and use in the future. The AncestryDNA representatives at yesterday’s meeting were receptive to the idea — it hasn’t been promised, but we are keeping our fingers crossed and pushing as hard as we can.
So… what does this all mean?
It means that one day in the not-too-distant-future those of us who’ve tested with AncestryDNA are going to log into our accounts, open our DNA results and the number we see on the graphic highlighted here — of pages of matches — is going to be an awful lot smaller.
Instead of 260 pages of matches, with most of them being false positives, I expect to see perhaps as few as 20 pages, with most of them being pretty darned good matches — real genetic cousins to work with and, with luck, advance our mutual understanding of our common ancestry.
Get ready. This change is coming. But this part of what AncestryDNA has planned — the smaller, or shorter, match list — is not a bad change.
This is a case where less is really more: more useful, more powerful, more accurate.
Watch this space for more info …
SOURCES
- “An autosome is any of the numbered chromosomes, as opposed to the sex chromosomes. Humans have 22 pairs of autosomes and one pair of sex chromosomes (the X and Y).” Glossary, Genetics Home Reference, U.S. National Library of Medicine (http://ghr.nlm.nih.gov/glossary=Glossary : accessed 6 Oct 2014), “autosome.” ↩
- See generally Judy G. Russell, “Autosomal DNA testing,” National Genealogical Society Magazine, October-December 2011, 38-43. ↩
That will be a wonderful thing. Right now it is so hard to identify clearly good matches and so particularly since you don’t have knowledge of the chromosome you can’t tell how accurate those “shared” ancestors really are. Can’t wait to see a more reasonable list.
It should make match analysis a lot more manageable, Rob — or at least that’s what we all expect at this point.
This really is a good thing coming out of Monday’s meeting. Thank you for letting us all know. ;o) V.
Anything — anything at all — that helps me cut my matches from 12,000 maybe-good-maybe-not to 1,000 probably-good is going to be a WONDERFUL thing, Valerie!!
When will they roll out a chromosome browser?
In my personal opinion, speaking only for myself and based solely on my own perceptions of the attitudes of some folks at AncestryDNA and not on any specific representations by anyone else, my judgment is that we may get a chromosome browser at AncestryDNA when hell freezes over.
So, Judy, you’re saying there’s still hope? 🙂
The same hope as that pigs will fly past my second story window, sure!
I think you are exactly right. I hate to agree with you on this, but I do.
What Ancestry is tacitly admitting now is that their current system is generating matches that are mostly fictional — junk — pseudoscientific hocus pocus. If arithmetic serves me, the garbage rate is about 90%.
A chromosome browser would have made their failings abundantly clear.
As a GEDmatch user I already knew their system was awful, but it’s nice to see that the need to conserve computational re$ources is forcing AncestryDNA to cut back on the junk science. I just *hope* that the science they’re planning to apply to the cutbacks is better than the science that got us here in the first place.
AncestryDNA owes its customers and the community of genetic genealogists a clear explanation of how they plan to butcher our match lists.
I expect to see a white paper explaining the change in the matching system when it’s rolled out, Jason. But I want to make it clear that I do not regard this change as “butcher(ing) our match lists.” THIS change is one I think is both right and overdue. I may disagree — vehemently — with much of what AncestryDNA is doing both now and into the future. But this particular matching system change is one I think is a good thing.
Thank you for the clarification. I don’t mean to imply that you agree with me.
Maybe Ancestry has something fantastic planned, but I’ll save my applause for when they actually DO something that makes their product better.
I’m not good at seeing the glass half full…especially when the company freely admits the glass is currently 60-90% empty:
http://33.media.tumblr.com/f79b70e4d6130e287c3e6aeaaa41d6c6/tumblr_nd4ils45qD1txt4yho1_1280.png
(screen cap from Julie Granka’s lecture at the International Genetic Genealogy Conference)
It seems that a 60-90% error rate would give Ancestry a bit of a black eye.
Don’t hold your breath on anything fantastic happening. Small incremental changes for sure and some changes we may disagree with every bit as much as we agree with this particular winnowing out of bad matches.
My opinion, for what it’s worth is this. For years, Ancestry.com seems to have had a mantra that says “More Is Better,” which has led to many of those shaky leaves. Now that winter approaches to the north, it’s good that some of them may wither and fall off. Keeping those in a basket somewhere may be appealing to some, so push on with that. We know what happened to the Y and mt data.
Thanks for burning the way-too-early morning oil to share this news, Judy. It is part of how genetic genealogy is coming of age responsibly.
We can certainly hope it’s coming of age responsibly, John, and in this respect — dropping off the false positives — AncestryDNA is moving in the right direction. I am often critical of AncestryDNA and I will be in the future I’m sure — but on this decision they’re right.
So far, on the shaking leaf lists, it just mentions we share an ancestor and names the ancestor. So does that mean it just might be a false positive even if we both have JOE BLOW on our tree?
I do know some of the shaking leaf matches are actual cousins I know about. And, some of the folks who match on the shaking leaf part of Ancestry DNA also match with me on Family Tree so I assume those matches are accurate. Will be interesting to see how many folks are dropped off my list. Most of the time on the other lists, I cannot see where we share an ancestor.
Yes, oh yes! Many times even with the shaky leaf it’s a false positive on TWO sides: you’re really not genetically related at all AND the tree data is just plain wrong.
I, for one, cannot wait for Ancestry to reduce the numebr of matches. I currently administer 4 tests for my own family there, each having 10,000 or more matches. I literally cannot keep up with the data management — and so I dont. I barely check my results there at all, relying instead on the fact that I have taken the raw data and upoloaded to both TDNA and GedMatch. Still a lot of data to manage, but significantly easier than Ancestry. In fact, in starting a new round of testing for my husband’s family I bypassed Ancestry because the hassle of data mangement was not worth it.
This will be a good step towards making things better, but data management for multiple kits is never easy. We do have to spend some real effort on this.
Will the Ancestry.com DNA matches I selected as “Favorites” go away? I hope not since this is how I reduce the total matches into smaller better matches for possible further researching.
There’s been no definitive statement on that, Paul, but I would certainly plan for a worst case scenario and do as much data capture as you can, including screen captures, to ensure you don’t lose data.
This is great news, I spent a lot of time searching through trees and messaging other members never being able to find a common name much less a common ancestor.
The problem I have with ancestry dna (and ftdna, for that matter), is that it hard to get the raw data. I’m no dna expert, but not being able to share the data itself you have to rely on the search results find your matches. So what happens when you get that unexpected match from a relative you cannot find the paper trail for? You can’t really cite the dna, it seems like you have to site search results.
Another issue I have run into along those same lines is I have a cousin who is a direct male decendant of an ancestor who’s parents are unknown. He got some results that point to those unknown parents. If I’m to include this in my research I am stuck taking his word for as a secondary source at best. It would be nice to have some way of getting primary information out of these tests to be able to pass along.
I hope that makes since.
I understand your point, Jason, and it’s the reason why we try and try and try and try to explain to AncestryDNA that giving us more data and more tools is the way to go.
Jason,
It is quite easy to get the raw data at FTDNA and if you are having an issue contact customer service as they are very helpful. I use the raw data for third party tools – I really like using http://www.dnagedcom.com/ to analyze my results with their ADSA tool. Right now they are working on an issue so be patient.
Thanks so much for the update on this change. Are they looking for beta testers? I am so ready for this! Besides getting my 18,000+ matches reduced, I’m particularly interested in help to sort out which of my 26 3rd cousin matches are likely to be real, and on to the 4th cousin matches.
Your third cousin matches are VERY likely to be real genetic cousins. Of course, they may turn out to be second cousins, or fourth cousins, as well as third cousins. But at that level it’s not likely you’ll see any drop off.
I think it’s probably more likely that many of these 3rd cousin matches are actually 5th or 6th cousins, if not further back. That goes well beyond where my paper trail can go (or where I really have any interest in pursuing). Naturally some will be off a generation or so either way no matter what algorithm is used. My zillions of 4th cousin matches cannot possibly be real even as 5th or 6th cousins – the math doesn’t work. From my reading I don’t think we’ll be culling just the distant matches, but it will be fascinating to find out how it all works out when this arrives!
They may well be further back in endogamous (intermarried) populations, Michael. If you have multiple colonial American lines or any Ashkenazi Jewish ancestry, then you may well be right that many of the apparently closer cousins are really further removed.
100% Ashkenazi here, which is one reason I’m so impatient to get this!
Hey! How come you have 260 pages of matches and I only have 129? Oh, wait. Less is better! Great to hear this news, Judy. Thanks for reporting it to all of us. Here’s hoping the results clear up some too – all of my matches have been predicted 3rd or 4th to 6th cousins, but have actually been 2nd cousins or 1st cousins #x removed. Will the accuracy or “narrowing” predictions improve with better test results?
We hope so, Doug, but until we see the actual roll-out we won’t know for sure. Remember of course that autosomal DNA inheritance follows purely random patterns, so no autosomal test will ever definitively tell you someone is a — say — second cousin versus a first cousin once removed.
Wonderful! I’m looking forward to this change.
You and me both!
At AncestryDNA I have 14 estimated 4th cousins.
Not 1 is anywhere near advancing to even finding we’re within 20th cousins…
I have over 5000 matches last I checked.
Not one has ever contacted me – not one – NONE of the 5000+
My tree is PUBLIC with 2000+ “leaves”, it’s readily available for others to search.
–
I have 1 – 3rd cousin estimated and it’s not bad. At least I know it’s a Swede on my maternal side.
I have 1 – 2nd cousin match, and I know who they are, already in my family tree as 2nd cousin or 2nd 1X removed. Just not sure from memory here.
At least I know my DNA has tested correctly by this….
When I do the shaky leave DNA match search, out of my 5000+ matches it returns an astounding 2 matches (estimated 5th – 8th cousins) that MIGHT have something interesting that matches to my tree.
Ah NO NO NO NO NO. Not in a million years.
I’d be happy if Ancestry dropped my 4984 useless matches and leaves me with the top 16 estimated 4th-6th cousins.
Still – I wonder – 2 years into this, when are they going to open their so called DNA product WORLDWIDE!!!!!! We are all immigrants here.
Testing USA only is like testing who’s standing in your kitchen right now.
Get out there and sell the product globally!
Family Tree DNA = 100% by far the BEST autosomal DNA product out there.
Leaps and bounds far far better than Ancestry DNA.
23andMe? Also FAR better than Ancestry. Much better tools provided to the serious DNA searcher.
AncestryDNA, my “estimate for them” is they may have a decent DNA product in 2024 if they decide to develop it.
You bet less is more, get rid of this garbage called my match list.
No testing company can make your matches better at responding, Mark, but we’d all like to see DNA testing be more globally inclusive. Boy do I ever — I need a lot of help with my father’s German side!!!
I think probably the best way to save the full archive of Ancestry DNA before the purge is to download a free copy of the Google Chrome Browser and load the free “AncestryDNA Helper” add in. It will completely download over many hours in the background all of your records including the names and birth/death dates of the synopsis pedigree tree that appears when you open up a potential match in Ancestry. The results can be easily saved as an Excel spreadsheet for eternity if you wish.
I have 6 autosomal Ancestry DNA tests under management with a number of the individuals related. I often combine and compare on a third clean spreadsheet the various info that is collected to compare and find common Ancestry “hits” between related parties to increase my confidence of matches. Remember though, false negatives are pretty frequent i.e., except for potential 1st and 2nd cousins the lack of a match does NOT prove that a non-relationship. For 3rd cousins, there is about a 10% chance for a true 3rd cousin not to appear as a match. For 4th cousins, the false negative percentage increases to 50%! That is if you have 100 established 4th cousins all take an autosomal test, you will only see about 50% of those people in your list of matches! I am not sure how well this is appreciated. I discovered it about 3 weeks ago when I compared the matches of my wife and her full biological sister for 3rd and 4th cousins. At the 4th cousin level, they showed identical matches on less than 50% of the total 4th cousin (95% probable) matches. This is all just how the statistics of recombination works. At any rate if you are concerned of losing your historical match data, suggest you use the Google Chrome browser with the “AncestryDNA Helper” add in. Regards, Bill Rothwell
Bill, there is no matching-system cure for the random recombination of DNA that causes us not to match 10% of our third cousins or 50% of our fourth cousins. It’s not a false negative in the sense of misunderstanding the data; it’s the simple scientific fact that my third cousin and I didn’t happen to win the recombination lottery and inherit the same DNA segments. This is why no autosomal DNA test can ever disprove a relationship at that level of cousinship. So I don’t think of this as a false negative as much as an unfortunate scientific fact.
Judy, That was precisely what I was trying to convey. Somehow I did not do a good job of stating things or you misread my entry. However scientists (I am one) typically use the terminology “false positive” and “false negative” to describe the two possible error outcomes in tests of all sorts. In simple terms, I tell people, “the absence of a demonstrated match on autosomal DNA, is NOT proof of a lack of relationship.” The one exception to this concerns siblings and probably first and second cousins where the probability of a statistical mismatch is vanishingly small. I lament the fact that some people view autosomal DNA as some sort of pseudoscience or something mysterious since I think most anomalies can probably be explained by the statistics of recombination.
I may have misunderstood, too, Bill.
re AncestryDNA Helper mentioned above.
Isn’t this the helper to use when you are entering Ancestry matches into Gemone Mate?
Has there been any information on the proportion of lost matches that will be the result of dumping segments from excess IBD regions — vs matches lost as a result of conversion to better measurement, i.e., converting from megabases to cM and excluding tiny segments in the 1-4 cM range?
Can someone (hint:Judy G. Russell) please ask ancestry what is the median and average size in cM of the segments that will be thrown out with the disappearing matches?
Whether that specific information will be disclosed or not I can’t say, but I do expect a white paper on the matching system when it’s rolled out.
This is really good news. I have over 15,000 matches. Not in my wildest dreams do I have time to look at each tree even for the ones that aren’t private. Cutting that to about 1,500 would certainly bring it in line with the number of matches I have at 23 and FF. And yes, providing a chromosome browser would reveal just how much of this is smoke and mirrors. Perhaps, once they cut out all the dead wood, a next step would be chromosome information and hopefully before hell does freeze over.
I agree it should be the next step, Ruth, but I truly do not ever expect it from the folks currently in charge of this particular test.
I have kept a running spreadsheet of matches where I can identify the common ancestors, but I still hope that I don’t lose my “starred” matches in this process. And I wonder what will happen to one of my matches that Ancestry indicates “might” be a 5th to 8th cousin with a Very Low Confidence level – I just happen to know that the person in question is my 3rd cousin.
At present you can use the AncestryDNA Helper extension for Google Chrome Browser to capture a csv file of all of the matches for each test you have that you can then open up in a spreadsheet program like Excel. This includes the tested persons id and administrator id so you know who to contact thru the ancestry messaging system.
After you run the extensions initial scan (which can take a long time), you can download your matches. My two tests took 12 and 13 hours the first time I ran them. (Hint, let it run overnight though you may need to take your computer off sleep to keep it running.)
The extension also adds some other search capabilities and the ability to compare multiple tests you administer or phase them to see those matches in common. Oh, and the data/info does not leave your computer.
I would suggest with any match that you have starred, to screen capture or “print/save” a pdf of that individual’s match page and view of the tree. And if they have a public tree, you might create a quick link or a browser bookmark to that tree for future reference. That way when your match total “shrinks” you have the info in case some of your starred matches disappear.
I have tests at Ancestry and FamilyTreeDNA. I have confirmed zero matches at FamilyTreeDNA from less than 90 matches. (My tree goes back many generations and is well researched/sourced with only a couple stumped branches so I know my ancestors.) I have between a dozen and two dozen confirmed matches at Ancestry from about 8000 or so matches. And some of those matches come from the distant cousin range.
I agree Ancestry’s matches are excessive and more than likely include a lot of false matches but Ancestry’s real failure is assuming its customers can’t handle the tools that are really necessary to truely add DNA testing as a tool in your genealogy toolbox. Just seeing we share something and these are the people we have in common is not enough. Tools that answer how much DNA do we have in common, what is the longest segment we share, what chromosome(s) do we share, etc. are what is needed. These tools are available at the other two DNA companies already. And from a few third party sites.
When you download your DNA data, Ancestry warns you about privacy at other websites. Well, if it provided the tools necessary I would not have to download my data and take it elsewhere where the necessary tools are available to truely analyze my match with someone (assuming the person puts his/her data at the same site.)
I like Ancestry for the matches but I dislike that Ancestry withholds the tools that are necessary to analyze these matches I don’t have elsewhere.
Jeff Snaveley’s extension is the best method available now, for sure.
I am extremely excited about this development! My mom has about 354 pages of matches (she has very deep colonial American roots) and I welcome any way to winnow down the amount of people to search through. Thank you for the heads up!
I think winnowing out the bad matches will help a bit. Hope it does at any rate!
Have they started already? I went to my AncestryDNA results page after I read this and I only had 117 matches, down from hundreds. And for the first three that I checked, I found my ancestral connection. I found two 2nd cousins and one 2nd cousin 2x removed. Wow!
Until now, I have never found a confirmed match on any of the sites I have tested at.
Thanks for letting us know about this change.
Eileen
It’s definitely not being rolled out now, Eileen: I still have the same 260 pages of result I had yesterday, and last week. The launch isn’t even targeted for a specific date yet. You might want to try refreshing your results page to see what happens — and then asking Ancestry if those results are still missing in action.
I haven’t done the Ancestry DNA testing, I have don the Family Tree DNA though. If what you say is true it must be a real pain looking through all of those names. I’m not sure how many names I have at FTDNA, but out of all of them I’ve emailed not one has been a match. While I’ve not emailed all of them, even the 5-10 that I’ve been in contact with could find any connections. Hopefully these tests will get better over time.
With that said, I have had good luck with the male DNA testing, that has connected me to many of the people I know to be in my direct male ancestor’s line.
DNA testing works best when it’s targeted: you and someone you hope to match both test to see if you have genetic evidence to support your theory. In every case, it works only when combined with the paper trail evidence.
Judy,
I want to let you know that your blog post is listed in today’s Fab Finds post at http://janasgenealogyandfamilyhistory.blogspot.com/2014/10/follow-friday-fab-finds-for-october-10.html
Have a wonderful weekend!
Thanks, Jana!
One thing folks won’t tell you though about DNA matching is that when your ancestors come from “endogamous regions” you are still going to have a lot of matches. This has created a LOT of problems for me as autosomal often shows up as far bakc as 10th cousins in one branch of my tree where intermarriage was very common. Those ancestors in my case come from the earliest settlers in New England where there were so few people to marry that this DNA still shows up today, although most folks you talk to will tell you it is not possible to go further back than 6 generations, that just is not so if you come from a line where families married into the same families over and over…a lot of that DNA still persists today. It is not just ancestry where I have tons of matches…I also have tons of matches on 23andme, FTDNA…this really makes things complicated for people trying to trace their DNA when your ancestry comes fron endogamous populations…I actually get hit with this in 2 ways in my DNA as my Italian ancestors ofetn married into the same families over and over too…BUT if you come from ancestors of the first settlements in the US you need to be aware of this as people are ofetb listed as 6th cousins and such (sometimes even closer) when indeed, if you are able to track it, as I have done with several, they are your 10th cousins or some such distant relationship. I doubt these new changes will “fix” that so people who go back to those areas in the 1600-1700’s in New England need to be aware of that. There have been a few articles written on that phenomena. Her is one of them http://dna-explained.com/2013/10/21/why-are-my-predicted-cousin-relationships-wrong/
Endogamy is certainly an issue. Wrote about it two years and more ago: Endogamy and you. Really.
Reading the blog post and comments have convinced me that I’m glad I’m waiting to do anything with DNA. No one here talked about what Ancestry.com has said they would do with these samples if “the market place changes.” How can anyone trust them to be good stewards of these DNA collections?
I always think about my time and what a new line of genealogy research or method will consume. It doesn’t sound like DNA matches warrant the time it takes to wade through yet. I know that less will be more in this case but I’m going to wait until I hear that working with DNA will be a good use of my precious research time.
Some may not warrant the time, others pay off instantly. As much as I wouldn’t run the circus the way AncestryDNA does, I can’t deny that I’ve made a couple of research breakthroughs because of cousins who tested there. It’s not a perfect system, but the depth of that database is too deep to ignore.
I had my test done at Ancestry and the composition was NOT as reported by my family. Specifically I AM SURE that my great great grandmother on my mother’s side was of Irish ancestry and equally AS SURE that my father’s father is a Cherokee! Well when I got my results from ancestry.com I was reported as less than 1 percent native american, and none of my european ancestry was Irish!
Not convinced of Ancestry.com accuracy I did the free upload and free 1st 20 matches from Family Tree DNA and VIOLA one of my first two matches is of total Irish ancestry, and a few down show North Carolina/Virginia area which is traditionally Cherokee Nation. Whilst DNA is DNA ancestry.com racial break down information was ALL WRONG for me!
You can transfer your raw data to FTDNA using this link for free https://www.familytreedna.com/autosomalTransfer?atdna=mGuGbpp1PgJxZC6kBS9UEg%3d%3d
Please do tell if your information is more accurate with FTDNA than with Ancestry.com