Wednesday, April 30, 2014

Million Man Fraud?

Are these the same people:
John Smith birthdate: 1-1-1900 social security number 999-99-9999 living in Kansas
John Smith birthdate: 1-1-1900 social security number 999-99-9999 living in North Carolina?

If you believe some of the politicians, they looked at some of the studies Kansas and North Carolina have conducted and estimated that 35,000 voter record duplicates might mean there are a million duplicate voter records in all the states. But some people who are turning this fact into a tremendous voter fraud controversy don't understand databases.

There are many people in a variety of states who share the same name. In fact, I've met some people who have the same name I do. Even though people have the same name, the programs that try to search the databases to find matches also try to match birth dates and social security numbers. Many databases cannot have blank fields. If the information is not known or gathered for the voter record databases, a proxy date and number are used to fill in the blanks, such as the ones shown in the beginning.

I wish (and maybe they did but it's not clear from the news reports) that the query writers to screen the database records would have written in filters to ignore matches based on the various proxies used by the states. My guess is that the 35,000 or so would be reduced quite a bit. In fact, when Kansas and Nebraska have researched in more detail, their numbers are quite a bit less.

If it's not a database 'problem' then it's a wonderful anomaly that I'm going to use in my next statistics class: "Hey, did you guys know that 1 in every 10,000 people were born on the same day, given the same name and assigned the same social security number by the goverment? You're homework tonight is to calculate the probability of that happening by random chance."

