Monday, March 25, 2013

Needles in a Haystack

Big Data is useful, but calling it 'big data' is like saying an elephant is in the same family as a dormouse (it's not, by the way, nor are they in the same taxonomical order). If every 32-bit byte was worth a penny, Wal-Mart's yearly expansion of retail data would wipe out the US government's deficit. Add Kroger,Target, Walgreen's, Amazon.com, and others: the government would have a surplus, even if data were taxed at 0.1 cents per byte. But I digress.

Lots of companies are using the data to focus marketing efforts on their customers. They're looking for patterns. Netflix developed a successful show based on their viewers habits. Target sends out baby product promotions when they notice women following the pattern of buying multivitamins and skin lotion a few months after that. They found a pattern in earlier customers and now apply it to current ones.

A worry about this analysis of 'big data' is that there's not an effort to look into causes, only correlations. I've blogged about the dangers of correlations that are unrelated to causes before. Analysis of data is one step toward getting information, but it's not the final step. Even one of the authors points out a correlation that seems strange and struggled to find the cause: orange cars have proportionally less maintenance problems than any other color. There's just too much data and too little time to not only find the patterns and correlations but to be able to analyze the data for causes.

What about the accuracy of the data? Couples might share on-line retail accounts. If it's in the guy's name, Target might miss out on the marketing of baby products. In a family or a group of close friends, swapped purchases would really distort the analysis or make it moot. Societal norms change. Those changes might invalidate what we might research, conclude and develop for marketing plans. Non-profits are discovering this as charitable giving trends have been evolving over the last twenty years. Workplace policies have been modified as new generations enter the labor market with different expectations and work styles.

If we collect too much information, are we obliterating the really useful? Are we hiding the needles within the haystack?

No comments:

Post a Comment