There is a lot of data out there.
This fact has urged a few related notions to propagate from the realm of research into the world of business and the popular consciousness. The first such notion is that the amount of data out there is a bit frightening – there is a tremendous, staggering amount of it residing in the digital storehouses of government and industry, and it is growing bigger every minute. The second notion is that there is a desperate need to harness this data both for the betterment of society and the growth of business. The third is that this desperate need is starting to be met by emerging disciplines, tools, and methods.
I can tick off several books that reside in one or more of the above categories. “Beyond the Tsunami,” an online book from Microsoft Research mentioned in a post in this blog, is one such book. “Visualizing Data,” a title from O'Reilly Press, falls into the the third category.
But the book I want to talk about now is “SuperCrunchers,” a 2007 book by Ian Ayres, a professor at both the Yale School of Law and the Yale School of Management.
SuperCrunchers is a book about how statistical analysis of large datasets (that is, “supercrunching”) is beginning to profoundly affect our lives. The book illustrates this effect with a series of anecdotal case studies. In one example, a computer science professor started the website farecast.com after he was annoyed by learning that the people in the seat next to him on a plane had paid less for their tickets simply by waiting longer before buying them. Farecast is a site that predicts when tickets for your desired flight will be cheapest.
Ayres also describes the application of statistical methods to societal problems in the fields of medicine (evidence based medicine) and education (direct instruction). The sometimes innovative and sometimes creepy use of such methods by large corporations are detailed by Ayres as well.
The anecdotes contained in this book are compellingly written, and since the book is for a popular audience, the statistical methods described are described easily digested by the average reader.
This is an important book—-it clearly describes how the abundance of data and the statistical methods to manipulate it are changing society. This change is sometimes clearly for the better, and sometimes less so.