Friday, November 16, 2012

The Role of Data Analysis in "Get Out The Vote" Operations

In this article in The Atlantic, Ben Jacobs describes the failures of the Romney campaign's "ORCA" software platform.  ORCA was supposed to help the campaign identify which likely Romney voters had already cast their ballots.   Thus, they could focus get-out-the-vote operation resources on voters who hadn't gone to the polls yet. 
It turns out that the Obama campaign had tried something similar in 2008, and that system had also crashed.   The Obama campaign learned its lesson from this experience. They came to the conclusion that such an election day system was not feasible.   For 2012 they went with a system called "Narwhal" that targeted likely voters before election day. 

So in a sense the Romeny campaign was on cycle behind from a technical perspective -- they learned a lesson that the Obama campaign had learned 4 years earlier. 

Thursday, November 8, 2012

Nate Silver and Big Data

There are a host of post-election stories on Nate Silver's success in using statistical analyses to predict the outcome of the election overall as well as the the outcome of voting on a state by state basis.  For example, in this article from InformationWeek, Eric Lundquist ruminates on how to apply Silver's success to business.

If you are interested in the nuts and bolts of how Silver uses data to make predictions, here are a few links from Silver's blog on the New York Times site.  One is an explanation of his methodology, the other a glossary of terms:

Or, you can go right to the source and read Silver's book The Signal and the Noise:

Thursday, October 11, 2012

Two Congressional Investigations Into Activities of Data Companies

 There are now two Congressional investigations being held regarding the activities of data gathering companies.  One investigation is targeting so called "data sellers" while the other is targeting similary termed "data brokers."   Both investigations are targeting some of the same companies. 

The "data sellers" investigation is being run by Representative Edward J. Markey, Democrat of Massashusetts, and Representative Joe L. Barton, Republican of Texas.  The investigation was prompted by this article on data gethering company Acxiom in the New York Times and targets companies such as Acxiom and Experian.

The "data brokers" investigation was opened this week by Senator John D. Rockefeller IV of West Virginia, and targets some of the same companies as the other investigation.

You can read more about the investigations at the links below:

Wednesday, October 10, 2012

Visualization of what AT&T's network looked like on 9/11/01

CNN has a slideshow of graphs that show what AT&T network traffic looked on the days on which events initiated a  high volume of calls and texts. Included are visualizations of network traffic on 9/11/01 as well as the days on which the East Coast Earthquake and  Hurricane Katrina occurred.

See the article on the CNN site:

Friday, September 28, 2012

How Facebook uses Big Data

To get an idea of where Big Data is headed, "there are worse places to look than Facebook," says Derrick Harris, the author of a GigaOM article on Facebook's use of Big Data technologies.

In the article, Facebook's VP of Infrastructure Engineering Jay Parikh talks abouut the technologies that Facebook is using to harness its mountains of data.  These technologies include the usual suspects like Hadoop and Hive in addition to mySql and some home brewed backend and interface pieces.

Read the article at:

Tuesday, September 11, 2012

Dark side of big data: "terms of service" agreements

Time Magazine has an article on how the way big data is collected may affect us all.  In order for companies to analyze all that data, they have to collect it first -- from you, when you are browsing and clicking.


Friday, September 7, 2012

Harnessing the body's data

The New York Times profiles several current and upcoming devices that promise to harvest biometric data from the body for the benefit of both the wearer and perhaps for the populace in general.

See the story here:

Thursday, May 31, 2012

Today's Big Data links

Here are a few posts of interest from the last few days regarding Big Data.  In the Columbia Journalism Review, "When Big Data is Bad Data" concerns the controversy over public posting of teacher rankings based on questionable metrics.  From the Wall Street Journal's "CIO Report" blog, "Taking Small Steps to Big Data" presents views on Big Data that were aired during the recent MIT Sloan CIO Symposium.

"When Big Data is Bad Data"

"Taking Small Steps to Big Data"

Thursday, May 24, 2012

White House Launches New Digital Initiative

The White House recently announced a new digital strategy, which addresses a broad array of topics such as open government, data quality, and attracting talent, among others.

This strategy is the work of Steven VanRoekel CIO of the Federal government, and Todd Park, the CTO.

VanRoekel assumed the role of Federal CIO after Vivek Kundra, the first person to hold the position newly created under the Obama administration, left last year.  Similarly, Park assumed the CTO role after Aneesh Chopra departed.

Here's the blog entry on O'Reilly's "radar" site (btw as a longtime M*A*S*H fan, I love that the url is

Monday, May 21, 2012

Science and private "big data"

When research is conducted using data from private sources, should the underlying data be made public?  See the New York Times article at

Thursday, May 17, 2012

Coming soon: the Google Knowledge Graph

Google will release a new "Knowledge Graph" feature this week.  For certain searches, the graph will pop up in the right hand column of your search results and provide data which can be used to, for example, disambiguate the search -- if you searched Taj Mahal, you might see data for the musician as well as the building.

For more information, see the Atlantic article as well as the entry on Google "Inside Search" blog:

Google Gets Back to Its Roots With New Search Update - The Atlantic

Inside Search: Introducing the Knowledge Graph: things, not strings

Thursday, May 3, 2012

Fighting more than crime with Compstat

CompStat, a data driven process first used in New York City and credited with a drastic decline in crime there, is being used by the city of Baltimore to do more than just fight crime.  It's now being used as a tool to improve a variety of city services.

 Read the blog entry on the New York Times site:

Monday, April 30, 2012

Harvard pushing for more open access?

See this article in The Atlantic which talks about how Harvard library is "encouraging a broad range of measures to support open access journals."

Harvard vs. Yale: Open-Access Publishing Edition - The Atlantic

Sunday, January 15, 2012

New "big data" book in the works from Manning Press

I received an email this morning from Manning Press announcing an early access edition of their upcoming book "Big Data."

"Big Data" is by Nathan Marz and Sam Ritchie, both engineers at a little startup called "Twitter" of which you may have heard. The book is a survey of the practical aspects of implementing a Big Data solution. As an early access edition, currently only the first chapter is written (and freely available at However, promised chapters with titles like "MapReduce and Batch Processing" will cover topics like Hadoop installations,NoSQL databases, and the like.

For the uninitiated, early access editions (MEAPs or
"Manning Early Access Program" in Manning parlance) allow you to purchase
the book before it is entirely written. The purchaser is granted access to
each chapter as soon as it is completed. For more information on MEAPs,

Thursday, January 5, 2012

David Weinberger Big Data excerpt in The Atlantic

David Weinberger, a Senior Researcher at Harvard University’s Berkman Center for the Internet and Society, has authored a new book titled "Too Big To Know." The book is concerned with how the size of data affects the way in which we use it to forge new understanding.

The Atlantic has published an excerpt here: