Real-time data contains a mother lode of information – but until privacy worries are put to rest, that wealth of data will – and probably should -- go untapped, says a Carnegie Mellon professor.
The time has come for a frank discussion, says Tom Mitchell, head of the Machine Learning Department in Carnegie Mellon’s School of Computer Science. At issue: would we want to reap the benefits of correlating data about flu carriers and their location via the GPS in their smartphones? Or would we prefer not to have this information – which could help us avoid getting sick – for the sake of preserving personal privacy? If the massive quantity of data about us in different silos were correlated, we would confront many more morally ambiguous choices, says Mitchell.
“The potential benefits of mining such data range from reducing traffic congestion and pollution, to limiting the spread of disease, to better using public resources such as parks, buses and ambulance services. But risks to privacy from aggregating these data are on a scale that humans have never before faced,” Mitchell wrote in a recent issue of the journal Science.
Congressional hearings might be required to fully air the pros and cons, although some issues will be resolved over time as police and courts review electronic information as evidence, the professor said.
“We need to, as a society, recognize that this is happening and figure out what government policies might be helpful,” said Mitchell.
In some instances, data is being gathered anonymously and the privacy implications are minor. Currently, cell phone carriers can see where phones happen to be across the different cells in a network. Thus, if cars are stuck in traffic, it’s possible to see that many cell phones are in a given cell or series of cells and aren’t moving. Although this kind of data gathering appears to be innocuous from the standpoint of privacy, others are far more ambiguous and possibly troubling.
The Google Flu Trends Website publishes statistics related to queries about flu-related topics. The results map very closely to statistics compiled by the Center for Disease Control in Atlanta. However, the Google information is available more or less instantaneously, while the CDC takes longer to compile its data. “They can predict the CDC report a week before it’s published,” said Mitchell.
If contagious disease information were to be correlated
with cellular phone or GPS location data, it would be possible to know where
infectious persons might be. It would therefore also be possible to call
persons who might be near infectious persons and warn them. While such an
application of technology might help stop the spread of a pandemic, it would
also call into use perhaps more information about us than we would be comfortable
sharing.
In some
cases, technology might be enlisted to preserve anonymity. For example, data
could be mined from many different organizations but not aggregated into a
central repository without first being encrypted to protect the privacy of individuals,
Mitchell pointed out.
But the thorny ethical questions that are beginning to present themselves are not likely to be answered overnight. “It will take a while,” said the professor, adding, “The public has to recognize this is a trend and that it is a big deal. There is a lot of data out there about us and there is a lot of good or bad that can be done with it.”

Good morning from Los Angeles! #ibmcloud
That's it from me! Over to North America.
The data processing of Roland Garros 2012 (#RG12) rests on IBM Private Cloud http://t.co/JUaY1ItM [French Press release]
IBM Accelerates Business from Supply to Demand with New #Cloud Offerings For Smarter Commerce http://t.co/OFxknOb0 [Press Release]
How IBM #SmartCloud Foundation technology powers cloud adoption?
IBM VP @SLHebner explains here http://t.co/sSzfa0O5 [VIDEO]
IBM's Fiona Cullen will present ‘The Power of #Cloud: Driving Business Model’ On May 24 @ Utrecht, Netherlands #cloudforum2012 #ibmcloud
Blog Post: Why service providers should not ignore cloud http://t.co/ZfQyue4r via @eMarcusNet #thoughtsoncloud
Have any #cloudmoment? Share your story with us via Twitter, Instagram, YouTube, Facebook and tag it. See other stories http://t.co/J4ntsaQ5
Sign up now for IBM #SmartCloud Enterprise! No charge for select VMs (only till May 28). More Details >> http://t.co/2LEzOUZC #ibmcloud
RT @HansMoen: See this video from @IBMCloud to learn how to cut costs in building innovation in your business http://t.co/XOyJoFn6 #clou ...