Data is one of the biggest challenges facing companies today. Technologies such as social media mean more data is available than ever before—and researchers are struggling to analyze information effectively. User bias and knowledge has come to play a large, but dangerous role in data analysis. Now, a new tool could help identify relationships between data variables without input from researchers.
Most mathematical relationships—like what conditions increase disease outbreak—require some knowledge and guesswork about potential associations. A scientist might assume, for example, that a salmonella outbreak has come from a supply of tainted eggs. But such assumptions can lead to false results. A team of researchers from MIT, Harvard and the Broad Institute have developed a method of removing such prejudice from data analysis.
The team is working on algorithms capable of identifying all relationships in a dataset while accounting for “noise.” On a graph of a worker’s pay, for example, such “noise” could include bonuses and overtime.
“I had started trying to think about some large epidemiological datasets, and since I wasn’t an epidemiologist, I didn’t really know what to look for,” David Reshef, a joint MD-PhD student in the Harvard-MIT Division of Health Sciences and Technology (HST), said in a statement. “I just kind of wanted to know, ‘What are the variables in these datasets that are most associated?’ Being as naïve as I was, I hadn’t quite realized how difficult of a question that was to answer.”
To this problem, Reshef has developed an algorithm that plots all variables of a dataset against the other variables. It then overlaps each relationship to assess how related the variables are.
“The fundamental idea behind this approach is that if a pattern exists in the data, there will be some gridding that can capture it,” Reshef said. The algorithm works with a curve as well as it does with a straight line, meaning that any type of relationship can be analyzed.
Eventually, this type of algorithm could be used to remove researcher bias from analysis. If a scientist assumes that a disease outbreak, for example, is caused by a water supply, other variables—like food or weather—could be ignored. The new algorithm could help identify unlikely relationships in datasets.
The team’s research was recently published in Science. Click here for a video demonstration of the new tool.

Fresh post: Cloud physical security considerations http://t.co/EMmMaQyF (via @TAslan4) #cloud #security
RT @ShakuS: Connect with #IBMMobile team at #MWC12 next week – @Bob_Sutor @dheap @toddplunk @jmacd @didelrosso @tselrahc @mikekuklenko
@Husaria We'd be happy to work with you to make moving to the cloud as easy as possible. Please let us know how we can help.
Headed to #MWC12? Here's a new blog post to give you a peek of what to expect >> http://t.co/3voelZEF #IBMMWC
Big Blue Goes Big on IT Security http://t.co/mOhWynP4 #IBM #security (by @ahess247)
Nice Cloud 101 post on workloads: I have a #cloud player, now I need movies! http://t.co/rLWnfsRZ (via @JuliaCalabuig) #thoughtsoncloud
Cloud adoption in Asia Pacific: Strong signs of progress, but not everywhere [Forrester] http://t.co/oBxlZrxJ #cloud #Asia
Thx for the RTs! @shameerc @IBM_SI_BPs @ibmsaas @tweetsaj @mulvaneyA @NancyMReaves @tdkarthik @rudnickm @Mak2064
Thx for the RTs! @callmechelsea @james_mathewson @bobboyce @yesicaibm @ivansteen @hbmibm @henrikuiper @mkarimawan @jtspears77 @neccloudbizz
Thx for the RTs! @kthuerk @_carlos_felipe @PVSWXchange @mikeatwired @emarcusnet @gregoryjgreben @sarahatWIS @icloudcompare @stevendickens3