Machine analysis of hundreds of thousands of Twitter messages is helping scientists learn how to use social networks to combat the spread of disease. Marcel Salathé, assistant professor of biology at Penn State, came up with the idea of combing tweets to determine people's attitudes about the H1N1 swine-flu vaccine. Mapping negative attitudes about a vaccine could potentially help identify geographic areas or population clusters that are unvaccinated and, therefore, theoretically more susceptible to a contagious outbreak.
While the H1N1 pandemic was sweeping the nation, Salathé started by collecting all English-language tweets that mentioned a relevant keyword, such as "vaccine, "vaccinate" or "immunize." The original data set of nearly 500,000 tweets had to be whittled down for classification purposes, so the professor sliced out about 10 percent of them and had student volunteers read and classify each message in that subset as:
1. positive ("I'm off to get a swine-flu vaccination");
2. negative ("The H1N1 'vaccine' is dirty. DontGetIt!");
3. neutral ("Health dept offering the flu vaccine on Monday"); or
4. irrelevant ("Senate says no to funding for malaria vaccine").
Then, programmer/analyst Shashank Khandelwal developed an algorithm to classify the remaining mountain of uncategorized messages. "The human-rated tweets served as a 'learning set' that we used to 'teach' the computer how to rate the tweets accurately," Salathé says.
Once the machines had sorted the messages and eliminated those classified as irrelevant, the researchers found that tweeted pro-vaccine sentiment correlated with actual vaccination rates. New England, for example, had the highest number of positive attitudes and the highest number of people vaccinated. As Salathé told a Penn State publication, "These results could be used strategically to develop public-health initiatives. ... targeted campaigns could be designed according to which region needs more prevention education."
The biologist says one of his goals is to use social-network data to also investigate other health threats, like obesity and heart disease.
The beauty of tweets for research is that they are public data, are concise, sometimes include location information and, by showing who is following whom, can enable a researcher to pinpoint pockets of, say, people avoiding the vaccine--the assumption being that people who follow X tend to share X's views. As Salathé says in the research paper he co-wrote with Khandelwal, they found that "opinions are clustered," and that "most communities were dominated by either positive or negative sentiments towards the novel vaccine."
Note to CDC: Maybe for the next outbreak, you could find out if Lady Gaga, Justin Bieber or one of the other most-followed people is positive or negative about a new vaccine and then target that person accordingly.

Good morning from Los Angeles! #ibmcloud
That's it from me! Over to North America.
The data processing of Roland Garros 2012 (#RG12) rests on IBM Private Cloud http://t.co/JUaY1ItM [French Press release]
IBM Accelerates Business from Supply to Demand with New #Cloud Offerings For Smarter Commerce http://t.co/OFxknOb0 [Press Release]
How IBM #SmartCloud Foundation technology powers cloud adoption?
IBM VP @SLHebner explains here http://t.co/sSzfa0O5 [VIDEO]
IBM's Fiona Cullen will present ‘The Power of #Cloud: Driving Business Model’ On May 24 @ Utrecht, Netherlands #cloudforum2012 #ibmcloud
Blog Post: Why service providers should not ignore cloud http://t.co/ZfQyue4r via @eMarcusNet #thoughtsoncloud
Have any #cloudmoment? Share your story with us via Twitter, Instagram, YouTube, Facebook and tag it. See other stories http://t.co/J4ntsaQ5
Sign up now for IBM #SmartCloud Enterprise! No charge for select VMs (only till May 28). More Details >> http://t.co/2LEzOUZC #ibmcloud
RT @HansMoen: See this video from @IBMCloud to learn how to cut costs in building innovation in your business http://t.co/XOyJoFn6 #clou ...