Twitter used to predict future

Researchers have for the first time used social networks to predict what hashtags will trend more than a week in advance.

With data from thousands on Twitter users, researchers developed a model that analyzes the connections between pairs of individuals and how topics flow through the most connected parts of the network. The work holds promise for predicting global epidemics, said Nicholas Christakis ’84, a professor of sociology at Yale and co-senior author of the paper.

“Basically what we’ve done is figured out a way to use Twitter to predict the future,” said James Fowler, a professor of medical genetics and political science at the University of California, San Diego and co-senior author of the paper.

The friendship paradox states that a person’s friends, on average, have more friends than the individual. Using Twitter data from 2009, the researchers randomly selected 50,000 users and 50,000 of their followers for the network analysis.

In agreement with the paradox, the researchers found that the second group was more well connected to the network, and on average using hashtags nine days before the hashtag went viral among the original 50,000 users.

“It’s mathematically the case that your friends have more friends than you do,” Christakis said. “In fact your sexual partners have more sexual partners than you do, and if you’re a scientist, your co-authors have more co-authors than you do. It’s a general property of social networks.”

The result surprised Fowler, who expected that the Twitter connections would predict trending topics hours in advance — certainly not more than a week beforehand. Fowler said if the study were to be repeated today, the model might produce a shorter prediction time because the Twitter network is more complex.

Andrew Papachristos, a professor of sociology at Yale who was not involved in the study, said that using the Twitter data allowed the researchers to look at connections over time. By comparison, many network studies involve static networks that prevent causal analysis, Papachristos said.

“We’ve studied most of the network we have actual data on,” Papachristos said. “Twitter [is] in real time and there is a massive amount of it.”

Both Fowler and Christakis have previously collaborated on network studies, and the pair said they are looking to shift their studies from descriptive science to application.

The duo plans to apply the Twitter model to disease outbreak. Based on the geographic location of web searches, Google currently compiles data to figure out the source of disease outbreaks. With a similar network to the Twitter study, Fowler and Christakis plan to ask volunteers and their friends for continual reports on their flu symptoms. Fowler and Christakis hope to use this data to predict outbreaks before they happen.

“If the most popular person in the group gets the flu, then soon everybody will be getting the flu,” Christakis said.

The study was published on April 9 in the journal PLOS ONE.

Comments