The Yale Center for Research Computing is helping solve big problems — big data problems.
Starting Nov. 2, the center partnered with a consortium of universities and industry partners headed by Columbia University on a $1.25 million research project with the U.S. National Science Foundation to tackle challenges facing data analysis in the northeastern United States. Yale’s center, a new facility unveiled just this semester, is equipped with sustainable, state-of-the-art computational infrastructure capable of applying advanced computing and data processing throughout Yale’s research community.
“Yale, alongside 40 other universities and industry partners, will form a consortium through which data, tools and ideas will be shared in domain-specific areas,” said Kiran Keshav, executive director of the Yale Center for Research Computing. He added that three Yale faculty and researchers will be serving the project on the steering committee and in the areas of energy generation and storage and discovery science.
“Big data” is a term for massive sets of information collected from a variety of sources, including weather sensors, lab machines and observational equipment, said Andrew Sherman ’71 GRD ’75, senior computational research scientist in the Yale Computer Science Department. The project aims to work on correcting inefficiencies in data collaboration, in order to optimize the solutions to current challenges in science.
A Nov. 2 media release from the Data Science Institute at Columbia University announced that the NSF proposed to divide the country into four regional hubs. The NSF Program Solicitation outlines the budget for these regional hubs: a maximum of $1.25 million is awarded to four projects, one for each region, for up to three years. Columbia and Yale’s partnership in this larger consortium represents the Northeast.
Kathleen McKeown, a computer science professor at Columbia and director of the Data Science Institute, described the northeastern region of the U.S. as “an ideal laboratory for testing the potential for data science to improve lives.” She said the Northeast Hub will extract insights from large amounts of data to eventually bring tangible results.
The Northeast Hub has six areas of focus: health, energy, cities and regions, finance, education and discovery science. Sherman will represent Yale in the area of discovery science, which analyzes large-scale data using powerful computing to discover relationships within data sets and bring insight into scientific problems such as genomics, high-energy physics, astronomy, climatology and the study of the human brain.
However, the project’s aim is not to deploy large-scale solutions just yet, Sherman added. There are still many difficulties with big-data acquisition and application, he said.
“Big data is really big, so it’s difficult, time-consuming and often expensive to store the data, compute with the data or move the data around,” Sherman said in a Tuesday email to the News. “We’re really in the infancy of development of the techniques to analyze such huge quantities of data [and] so many of the techniques we have today … may not be able to scale up to handle the sizes of data sets required for discovery science in the future.”
In the health sector, the project intends to improve harnessing data from social media, patients, environmental sensors and other sources to improve individualized treatment, the press release stated.
Just as discoveries about the human genome have begun to transform health care, Sherman said, scientists will be able to make rapid progress in discovery science with the sharing of data using computers, networks and data storage facilities.
The press release also highlights other initiatives which aim to make current governmental systems more efficient. The press release further noted that public services can be optimized to make cities and regions more equitable, sustainable and resilient. Data analytics can also help facilitate better understanding of financial markets and improve education through feedback on teaching techniques and online courses, according to the release.
The Northeast Hub will hold its first workshop on Dec. 16 at Columbia to discuss corporate data analytics and how companies in the Northeast can benefit from it.