A recently awarded grant may help Yale resolve the technical challenges associated with transferring the large volumes of data its researchers study every day.

Earlier this month the National Science Foundation awarded Yale a $496,253 grant to improve the process of transporting large-scale scientific data. David Galassi, the director of network services for ITS, said the University plans to use these funds to build a dedicated science network which will be 10 times faster than the current system. While the project schedule in the grant proposal spans two years, Galassi said he expects that researchers will be able to use parts of the developing network within six to nine months.

Until the new network launches, scientists working with massive data sets must rely on the same system used by the rest of the University. Though this primary network is sufficient for average use — browsing the web, sending emails and other day-to-day activities — it becomes a significant bottleneck for certain scientific applications which must transfer far more data.

Daisuke Nagai, a professor of physics and astronomy and one of the authors of the NSF grant application, said he frequently encounters these network bottlenecks in his research on the formation of galaxies, stars and the structure of the universe. His group runs simulations that generate upward of 200 terabytes of data — roughly equivalent to 42 million MP3s — which can take months to send across campus from research storage to local astronomy servers.

“When we have to transfer one set of simulations that we are analyzing, it takes a couple of months to transfer the data or to retrieve the data from West Campus,” Nagai said. “If you want more data, it takes another couple of months.”

Nagai said he expects that these multi-month transfer times will be reduced to several weeks after the new network has been deployed. The upgrades may also eliminate the need to transfer the data to the astronomy department’s servers in the first place by allowing his group instead to “interact with this data online,” thus enabling researchers to process their data in real time.

Nagai said the network upgrade will be “the key for finding new trends” by allowing scientists to work more efficiently and reducing the delays due to file transfers. He added that the need for faster transfer of massive data sets is common to scientific research across all institutions.

Andrew Sherman, a researcher in the Computer Science Department and a member of the grant writing committee, said that many scientists at Yale collaborate on projects with national laboratories or other universities that require sharing large data sets. The planned science network will facilitate external collaboration by creating a DMZ, a specialized network segment through which researchers can securely swap data without traversing the less flexible firewall which protects the main campus network.

Large data downloads from outside the University are especially important for projects such as Yale’s Center for Earth Observation, a research lab that analyzes satellite images of the Earth. The Center’s director, Ronald Smith, explained that the project’s data sets consist of “a huge amount of data per second,” most of which must be “downloaded from external archiving sites around the world.”

Smith, a professor in the Geology and Geophysics Department, said he hopes that the new network resources will prevent researchers from abandoning data intensive projects because the University lacks the “network capability to bring it in.”

Yale was one of 33 campuses to receive network funding from the NSF under the Campus Cyberinfrastructure-Network Infrastructure and Engineering program.

MICHAEL DISCALA
Yale 2014