New storage system to fill research needs

Though the University announced the creation of a new mass storage system to meet the needs of faculty researchers in mid-October, the new backup option is only a temporary boost to Yale’s current storage systems.

Yale Information Technology Services and the Provost’s Office announced on Oct. 11 the creation of a “Research Storage Solution” — a mass data repository accessible through the Internet — to meet the increasingly urgent need for additional research storage space among faculty. The University’s current storage systems are coordinated by individual departments and researchers and have often been prone to becoming overcrowded and damaged. The newest addition to storage systems is meant to bolster and streamline existing backup spaces across the University.

The research storage solution is addressing an urgent need for data space that will continue to grow over the next few years, said Deputy Provost for Science and Technology Steve Girvin, who coordinated the initiative on behalf of the Provost’s Office. The system is a temporary fix until the University can implement a storage method that will better sort and categorize data, he added.

“The issue has been that researchers on campus, regardless of what discipline they’re in, have to store greater quantities of data than ever before,” said Charles Powell, associate chief information officer for Yale ITS.

The new system likely marks a step toward an increased use of “cloud” storage in coming years that could eventually lead to Yale utilizing more storage units off campus, Powell said.

Under the new system, faculty data is stored in two pairs of six-foot data storage towers located on central campus and West Campus, Powell said. Each pair of storage towers function as a single unit, he added, and the pairs sync with each other every hour.

Data storage costs the user $360 per terabyte — which is roughly equivalent to the memory space of two MacBook Pro laptops — and faculty interested in using the system must purchase at least half a terabyte for $180, according to a letter about the program that Girvin sent to science department chairs. Though faculty must pay for their own storage space, the per terabyte costs have already been partially subsidized by the University, Powell said.

Since the system launched, about 20 groups of researchers have signed up for storage space, Powell said, adding that he hopes to fill the unit halfway within six months.

The system also allows non-Yale members who are collaborating on research with faculty to sign up for storage accounts, Girvin said. Since the National Science Foundation — which funds much of Yale’s science research — requires that grant proposals list how a project’s data will be stored and shared, Girvin said this aspect of the new storage system should help the University submit stronger proposals.

Researchers in Yale’s Astronomy Department and the Center for Genome Analysis at West Campus said they have perceived a greater need for data storage.

“In astronomy we’re seeing a huge data swell coming our way, and we’re worried about how to handle it,” said Pieter van Dokkum, chair of the Astronomy Department, adding that he has resorted to storing some of his research data in four boxes of hard drives on his desk.

Though the Astronomy Department created its own central data storage system four years ago, van Dokkum said that system has been full for several months. The storage system also crashed in fall 2010, he said, leaving some professors unable to access their research data for about three weeks while the backups were retrieved.

Daisuke Nagai, a physics and astronomy professor conducting galaxy simulations that take up large amounts of data space, is testing the new storage system for his department at no cost. While Nagai has simulated the formation of isolated galaxies through a supercomputer at West Campus, the computer still does not have the memory space to complete and store larger simulations that involve several galaxies and account for black holes.

“It’s going to provide a huge amount of space that we really need,” Nagai said. “Our research is significantly limited by the [current] storage.”

Nagai said he tested the new system and found that it could retrieve archived projects within hours, instead of the days required by the department’s current storage methods. The new system is slow, Nagai said, and the cost of the program is fairly steep compared to the prices of retail hard drives. But he added that a benefit of the system is its ability to sync between two locations.

Shrikant Mane, director of Yale’s Center for Genome Analysis that opened on West Campus in January 2010, said researchers at the center would likely choose to migrate their data to the new system over the next few months. He added that several researchers have already organized their own hard drive backup systems as available storage at the center has dwindled, even though the center’s risk of running out of data space is not immediate.

“We cannot keep building more and more service capacity,” Mane said. “There is a limit on how long we can store this data.”

The new storage system can hold a total of one petabyte — or about 1,000 terabytes — of data.

Comments