As digitization of library holdings becomes increasingly in vogue, advancements have been made at Yale in analytical research tools for these digital archives.

This fall, Librarian for Digital Humanities Research Peter Leonard and Public Services Librarian Lindsay King released results from an ongoing project that aims to employ data mining tools — algorithms that extract information from digital sources — in the analysis of Vogue magazine’s sprawling digital archives. The project is the first large-scale digital humanities project in periodicals at Yale and will precipitate other “experiments” to explore the potential of technology to answer research questions in the humanities.

“We think of libraries as buying physical books, and Yale libraries will never stop,” Leonard said. “But we also want to develop ways of making sense of large cultural collections.”

Computer science professor Holly Rushmeier said the projects involving data mining tools expand, rather than replace, traditional methods for studying the humanities. In the past, large collections could not be completely analyzed by individuals.

The Vogue project occurred because ProQuest, a library resource to which Yale has a perpetual access license, unveiled the entire archive of Vogue — spanning 122 years — for online access in 2011, King said.

Leonard said the research in Vogue is an example with which to explore the data mining technology. King added that Vogue was specifically chosen because of its relevance in popular culture.

“Everyone relates to it,” King said. “It’s a way to talk about the technology without academic discussions about the significance of the material.”

With assistance from students and professors within the computer science department, King and Leonard customized free, open-source data mining tools — including Bookworm and topic modeling — to fit the online interface.

Bookworm, a word search tool, displays trends in word usage over time and sources that might have led to changes in these trends, Leonard and King said. In topic modeling, computers “read” the text within the archive to find recurring trends in words that tend to appear together.

“We think that only humans can read a novel — these computers don’t understand English or women’s fashion,” Leonard said. “But there were uncanny results with robots reading and making decisions about broad themes.”

King said the data mining tools found a shift beginning in the 1970s that emphasized women’s health in the magazines. King said these results corroborate the work of Grace Mirabella, Vogue’s editor-in-chief at the time, to make the magazine less materialistic.

Beyond insights into fashion and culture, the project is just one example of how to best implement digital technology to answer humanities-based research questions, Leonard said.

Assistant professor in Slavic Languages and Literatures Marijeta Bozovic said she focuses on using digitization and search tools to analyze the Beinecke Rare Book and Manuscript Library’s collection of the works of Joseph Brodsky, which is too vast to study in its entirety by hand.

The process of digitizing library holdings, Leonard said, also speaks to a greater trend in the digital humanities.

“The library is exploring what types of infrastructure to use to get better [tools] for more intelligent queries and for finding patterns latent in the data,” he said.

He said one of the main challenges in building these tools is working with items that are still under copyright — essentially all works published after 1923.

A joint project from 2008 to 2009 with Yale University Library and Microsoft digitized 100,000 books from the library’s holdings.