Sonia Ruiz

A team of researchers, including a professor and a postdoctoral researcher at Yale, recently published a paper with troubling findings: The vast majority of a set of studies conducted with data from a widely used public data set are not meeting required methodological standards.

According to the study, which was published in the Nov. 28 issue of JAMA, 102 of 120 studies — selected at random and intended as a representative sample of the 1,084 studies published using an open-source database in 2015–16 — did not adhere to one or more required research practices. These studies were evaluated based on seven required research practices addressing the interpretation of data, research design and the appropriateness of the data analysis used.

“Increasingly, researchers are leveraging databases that were created by others,” wrote Harlan Krumholz, a professor of cardiology at the School of Medicine and a co-author of the study, in a statement to the News. “These databases provide an immense opportunity, but often have features that require specific approaches to their analysis. We found that the vast majority of a sample of published papers using a publicly available data resource had made errors.”

The database used in the studies analyzed by Krumholz and his team — the National Inpatient Sample — is publicly accessible and widely used in health care research. According to two co-authors of the study — Rohan Khera, a cardiology research fellow at the University of Texas Southwestern, and Suveen Angraal, a postdoctoral researcher at the Center for Outcomes Research and Evaluation at Yale New Haven Hospital — the National Inpatient Sample is helpful in analyzing health care services and patient outcomes across the United States.

That said, “It is a complex database and has design characteristics that require appropriate interpretation to reach the right conclusions,” wrote Khera and Angraal in a joint statement to the News.

For example, the team found that some studies had misinterpreted structural changes in the data set, which they erroneously ascribed to actual differences in the data collected.

In response to the study, the Agency for Healthcare Research and Quality, which is responsible for collecting and disseminating the National Inpatient Sample data, has released a checklist of major methodological considerations for researchers to use while looking at data from the National Inpatient Sample, according to Khera and Angraal, who noted that the study “highlights an opportunity for improvement.”

Khera and Krumholz also published a companion piece to the paper in “Circulation: Cardiovascular Quality and Outcomes,” a journal sponsored by the American Heart Association.

According to the piece, use of the National Inpatient Sample has grown rapidly in recent years due to its accessibility and affordability. As a result, researchers and readers may not yet be familiar with its nuances and complexities.

Khera and Angraal said they had anecdotally encountered “misinterpretation of the data in published literature” and hoped, in their study, “to systematically assess the burden of such misinterpretations.”

Their findings were worrisome. Despite the fact that documentation of the required methodology accompanied the data from the National Inpatient Sample, the team found that 85 percent of the studies they assessed did not adhere to one or more of the seven required research practices, and 62 percent did not adhere to two or more.

These results have important consequences in policy decisions, Khera and Angraal said.

Because studies using national health care data play a role in “designing treatment recommendations and designing health policy interventions, such inaccuracies in published studies have major implications for the success of such interventions,” they wrote.

Even though the Agency for Healthcare Research and Quality released its checklist for research, it is important that researchers invest more time in gaining a better understanding of the nuances of the data at hand before conducting their studies, Khera and Angraal said. Analysis must come only after addressing the suitability and complexities of the data set.

Krumholz also emphasized the need for researchers to take the initiative in ensuring their methods meet required standards.

“Now is the time for all of us who use these types of data to be sure our papers are correct — and ensure that future research takes into account the importance of adhering to the methodological standards specific to the data that are being used,” he said.

Max Graham | max.m.graham@yale.edu