Publishing Research Data on a Professional Basis
Green, Toby, Bulletin of the World Health Organization
As Pisani & AbouZahr have identified, there are many obstacles to the publishing of data: social (incentives for researchers to make the effort to publish), financial (having adequate financing to cover short-term publishing and long-term curation costs), and technical (standards and systems). (1) This paper looks at some of the technical challenges of publishing data professionally and describes the discoverability and citability benefits that follow.
Let's take it as read that publishing research data is a "good thing" that researchers are as willing to publish data as they are research papers and funding is in place to make them available online in the long run. Job done ? Well, no, not by a long chalk.
Just as loading a journal article onto a web site somewhere isn't the same as publishing it properly, so the same is true for data. To be as discoverable and as citable as research articles, data sets need to be published using an infrastructure that is compatible with research articles. It is not enough that data sets hang like dongles off a research article; they need to be discoverable and citable in their own right--just like a journal article. This means the metadata must be compatible with existing bibliographic management and citation systems like RefWorks[R] and CrossRef[R]. Users will expect search engines, abstracting and indexing services and library catalogues to reference data sets, so, for example, librarians will need MARC (MAchine-Readable Cataloging) records.
Is this overkill? Well, the Organisation for Economic Cooperation and Development (OECD) doesn't think so. OECD publishes more than 390 data sets as stand-alone objects, as well as thousands of data sets as supplemental data to its books and journal articles. Sub-sets of the data sets are also posted on the web as stand-alone objects too. So it is no surprise that, in the absence of good discovery metadata and systems, the number one complaint from users is the challenge of finding a relevant data set. They know the data is there, but they can't find it--even with Google's help.
To solve this problem, OECD's Publishing Division has spent the past three years grappling with the challenge of how to publish these many thousands of data objects so that users can not only find the data they need, but can then cite and manage the data sets using the same tools that they already use to manage journal articles or book chapters. …