Semantic Web for Reliable Citation Analysis in Scholarly Publishing

Article excerpt

Analysis of the impact of scholarly artifacts is constrained by current unreliable practices in cross-referencing, citation discovering, and citation indexing and analysis, which have not kept pace with the technological advances that are occurring in several areas like knowledge management and security. Because citation analysis has become the primary component in scholarly impact factor calculation, and considering the relevance of this metric within both the scholarly publishing value chain and (especially important) the professional curriculum evaluation of scholarly professionals, we defend that current practices need to be revised. This paper describes a reference architecture that aims to provide openness and reliability to the citation-tracking lifecycle. The solution relies on the use of digitally signed semantic metadata in the different stages of the scholarly publishing workflow in such a manner that authors, publishers, repositories, and citation-analysis systems will have access to independent reliable evidences that are resistant to forgery, impersonation, and repudiation. As far as we know, this is the first paper to combine Semantic Web technologies and public-key cryptography to achieve reliable citation analysis in scholarly publishing.

**********

In recent years, the amount of scholarly communication brought into the digital realm has exponentially increased. (1) This no-way-back process is fostering the exploitation of large-scale digitized scholarly repositories for analysis tasks, especially those related to impact factor calculation. The potential automation of the contribution-relevance

calculation of scholarly artifacts and scholarly professionals has attracted the interest of several parties within the scholarly environment, and even outside of it. For example, one can find within articles of the Spanish law related to the scholarly personnel certification the requirement that the papers appearing in the curricula of candidates should appear in the Subject Category Listing of the Journal Citation Reports of the Science Citation Index. (2) This example shows the growing relevance of these systems today.

Nevertheless, current practices in citation analysis entail serious problems, including security flaws related to the publishing process (e.g., repudiation, impersonation, and privacy of paper contents) and defects related to citation analysis, such as the following:

* Nonidentical paper instances confusion

* Author naming conflicts

* Lack of machine-readable citation metadata

* Fake citing papers

* Impossibility for authors to control their related citation data

* Impossibility for citation-analysis systems to verify the provenance and trust of citation data, both in the short and long term

Besides the fact that they do not provide any security feature, the main shortcoming of current citation-analysis systems such as ISI Citation Index, Citeseer (http:// citeseer.ist.psu.edu/), and Google Scholar is the fact that they count multiple copies or versions of the same paper as many papers. In addition, they distribute citations of a paper between a number of copies or versions, thus decreasing the visibility of the specific work. Moreover, their use of different analysis databases leads to very different results because of differences in their indexing policies and in their collected papers. (3)

To remedy all these imperfections, this paper proposes a reference architecture for reliable citation analysis based on applying semantic trust mechanisms. It is important to note that a complete or partial adoption of the ideas defended in this paper will imply the effort to introduce changes within the publishing lifecycle. We believe that these changes are justified considering the serious flaws of the established solutions, and the relevance that citation-analysis systems are acquiring in our society. …