Academic journal article Information Technology and Libraries

High-Performance Annotation Tagging over Solr Full-Text Indexes

Academic journal article Information Technology and Libraries

High-Performance Annotation Tagging over Solr Full-Text Indexes

Article excerpt

INTRODUCTION

Tags are generally conceived as nonhierarchical terms (or keywords) assigned to an information object (e.g., a digital image, a document, a metadata record) in order to enrich its description beyond the one provided by object properties. The enrichment is intended to improve the way end users (or machines) can search, browse, evaluate, and select the objects they are looking for. Examples are qualificative terms, i.e. terms associating the object to a class (e.g., biology, computer science, literature) or qualitative terms, i.e. terms associating the object to a given measure of value (e.g., rank in a range, opinion). (1) Approaches differ in the way tags are generated. In some cases users (or machines) (2) freely and collaboratively produce tags, (3) thereby generating so-called folksonomies. The natural heterogeneity of folksonomies calls for solutions to harmonise and make more effective their usage, such as tag clouds. (4) In other approaches users can pick tags from a given set of values (e.g., vocabulary, ontology, range) or else find hybrid solutions, where a degree of freedom is still permitted. (5, 6) A further differentiation is introduced by semantically enriched tags, which are tags contextualized by a label or prefix that provides an interpretation for the tag. (7) For example, in the digital library world, the annotation of scientific article objects with subject tags could be done according to the tag values of the tag interpretations of ACM scientific disciplines and "Dewey Decimal Classification," whose term ontologies are different. (8)

The action of tagging is commonly intended as the practice of end users or machines of assigning or removing tags to the objects of an information space. An information space is a digital space a user community populates with information objects for the purpose of enabling content sharing and providing integrated access to different but related collections of information objects. (9) The effect of tagging information objects in an information space may be private, i.e., visible to the users who tagged the objects or to a group of users sharing the same right, or public, i.e., visible to all users. (10) Many well-known websites allow end users to tag web resources. For example Delicious (11) (http://delicious.com) allows users to tag web links with free and public keywords; Stack Overflow (http://stackoverflow.com). which lets users ask and answer questions about programming, allows tagging of question threads with free and public keywords; Gmail (12) (http://mail.gmail.com) allows users to tag emails--at the same time, tags are also transparently used to encode email folders. In the digital library context, the portal Europeana (http://www.europeana.eu) allows authenticated end users to tag metadata records with free keywords to create a private set of annotations.

In this work we shall focus on annotation tagging--that is, tagging used as a manual data curation technique to classify (i.e., attach semantics to) the objects of an information space. In such a scenario, tags are defined as controlled vocabularies whose purpose is classification. (13, 14) Unlike semantic annotation scenarios, where semantic tags may be semiautomatically generated and assigned to objects, (15) in annotation tagging authorized data curators are equipped with search tools to identify the sets of objects they believe should belong or not belong to a given category (identified by a tag), and to eventually perform the tagging or untagging actions required to apply the intended classification. In general, such operations may assign or remove tags to and from an arbitrarily large subset of objects of the Information Space. It is therefore hard to predict the quality and consistency of the combined effect of a number of such actions. As a consequence, data curators must rely on virtual tagging functionalities which allow them to bulk (un)tag sets of objects in temporary work sessions, where they can in real-time preview and experiment (do/undo) the effects of their actions before making the changes visible to end users. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed

Oops!

An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.