Archaeology, E-Publication and the Semantic Web
Richards, Julian D., Antiquity
In an article in the May 2001 edition of Scientific American Tim Berners-Lee, 'inventor' of the World Wide Web, outlined his vision of how the Web would evolve (Berners-Lee et al. 2001). The problem with the present-day Web, according to Berners-Lee, is that most of the information is designed for human consumption, and even if the content of a web page were derived from a structured database, the meaning and structure is not evident to an automated search program or robot browsing the web. The drawback of this is apparent from any Google search. Type in 'barrow', for example, and despite over seven million hits (as of March 2006), it is not until the eighty-eighth entry that the first archaeological result is reached, a website about the West Kennet long barrow, by which time most users will have given up. The main reason, of course, is that words carry a multiplicity of meanings according to context. Google is unable to differentiate between archaeological monuments, the various places called Barrow (and their football teams), Isaac Barrow (seventeenth-century mathematician and fellow of the Royal Society), or wheelbarrow sellers, and will invariably produce large numbers of false hits.
For Berners-Lee, the Web was conceived as an information space in which machines should be able to participate and help (Berners-Lee 1998; 1999). Rather than just containing instructions for how they should be displayed, web documents should also include structured and machine-readable content. The Semantic Web can be summed up as 'technologies for enabling machines to make more sense of the Web, with the result of making the Web more useful for humans' (Dumbill 2000). According to Berners-Lee's vision, automated text scanning programs or agents would be able to parse web pages, breaking them down into strings of information, thus offering clinic appointments or selling insurance, without false hits, according to the personal preferences of the user.
For archaeological research, this has tremendous potential. If a search engine knew which barrows were archaeological, and which were Bronze Age rather than Neolithic, then web searching would be much more effective. The Semantic Web therefore has great potential to help locate more accurately the information in which we are really interested. It also has massive implications for how we publish. As the distinction between publication and online archives becomes blurred (Richards 2002; 2004) there is a strong argument that we should also structure both publications and archives to be machine-readable.
As more journals, such as Antiquity, become available online, then it becomes possible to search and harvest the titles and even the text of articles for occurrences of specific terms. Library catalogue records including title, author and subject keywords can already be searched online, but article content could also be indexed comprehensively. Scholars would then simultaneously be able to locate all monument database records and web sites referencing those Bronze Age round barrows within 20km of Barrow-in-Furness, and could also find all published references to them, in both primary and secondary literature. This is a separate argument from the debate about open access and e-repositories (Day 2001). The Semantic Web does not require that information is freely available. Indeed its development will undoubtedly be driven by the commercial advantages to advertisers and sellers. Nonetheless there is an area of overlap in that whilst the Semantic Web does not require open access, it would arguably be greatly facilitated by it. If all archaeologists placed their articles in an online repository they would undoubtedly be easier to find. Therefore the Semantic Web also needs to be considered alongside current debates about scholarly publication. In his editorial in the December 2005 issue of Antiquity, Martin Carver painted a rather bleak picture of a geography of knowledge in which: 'You do not need to read anything you didn't intend, you remain in control, finding what you 'need' using keywords and search engines' (Carver 2005: 757). …