Although HTML 4.0 has introduced a number of features which allows the creation of highly structured tables, it is too rigid and unstructured. The tree representation devised here is meant to provide a more flexible encoding of the structural and semantic information encoded in the table. The tree representation will be explicitly created and kept associated with the document. It will be encoded as RDF meta-data ( Brickley et al 1998).
Two Dimensional and Multi-dimensional Tables: The first issue in the design of the structural representation of the table is the identification of the descriptive information. We can distinguish different types of descriptive information. Global and Local information provide textual description of the table and of each cell (e.g., using summary and title attributes). Indexing information denotes rows/columns of the table as headers (using the THEAD and TH HTML elements), thus providing a description of the dimensions of the table. Generally, it is possible to define any cell as header and place it anywhere inside the table. The effect of the header element can be controlled using the scope and headers attributes. Most tables (e.g., HTML 3.2 compliant) are regular, i.e., they explicitly identify header rows/columns, and they do not use irregular HTML constructions (e.g., scope, headers). The explicit presence of the header information allows one to easily construct the indexing component of the tree structure (as in Figure 1). In absence of additional information, headers will provide indexing/navigation into the table. The position of the header rows/columns may also have a semantic meaning and will be used to partition the table in sub-tables. In absence of additional semantic information on the content of the table, a generic search strategy will be provided to the user, relying on using the header information as indices into the table -- i.e., select dimensions and cells by scanning the header rows/columns.
HTML 4.0 allows the creation of irregular/multi-dimensional tables characterized by the presence of indexing/header information irregularly located