Magazine article Information Today

TextWare: Fast Indexing and Searching

Magazine article Information Today

TextWare: Fast Indexing and Searching

Article excerpt

Software Review

TextWare: Fast Indexing and Searching

TextWare is designed to make the indexing and retrieval of information from large collections of existing, possibly non-uniform, documents as simple and rapid as possible.

TextWare provides satisfyingly rapid and flexible access to collections of text already on magnetic media or of scannable quality. As such, it would be a most useful way to organize and access a large collection of library bibliographies/guides. In a very busy reference/telephone service area, TextWare could be used not only for online versions of library documentation, but also to create an ongoing database of quick reference information, in effect, "making notes" to keep on file.

Fast Indexing Algorithm

Databases make fast retrieval possible because they are indexed. Generally, the number of indexed fields allowed must be limited to prevent the index from taking up a disproportionate amount of memory. To enter information into a database, that information must be converted and read or keyed into formats required by the database structure.

TextWare uses an algorithm that greatly reduces the size of the index, and its creation time. With the TextWare algorithm, the size of the index virtually stops growing as the amount of text continues to increase. For example, one megabyte of data may result in a TextWare index of 200 to 400 kilobytes, while five megabytes of data will result in only 500 to 750 megabytes of index entries. It is thus possible and practical to use TextWare to index every unique word in very large documents.

Text Organization

The terms Card (field), Document (record), and CardFile (database) are used by TextWare to describe its organization of data. The size of TextWare records (called Cards) is determined by the user, and may be set to a page, a paragraph, or any user-defined amount of text. The user sets Card size according to the amount of information that will be most useful during text retrieval and display.

There can be only one Card size defined for each database (a database is called a CardFile). A CardFile is a set of Cards. Records (called Documents) provide an optional mid-point in the hierarchy: sets of Cards representing individual text files may be designated as separate Documents, or Cards from several text files may all be given a Document name. Having the CardFile subdivided into Documents enhances text retrieval, since a search can be limited to specific Documents. The Document name can also be part of the hit-list display for a search, and is very useful for immediately identifying what file the hit is from and thus how useful it is likely to be.

Format Variety

TextWare can index and retrieve from documents in a variety of formats. Certain external document formats are automatically converted to TextWare's internal format. These automatically-converted external formats include Microsoft Word 5.0, WordPerfect 4.2, WordStar 5.0 and 5.5. Other external formats, such as PC Write, Volkswriter 3 and 4, WordPerfect 5.0 and 5.1, and ASCII files, can either be converted to TextWare's internal format, or left in their original formats.

Since TextWare accesses files by pathname, documents can be located on any drive, including CD-ROM or optical disk drives.

Text Retrieved from Files

The TextWare main menu screen provides three choices: Text-Retrieval, Text-Indexing, and CardFile-Utilities. A help line prompts the user to use the arrow and Enter keys to select and activate the desired function.

Choosing Text Retrieval brings up a list of CardFiles in the current directory, as well as options to change directory or change drive, in order to access any other CardFiles.

When a CardFile is chosen, a query window appears, with the number of unique words and total number of Cards in the file noted at the top.

Powerful Search Capabilities

TextWare is capable of extremely powerful and flexible searching. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed

Oops!

An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.