Academic journal article
By Pealer, Lisa N.; Dorman, Steve M.
Journal of School Health , Vol. 68, No. 8
Because of the growing amount of documents contained on the Internet, search engines and Web directories can be useful tools for locating specific information. A search engine is a tool designed to search the Internet for keywords or phrases designated as search terms by users. Web browsers link to search engines by clicking the Search button on the menu bar. A list of search engines will appear and the user may select the particular engine preferred.
While a search engine does not search the Internet itself, it does search a database of information. These databases are assembled when search engines send out applications called spiders (also called robots, bots, or crawlers) to look for new, changed, or defunct sites. Spiders are software devices programmed to automatically and continually gather information from all over the Internet. AltaVista and HotBot, for example, search more than 10 million pages per day. Both Infoseek and Excite claim to have more than 50 million total pages indexed. When sites are found, the spiders send back the information to the site index where it changes or updates the search engine database. Each engine searches a different database, which accounts for the diverse results garnered by each search engine even when exact terms have been entered.
The degree of detail used in searching varies. Most search engines index sites either by keyword indexing or by concept-based indexing. Keyword indexing searches for significant words either in the title, HTML tags, or throughout the entire documents of a site. Concept-based indexing uses complex algorithms to determine concepts within articles. Concept-based indexing allows the engine to distinguish between different uses of the same word. Using concept-based indexing, a user entering the word `heart,' for example, would be able to distinguish between the organ of the body and the Valentine symbol
Databases produced by search engines are very large, and search results may vary in relevancy. Most search engines will return a ranked list of hits to the user once the engine has completed its search. Hits are ranked according to the relevance assigned to the site by the search engine. While each search engine uses a different technique to determine relevance, generally, relevancy is determined by the frequency and positioning of keywords in a site.
A broad search term may result in too many hits to be of value. So, if the first few ranked sites are not very helpful in a search, it is probably advisable to re-do the search using a more specific keyword, rather than sort through a list of irrelevant sites. Developing search skills can help the user make more efficient use of search devices.
Most search engines use a form of Boolean logic (named after the English 1850s mathematician George Boole). The Boolean operators AND, OR, and NOT are used in conjunction with keywords to include or exclude terms in a search. Engines with advanced searches or advanced levels allow the user to refine and restrict searches in such a way to increase relevancy of hits. Some engines allow users to restrict the search by media type (ie, sound, video, graphic files), language, date range, or location. Most engines have the capability of searching for a phrase, thereby greatly enhancing the possibility of relevant hits being returned. In addition, some engines allow users to truncate a word, using an asterisk or other symbol that tells the engine to fill in the space with any letter. This practice allows users to find hits with variations of spelling on the same word.
While search engines index all sites, a Web directory allows users to search through predetermined categories until a site of interest is found. Web directories are assembled by people and often contain reviews or recommendations to assist users through the content of the site. Directories depend on descriptors provided by the directory creators; consequently, users may find the results limited and not specific. …