Academic journal article Communications of the IIMA

The Syllabus Based Web Content Extractor (SBWCE)

Academic journal article Communications of the IIMA

The Syllabus Based Web Content Extractor (SBWCE)

Article excerpt


Syllabus Based Web Content Extractor (SBWCE) introduces a new technique of Syllabus Based Web Content Mining. It makes the Syllabus Based Web Content Extraction easy and creates an instant online book view based on the links relevant to the given Syllabus. Three important contributions are made by the current work. First, as multiple format educational information is needed for Syllabus based content; the technique used makes the finding of such content easier. Second, a new approach for capturing and recording the heuristics involved during searching by experts is used. Third, the grouping of Syllabus Words for precise extraction is exploited. This paper introduces SBWCE and presents the related details.


According to Web-Based Education Commission -U.S. (2000), "The Internet is perhaps the most transformative technology in history, reshaping business, media, entertainment, and society in astonishing ways. But for all its power, it is just now being tapped to transform education". Still, Internet provides a great platform for e-learning. It includes Educational software, Programming languages, Educational Content Websites, School Websites, Virtual courses, Learning Management Systems, A-synchronous Learning Networks and Collaborative Learning Environments. According to ALN Report (Allen & Seaman, 2007), based on responses from over 2,500 colleges and universities, "Nearly 3.5 million students were taking at least one online course during the fall 2006 term and nearly twenty percent of all U.S. higher education students were taking at least one online course." It points to the strong need of the tailor made services and content designed around the needs of the individual and that, which is available at a time and place and in a form, which suits the learner's needs. The searching of educational information, content and material requires the development of better web content finding tools and techniques. Most of the times the required content is available on the Web but finding it is difficult. The Search Engines help to extract the information bundles from the vast ocean of the Web. However, finding of the correct collection still remains unsolved. Moreover, most of the time the search engine is not designed for the purpose that matches the user's search perspective. This also happens whenever the user searches for Syllabus Based Content. The need in such situation is to look towards content finding from a focused point of view. Web Mining is the application of Data Mining techniques to discover patterns from the Web (Etzioni, 1996). It can be effectively used for learning the on-line learner behaviour and for mining the content from the Web as per the demands of the learner. So, based on this motivation SBWCE is developed.



The development of SBWCE was based on the study of different important issues including the following--

Web Content Mining

Researchers are exploring ways to build systems that automatically gather and manipulate Web based information on user's behalf. But as the relevant content is embedded in HTML pages, extracting their content is difficult. A wrapper is a procedure for extracting a particular resource's content. Kushmerick, N., Weld, D., and Doorenbos, R. (1997), used Wrapper induction for information extraction and introduced wrapper induction, a technique for automatically constructing wrappers. Another related study by Crescenzi, V., Mecca, G. and Merialdo, P. (2001) describes a project RoadRunner to investigate techniques for extracting data from HTML sites through the use of automatically generated wrappers. Buttler, D., Liu, L., and Pu, C. (2001) presents a fully automated extraction system for the World Wide Web, called Omini. Omini parses web pages into tree structures and performs object extraction. Another important study by Chang, C-H. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed


An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.