A Design of Information Extraction System

Main Article Content

J. Pavithra
R.Monisa, G.Ramya,


It deals with the problem of extracting specific information from a collection of documents. Information extraction has become an essential task due to the vast growth of online information. It is defined as the process of particularly structuring and combining data that are stated in one or more documents and database. It describes about the basic system architecture which is involved in designing an information extraction system. It involves the basic steps of web information extraction such as organizing web page, generating rule and the result to be displayed. XML technology can be considered as a suitable approach for information extraction because; it reduces the difficulties in extracting information from huge amount of data. Various techniques and concepts are available in XML that can be used to extract information from a document and web page. For extraction purpose we need to design an XML template to capture the information needed; the extracted information has to be placed in the template. This paper focuses on extracting information from semi-structured data.

Key Terms: Information extraction, XML, DOM, XSLT, XPath.


Article Details