Design And Development Of Hybrid Approach For Constructing Gene/Protein Names Dictionary
Abstract
Gene/Protein names identification in biomedical texts is an important challenge in bioinformatics. Several approaches have been proposed to tackle this problem. Machine learning and statistical techniques proved to be useful. Other methods focus on linguistic techniques, or are based on the usage of dictionaries extracted from databases, ontologies, and other data sources. Some methods rely on the combination of dictionaries and linguistic/machine learning techniques. This paper focuses on the development of hybrid method that combines rule based and n-gram statistical technique to identify and extract gene and protein names and construct dictionary for it.
Keywords: Information Extraction, Gene name, Protein name, Regular Expression, Medline abstracts, Dictionary.
Full Text:
PDFDOI: https://doi.org/10.26483/ijarcs.v3i7.1412
Refbacks
- There are currently no refbacks.
Copyright (c) 2016 International Journal of Advanced Research in Computer Science

