Data mining refers to collecting or mining knowledge from large amounts of data. It is used in various medical applications like tumor clustering, protein structure prediction, gene selection, cancer classification based on microarray data, clustering of gene expression data, statistical model of protein-protein interaction etc. The analyzing the clustering algorithms phase consist of four clustering algorithms namely K-means, Fuzzy c–means, Hierarchical algorithm and Partitioning Around Medoids(PAM) on HIV – 1 infection effect on macrophages in vitro time course microarray data set. The clustering algorithms are validated using validation measures and based on internal validation measures such as Dunn index, Dunn index 2, Calinski-Harabasz index and Average Silhouette width, the best clustering algorithm out of 4 is to be identified and finally the proposed research work is also to find common genes present in each cluster produced by the four clustering algorithms.


Data mining, Microarray, Preprocessing, Clustering algorithm, Finding common genes cluster wise.

Full Text:



Kaufman, L. and Rousseeuw, P.J. Finding Groups in Data: an Introduction to Cluster Analysis. John Wiley and Sons, 1990.

Han, Jiawei, Kamber, Micheline (2001), Data mining: concepts and techniques, Morgan Kaufmann. p. 5. ISBN 978-1-55860-489-6.

Ambroise C and McLachlan, G (2002), “Selection bias in gene extraction on the basis of microarray gene-expression data”, Proc Natl Acad Sci U S A 99(10):6562–6.

Alberts B, Johnson A, Lewis J, Raff M, Roberts K and Walter P (2002), “Molecular Biology of the Cell. Garland Publishing, New York, fourth edition.

Pan W (2002),“A comparative review of statistical methods for discovering differentially expressed genes in replicated microarray experiments, “Bioinformatics 18(4):546–554.

Dhanalakshmi, K., and H. Hannah Inbarani. "Fuzzy soft rough K-Means clustering approach for gene expression data." arXiv preprint arXiv:1212.5359 (2012).

Rajendran, Porkodi, and Deepika Thangavel. "Clustering of Microarray Data to Identify Enriched Go Terms of Genes in Severe Asthma Dataset using Gene Enrichment Analyze." Indian Journal of Science and Technology 9.8 (2016).

Yanchi Liu1,2, Zhongmou Li2, Hui Xiong2, Xuedong Gao1, Junjie Wu3 Understanding of Internal Clustering Validation Measures 2010 IEEE International Conference on Data Mining.

Sarah M., Kim Matthew, I. Penam, Mark Moll George Giannakopoulos George N. Bennett, Lydia E. Kavraki, "An Evaluation of Different Clustering Methods and Distance Measures Used for Grouping Metabolic Pathways,” To appear in the Proc. of the Eighth Intl. Conf. on Bioinformatics and Computational Biology (BICoB 2016).

Eréndira Rendón, Itzel Abundez, Alejandra Arizmendi and Elvia M. Quiroz, “Internal versus External cluster validation Indexes, "International Journal of Computers and Communications Issue 1, Volume 5, 2011.

Satya Chaitanya Sripada., Dr. M.Sreenivasa Rao, "Comparison Of Purity And Entropy Of K-Means Clustering And Fuzzy C Means Clustering, "Indian Journal of Computer Science and Engineering (IJCSE)

Hamerly G, Elkan C. (2002), "Alternatives to the k-means algorithm that find better clusterings" (PDF). Proceedings of the eleventh international conference on Information and knowledge management (CIKM).

J. C. Bezdek (1981), "Pattern Recognition with Fuzzy Objective Function Algoritms", Plenum Press, New York Tariq Rashid: “Clustering.

Ward, Joe H. (1963), "Hierarchical Grouping to Optimize an Objective Function", Journal of the American Statistical Association. 58 (301): 236–244. doi:10.2307/2282967. JSTOR 2282967. MR 0148188.

H.S. Park, C.H. Jun, “A simple and fast algorithm for K-medoids clustering”, Expert Systems with Applications, 36, (2) (2009), 3336–3341.

Dunn 1974, Dunn J. (1974), "Well separated clusters and optimal fuzzy partitions. Journal of Cybernetics ,4, 95-104.

Calinski RB, Harabasz J A, “dendrite method for cluster analysis”, Communications in Statistics 1974, 3:127.

Rousseeuw 1987, Rousseeuw, P.J, (1987), " Silhouettes a graphical aid to the interpretation and validation of cluster analysis,” Journal of Computational and Applied Mathematics, 20, 53-65.

DOI: https://doi.org/10.26483/ijarcs.v8i5.4049


  • There are currently no refbacks.

Copyright (c) 2017 International Journal of Advanced Research in Computer Science