PROBABILISTIC TOPIC MODELING AND ITS VARIANTS – A SURVEY

Main Article Content

Padmaja CH V R
S Lakshmi Narayana
Divakar CH

Abstract

– Topic modeling is one of the fast-growing research areas as there is a huge increase in internet users. These users are the main source of large volumes of electronic data in terms of documents, tweets, or messages and so on. Collecting, organizing, storing and retrieving the data in text format is becoming more and more typical. The topic model is one research area which focuses on classifying the textual data into groups. In this study, we are presenting a survey on the advanced algorithms that are used in topic modeling. The main purpose of this survey is to provide a brief overview of the current topic models that motivate the budding researchers to select the best suitable algorithm for their work.

 

Downloads

Download data is not yet available.

Article Details

Section
Articles

References

A. Daud, J. Li, L. Zhou, and F. Muhammad, “Knowledge discovery through directed probabilistic topic models: a survey,†Frontiers of Computer Science in China, vol. 4, no. 2, pp. 280–301, Jun. 2010.

David M. Blei. Introduction to Probabilistic Topic Models. Communications of the ACM, 2011

Steyvers, M. and Griffiths, T., Probabilistic Topic Models. In T. Landauer, D. S. McNamara, S. Dennis, & W. Kintsch (Eds.), A handbook of Latent Semantic Analysis. Hillsdale, NJ: Erlbaum, 2007

Jelisavcic, V., Furlan, B., Protic, J., & Milutinovic, V. M., “Topic Models and Advanced Algorithms for Profiling of Knowledge in Scientific Papersâ€, 35th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO’2012), 1030–1035.

Evangelopoulos, N., Zhang, X., and Prybutok, V. Latent semantic analysis: Five methodological recommendations. European Journal of Information Systems 21, 1 (Jan. 2012), 70–86, 2012.

S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman, “Indexing by latent semantic analysis,†Journal of the American Society for Information Science, vol. 41, pp. 391–407, 1990.

Hofmann, T., Probabilistic Latent Semantic Indexing. In Proceedings of the 22nd ACM SIGIR Conference on Research & Development on Information Retrieval, Berkeley, CA, USA, 1999.

D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent Dirichlet allocation,†Journal of Machine Learning Research, vol. 3, pp. 993–1022, Jan. 2003.

T. L. Griffiths and M. Steyvers, “Finding scientific topics,†Proceedings of the National Academy of Sciences of the United States of America, vol. 101, no. Suppl 1, pp. 5228–5235, Apr. 2004.

D. Blei, T. Gri, M. Jordan, and J. Tenenbaum, “Hierarchical topic models and the nested chinese restaurant process,†2003.

D. M. Blei and J. D. Lafferty, “Dynamic Topic models,†in Proceedings of the 23rd international conference on Machine learning, ser. ICML ’06. New York, NY, USA: ACM, 2006, pp. 113–120.

X. Wang and A. McCallum, “Topics over time: a non-Markov continuous-time model of topical trends,†in Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, ser. KDD ’06. New York, NY, USA: ACM, 2006, pp. 424–433.

David M. Blei, John D. Lafferty, “A Correlated Topic model of Scienceâ€, Annals of Applied Statistics 2001, Vol. 1, No. 1, 17 -35, 2007.

W. Li and A. McCallum, “Pachinko allocation: DAG-structured mixture models of topic correlations,†in Proceedings of the 23rd international conference on Machine learning, ser. ICML ’06. New York, NY, USA: ACM, 2006, pp. 577–58.

M. R. Zvi, C. Chemudugunta, T. Griffiths, P. Smyth, and M. Steyvers, “Learning author-topic models from text corpora,†ACM Trans. Inf. Syst., vol. 28, no. 1, pp. 1–38, Jan. 2010.