PROBABILISTIC TOPIC MODELING AND ITS VARIANTS – A SURVEY
Main Article Content
Abstract
– Topic modeling is one of the fast-growing research areas as there is a huge increase in internet users. These users are the main source of large volumes of electronic data in terms of documents, tweets, or messages and so on. Collecting, organizing, storing and retrieving the data in text format is becoming more and more typical. The topic model is one research area which focuses on classifying the textual data into groups. In this study, we are presenting a survey on the advanced algorithms that are used in topic modeling. The main purpose of this survey is to provide a brief overview of the current topic models that motivate the budding researchers to select the best suitable algorithm for their work.
Â
Downloads
Article Details
COPYRIGHT
Submission of a manuscript implies: that the work described has not been published before, that it is not under consideration for publication elsewhere; that if and when the manuscript is accepted for publication, the authors agree to automatic transfer of the copyright to the publisher.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work
- The journal allows the author(s) to retain publishing rights without restrictions.
- The journal allows the author(s) to hold the copyright without restrictions.
References
A. Daud, J. Li, L. Zhou, and F. Muhammad, “Knowledge discovery through directed probabilistic topic models: a survey,†Frontiers of Computer Science in China, vol. 4, no. 2, pp. 280–301, Jun. 2010.
David M. Blei. Introduction to Probabilistic Topic Models. Communications of the ACM, 2011
Steyvers, M. and Griffiths, T., Probabilistic Topic Models. In T. Landauer, D. S. McNamara, S. Dennis, & W. Kintsch (Eds.), A handbook of Latent Semantic Analysis. Hillsdale, NJ: Erlbaum, 2007
Jelisavcic, V., Furlan, B., Protic, J., & Milutinovic, V. M., “Topic Models and Advanced Algorithms for Profiling of Knowledge in Scientific Papersâ€, 35th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO’2012), 1030–1035.
Evangelopoulos, N., Zhang, X., and Prybutok, V. Latent semantic analysis: Five methodological recommendations. European Journal of Information Systems 21, 1 (Jan. 2012), 70–86, 2012.
S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman, “Indexing by latent semantic analysis,†Journal of the American Society for Information Science, vol. 41, pp. 391–407, 1990.
Hofmann, T., Probabilistic Latent Semantic Indexing. In Proceedings of the 22nd ACM SIGIR Conference on Research & Development on Information Retrieval, Berkeley, CA, USA, 1999.
D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent Dirichlet allocation,†Journal of Machine Learning Research, vol. 3, pp. 993–1022, Jan. 2003.
T. L. Griffiths and M. Steyvers, “Finding scientific topics,†Proceedings of the National Academy of Sciences of the United States of America, vol. 101, no. Suppl 1, pp. 5228–5235, Apr. 2004.
D. Blei, T. Gri, M. Jordan, and J. Tenenbaum, “Hierarchical topic models and the nested chinese restaurant process,†2003.
D. M. Blei and J. D. Lafferty, “Dynamic Topic models,†in Proceedings of the 23rd international conference on Machine learning, ser. ICML ’06. New York, NY, USA: ACM, 2006, pp. 113–120.
X. Wang and A. McCallum, “Topics over time: a non-Markov continuous-time model of topical trends,†in Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, ser. KDD ’06. New York, NY, USA: ACM, 2006, pp. 424–433.
David M. Blei, John D. Lafferty, “A Correlated Topic model of Scienceâ€, Annals of Applied Statistics 2001, Vol. 1, No. 1, 17 -35, 2007.
W. Li and A. McCallum, “Pachinko allocation: DAG-structured mixture models of topic correlations,†in Proceedings of the 23rd international conference on Machine learning, ser. ICML ’06. New York, NY, USA: ACM, 2006, pp. 577–58.
M. R. Zvi, C. Chemudugunta, T. Griffiths, P. Smyth, and M. Steyvers, “Learning author-topic models from text corpora,†ACM Trans. Inf. Syst., vol. 28, no. 1, pp. 1–38, Jan. 2010.