The Latent Dirichlet Allocation Model: Foundations, Trends, and Challenges
DOI:
https://doi.org/10.22201/fesa.29928273e.2026.12.103Keywords:
Latent Dirichlet Allocation, machine learning, thematic modeling, Web of Knowledge, bibliometric analysis.Abstract
This article presents research progress on the theoretical framework and state of the art for a master’s level engineering thesis that proposes an open cloud-based tool which enables users without advanced technical or programming knowledge to conduct topic modeling projects based on the Latent Dirichlet Allocation algorithm. To this end, the theoretical foundations are described, and a bibliometric analysis is performed using information from 2,113 documents published between 2002 and 2024, retrieved from the Web of Knowledge platform, in order to identify trends, applications, and challenges. A growing use of this model is highlitened, because its accuracy and reliability. Furthermore, its implementation in social science and humanities is identified as a relevant area of opportunity to promote its use in interdisciplinary contexts.
References
Aria, M., & Cuccurullo, C. (2017). bibliometrix: An R-tool for comprehensive science mapping analysis. Journal of Informetrics, 11(4), 959-975. https://doi.org/10.1016/J.JOI.2017.08.007
Asmussen, C. B., & Møller, C. (2019). Smart literature review: a practical topic modelling approach to exploratory literature review. Journal of Big Data, 6(1). https://doi.org/10.1186/s40537-019-0255-7
Barde, B. V., & Bainwad, A. M. (2017, 15-16 June). An overview of topic modeling methods and tools. 2017 International Conference on Intelligent Computing and Control Systems, ICICCS, Madurai, India, 745-750. https://doi.org/10.1109/ICCONS.2017.8250563
Biblioteca Digital UNAM. (2025, 20 de diciembre). Journal Citation Reports: Science Edition. Universidad Nacional Autónoma de México. https://www.bidi.unam.mx/index.php/colecciones-digitales/bases-de-datos/ver-todos-los-recursos/555-journal-citation-reports-science-edition-full
Blei, D. M., Ng, A. Y., & Edu, J. B. (2003). Latent Dirichlet Allocation Michael I. Jordan. In Journal of Machine Learning Research (Vol. 3).
Clarivate. (2025a). Detalle del artículo. https://support.clarivate.com/ScientificandAcademicResearch/s/article/Journal-Citation-Reports-Quartile-rankings-and-other-metrics?language=en_US
Clarivate. (2025b). Web of Science | Clarivate. https://clarivate.com/academia-government/scientific-and-academic-research/research-discovery-and-referencing/web-of-science/
Clarivate. (2025c). Web of Science: h-index information. https://support.clarivate.com/ScientificandAcademicResearch/s/article/Web-of-Science-h-index-information?language=en_US
Clarivate Analytics. (2025). Colección Principal de Web of Science. https://webofscience.help.clarivate.com/es-es/Content/wos-core-collection/wos-core-collection.htm
D’Amato, D., Droste, N., Allen, B., Kettunen, M., Lähtinen, K., Korhonen, J., Leskinen, P., Matthies, B. D., & Toppinen, A. (2017). Green, circular, bio economy: A comparative analysis of sustainability avenues. Journal of Cleaner Production, 168, 716-734. https://doi.org/10.1016/j.jclepro.2017.09.053
Dyer, T., Lang, M., & Stice-Lawrence, L. (2017). The evolution of 10-K textual disclosure: Evidence from Latent Dirichlet Allocation. Journal of Accounting and Economics, 64(2-3), 221-245. https://doi.org/10.1016/j.jacceco.2017.07.002
Guo, Y., Barnes, S. J., & Jia, Q. (2017). Mining meaning from online ratings and reviews: Tourist satisfaction analysis using latent Dirichlet allocation. Tourism Management, 59, 467-483. https://doi.org/10.1016/j.tourman.2016.09.009
Hirsch, J. E. (2005). An index to quantify an individual’s scientific research output. Proceedings of the National Academy of Sciences of the United States of America, 102(46), 16569-16572. https://doi.org/10.1073/PNAS.0507655102
Jelodar, H., Wang, Y., Yuan, C., Feng, X., Jiang, X., Li, Y., & Zhao, L. (2019). Latent Dirichlet Allocation (LDA) and topic modeling: models, applications, a survey. Multimedia Tools and Applications, 78(11), 15169-15211. https://doi.org/10.1007/s11042-018-6894-4
Lansley, G., & Longley, P. A. (2016). The geography of Twitter topics in London. Computers, Environment and Urban Systems, 58, 85-96. https://doi.org/10.1016/j.compenvurbsys.2016.04.002
Li, H. (2024). Latent Dirichlet Allocation. Machine Learning Methods, 439-471. https://doi.org/10.1007/978-981-99-3917-6_20
Moro, S., Cortez, P., & Rita, P. (2015). Business intelligence in banking: A literature analysis from 2002 to 2013 using text mining and latent Dirichlet allocation. Expert Systems with Applications, 42(3), 1314-1324. https://doi.org/10.1016/j.eswa.2014.09.024
Parra, M. (2021). Estudio bibliométrico en la base de datos Web of Science tendencias de producción bibliográfica en modalidades de educación virtual y educación a distancia durante el año 2019 [Tesis de maestría inédita]. Universidad La Salle, Bogotá. https://ciencia.lasalle.edu.co/maest_docencia
Qamar, U., & Raza, M. S. (2024). Introduction to Text Mining. Applied Text Mining, 3–24. https://doi.org/10.1007/978-3-031-51917-8_1
Serles, U., & Fensel, D. (2024). Natural Language Processing. An Introduction to Knowledge Graphs, 55-61. https://doi.org/10.1007/978-3-031-45256-7_6
Taecharungroj, V., & Mathayomchan, B. (2019). Analysing TripAdvisor reviews of tourist attractions in Phuket, Thailand. Tourism Management, 75, 550-568. https://doi.org/10.1016/j.tourman.2019.06.020
Xue, J., Chen, J., Chen, C., Zheng, C., Li, S., & Zhu, T. (2020). Public discourse and sentiment during the COVID 19 pandemic: Using Latent Dirichlet Allocation for topic modeling on Twitter. PLoS ONE, 15(9). https://doi.org/10.1371/journal.pone.0239441
Zong, C., Xia, R., & Zhang, J. (2021). Text Data Mining. Text Data Mining, 1-351. https://doi.org/10.1007/978-981-16-0100-2/COVER
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Universidad Nacional Autónoma de México

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.