The Latent Dirichlet Allocation Model: Foundations, Trends, and Challenges

Authors

DOI:

https://doi.org/10.22201/fesa.29928273e.2026.12.103

Keywords:

Latent Dirichlet Allocation, machine learning, thematic modeling, Web of Knowledge, bibliometric analysis.

Abstract

This article presents research progress on the theoretical framework and state of the art for a master’s level engineering thesis that proposes an open cloud-based tool which enables users without advanced technical or programming knowledge to conduct topic modeling projects based on the Latent Dirichlet Allocation algorithm. To this end, the theoretical foundations are described, and a bibliometric analysis is performed using information from 2,113 documents published between 2002 and 2024, retrieved from the Web of Knowledge platform, in order to identify trends, applications, and challenges. A growing use of this model is highlitened, because its accuracy and reliability. Furthermore, its implementation in social science and humanities is identified as a relevant area of opportunity to promote its use in interdisciplinary contexts.

References

Aria, M., & Cuccurullo, C. (2017). bibliometrix: An R-tool for comprehensive science mapping analysis. Journal of Informetrics, 11(4), 959-975. https://doi.org/10.1016/J.JOI.2017.08.007

Asmussen, C. B., & Møller, C. (2019). Smart literature review: a practical topic modelling approach to exploratory literature review. Journal of Big Data, 6(1). https://doi.org/10.1186/s40537-019-0255-7

Barde, B. V., & Bainwad, A. M. (2017, 15-16 June). An overview of topic modeling methods and tools. 2017 International Conference on Intelligent Computing and Control Systems, ICICCS, Madurai, India, 745-750. https://doi.org/10.1109/ICCONS.2017.8250563

Biblioteca Digital UNAM. (2025, 20 de diciembre). Journal Citation Reports: Science Edition. Universidad Nacional Autónoma de México. https://www.bidi.unam.mx/index.php/colecciones-digitales/bases-de-datos/ver-todos-los-recursos/555-journal-citation-reports-science-edition-full

Blei, D. M., Ng, A. Y., & Edu, J. B. (2003). Latent Dirichlet Allocation Michael I. Jordan. In Journal of Machine Learning Research (Vol. 3).

Clarivate. (2025a). Detalle del artículo. https://support.clarivate.com/ScientificandAcademicResearch/s/article/Journal-Citation-Reports-Quartile-rankings-and-other-metrics?language=en_US

Clarivate. (2025b). Web of Science | Clarivate. https://clarivate.com/academia-government/scientific-and-academic-research/research-discovery-and-referencing/web-of-science/

Clarivate. (2025c). Web of Science: h-index information. https://support.clarivate.com/ScientificandAcademicResearch/s/article/Web-of-Science-h-index-information?language=en_US

Clarivate Analytics. (2025). Colección Principal de Web of Science. https://webofscience.help.clarivate.com/es-es/Content/wos-core-collection/wos-core-collection.htm

D’Amato, D., Droste, N., Allen, B., Kettunen, M., Lähtinen, K., Korhonen, J., Leskinen, P., Matthies, B. D., & Toppinen, A. (2017). Green, circular, bio economy: A comparative analysis of sustainability avenues. Journal of Cleaner Production, 168, 716-734. https://doi.org/10.1016/j.jclepro.2017.09.053

Dyer, T., Lang, M., & Stice-Lawrence, L. (2017). The evolution of 10-K textual disclosure: Evidence from Latent Dirichlet Allocation. Journal of Accounting and Economics, 64(2-3), 221-245. https://doi.org/10.1016/j.jacceco.2017.07.002

Guo, Y., Barnes, S. J., & Jia, Q. (2017). Mining meaning from online ratings and reviews: Tourist satisfaction analysis using latent Dirichlet allocation. Tourism Management, 59, 467-483. https://doi.org/10.1016/j.tourman.2016.09.009

Hirsch, J. E. (2005). An index to quantify an individual’s scientific research output. Proceedings of the National Academy of Sciences of the United States of America, 102(46), 16569-16572. https://doi.org/10.1073/PNAS.0507655102

Jelodar, H., Wang, Y., Yuan, C., Feng, X., Jiang, X., Li, Y., & Zhao, L. (2019). Latent Dirichlet Allocation (LDA) and topic modeling: models, applications, a survey. Multimedia Tools and Applications, 78(11), 15169-15211. https://doi.org/10.1007/s11042-018-6894-4

Lansley, G., & Longley, P. A. (2016). The geography of Twitter topics in London. Computers, Environment and Urban Systems, 58, 85-96. https://doi.org/10.1016/j.compenvurbsys.2016.04.002

Li, H. (2024). Latent Dirichlet Allocation. Machine Learning Methods, 439-471. https://doi.org/10.1007/978-981-99-3917-6_20

Moro, S., Cortez, P., & Rita, P. (2015). Business intelligence in banking: A literature analysis from 2002 to 2013 using text mining and latent Dirichlet allocation. Expert Systems with Applications, 42(3), 1314-1324. https://doi.org/10.1016/j.eswa.2014.09.024

Parra, M. (2021). Estudio bibliométrico en la base de datos Web of Science tendencias de producción bibliográfica en modalidades de educación virtual y educación a distancia durante el año 2019 [Tesis de maestría inédita]. Universidad La Salle, Bogotá. https://ciencia.lasalle.edu.co/maest_docencia

Qamar, U., & Raza, M. S. (2024). Introduction to Text Mining. Applied Text Mining, 3–24. https://doi.org/10.1007/978-3-031-51917-8_1

Serles, U., & Fensel, D. (2024). Natural Language Processing. An Introduction to Knowledge Graphs, 55-61. https://doi.org/10.1007/978-3-031-45256-7_6

Taecharungroj, V., & Mathayomchan, B. (2019). Analysing TripAdvisor reviews of tourist attractions in Phuket, Thailand. Tourism Management, 75, 550-568. https://doi.org/10.1016/j.tourman.2019.06.020

Xue, J., Chen, J., Chen, C., Zheng, C., Li, S., & Zhu, T. (2020). Public discourse and sentiment during the COVID 19 pandemic: Using Latent Dirichlet Allocation for topic modeling on Twitter. PLoS ONE, 15(9). https://doi.org/10.1371/journal.pone.0239441

Zong, C., Xia, R., & Zhang, J. (2021). Text Data Mining. Text Data Mining, 1-351. https://doi.org/10.1007/978-981-16-0100-2/COVER

Published

2025-10-01

How to Cite

Franco Salido, G. ., & Macedo Chagolla, F. (2025). The Latent Dirichlet Allocation Model: Foundations, Trends, and Challenges. Revista Digital De Posgrado, (12), 27–53. https://doi.org/10.22201/fesa.29928273e.2026.12.103

Issue

Section

Articles