Topic Modelling of Disaster Based on Indonesia Tweet Using Latent Dirichlet Allocation

Aninditya Anggari Nuryono, Iswanto Iswanto, Alfian Ma'arif, Rizal Kusuma Putra, Yabes Dwi Nugroho H, Muhammad Iman Nur Hakim

Abstract


Twitter (now X) is a critical social media platform for disseminating information during crises. This study models disaster-related topics from Indonesian-language tweets using Latent Dirichlet Allocation (LDA). From a dataset of 8,718 tweets collected from official sources like BMKG and BNPB, we performed several preprocessing steps, including case folding, stop word removal, stemming, and normalization of slang and abbreviations. The optimal number of topics was determined using coherence scores, with the model achieving a peak coherence value of approximately 0.57. Keywords such as “banjir”, “kecelakaan”, “tanah longsor,” and others were used to collect data from Twitter accounts like "BMKG" (Meteorology, Climatology, and Geophysical Agency) and "BNPB" (National Disaster Management Agency). The results revealed that the most frequently discussed topics with high coherence values were “angin topan” “topan”, “virus corona”, “kecelakaan”, “tenggelam”, “badai”, “angin puting.” A word cloud was used to visualize these disaster-related topics.

Keywords


Latent Dirichlet Allocation; Twitter, Disaster; Topic Modelling; Word Cloud

Full Text:

PDF

References


D. Gunawan, R. P. Siregar, R. F. Rahmat, and A. Amalia, “Building automatic customer complaints filtering application based on Twitter in Bahasa Indonesia,” J. Phys. Conf. Ser., vol. 978, no. 1, 2018, doi: 10.1088/1742-6596/978/1/012119.

N. Made, G. Dwi, M. A. Fauzi, and L. S. Dewi, “Cyberbullying identification in twitter using support vector machine and information gain based feature selection,” Indones. J. Electr. Eng. Comput. Sci., vol. 18, no. 3, pp. 1494–1500, 2020, doi: 10.11591/ijeecs.v18.i3.pp1494-1500.

S. Anson, H. Watson, K. Wadhwa, and K. Metz, “Analysing social media data for disaster preparedness: Understanding the opportunities and barriers faced by humanitarian actors,” Int. J. Disaster Risk Reduct., vol. 21, Nov. 2016, doi: 10.1016/j.ijdrr.2016.11.014.

S. Subramani, V. Sridhar, and K. Shetty, “A Novel Approach of Neural Topic Modelling for Document Clustering,” Proc. 2018 IEEE Symp. Ser. Comput. Intell. SSCI 2018, pp. 2169–2173, 2019, doi: 10.1109/SSCI.2018.8628912.

A. F. Hidayatullah, E. C. Pembrani, W. Kurniawan, G. Akbar, and R. Pranata, “Twitter Topic Modeling on Football News,” 2018 3rd Int. Conf. Comput. Commun. Syst. ICCCS 2018, pp. 94–98, 2018, doi: 10.1109/CCOMS.2018.8463231.

K. S. Prabhudesai, B. O. Mainsah, L. M. Collins, and C. S. Throckmorton, “Augmented Latent Dirichlet Allocation (LDA) Topic Model with Gaussian Mixture Topics,” ICASSP, IEEE Int. Conf. Acoust. Speech Signal Process. - Proc., vol. 2018-April, pp. 2451–2455, 2018, doi: 10.1109/ICASSP.2018.8462003.

K. B. Vamshi, A. K. Pandey, and K. A. P. Siva, “Topic Model Based Opinion Mining and Sentiment Analysis,” 2018 Int. Conf. Comput. Commun. Informatics, ICCCI 2018, pp. 1–4, 2018, doi: 10.1109/ICCCI.2018.8441220.

L. Li, Y. Sun, and C. Wang, “Semantic Augmented Topic Model over Short Text,” Proc. 2018 5th IEEE Int. Conf. Cloud Comput. Intell. Syst. CCIS 2018, pp. 652–656, 2019, doi: 10.1109/CCIS.2018.8691313.

M. Mustakim et al., “DBSCAN algorithm: twitter text clustering of trend topic pilkada pekanbaru,” J. Phys. Conf. Ser., vol. 1363, p. 12001, 2019, doi: 10.1088/1742-6596/1363/1/012001.

M. A. Safi’ie, E. Utami, and H. A. Fatta, “Latent Dirichlet Allocation (LDA) Model and kNN Algorithm to Classify Research Project Selection,” IOP Conf. Ser. Mater. Sci. Eng., vol. 333, no. 1, 2018, doi: 10.1088/1757-899X/333/1/012110.

S. J. Blair, Y. Bi, and M. D. Mulvenna, “Sentiment Classification of Social Media Content with Features Generated Using Topic Models,” Front. Artif. Intell. Appl., vol. 284, pp. 155–166, 2016, doi: 10.3233/978-1-61499-682-8-155.

I. R. Putri and R. Kusumaningrum, “Latent Dirichlet Allocation (LDA) for Sentiment Analysis Toward Tourism Review in Indonesia,” J. Phys. Conf. Ser., vol. 801, p. 12073, Jan. 2017, doi: 10.1088/1742-6596/801/1/012073.

S. H. Banu and S. Chitrakala, “Trending Topic Analysis using novel sub topic detection model,” Proceeding IEEE - 2nd Int. Conf. Adv. Electr. Electron. Information, Commun. Bio-Informatics, IEEE - AEEICB 2016, pp. 157–161, 2016, doi: 10.1109/AEEICB.2016.7538263.

L. J. Sun and Y. Yin, “Discovering themes and trends in transportation research using topic modeling,” Transp. Res. Part C Emerg. Technol., vol. 77, pp. 49–66, 2017, doi: 10.1016/j.trc.2017.01.013.

S. Wang, M. J. Paul, and M. Dredze, “Exploring health topics in Chinese social media: An analysis of Sina Weibo,” in AAAI Workshop on the World Wide Web and Public Health Intelligence, pp. 20–23, 2014.

M. Nuser and E. Al-Horani, “Medical documents classification using topic modeling,” Indones. J. Electr. Eng. Comput. Sci., vol. 17, no. 3, pp. 1524–1530, 2019, doi: 10.11591/ijeecs.v17.i3.pp1524-1530.

L. Liu, L. Tang, W. Dong, S. Yao, and W. Zhou, “An overview of topic modeling and its current applications in bioinformatics,” Springerplus, vol. 5, no. 1, 2016, doi: 10.1186/s40064-016-3252-8.

Z. Tong and H. Zhang, “A Text Mining Research Based on LDA Topic Modelling,” in Computer Science & Information Technology, pp. 201–210, 2016, doi: 10.5121/csit.2016.60616.

A. Piepenbrink and A. S. Gaur, “Topic models as a novel approach to identify themes in content analysis,” https://doi.org/10.5465/AMBPP.2017.141, Oct. 2017, doi: 10.5465/AMBPP.2017.141.

J. F. Yeh, Y. S. Tan, and C. H. Lee, “Topic detection and tracking for conversational content by using conceptual dynamic latent Dirichlet allocation,” Neurocomputing, vol. 216, pp. 310–318, 2016, doi: 10.1016/j.neucom.2016.08.017.

E. B. Setiawan, D. H. Widyantoro, and K. Surendro, “Detecting Indonesian Spammer on Twitter,” in 2018 6th International Conference on Information and Communication Technology (ICoICT), pp. 259–263, 2018, doi: 10.1109/ICoICT.2018.8528773.

P. M. Prihatini, I. K. Suryawan, and I. N. Mandia, “Feature extraction for document text using Latent Dirichlet Allocation,” J. Phys. Conf. Ser., vol. 953, no. 1, 2018, doi: 10.1088/1742-6596/953/1/012047.

L. Zhang, “Data and Content Analysis for Social Network Using LDA Text Model,” J. Phys. Conf. Ser., vol. 1213, no. 2, 2019, doi: 10.1088/1742-6596/1213/2/022035.

M. A. Fauzi, “Random forest approach fo sentiment analysis in Indonesian language,” Indones. J. Electr. Eng. Comput. Sci., vol. 12, no. 1, pp. 46–50, 2018, doi: 10.11591/ijeecs.v12.i1.pp46-50.

F. Hidayatullah and M. Maarif, “Pre-processing Tasks in Indonesian Twitter Messages,” J. Phys. Conf. Ser., vol. 801, p. 12072, 2017, doi: 10.1088/1742-6596/801/1/012072.

S. Lohmann, F. Heimerl, F. Bopp, M. Burch, and T. Ertl, “Concentri Cloud: Word Cloud Visualization for Multiple Text Documents,” in 19th International Conference on Information Visualisation, pp. 114–120, 2015, doi: 10.1109/iV.2015.30.




DOI: https://doi.org/10.31763/simple.v7i1.132

Refbacks

  • There are currently no refbacks.


Copyright (c) 2025 Aninditya Anggari Nuryono, Iswanto Iswanto, Alfian Ma'arif, Rizal Kusuma Putra, Yabes Dwi Nugroho H, Muhammad Iman Nur Hakim

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

 


Signal and Image Processing Letters
ISSN Online: 2714-6677 | Print: 2714-6669
Published by Association for Scientific Computing Electrical and Engineering (ASCEE)
Website: https://simple.ascee.org/index.php/simple/
Email: simple@ascee.org


 

Creative Commons License
 


View My Stats