Letter Detection : An Empirical Comparative Study of Different ML Classifier and Feature Extraction

Aji Prasetya Wibawa, Nastiti Susetyo Fanany Putri, Prasetya Widiharso

Abstract


Work and communication activities are inextricably linked. Letters are an example of a communication medium that is still widely utilized. When it comes to significant job, however, simply an official letter is required. Official and private letters must be distinguished and classified. Different feature extraction methods, such as the count-vectorizer and TF-IDF vectorizer, are employed to transmit the detection of this official and personal letter. To categorize letters by type, various machine learning (ML) techniques are employed. Nave Bayes, Support vector machine, and AdaBoost are the algorithms. The accuracy measurements used in this study include accuracy scores, F1-mean, recall, and precision. The best working algorithm is Naïve Bayes for two vectorizer methods used, with an accuracy value of 98%.

Keywords


Official and private latter classification; Machine learning classifier; Accuracy Measure

Full Text:

PDF

References


Nushashikin, S. Ramadhan, and Nurizzati, “Error Analysis in Indonesian Language at The Letter of the Education And Culture of Bukittinggi City,” Proc. 4th Int. Conf. Lang. Lit. Educ. (ICLLE-4 2021), vol. 604, pp. 210–212, 2021.

A. Nurachmana, “Penerapan Model Brainstorming pada Materi Menulis Surat Resmi dalam Mata Kuliah Menulis Mahasiswa PBSI FKIP UPR Semester Genap 2018/2019,” J. Pendidik., vol. 21, no. 1, pp. 29–35, 2019.

G. W. Saputra, “Mendampingi Siswa untuk Mengenal dan Memahami Surat Pribadi dan Surat Dinas pada Kelas VII MTs NU Umbul Sari,” Jurnal Pengabdian Masyarakat (ABDIRA), vol. 2, pp. 69–72, 2022.

I. Kemal, “Penerapan Pendekatan Konstruktivisme dalam Meningkatkan Keterampilan Menulis Surat Pribadi pada Siswa Kelas IV SD Negeri 11 Tanah Jambo Aye Kabupaten Aceh Utara,” J. Tunas Bangsa, vol. 2, no. 2, pp. 41–66, 2015.

I. Darussalam, C. Najimudin, and D. Firmansyah, “Pengaruh Metode Pembelajaran Grup Investigasi Dalam Menelaah Unsur-Unsur Dan Ciri Bahasa Serta Menulis Surat Pribadi,” Parol. Jurnal Pendidik. Bhs. dan Sastra Indones., vol. 2, no. 1, pp. 67–72, 2019.

S. Chowdhury and M. P. Schoen, “Research Paper Classification using Supervised Machine Learning Techniques,” 2020 Intermt. Eng. Technol. Comput. IETC 2020, pp. 1-6, 2020.

S. Vijayaraghavan et al., “Fake News Detection with Different Models,” arXiv preprint arXiv:2003.04978, 2020.

A. Amirullah, I. Aulia, and D. Arisandy, “Implementing Cosine Similarity Algorithm to Increase the Flexibility of Hematology Text Report Generation,” 2020 Int. Conf. Data Sci. Artif. Intell. Bus. Anal. DATABIA 2020 - Proc., pp. 76–82, 2020.

W.-C. Chang, F. X. Yu, Y.-W. Chang, Y. Yang, and S. Kumar, “Pre-training Tasks for Embedding-based Large-scale Retrieval,” arXiv preprint arXiv:2002.03932, pp. 1–12, 2020.

S. W. Kim and J. M. Gil, “Research paper classification systems based on TF-IDF and LDA schemes,” Human-centric Comput. Inf. Sci., vol. 9, no. 1, 2019.

N. S. Mohd Nafis and S. Awang, “An Enhanced Hybrid Feature Selection Technique Using Term Frequency-Inverse Document Frequency and Support Vector Machine-Recursive Feature Elimination for Sentiment Classification,” IEEE Access, vol. 9, pp. 52177–52192, 2021.

K. N. Singh, S. D. Devi, H. M. Devi, and A. K. Mahanta, “A novel approach for dimension reduction using word embedding: An enhanced text classification approach,” Int. J. Inf. Manag. Data Insights, vol. 2, no. 1, p. 100061, 2022.

A. Onan, “Sentiment analysis on product reviews based on weighted word embeddings and deep neural networks,” Concurr. Comput. Pract. Exp., vol. 33, no. 23, pp. 1–12, 2021.

Z. Tang, W. Li, and Y. Li, “An improved term weighting scheme for text classification,” Concurr. Comput. Pract. Exp., vol. 32, no. 9, pp. 1–19, 2020.

T. Dogan and A. K. Uysal, “A novel term weighting scheme for text classification: TF-MONO,” J. Informetr., vol. 14, no. 4, p. 101076, 2020.

Z. Jiang, B. Gao, Y. He, Y. Han, P. Doyle, and Q. Zhu, “Text Classification Using Novel Term Weighting Scheme-Based Improved TF-IDF for Internet Media Reports,” Math. Probl. Eng., pp. 1-30, 2021.

G. Sidorov, “Vector space model for texts and the tf-idf measure,” SpringerBriefs Comput. Sci., pp. 11–15, 2019.

K. Adnan and R. Akbar, “Limitations of information extraction methods and techniques for heterogeneous unstructured big data,” Int. J. Eng. Bus. Manag., vol. 11, pp. 1–23, 2019.

N. Mohapatra, N. Sarraf, and S. sarit Sahu, “Ensemble Model for Chunking,” CS & IT Conference Proceedings, pp. 113–119, 2021.

C. Toraman, E. H. Yilmaz, F. Şahinuç, and O. Ozcelik, “Impact of Tokenization on Language Models: An Analysis for Turkish,” arXiv preprint arXiv:2204.08832, vol. 1, no. 1, 2022.

A. N. Ulfah and M. K. Anam, “Analisis Sentimen Hate Speech Pada Portal Berita Online Menggunakan Support Vector Machine (SVM),” JATISI (Jurnal Tek. Inform. dan Sist. Informasi), vol. 7, no. 1, pp. 1–10, 2020.

S. Sakthi Vel, “Pre-Processing techniques of Text Mining using Computational Linguistics and Python Libraries,” Proc. - Int. Conf. Artif. Intell. Smart Syst. ICAIS 2021, pp. 879–884, 2021.

H. T. Sueno, “Multi-class Document Classification using Support Vector Machine (SVM) Based on Improved Naïve Bayes Vectorization Technique,” Int. J. Adv. Trends Comput. Sci. Eng., vol. 9, no. 3, pp. 3937–3944, 2020.

W. T. Meshach, S. Hemajothi, and E. A. M. Anita, “Real-time facial expression recognition for affect identification using multi-dimensional SVM,” J. Ambient Intell. Humaniz. Comput., vol. 12, no. 6, pp. 6355–6365, 2021.

Y. Gorishniy, I. Rubachev, V. Khrulkov, and A. Babenko, “Revisiting Deep Learning Models for Tabular Data,” Advances in Neural Information Processing Systems, vol. 34, pp. 18932-18943, 2021.

J. A. Jupin, T. Sutikno, M. A. Ismail, M. S. Mohamad, S. Kasim, and D. Stiawan, “Review of the machine learning methods in the classification of phishing attack,” Bull. Electr. Eng. Informatics, vol. 8, no. 4, pp. 1545–1555, 2019.

A. Taherkhani, G. Cosma, and T. M. McGinnity, “AdaBoost-CNN: An adaptive boosting algorithm for convolutional neural networks to classify multi-class imbalanced datasets using transfer learning,” Neurocomputing, vol. 404, pp. 351–366, 2020.

S. Krimberg, N. Vanetik, and M. Litvak, “Summarization of financial documents with TF-IDF weighting of multi-word terms,” Proc. 3rd Financ. Narrat. Process. Work. FNP 2021, pp. 75–80, 2021.

M. N. Sahono et al., “Extrovert and Introvert Classification based on Myers-Briggs Type Indicator(MBTI) using Support Vector Machine (SVM),” Proc. - 2020 Int. Semin. Appl. Technol. Inf. Commun. IT Challenges Sustain. Scalability, Secur. Age Digit. Disruption, iSemantic 2020, pp. 572–577, 2020.

A. Rajmohan, A. Ravi, K. O. Aakash, K. Adarsh, A. D. Raj, and T. Anjali, “CoV2eX: A COVID-19 Website with Region-wise Sentiment Classification using the Top Trending Social Media Keywords,” 2021 Int. Conf. Wirel. Commun. Signal Process. Networking, WiSPNET 2021, pp. 113–117, 2021.




DOI: https://doi.org/10.31763/simple.v5i1.45

Refbacks

  • There are currently no refbacks.


Copyright (c) 2023 Aji Prasetya Wibawa

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.


Signal and Image Processing Letters

ISSN Online: 2714-6677 | Print: 2714-6669
Published by Association for Scientific Computing Electrical and Engineering (ASCEE)
Website : https://simple.ascee.org/index.php/simple/
Email 1 : simple@ascee.org
Email 2 : azhari@ascee.org


 

View My Stats