Identification of infant crying Using Mel-Frequency Cepstral Coefficient (MFCC) and Artificial Neural Network (ANN) methods

Ahmad Azhari, Intan Destiyanti

Abstract


The crying of infants aged 0-3 months can be classified according to their needs, as identified by Dunstan Baby Language, which consists of specific sounds denoting different needs. These sounds include "eairh" for discomfort caused by fart, "neh" indicating hunger, "heh" representing general discomfort, "owh" signaling tiredness or sleepiness, and "eh" expressing the need to burp. The baby crying sound data was obtained from the Dunstan Baby Language (DBL) database, which includes educational videos about infants and a collection of babies crying sounds. These sounds were converted into *.wav audio format and divided into 5-second segments. A total of 188 audio data segments were collected. The research employed the Artificial Neural Network (ANN) classification method and the Mel-Frequency Cepstral Coefficient (MFCC) feature extraction method. The collected data underwent feature extraction, aiming to identify distinctive characteristics using the librosa library in the Python programming language. This process allowed us to obtain specific information from the acquired sound data. The results of this study achieved an accuracy level of 90%. This research contributes to the understanding and classification of infant crying based on the Dunstan Baby Language, offering insights into their various needs. The implementation of ANN and MFCC techniques showcases the effectiveness of this approach in accurately classifying infant cries and provides a foundation for further research in the field of infant communication.

Keywords


Dunstan Baby Language; Artificial Neural Networks; Mel-Frequency Cepstral Coefficient; Feature extraction

Full Text:

PDF

References


E. Franti and M. Dascalu, “Testing the Universal Baby Language Hypothesis-Automatic Infant Speech Recognition with CNNs,” in Telecommunications and Signals Processing (TSP), 2018, pp. 424–427. doi: https://doi.org/10.1109/TSP.2018.8441412.

A. Chinello, V. Di Gangi, and E. Valenza, “Persistent primary reflexes affect motor acts: Potential implications for autism spectrum disorder,” Res Dev Disabil, vol. 83, pp. 287–295, Dec. 2018, doi: 10.1016/j.ridd.2016.07.010.

Eva Rhea Moeckel and Noori Mitha, Textbook of Pediatric Osteopathy. Elsvier Health Sciences, 2008. Accessed: Jun. 17, 2023. [Online]. Available: https://books.google.co.id/books?id=Y9Dpcqr7PZMC&dq

D. Wolke, rer nat hc, A. Bilgin, and M. Samara, “Systematic Review and Meta-Analysis: Fussing and Crying Durations and Prevalence of Colic in Infants,” J Pediatr, vol. 185, pp. 55–61, 2017, doi: https://doi.org/10.1016/j.jpeds.2017.02.020.

O. Kamil, “Frame Blocking and Windowing Speech Signal,” Journal of Information, Communication and Intellegence System (JICIS), vol. 4, no. 5, pp. 87–94, 2018, [Online]. Available: https://www.researchgate.net/publication/331635757

S. Zhang, E. Loweimi, P. Bell, and S. Renals, “Windowed Attention Mechanism for Speech Recognition,” in IEEE Signal Processing Society, 2019, pp. 7100–7104.

A. Kumar, G. Verma, C. Rao, A. Swami, and S. Segarra, “Adaptive Contention Window Design using Deep Q-learning,” in IEEE International Conference on Coustics, Speech and SIgnal Processing (ICASSP), Nov. 2020. doi: https://doi.org/10.1109/ICASSP39728.2021.9414805.

T. M. Wani, T. S. Gunawan, S. A. A. Qadri, M. Kartiwi, and E. Ambikairajah, “A Comprehensive Review of Speech Emotion Recognition Systems,” IEEE Access, vol. 9. Institute of Electrical and Electronics Engineers Inc., pp. 47795–47814, 2021. doi: 10.1109/ACCESS.2021.3068045.

I. Kavalerov, S. Wisdom, Erdogan Hakan, and Patton Brian, “Universal Sound Separation,” in IEEE Workshop on Application of Signal Processing to Audio and Acoustics, 2019, pp. 175–179. doi: https://doi.org/10.1109/WASPAA.2019.8937253.

L. Juvela et al., “Speech Waveform Synthesis from MFCC Sequences with Generative Adversarial Networks,” in IEEE Signal Processing Society, 2018, pp. 5679–5683. doi: https://doi.org/10.1109/ICASSP.2018.8461852.

“Choice of Mel Filter Bank in Computing MFCC of a Resampled Speech,” in Universiti Teknologi Malaysia, IEEE, 2010, pp. 121–124. doi: https://doi.org/10.1109/ISSPA.2010.5605491.

C. Donahue and B. L. R. Prabhavalkar, “Exploring Speech Enhancement with Generative Adversarial Networks for Robust Speech Recognition,” in IEEE Signal Processing Society, 2018, pp. 5024–5028. doi: https://doi.org/10.1109/ICASSP.2018.8462581.

P. M. Kumar and K. Srinivas, “Real Time Implementation of Speech Steganography,” in International Conference on Smart System and Inventive Technology, 2020, pp. 365–369. doi: https://doi.org/10.1109/ICSSIT46314.2019.8987785.

X. Liu, M. Sahidullah, and T. Kinnunen, “Learnable MFCCs for speaker verification,” in Proceedings - IEEE International Symposium on Circuits and Systems, Institute of Electrical and Electronics Engineers Inc., 2021. doi: 10.1109/ISCAS51556.2021.9401593.

L. Stankovic and M. Brajovic, “Analysis of the reconstruction of sparse signals in the DCT domain applied to audio signals,” IEEE/ACM Trans Audio Speech Lang Process, vol. 26, no. 7, pp. 1216–1231, Jul. 2018, doi: 10.1109/TASLP.2018.2819819.

M. E. Rahaman, S. M. Shamsul Alam, H. S. Mondal, A. S. Muntaseer, R. Mandal, and M. Raihan, “Performance Analysis of Isolated Speech Recognition Technique Using MFCC and Cross-Correlation,” in International Conferences on Computing, Communication and Networking Technologies (ICCCNT), 2019. doi: https://doi.org/10.1109/ICCCNT45670.2019.8944534.

Mahāwitthayālai Songkhlānakharin. College of Computing, C. Electrical Engineering/Electronics, IEEE Thailand Section, and Institute of Electrical and Electronics Engineers, A Light-Weight Artificial Neural Network for Speech Emotion Recognition using Average Values of MFCCs and Their Derivatives. 2020. doi: https://doi.org/10.1109/ECTI-CON49241.2020.9158221.

O. Asmae, R. Abdelhadi, C. Bouchaib, S. Sara, and K. Tajeddine, “Parkinson’s Disease Indentification using KNN and ANN Algorithm based on Voice Disorder,” in International Conference on Innovative Research in Applied Science, Engineering and Technology (IRASET), 2020. doi: https://doi.org/10.1109/IRASET48871.2020.9092228.

M. Kahani, M. H. Ahmadi, A. Tatar, and M. Sadeghzadeh, “Development of multilayer perceptron artificial neural network (MLP-ANN) and least square support vector machine (LSSVM) models to predict Nusselt number and pressure drop of TiO2/water nanofluid flows through non-straight pathways,” Numeri Heat Transf A Appl, vol. 74, no. 4, pp. 1190–1206, Aug. 2018, doi: https://doi.org/10.1080/10407782.2018.1523597.

S. Liu, A. Borovykh, L. A. Grzelak, and C. W. Oosterlee, “A neural network-based framework for financial model calibration,” J Math Ind, vol. 9, no. 1, Dec. 2019, doi: 10.1186/s13362-019-0066-7.




DOI: https://doi.org/10.31763/simple.v4i3.70

Refbacks

  • There are currently no refbacks.


Copyright (c) 2023 Ahmad Azhari, Intan Destiyanti

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.


Signal and Image Processing Letters

ISSN Online: 2714-6677 | Print: 2714-6669
Published by Association for Scientific Computing Electrical and Engineering (ASCEE)
Website : https://simple.ascee.org/index.php/simple/
Email 1 : simple@ascee.org
Email 2 : azhari@ascee.org


 

View My Stats