International Journal of Emerging Research in Engineering, Science, and Management
Vol. 4, Issue 2, pp. 14-22, Apr-Jun 2025.
https://doi.org/10.58482/ijeresm.v4i2.3

Enhancing Biometric Security: A Robust Voice Frequency Detector with CNN-BiLSTM and Anti-Spoofing Mechanisms

Mahfudz Ahnan Al Faruq

Mohammad Givi Efgivia

Informatics Engineering, Muhammadiyah Prof. Dr. Hamka University, Jakarta, Indonesia.

Abstract: This study introduces a Voice Frequency Detector (VFD) framework to enhance biometric password authentication by addressing key challenges such as spoofing attacks, environmental noise, and natural variations in speaker voice due to health, emotion, or aging. The system leverages dynamic vocal features including fundamental frequency (F0), Mel-Frequency Cepstral Coefficients (MFCCs), and formant structures, integrated with a hybrid CNN-BiLSTM deep learning model and attention mechanisms for robust spectral-temporal analysis. An anti-spoofing subsystem employs spectral flatness and phase distortion features to detect synthetic and replayed voices. The methodology involves signal preprocessing (Wiener filtering, voice activity detection), feature extraction, and score fusion by combining deep learning outputs with anti-spoofing results. Experiments on a dataset of 100 speakers and 1,000 spoofed samples demonstrate strong performance, achieving an EER of 2.8% in controlled conditions and 5.0% in noisy environments, with over 91% accuracy against replay, synthetic, and voice conversion attacks. Statistical analysis confirms that MFCCs are the most discriminative feature, contributing to 62% of the variance. The VFD framework offers a secure, adaptive, and practical voice authentication solution suitable for finance, IoT, and access control applications. Future enhancements may explore multi-modal integration and transformer-based architectures for broader applicability.

Keywords: Anti-Spoofing, CNN-BiLSTM, Deep Learning, MFCCs, Voice Biometrics.

References: 

  1. Rahman et al., “Multimodal EEG and Keystroke Dynamics Based Biometric System Using Machine Learning Algorithms,” IEEE Access, vol. 9, pp. 94625-94643, 2021, doi: 10.1109/ACCESS.2021.3092840.
  2. C.-C. Hsu, K.-M. Cheong, T.-S. Chi, and Y. Tsao, “Robust voice activity detection algorithm based on feature of frequency modulation of Harmonics and its DSP implementation,” IEICE Transactions on Information and Systems, vol. E98.D, no. 10, pp. 1808–1817, Jan. 2015, doi: 10.1587/transinf.2015edp7138.
  3. Muthukumaran Vaithianathan, “Digital signal processing for noise suppression in voice signals,” International Journal of Advanced Research and Interdisciplinary Scientific Endeavours, vol. 1, no. 4, 2024. doi: 10.61359/11.2206-2417.
  4. H. Mandalapu et al., “Audio-Visual Biometric Recognition and Presentation Attack Detection: A Comprehensive Survey,” IEEE Access, vol. 9, pp. 37431-37455, 2021, doi: 10.1109/ACCESS.2021.3063031.
  5. S. S. U. Hasan, A. Ghani, A. Daud, H. Akbar, and M. F. Khan, “A review on Secure authentication Mechanisms for Mobile Security,” Sensors, vol. 25, no. 3, p. 700, Jan. 2025, doi: 10.3390/s25030700.
  6. D.-H. Jung et al., “Deep Learning-Based Cattle Vocal Classification Model and Real-Time Livestock Monitoring System with Noise Filtering,” Animals, vol. 11, no. 2, p. 357, Feb. 2021, doi: 10.3390/ani11020357.
  7. U. S. Shanthamallu, S. Rao, A. Dixit, V. S. Narayanaswamy, J. Fan and A. Spanias, “Introducing Machine Learning in Undergraduate DSP Classes,” ICASSP 2019 – 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 2019, pp. 7655-7659, doi: 10.1109/ICASSP.2019.8683780.
  8. W. Lee, J. J. Seong, B. Ozlu, B. S. Shim, A. Marakhimov, and S. Lee, “Biosignal Sensors and Deep Learning-Based Speech Recognition: A review,” Sensors, vol. 21, no. 4, p. 1399, Feb. 2021, doi: 10.3390/s21041399.
  9. M. Mcuba, A. Singh, R. A. Ikuesan, and H. Venter, “The effect of deep learning methods on deepfake audio detection for digital investigation,” Procedia Computer Science, vol. 219, pp. 211–219, Jan. 2023, doi: 10.1016/j.procs.2023.01.283.
  10. X. Wang et al., “ASVspoof 2019: A large-scale public database of synthesized, converted and replayed speech,” Computer Speech & Language, vol. 64, p. 101114, May 2020, doi: 10.1016/j.csl.2020.101114.
  11. N. Tomashenko, Y. Khokhlov, and Y. Esteve, “Exploring Gaussian mixture model framework for speaker adaptation of deep neural network acoustic models,” arXiv.org, Mar. 15, 2020. https://arxiv.org/abs/2003.06894.
  12. M. Todisco, H. Delgado, and N. Evans, “Constant Q cepstral coefficients: A spoofing countermeasure for automatic speaker verification,” Computer Speech & Language, vol. 45, pp. 516–535, Feb. 2017, doi: 10.1016/j.csl.2017.01.001.
  13. L. Zhang and J. Yang, “A continuous liveness detection for voice authentication on smart devices,” arXiv.org, Jun. 01, 2021. https://arxiv.org/abs/2106.00859.
  14. X. Zhang, D. Cheng, P. Jia, Y. Dai and X. Xu, “An efficient android-based multimodal biometric authentication system with face and voice,” IEEE Access, vol. 8, pp. 102757-102772, 2020, doi: 10.1109/ACCESS.2020.2999115.
  15. X. Wang, Z. Yan, R. Zhang, and P. Zhang, “Attacks and defenses in user authentication systems: A survey,” Journal of Network and Computer Applications, vol. 188, p. 103080, May 2021, doi: 10.1016/j.jnca.2021.103080.

© 2025 The Author(s). Published by IJERESM. This work is licensed under the Creative Commons Attribution 4.0 International License.

Archiving: All articles are permanently archived in Zenodo IJERESM Community.