Journal of Information Technology Review

DLINE Journals portal

Home

New Journals

Browse Journals

Journal Prices

For Authors

Print ISSN: 0976-898X
Online ISSN: 0976-8998

About JITR
	DLINE Portal Home Home Aims & Scope Editorial Board Current Issue Next Issue Previous Issue Sample Issue Upcoming Conferences Self-archiving policy Alert Services Be a Reviewer Publisher Paper Submission Subscription Contact us

How To Order
	Order Online Price Information Request for Complimentary Print Copy

For Authors
	Guidelines for Contributors Online Submission Call for Papers Author Rights

RELATED JOURNALS

Journal of Digital Information Management (JDIM)

International Journal of Computational Linguistics Research (IJCL)

International Journal of Web Application (IJWA)

Journal of Information Technology Review

A Study on Automatic Speech Recognition

Saliha Benkerzaz, Youssef Elmir, Abdeslam Dennai
Department of Exact Sciences, University of TAHRI Mohamed, Smart Grid & Renewable Energies Laboratory, Computer Science & Sciences Didactic Team, Bechar, Algeria

Abstract: Speech is an easy and usable technique of communication between humans, but nowadays humans are not limited to connecting to each other but even to the different machines in our lives. The most important is the computer. So, this communication technique can be used between computers and humans. This interaction is done through interfaces, this area called Human Computer Interaction (HCI). This paper gives an overview of the main definitions of Automatic Speech Recognition (ASR) which is an important domain of artificial intelligence and which should be taken into account during any related research (Type of speech, vocabulary size... etc.). It also gives a summary of important research relevant to speech processing in the few last years, with a general idea of our proposal that could be considered as a contribution in this area of research and by giving a conclusion referring to certain enhancements that could be in the future works.

Keywords: Automatic Speech Recognition, Human Computer Interaction, Computer Vision, Artificial Intelligence A Study on Automatic Speech Recognition

DOI:https://doi.org/10.6025/jitr/2019/10/3/77-85

Full_Text PDF 859 KB Download: 447 times

References:[1] Laszlo, T. (2018). Deep Neural Networks with Linearly Augmented Rectifier Layers for Speech Recognition, SAMI 2018 IEEE 16th World Symposium on Applied Machine Intelligence and Informatics February 7-10 KoÃ…Â¡ice, HerlÃ¢â‚¬â„¢any, Slovakia. [2] Yuki, S., Shinnosuke, T. (2018). Statistical Parametric Speech Synthesis Incorporating Generative Adversarial Networks, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26 (1). [3] Lilia, L., Mohamed, T. L., Rachid, B. (2017). Discriminant Learning for Hybrid HMM/MLP Speech Recognition System using a Fuzzy Genetic Clustering, Intelligent Systems Conference 20177-8 | London, UK. [4] Abhijit, M., Vinay, K. M. (2017). Human Emotional States Classification Based upon Changes in Speech Production Features in Vowel Regions, 2017 2nd International Conference on Telecommunication and Networks (TEL-NET 2017). [5] Michael, P., James, G., Anantha, P. C. (2018). A Low-Power Speech Recognizer and Voice Activity Detector Using Deep Neural Networks, IEEE Journal of Solid-state Circuits, 53(1). [6] Stefan, H., Marco, D., Christian, R., Fabrice, L., Patrick, L., Renato, D., Alessandro, M., Hermann, N., Giuseppe, R. (2011). Comparing stochastic approaches to spoken language understanding in multiple languages, IEEE Transactions on Audio, Speech, and Language Processing, 19 (6) 1569Ã¢â‚¬â€œ1583. [7] Edwin, S., Sahar, G., Nathalie, C., Yannick, E., Renato, D. (2017). ASR error management for improving spoken language understanding, arXiv: 1705.09515v1[cs.CL]. [8] Gregory, G., Jean-Luc, G. (2018). Optimization of RNN-Based Speech Activity Detection, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26 (3). [9] Gregory, G., Jean-Luc, G. ( 2015). Minimum Word Error Training of RNN-based Voice Activity Detection, INTERSPEECH 2015,16th Annual Conference of the International Speech Communication Association, Dresden, Germany. [10] Dominique, F., Odile, M., Irina, I. ( 2017). New Paradigm in Speech Recognition: Deep Neural Networks, the ContNomina project supported, French National Research Agency (ANR). [11] Luiza, O. (2015). Reconnaissance de la parole pour lÃ¢â‚¬â„¢aide ÃƒÂ la communication pour les sourds et malentendants, UniversitÃƒÂ© de Lorraine, Laboratoire Lorrain de Recherche en Informatique et ses Applications - UMR 7503 . [12] https://www.voicebox.com/wp-content/uploads/2017/05/Automatic-Speech-Recognition-Overview-and-Core- Technology.pdf, Ã‚Â© 2017 Voicebox Technologies Corporation, voicebox.com. [13] Xuedong, H., Li, D. (2009). An Overview of Modern Speech Recognition, Indurkhya/Handbook of Natural Language Processing C5921_C01, 339 -344, Microsoft Corporation. [14] Julien, A. ( 2003). Approche Dela Reconnaissance Automatiquede La Parole, Rapport cycle probatoire, CNAM. [15] Anusuya, M. A., Katti, S. K. (2009). Speech Recognition by Machine: A Review, (IJCSIS) International Journal of Computer Science and Information Security, 6(3). [16] Preeti, S., Parneet, K. (2013). Automatic Speech Recognition: A Review, International Journal of Engineering Trends and Technology, 4(2) 2013,http://www.internationaljournalssrg.org [17] Santosh, K. G., Bharti, W. G., Pravin, Y. (2010). A Review on Speech Recognition Technique, International Journal of Computer Applications (0975 Ã¢â‚¬â€œ 8887) 10(3), (November). [18] Vrinda1, Shekhar, Chander. (2013). Speech Recognition System For English Language, International Journalof Advanced Research in Computer and Communication Engineering, 2(1), January 2013, ISSN (Print): 2319-5940 ISSN (Online): 2278- 1021, www.ijarcce.com, Copyright to IJARCCE. [19] http://www.speech.cs.cmu.edu/comp.speech/Section6/Q6.1.html, date of consultation: 29/08/2018. [20] Pegah, G., Jash, D., Michael, L. S. ( 2016). Linearly augmented deep neural network, in Proc. ICASSP, 5085Ã¢â‚¬â€œ5089. [21] Sylvain,G., Guillaume, G., Laura, C. (2009). The ESTER2 Evaluation Campaign for the Rich Transcription of French Radio Broadcasts, Brighton UK, Proceedings of Interspeech. [22] Yannick, E.,Thierry, B., Jean-Yves, A., FrÃƒÂ©dÃƒÂ©ric, B. (2010). The EPAC corpus: manual and automatic annotations of conversational speech in French broadcast news, In: Proceedings of the International Conference on Language Resources and Evaluation (LREC). [23] Guillaume, G., Gilles, A., Niklas, P., Matthieu, C., Aude, G., Olivier. (2012). The ETAPE corpus for the evaluation of speechbased TV content processing in the French language, In: Proceedings of the International Conference on Language Resources, Evaluation and Corpora (LREC). [24] NicolÃƒÂ¡s, M., John, H., Hansen, L., Doorstep, T. T. (2005). MFCC Compensation for improved recognition filtered and band limited speech, Center for Spoken Language Research, University of Colorado at Boulder, Boulder (CO), USA. [25] Santosh, K. G., Bharti, W. G., Pravin, Y. (2010). A Review on Speech Recognition Technique, International Journal of Computer Applications (0975 Ã¢â‚¬â€œ 8887), 10(3). [26] Manoj, K. S., Omendri, K. (2015). Speech Recognition: A Review, Special Conference Issue: National Conference on Cloud Computing & Big Data. [27] Benjamin., B. (2016). Reconnaissance Automatique de la Parolepour la transcription et le sous-titrage de contenus audio et vidÃƒÂ©o, 52 av. P. SÃƒÂ©mard -94200 Ivry-sur-Seine, AuthÃƒÂ´t.com - November 2016, 64. [28] Gravier, G., Adda, G., Paulson, N., CarrÃƒÂ©, M., Giraudel, A., Galibert, O. (2012). The ETAPE corpus for the evaluation of speechbased TV content processing in the French language, LREC Eighth International Conference on Language Resources and Evaluation, p. na. [29] Kahn, J., Galibert, O., Quintard, L., CarrÃƒÂ©, M., Giraudel, A., Joly, P. (2012). A presentation of the REPERE challenge, Content- Based Multimedia Indexing (CBMI), 2012 10th International Workshop on, IEEE, 1-6, 2012. [30] Zied, E., Benjamin, L., Olivier, G., Laurent, B. (2019). PrÃƒÂ©diction de performance des systÃƒÂ¨mes de reconnaissance automatique de la parole ÃƒÂ lÃ¢â‚¬â„¢aide de rÃƒÂ©seaux de neurones convolutifs, HAL Id: hal-01976284, TAL. Volume 59 - no 2/2018. [31] Galliano, S., Geoffrois, E., Mostefa, D., Choukri, K., Bonastre, J.-F., Gravier, G. (2005). The ESTER phase II evaluation campaign for the rich transcription of French broadcast news, Interspeech, 1149-1152.

DLINE Journals portal