A Study on Automatic Speech Recognition | Saliha Benkerzaz, Youssef Elmir, Abdeslam Dennai Department of Exact Sciences, University of TAHRI Mohamed, Smart Grid & Renewable Energies Laboratory, Computer Science & Sciences Didactic Team, Bechar, Algeria | Abstract: Speech is an easy and usable technique of communication between humans, but nowadays humans are not limited to connecting to each other but even to the different machines in our lives. The most important is the computer. So, this communication technique can be used between computers and humans. This interaction is done through interfaces, this area called Human Computer Interaction (HCI). This paper gives an overview of the main definitions of Automatic Speech
Recognition (ASR) which is an important domain of artificial intelligence and which should be taken into account during any related research (Type of speech, vocabulary size... etc.). It also gives a summary of important research relevant to speech processing in the few last years, with a general idea of our proposal that could be considered as a contribution in this area
of research and by giving a conclusion referring to certain enhancements that could be in the future works. | Keywords: Automatic Speech Recognition, Human Computer Interaction, Computer Vision, Artificial Intelligence A Study on Automatic Speech Recognition |
DOI:https://doi.org/10.6025/jitr/2019/10/3/77-85 | Full_Text   PDF 859 KB   Download:   447  times | References:[1] Laszlo, T. (2018). Deep Neural Networks with Linearly Augmented Rectifier Layers for Speech Recognition, SAMI 2018 IEEE
16th World Symposium on Applied Machine Intelligence and Informatics February 7-10 Košice, Herl’any, Slovakia.
[2] Yuki, S., Shinnosuke, T. (2018). Statistical Parametric Speech Synthesis Incorporating Generative Adversarial Networks,
IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26 (1).
[3] Lilia, L., Mohamed, T. L., Rachid, B. (2017). Discriminant Learning for Hybrid HMM/MLP Speech Recognition System using
a Fuzzy Genetic Clustering, Intelligent Systems Conference 20177-8 | London, UK.
[4] Abhijit, M., Vinay, K. M. (2017). Human Emotional States Classification Based upon Changes in Speech Production Features
in Vowel Regions, 2017 2nd International Conference on Telecommunication and Networks (TEL-NET 2017).
[5] Michael, P., James, G., Anantha, P. C. (2018). A Low-Power Speech Recognizer and Voice Activity Detector Using Deep Neural
Networks, IEEE Journal of Solid-state Circuits, 53(1).
[6] Stefan, H., Marco, D., Christian, R., Fabrice, L., Patrick, L., Renato, D., Alessandro, M., Hermann, N., Giuseppe, R. (2011).
Comparing stochastic approaches to spoken language understanding in multiple languages, IEEE Transactions on Audio,
Speech, and Language Processing, 19 (6) 1569–1583.
[7] Edwin, S., Sahar, G., Nathalie, C., Yannick, E., Renato, D. (2017). ASR error management for improving spoken language
understanding, arXiv: 1705.09515v1[cs.CL].
[8] Gregory, G., Jean-Luc, G. (2018). Optimization of RNN-Based Speech Activity Detection, IEEE/ACM Transactions on Audio,
Speech, and Language Processing, 26 (3).
[9] Gregory, G., Jean-Luc, G. ( 2015). Minimum Word Error Training of RNN-based Voice Activity Detection, INTERSPEECH
2015,16th Annual Conference of the International Speech Communication Association, Dresden, Germany.
[10] Dominique, F., Odile, M., Irina, I. ( 2017). New Paradigm in Speech Recognition: Deep Neural Networks, the ContNomina
project supported, French National Research Agency (ANR).
[11] Luiza, O. (2015). Reconnaissance de la parole pour l’aide àla communication pour les sourds et malentendants, Université
de Lorraine, Laboratoire Lorrain de Recherche en Informatique et ses Applications - UMR 7503 .
[12] https://www.voicebox.com/wp-content/uploads/2017/05/Automatic-Speech-Recognition-Overview-and-Core-
Technology.pdf, © 2017 Voicebox Technologies Corporation, voicebox.com.
[13] Xuedong, H., Li, D. (2009). An Overview of Modern Speech Recognition, Indurkhya/Handbook of Natural Language
Processing C5921_C01, 339 -344, Microsoft Corporation.
[14] Julien, A. ( 2003). Approche Dela Reconnaissance Automatiquede La Parole, Rapport cycle probatoire, CNAM.
[15] Anusuya, M. A., Katti, S. K. (2009). Speech Recognition by Machine: A Review, (IJCSIS) International Journal of Computer
Science and Information Security, 6(3).
[16] Preeti, S., Parneet, K. (2013). Automatic Speech Recognition: A Review, International Journal of Engineering Trends and
Technology, 4(2) 2013,http://www.internationaljournalssrg.org
[17] Santosh, K. G., Bharti, W. G., Pravin, Y. (2010). A Review on Speech Recognition Technique, International Journal of
Computer Applications (0975 – 8887) 10(3), (November).
[18] Vrinda1, Shekhar, Chander. (2013). Speech Recognition System For English Language, International Journalof Advanced
Research in Computer and Communication Engineering, 2(1), January 2013, ISSN (Print): 2319-5940 ISSN (Online): 2278-
1021, www.ijarcce.com, Copyright to IJARCCE.
[19] http://www.speech.cs.cmu.edu/comp.speech/Section6/Q6.1.html, date of consultation: 29/08/2018.
[20] Pegah, G., Jash, D., Michael, L. S. ( 2016). Linearly augmented deep neural network, in Proc. ICASSP, 5085–5089.
[21] Sylvain,G., Guillaume, G., Laura, C. (2009). The ESTER2 Evaluation Campaign for the Rich Transcription of French Radio
Broadcasts, Brighton UK, Proceedings of Interspeech.
[22] Yannick, E.,Thierry, B., Jean-Yves, A., Frédéric, B. (2010). The EPAC corpus: manual and automatic annotations of
conversational speech in French broadcast news, In: Proceedings of the International Conference on Language Resources
and Evaluation (LREC).
[23] Guillaume, G., Gilles, A., Niklas, P., Matthieu, C., Aude, G., Olivier. (2012). The ETAPE corpus for the evaluation of speechbased
TV content processing in the French language, In: Proceedings of the International Conference on Language Resources,
Evaluation and Corpora (LREC).
[24] Nicolás, M., John, H., Hansen, L., Doorstep, T. T. (2005). MFCC Compensation for improved recognition filtered and band
limited speech, Center for Spoken Language Research, University of Colorado at Boulder, Boulder (CO), USA.
[25] Santosh, K. G., Bharti, W. G., Pravin, Y. (2010). A Review on Speech Recognition Technique, International Journal of
Computer Applications (0975 – 8887), 10(3).
[26] Manoj, K. S., Omendri, K. (2015). Speech Recognition: A Review, Special Conference Issue: National Conference on Cloud
Computing & Big Data.
[27] Benjamin., B. (2016). Reconnaissance Automatique de la Parolepour la transcription et le sous-titrage de contenus audio et
vidéo, 52 av. P. Sémard -94200 Ivry-sur-Seine, Authôt.com - November 2016, 64.
[28] Gravier, G., Adda, G., Paulson, N., Carré, M., Giraudel, A., Galibert, O. (2012). The ETAPE corpus for the evaluation of speechbased
TV content processing in the French language, LREC Eighth International Conference on Language Resources and
Evaluation, p. na.
[29] Kahn, J., Galibert, O., Quintard, L., Carré, M., Giraudel, A., Joly, P. (2012). A presentation of the REPERE challenge, Content-
Based Multimedia Indexing (CBMI), 2012 10th International Workshop on, IEEE, 1-6, 2012.
[30] Zied, E., Benjamin, L., Olivier, G., Laurent, B. (2019). Prédiction de performance des systèmes de reconnaissance automatique
de la parole àl’aide de réseaux de neurones convolutifs, HAL Id: hal-01976284, TAL. Volume 59 - no 2/2018.
[31] Galliano, S., Geoffrois, E., Mostefa, D., Choukri, K., Bonastre, J.-F., Gravier, G. (2005). The ESTER phase II evaluation
campaign for the rich transcription of French broadcast news, Interspeech, 1149-1152. |
![](ijns_files/trans.gif) |
|