Home| Contact Us| New Journals| Browse Journals| Journal Prices| For Authors|

Print ISSN: 0976-898X
Online ISSN:
0976-8998


  About JITR
  DLINE Portal Home
Home
Aims & Scope
Editorial Board
Current Issue
Next Issue
Previous Issue
Sample Issue
Upcoming Conferences
Self-archiving policy
Alert Services
Be a Reviewer
Publisher
Paper Submission
Subscription
Contact us
 
  How To Order
  Order Online
Price Information
Request for Complimentary
Print Copy
 
  For Authors
  Guidelines for Contributors
Online Submission
Call for Papers
Author Rights
 
 
RELATED JOURNALS
Journal of Digital Information Management (JDIM)
International Journal of Computational Linguistics Research (IJCL)
International Journal of Web Application (IJWA)

 

 
Journal of Information Technology Review
 

A Study on Automatic Speech Recognition
Saliha Benkerzaz, Youssef Elmir, Abdeslam Dennai
Department of Exact Sciences, University of TAHRI Mohamed, Smart Grid & Renewable Energies Laboratory, Computer Science & Sciences Didactic Team, Bechar, Algeria
Abstract: Speech is an easy and usable technique of communication between humans, but nowadays humans are not limited to connecting to each other but even to the different machines in our lives. The most important is the computer. So, this communication technique can be used between computers and humans. This interaction is done through interfaces, this area called Human Computer Interaction (HCI). This paper gives an overview of the main definitions of Automatic Speech Recognition (ASR) which is an important domain of artificial intelligence and which should be taken into account during any related research (Type of speech, vocabulary size... etc.). It also gives a summary of important research relevant to speech processing in the few last years, with a general idea of our proposal that could be considered as a contribution in this area of research and by giving a conclusion referring to certain enhancements that could be in the future works.
Keywords: Automatic Speech Recognition, Human Computer Interaction, Computer Vision, Artificial Intelligence A Study on Automatic Speech Recognition
DOI:https://doi.org/10.6025/jitr/2019/10/3/77-85
Full_Text   PDF 859 KB   Download:   447  times
References:[1] Laszlo, T. (2018). Deep Neural Networks with Linearly Augmented Rectifier Layers for Speech Recognition, SAMI 2018 IEEE 16th World Symposium on Applied Machine Intelligence and Informatics February 7-10 Košice, Herl’any, Slovakia. [2] Yuki, S., Shinnosuke, T. (2018). Statistical Parametric Speech Synthesis Incorporating Generative Adversarial Networks, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26 (1). [3] Lilia, L., Mohamed, T. L., Rachid, B. (2017). Discriminant Learning for Hybrid HMM/MLP Speech Recognition System using a Fuzzy Genetic Clustering, Intelligent Systems Conference 20177-8 | London, UK. [4] Abhijit, M., Vinay, K. M. (2017). Human Emotional States Classification Based upon Changes in Speech Production Features in Vowel Regions, 2017 2nd International Conference on Telecommunication and Networks (TEL-NET 2017). [5] Michael, P., James, G., Anantha, P. C. (2018). A Low-Power Speech Recognizer and Voice Activity Detector Using Deep Neural Networks, IEEE Journal of Solid-state Circuits, 53(1). [6] Stefan, H., Marco, D., Christian, R., Fabrice, L., Patrick, L., Renato, D., Alessandro, M., Hermann, N., Giuseppe, R. (2011). Comparing stochastic approaches to spoken language understanding in multiple languages, IEEE Transactions on Audio, Speech, and Language Processing, 19 (6) 1569–1583. [7] Edwin, S., Sahar, G., Nathalie, C., Yannick, E., Renato, D. (2017). ASR error management for improving spoken language understanding, arXiv: 1705.09515v1[cs.CL]. [8] Gregory, G., Jean-Luc, G. (2018). Optimization of RNN-Based Speech Activity Detection, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26 (3). [9] Gregory, G., Jean-Luc, G. ( 2015). Minimum Word Error Training of RNN-based Voice Activity Detection, INTERSPEECH 2015,16th Annual Conference of the International Speech Communication Association, Dresden, Germany. [10] Dominique, F., Odile, M., Irina, I. ( 2017). New Paradigm in Speech Recognition: Deep Neural Networks, the ContNomina project supported, French National Research Agency (ANR). [11] Luiza, O. (2015). Reconnaissance de la parole pour l’aide à la communication pour les sourds et malentendants, Université de Lorraine, Laboratoire Lorrain de Recherche en Informatique et ses Applications - UMR 7503 . [12] https://www.voicebox.com/wp-content/uploads/2017/05/Automatic-Speech-Recognition-Overview-and-Core- Technology.pdf, © 2017 Voicebox Technologies Corporation, voicebox.com. [13] Xuedong, H., Li, D. (2009). An Overview of Modern Speech Recognition, Indurkhya/Handbook of Natural Language Processing C5921_C01, 339 -344, Microsoft Corporation. [14] Julien, A. ( 2003). Approche Dela Reconnaissance Automatiquede La Parole, Rapport cycle probatoire, CNAM. [15] Anusuya, M. A., Katti, S. K. (2009). Speech Recognition by Machine: A Review, (IJCSIS) International Journal of Computer Science and Information Security, 6(3). [16] Preeti, S., Parneet, K. (2013). Automatic Speech Recognition: A Review, International Journal of Engineering Trends and Technology, 4(2) 2013,http://www.internationaljournalssrg.org [17] Santosh, K. G., Bharti, W. G., Pravin, Y. (2010). A Review on Speech Recognition Technique, International Journal of Computer Applications (0975 – 8887) 10(3), (November). [18] Vrinda1, Shekhar, Chander. (2013). Speech Recognition System For English Language, International Journalof Advanced Research in Computer and Communication Engineering, 2(1), January 2013, ISSN (Print): 2319-5940 ISSN (Online): 2278- 1021, www.ijarcce.com, Copyright to IJARCCE. [19] http://www.speech.cs.cmu.edu/comp.speech/Section6/Q6.1.html, date of consultation: 29/08/2018. [20] Pegah, G., Jash, D., Michael, L. S. ( 2016). Linearly augmented deep neural network, in Proc. ICASSP, 5085–5089. [21] Sylvain,G., Guillaume, G., Laura, C. (2009). The ESTER2 Evaluation Campaign for the Rich Transcription of French Radio Broadcasts, Brighton UK, Proceedings of Interspeech. [22] Yannick, E.,Thierry, B., Jean-Yves, A., Frédéric, B. (2010). The EPAC corpus: manual and automatic annotations of conversational speech in French broadcast news, In: Proceedings of the International Conference on Language Resources and Evaluation (LREC). [23] Guillaume, G., Gilles, A., Niklas, P., Matthieu, C., Aude, G., Olivier. (2012). The ETAPE corpus for the evaluation of speechbased TV content processing in the French language, In: Proceedings of the International Conference on Language Resources, Evaluation and Corpora (LREC). [24] Nicolás, M., John, H., Hansen, L., Doorstep, T. T. (2005). MFCC Compensation for improved recognition filtered and band limited speech, Center for Spoken Language Research, University of Colorado at Boulder, Boulder (CO), USA. [25] Santosh, K. G., Bharti, W. G., Pravin, Y. (2010). A Review on Speech Recognition Technique, International Journal of Computer Applications (0975 – 8887), 10(3). [26] Manoj, K. S., Omendri, K. (2015). Speech Recognition: A Review, Special Conference Issue: National Conference on Cloud Computing & Big Data. [27] Benjamin., B. (2016). Reconnaissance Automatique de la Parolepour la transcription et le sous-titrage de contenus audio et vidéo, 52 av. P. Sémard -94200 Ivry-sur-Seine, Authôt.com - November 2016, 64. [28] Gravier, G., Adda, G., Paulson, N., Carré, M., Giraudel, A., Galibert, O. (2012). The ETAPE corpus for the evaluation of speechbased TV content processing in the French language, LREC Eighth International Conference on Language Resources and Evaluation, p. na. [29] Kahn, J., Galibert, O., Quintard, L., Carré, M., Giraudel, A., Joly, P. (2012). A presentation of the REPERE challenge, Content- Based Multimedia Indexing (CBMI), 2012 10th International Workshop on, IEEE, 1-6, 2012. [30] Zied, E., Benjamin, L., Olivier, G., Laurent, B. (2019). Prédiction de performance des systèmes de reconnaissance automatique de la parole à l’aide de réseaux de neurones convolutifs, HAL Id: hal-01976284, TAL. Volume 59 - no 2/2018. [31] Galliano, S., Geoffrois, E., Mostefa, D., Choukri, K., Bonastre, J.-F., Gravier, G. (2005). The ESTER phase II evaluation campaign for the rich transcription of French broadcast news, Interspeech, 1149-1152.

Home | Aim & Scope | Editorial Board | Author Guidelines | Publisher | Subscription | Previous Issue | Contact Us |Upcoming Conferences|Sample Issues|Library Recommendation Form|

 

Copyright © 2011 dline.info