Pooja Kumari Jha, Sheetal Giri, Rajan Karmacharya (Supervisor)
St. Xavier's College
08 July 16
Thesis or project
Recurrent Neural Network, Connectionist Temporal Classification (CTC), N-gram
Language Model, Automatic Speech Recognition
Speech recognition is the process of enabling a computer to identify and respond to the
sounds produced in human speech. Hamro Awaaz - Nepali Automated Speech
Recognizer (ASR) performs the speaker-independent, computer‐driven transcription of
spoken Nepali into readable Devanagari text in real time.
The project is based around an android application through which user will send their
voice recording to the server, where it is processed to corresponding text and responded
back. The base of any speech recognition system is its acoustic and language model.
These models in turn are dependent on the amount and quality of data collected, and the
training algorithms. A deep recurrent neural network with Connectionist Temporal
Classification (CTC) in the output layer is being used for training acoustic model.
Language modeling is done by means of n-gram distribution among data collected.
Finally, a search algorithm is used to find the best matching transcription to user's speech
Through these techniques we will be able to achieve a speech recognition model that can
be helpful to all seeking better model of speech recognition for Nepali and other
languages as well. The resulting application can create a platform for development of
other Nepali voice based applications.