Impulse-like characteristics of excitation occur at the glottal closure instant (GCI) due to sharp closure of the vibrating vocal folds in each glottal cycle. The GCIs are detected from the excitation component of the speech signal, and the excitation component is derived using inverse filtering or its variants. In this paper we propose a method for GCI detection based on single frequency filtering (SFF) of the speech signal. The SFF output has high signal-to-noise ratio (SNR) property in speech regions. The variance (across frequency) contour computed from the SFF output show rapid changes around the GCIs, and these rapid changes can be observed even when the speech signal is degraded. Thus the GCI locations can be extracted even from degraded speech using the SFF analysis. The robustness of the method is demonstrated for several cases of degradation of speech signal.
Added on December 19, 2018
Contributed by : Individual
Product Type : Research Paper
License Type : Freeware
System Requirement :
Author : G. Aneeja,Sudarsana Reddy Kadiri, B. Yegnanarayana
In this paper, a DNN based keyword spotting framework, that utilizes both spectral as well as prosodic information present in the speech signal, is proposed. A DNN is first trained to learn a set of hierarchical non-linear transformation parameters that project the original spectral and prosodic feature vectors onto a feature space where the distance between similar syllable pairs is small and between dissimilar syllable pairs is large. These transformed features are then fused using an attention based long short-term memory (LSTM) network. As a side result, a deep denoising autoencoder based fine-tuning technique is used to improve the performance of sequence predictions.
Many Inscript based standalone keyboard applications are available online for typing Punjabi but they are restrictive in nature and most of these do not offer formatting of the keyed-in content in the text area provided in the application. The main problem regarding these keyboard applications is that in the absence of an audio feedback for the keys pressed on the Punjabi Inscript keyboard, unless trained to use that keyboard, the visually impaired cannot make out whether the content is being typed correctly. In order to overcome this problem, we developed the Punjabi Unicode Inscript Keyboard with sound embedded on every keystroke. This keyboard can be used for typing directly in the Microsoft Word.
The OCR technology for Indian documents is in emerging stage and most of these Indian OCR systems can read the documents written in only a single script. As many commercial and official documents of different states of India are tri-lingual in nature, therefore identification of script and/ or language is one of the elementary tasks for multi-script document recognition. A script recognizer simplifies the task of multi-lingual OCR by improving the accuracy and reducing the computational complexity. This script recognition may be at line, word or character level depending on interlacing of different scripts at different levels.
Script Identification is one of the challenging step in the Optical Character Recognition system for multi-script documents. In Indian and Non-Indian context some results have been reported, but research in this field is still emerging. This paper presents a research work in the identification of Gurmukhi and English scripts at word level. It also identifies English Numerals from Gurmukhi text. Gabor feature extraction is one of most popular method for script recognition. This paper presents a zone based gabor feature extraction technique.