|
Download PDFOpen PDF in browserDeep Learning Speech Recognition: Input Representation PerspectiveEasyChair Preprint 152355 pages•Date: October 18, 2024AbstractConvolution neural network is becoming the state of the art models in many applications. With deep architectures, convolution neural network can learn speech patterns effectively. There remains the decision on using raw signals, spectrogram, or other input representation. In this paper Deep Convolution Architectures for Speech Recognition is designed, implemented, and developed. The architectures are implemented on raw data and on spectrogram representations. The architectures composed of two stages networks. Self extracting network and classification networks. First, the architecture uses the spectrogram approach to the feature extraction stage. Then classify the speech patterns into the appropriate class. The second architecture uses raw signal as input to the extraction stage. The two approaches use minimum preprocessing to the speech signal. The architectures recognize the speech patterns in the TI46 corpus. Extensive experiments were conducted to reach the best design in both approaches. Among the many convolution architectures we presented the best results. The architecture on raw signal produced better recognition rate, and achieves excellent performance over reported result. Keyphrases: Convolution Neural Network, deep learning, pattern recognition, speech recognition Download PDFOpen PDF in browser |
|
|