Deep Learning Speech Recognition: Input Representation Perspective

EasyChair Preprint 15235

5 pages•Date: October 18, 2024

Abstract

Convolution neural network is becoming the state of the art models in many applications. With deep architectures,
convolution neural network can learn speech patterns effectively. There remains the decision on using raw signals,
spectrogram, or other input representation. In this paper Deep Convolution Architectures for Speech Recognition is designed, implemented, and developed. The architectures are implemented on raw data and on spectrogram representations. The architectures composed of two stages networks. Self extracting network and classification networks. First, the architecture uses the spectrogram approach to the feature extraction stage. Then classify the speech patterns into the appropriate class. The second architecture uses raw signal as input to the extraction stage. The two approaches use minimum preprocessing to the speech signal. The architectures recognize the speech patterns in the TI46 corpus. Extensive experiments were conducted to reach the best design in both approaches. Among the many convolution architectures we presented the best results. The architecture on raw signal produced better recognition rate, and achieves excellent performance over reported result.

Keyphrases: Convolution Neural Network, deep learning, pattern recognition, speech recognition

Links:

https://easychair.org/publications/preprint/68Lj

BibTeX entry

BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:

@booklet{EasyChair:15235,
  author    = {Elsadig Babiker and Hanan Adlan},
  title     = {Deep Learning Speech Recognition: Input Representation Perspective},
  howpublished = {EasyChair Preprint 15235},
  year      = {EasyChair, 2024}}

Download PDF Open PDF in browser