The End-to-End Speech Synthesis System for the VLSP Campaign 2019

EasyChair Preprint 1742

3 pages•Date: October 22, 2019

Abstract

The traditional speech synthesis systems are typically built by multiple components, such as including a text analysis front-end, an acoustic model and an audio synthesis module. Building these components often requires a lot of people possessing extensive domain experts and may contain brittle design choices. In this paper, we describe how we build a Vietnamese speech synthesis system (TTS) based on Deep Learning techniques. We completed the build of two speech synthesis systems, with BigCorpus (Mean Opinion Score of 3.47) and SmallCorpus (Mean Opinion Score of 4.13) in text-to-speech shared-tasks of VLSP 2019. In addition, transfer learning and fine-tuning techniques are also applied to solve noise data problems of training data in BigCorpus and shortage of data in SmallCorpus.

Keyphrases: Tacotron2, Vietnamese speech synthesis, deep learning, speech synthesis, speech synthesis system, text-to-speech

Links:

https://easychair.org/publications/preprint/cwj5

BibTeX entry

BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:

@booklet{EasyChair:1742,
  author    = {Quang Pham Huu},
  title     = {The End-to-End Speech Synthesis System for the VLSP Campaign 2019},
  howpublished = {EasyChair Preprint 1742},
  year      = {EasyChair, 2019}}

Download PDF Open PDF in browser