In this post, we introduce a Korean Singing Voice Synthesis based on Auto-regressive Boundary Equilibrium GAN, accepted for ICASSP 2020.
Singing voice synthesis is a complicated task that involves multi-dimensional controls of a singer model, including phonemic modulation by lyrics, pitch control by music score, and natural elements such as breath sounds and vibrato expressions. Recently, end-to-end learning models based on GAN have drawn much interest to overcome the limitation of concatenative synthesis and statistical parametric models. When GAN is applied to the audio domain, it entails several issues: the choice of audio representation to generate, handling temporal continuity between two adjacent outputs, finding an effective loss metric for the audio representation. The proposed system addresses the issues using an auto-regressive GAN that generates spectrogram with the boundary equilibrium objective.
Figure.1 Overview of the proposed singing voice synthesis system.
A fundamental issue in the image-based approach when it is applied to audio data is that the model can span only a short audio segment and therefore successive segments generated over time can be discontinuous. To address this problem, we propose an auto-regressive conditional GAN which uses spectrogram in a previous time step as input to produce spectrogram in the current time step. Following figure shows how auto-regressive (AR) method helps generating continuous spectrogram. Without AR method the model generates distinct images of spectrogram but with AR method spectrogram is generated refered to previous spectrogram.
Figure.2 Spectrogram from ground truth and generated spectrograms from the proposed system.
We compared generated samples from the proposed model with ground truth samples and reconstructed samples. The ground truth samples are from original records and the reconstructed samples are processed same as generated samples to evaluate the sound quality loss from signal processing.
| 작은 별
| 거미가 줄을 타고
Itchy Bitsy Spider
| 마법의 성
| 알파벳 송