In this post, we introduce a Korean Singing Voice Synthesis based on Auto-regressive Boundary Equilibrium GAN, accepted for ICASSP 2020.
Overview
Singing voice synthesis is a complicated task that involves multi-dimensional controls of a singer model, including phonemic modulation by lyrics, pitch control by music score, and natural elements such as breath sounds and vibrato expressions. Recently, end-to-end learning models based on GAN have drawn much interest to overcome the limitation of concatenative synthesis and statistical parametric models. When GAN is applied to the audio domain, it entails several issues: the choice of audio representation to generate, handling temporal continuity between two adjacent outputs, finding an effective loss metric for the audio representation. The proposed system addresses the issues using an auto-regressive GAN that generates spectrogram with the boundary equilibrium objective.
Figure.1 Overview of the proposed singing voice synthesis system.
Auto-Regressive Method
A fundamental issue in the image-based approach when it is applied to audio data is that the model can span only a short audio segment and therefore successive segments generated over time can be discontinuous. To address this problem, we propose an auto-regressive conditional GAN which uses spectrogram in a previous time step as input to produce spectrogram in the current time step. Following figure shows how auto-regressive (AR) method helps generating continuous spectrogram. Without AR method the model generates distinct images of spectrogram but with AR method spectrogram is generated refered to previous spectrogram.
Figure.2 Spectrogram from ground truth and generated spectrograms from the proposed system.
Results
We compared generated samples from the proposed model with ground truth samples and reconstructed samples. The ground truth samples are from original records and the reconstructed samples are processed same as generated samples to evaluate the sound quality loss from signal processing.
Song | Generated | Reconstruction | Ground Truth |
---|---|---|---|
작은 별 Twinkle Twinkle Little Star |
|||
거미가 줄을 타고 Itchy Bitsy Spider |
|||
퐁당퐁당 Plop Plop |
|||
마법의 성 Magic Castle |
|||
빙고 Bingo |
|||
나비야 Butterfly |
|||
알파벳 송 Alphabet Song |
|||
솜사탕 Cotten Candy |