HierSpeech++
Sang-Hoon Lee, Ha-Yeong Choi, Seung-Bin Kim, Seong-Whan Lee
Korea University
Overall framework of Hierspeech++
In this page, all audio samples are generated by hierspeech++ (v1). We will release the hierspeech++ (v2), a multi-lingual hierspeech++ soon.
We utilize a LibriTTS dataset to train the TTS model.
Online TTS Demo is available on [Hugging Face Spaces]
Zero-shot TTS with Expressive Dataset
Abstract In this work, we once again significantly improve the naturalness and speaker similarity of the synthetic speech, even in the zero-shot speech synthesis scenarios. |
||||
---|---|---|---|---|
Prompt 1 (Whisper) |
Prompt 2(Angry) |
Prompt 3 (Laughing) |
Prompt 4 (Sleepy) |
Prompt 5 (Steve jobs) |
HierSpeech++ |
HierSpeech++ |
HierSpeech++ |
HierSpeech++ |
HierSpeech++ |
All speakers are unseen during training
Sentence 1 (121_121726_000029_000003) In Germany, they generally "Hock the Kaiser." |
||||
---|---|---|---|---|
GT |
YourTTS |
HierSpeech |
Vall-E-X |
XTTS(v1) |
HierSpeech++ |
HierSpeech++ |
HierSpeech++ |
Sentence 2 (237_126133_000047_000000) "Don't mind it, Polly," whispered Jasper; "twasn't her fault." |
||||
---|---|---|---|---|
GT |
YourTTS |
HierSpeech |
Vall-E-X |
XTTS(v1) |
HierSpeech++ |
HierSpeech++ |
HierSpeech++ |
Sentence 3 (260_123288_000020_000000) "The sail! the sail!" I cry, motioning to lower it." |
||||
---|---|---|---|---|
GT |
YourTTS |
HierSpeech |
Vall-E-X |
XTTS(v1) |
HierSpeech++ |
HierSpeech++ |
HierSpeech++ |
Sentence 4 (908_31957_000017_000001) A ring of amethyst I could not wear here, plainer to my sight, Than that first kiss. |
||||
---|---|---|---|---|
GT |
YourTTS |
HierSpeech |
Vall-E-X |
XTTS(v1) |
HierSpeech++ |
HierSpeech++ |
HierSpeech++ |
Sentence 5 (5683_32879_000025_000000) 'No, indeed, Dorcas--never, and never will; and I think, though I have learned to fear death, I would rather die than let Stanley even suspect it.' |
||||
---|---|---|---|---|
GT |
YourTTS |
HierSpeech |
Vall-E-X |
XTTS(v1) |
HierSpeech++ |
HierSpeech++ |
HierSpeech++ |
Sentence 6 (7021_85628_000026_000000) The Princess certainly was beautiful, and he would have dearly liked to be kissed by her, but the cap which his mother had made he would not give up on any condition. |
||||
---|---|---|---|---|
GT |
YourTTS |
HierSpeech |
Vall-E-X |
XTTS(v1) |
HierSpeech++ |
HierSpeech++ |
HierSpeech++ |
All speakers are unseen during training
Sentence 1 And lay me down in my cold bed and leave my shining lot. |
|||
---|---|---|---|
GT |
Prompt |
||
Vall-E |
NaturalSpeech 2 |
StyleTTS 2 |
HierSpeech++ |
Sentence 2 Yea, his honourable worship is within, but he hath a godly minister or two with him, and likewise a leech. |
|||
---|---|---|---|
GT |
Prompt |
||
Vall-E |
NaturalSpeech 2 |
StyleTTS 2 |
HierSpeech++ |
Sentence 3 The army found the people in poverty and left them in comparative wealth. |
|||
---|---|---|---|
GT |
Prompt |
||
Vall-E |
NaturalSpeech 2 |
StyleTTS 2 |
HierSpeech++ |
Sentence 4 Thus did this humane and right minded father comfort his unhappy daughter, and her mother embracing her again, did all she could to soothe her feelings. |
|||
---|---|---|---|
GT |
Prompt |
||
Vall-E |
NaturalSpeech 2 |
StyleTTS 2 |
HierSpeech++ |
The audio samples are from Mega-TTS demo page.
Sentence 1 Let's go drink until we can't feel feelings anymore. |
||
---|---|---|
Prompt (Sponge Bob) |
Mega-TTS |
HierSpeech++ |
Sentence 2 Uh, it's not like the internet to go crazy about something small and stupid. |
||
---|---|---|
Prompt (Peter Griffin) |
Mega-TTS |
HierSpeech++ |
Sentence 3 Then I would never talk to that person about boa constrictors, or primeval forests, or stars. I would bring myself down to his level. |
||
---|---|---|
Prompt (Rick) |
Mega-TTS |
HierSpeech++ |
Sentence 4 In what a disgraceful light might it not strike so vain a man! |
||
---|---|---|
Prompt (Morty) |
Mega-TTS |
HierSpeech++ |
All speakers are unseen during training
Source Speaker | Target Speaker | Converted | |||
---|---|---|---|---|---|
GT ( p228 ) |
GT ( p233 ) |
AutoVC |
VoiceMixer |
DiffVC |
|
Diff-HierVC |
DDDM-VC |
YourTTS |
HierVST (Ours) |
||
HierSpeech++ |
HierSpeech++ |
HierSpeech++ | |||
GT ( p233 ) |
GT ( p227 ) |
AutoVC |
VoiceMixer |
DiffVC |
|
Diff-HierVC |
DDDM-VC |
YourTTS |
HierVST (Ours) |
||
HierSpeech++ |
HierSpeech++ |
HierSpeech++ | |||
GT ( p240 ) |
GT ( p236 ) |
AutoVC |
VoiceMixer |
DiffVC |
|
Diff-HierVC |
DDDM-VC |
YourTTS |
HierVST (Ours) |
||
HierSpeech++ |
HierSpeech++ |
HierSpeech++ |
Speech Super-resoltuion (16k --> 48k)
For more accurate listening, it is recommended to conduct a simple audible frequency test. WARNING: HIGH-FREQUENCY SAMPLES WITH LOUD VOLUME MAY HAVE PAINFUL SOUNDS. 2000 Hz 22000 Hz |
---|
Sentence 1 "No," said Trot, positively, there's been enough patching in this country and I won't have any more of it. |
||
---|---|---|
GT (16 kHz) |
AudioSR |
SpeechSR (Ours) |
HierSpeech++ (16 kHz) |
HierSpeech++ (+AudioSR) |
HierSpeech++ (+SpeechSR) |
Sentence 2 But tell me, please, what you intend to do with this new lot of the Powder of Life, which Dr. Pipt is making. |
||
---|---|---|
GT (16 kHz) |
AudioSR |
SpeechSR (Ours) |
HierSpeech++ (16 kHz) |
HierSpeech++ (+AudioSR) |
HierSpeech++ (+SpeechSR) |
Sentence 3 The end he had been born to serve yet did not see had led him to escape by an unseen path and now it beckoned to him once more and a new adventure was about to be opened to him. |
||
---|---|---|
GT (16 kHz) |
AudioSR |
SpeechSR (Ours) |
HierSpeech++ (16 kHz) |
HierSpeech++ (+AudioSR) |
HierSpeech++ (+SpeechSR) |