SPEECHSPLIT2: DISENTANGLING SPEECH INFORMATION STREAMS WITHOUT EXHAUSTIVE BOTTLENECK FINE-TUNING
Chan, Chak Ho
Loading…
Permalink
https://hdl.handle.net/2142/124961
Description
Title
SPEECHSPLIT2: DISENTANGLING SPEECH INFORMATION STREAMS WITHOUT EXHAUSTIVE BOTTLENECK FINE-TUNING
Author(s)
Chan, Chak Ho
Issue Date
2021-05-01
Keyword(s)
VoiceConversion; Speech Disentanglement; Signal Processing
Abstract
SpeechSplit is among the first algorithms that successfully disentangle speech into four components: rhythm, content, pitch, and timbre. However, the model requires exhaustive fune-tuning of the bottleneck dimensions of the encoders, which can be a daunting task and limits its generalization ability. In this work, we propose SpeechSplit2, an improved version of SpeechSplit, in which simple signal processing methods are utilized to alleviate the laborious bottleneck fine-tuning problem. We show that by feeding di↵erent inputs to each encoder, we can control the input space to the neural networks so that each component only contains the information that we desire to extract, given the bottleneck size is su ciently large to encode the corresponding information. With the same neural network architecture as SpeechSplit, SpeechSplit2 achieves comparable performance in disentangling speech components when the bottlenecks are carefully fine-tuned and shows superior advantage over the baseline when the bottleneck size varies.
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.