なんだか最近、当該分野でEnd-to-End系の論文が急に増えたなぁということで、忘れないうちに自分用にメモ。面白そうな論文情報も含めて。もうね、正直言ってお腹いっぱいなんですけど、流れには逆らえないですね。ほとんどarXivなので、信頼性は担保されておらず、あくまで参考までに。気が向いたら一言コメントつけます。
※音声認識系はあえて外しました。
Paper
- Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders
- URL https://arxiv.org/abs/1704.01279
- Blog & Demo NSynth: Neural Audio Synthesis
- Google Brain and DeepMind’s work
- Tacotron: A Fully End-to-End Text-To-Speech Synthesis Model
- URL https://arxiv.org/abs/1703.10135
- Demo https://google.github.io/tacotron/
- Google’s work, "submitted to Interspeech 2017"
- MidiNet: A Convolutional Generative Adversarial Network for Symbolic-domain Music Generation using 1D and 2D Conditions
- URL https://arxiv.org/abs/1703.10847
- Academia Sinica’s work
- SEGAN: Speech Enhancement Generative Adversarial Network
- URL https://arxiv.org/abs/1703.09452
- Demo http://veu.talp.cat/segan/
- Code https://github.com/santi-pdp/segan
- a method of end-to-end speech enhancement
- Raw Waveform-based Speech Enhancement by Fully Convolutional Networks
- URL https://arxiv.org/abs/1703.02205
- a method of end-to-end speech enhancement
- Deep Voice: Real-time Neural Text-to-Speech
- URL https://arxiv.org/abs/1702.07825
- Demo http://research.baidu.com/deep-voice-production-quality-text-speech-system-constructed-entirely-deep-neural-networks/
- Baidu’s work; a method of end-to-end speech synthesis
- Char2Wav: End-to-End Speech Synthesis
- SampleRNN: An Unconditional End-to-End Neural Audio Generation Model
- WaveNet: A Generative Model for Raw Audio
GAN系でとりあえず以下。それにしてもGAN系の論文も、タケノコのようにポコポコ出てきますね。
- Towards Principled Methods for Training Generative Adversarial Networks
- Wasserstein GAN
- Improved Training of Wasserstein GANs
- Voice Conversion from Unaligned Corpora using Variational Autoencoding Wasserstein Generative Adversarial Networks
- BEGAN: Boundary Equilibrium Generative Adversarial Networks
- Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks
以下も参考までに。
Slide
[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adversarial Networks from DeepLearningJP2016
www.slideshare.net- Generative Model-Based Text-to-Speech Synthesis
- 音響分野におけるブラインド適応信号処理の展開
- 音声信号の分析と加工 ― 音声を自在に変換するには?
- 音声変換技術の進展と課題
Website
- Fantastic GANs and where to find them