Speech Prosody in Speech Synthesis (2015) という専門書

Springerから2015年に出版されている。

f:id:tam5917:20160316193036j:plain

この本は "Prosody, Phonology and Phonetics" というシリーズの中の一冊である。シリーズ設立の趣旨を見てみよう:

The series will publish studies in the general area of Speech Prosody with a particular (but non-exclusive) focus on the importance of phonetics and phonology in this field. The topic of speech prosody is today a far larger area of research than is often realised. The number of papers on the topic presented at large international conferences such as Interspeech and ICPhS is considerable and regularly increasing. The proposed book series would be the natural place to publish extended versions of papers presented at the Speech Prosody Conferences, in particular the papers presented in Special Sessions at the conference. This could potentially involve the publication of 3 or 4 volumes every two years ensuring a stable future for the book series. If such publications are produced fairly rapidly, they will in turn provide a strong incentive for the organisation of other special sessions at future Speech Prosody conferences.

国際会議で発表した論文をさらに強化して出版する場の役割も果たす、とある。まぁ趣旨は理解できる。2年ごとに3冊ないし4冊の本を出す予定、とのことだ。2016年3月現在で4冊出ているので、次出るとするなら2017年以降だろう。

この本自体の趣旨は以下の通りである(Speech Prosody in Speech Synthesis: Modeling and generation | Keikichi Hirose | Springerより):

The volume addresses issues concerning prosody generation in speech synthesis, including prosody modeling, how we can convey para- and non-linguistic information in speech synthesis, and prosody control in speech synthesis (including prosody conversions). A high level of quality has already been achieved in speech synthesis by using selection-based methods with segments of human speech. Although the method enables synthetic speech with various voice qualities and speaking styles, it requires large speech corpora with targeted quality and style.
Accordingly, speech conversion techniques are now of growing interest among researchers. HMM/GMM-based methods are widely used, but entail several major problems when viewed from the prosody perspective; prosodic features cover a wider time span than segmental features and their frame-by-frame processing is not always appropriate. The book offers a good overview of state-of-the-art studies on prosody in speech synthesis.

本の中身にざっと目を通してはみたが、やはり基礎的な知識を補充するというよりは、最新の結果を踏まえた上で音声合成における韻律に関する研究トピックを各研究者が紹介・解説しており、したがって関連分野の研究に従事している大学院生や研究者向けの本である。トピック的には確かに充実している。個人的にはHMM音声合成と関連するトピックがおさえてあって嬉しい。

Part I Modeling of Prosody
  • Chapter 1 ProZed: A Speech Prosody Editor for Linguists, Using Analysis-by-Synthesis (by Daniel J. Hirst)
  • Chapter 2 Degrees of Freedom in Prosody Modeling (by Yi Xu and Santitham Prom-on)
  • Chapter 3 Extraction, Analysis and Synthesis of Fujisaki model Parameters (by Hansjörg Mixdorff)
  • Chapter 4 Probabilistic Modeling of Pitch Contours Toward Prosody Synthesis and Conversion (by Hirokazu Kameoka)
Part II Para- and Non-Linguistic Issues of Prosody
  • Chapter 5 Communicative Speech Synthesis as Pan-Linguistic Prosody Control (by Yoshinori Sagisaka and Yoko Greenberg)
  • Chapter 6 Mandarin Stress Analysis and Prediction for Speech Synthesis (by Ya Li and Jianhua Tao)
  • Chapter 7 Expressivity in Interactive Speech Synthesis; Some Paralinguistic and Nonlinguistic Issues of Speech Prosody for Conversational Dialogue Systems (by Nick Campbell and Ya Li)
  • Chapter 8 Temporally Variable Multi attribute Morphing of Arbitrarily Voices for Exploratory Research of Speech Prosody (by Hideki Kawahara)
Part III Control of Prosody in Speech Synthesis
  • Chapter 9 Statistical Models for Dealing with Discontinuity of Fundamental Frequency (by Kai Yu)
  • Chapter 10 Use of Generation Process Model for Improved Control of Fundamental Frequency Contours in HMM-Based Speech Synthesis (by Keikichi Hirose)
  • Chapter 11 Tone Nucleus Model for Emotional Mandarin Speech Synthesis (by Miaomiao Wang)
  • Chapter 12 Emphasis, Word Prominence, and Continuous Wavelet Transform in the Control of HMM-Based Synthesis (by Martti Vainio, Antti Suni and Daniel Aalto)
  • Chapter 13 Exploiting Alternatives for Text-To-Speech Synthesis: From Machine to Human (by Nicolas Obin, Christophe Veaux and Pierre Lanchantin)
  • Chapter 14 Prosody Control and Variation Enhancement Techniques for HMM-Based Expressive Speech Synthesis (by Takao Kobayashi)