Speech Prosody in Speech Synthesis (2015) という専門書



この本は "Prosody, Phonology and Phonetics" というシリーズの中の一冊である。シリーズ設立の趣旨を見てみよう:

The series will publish studies in the general area of Speech Prosody with a particular (but non-exclusive) focus on the importance of phonetics and phonology in this field. The topic of speech prosody is today a far larger area of research than is often realised. The number of papers on the topic presented at large international conferences such as Interspeech and ICPhS is considerable and regularly increasing. The proposed book series would be the natural place to publish extended versions of papers presented at the Speech Prosody Conferences, in particular the papers presented in Special Sessions at the conference. This could potentially involve the publication of 3 or 4 volumes every two years ensuring a stable future for the book series. If such publications are produced fairly rapidly, they will in turn provide a strong incentive for the organisation of other special sessions at future Speech Prosody conferences.


この本自体の趣旨は以下の通りである(Speech Prosody in Speech Synthesis: Modeling and generation | Keikichi Hirose | Springerより):

The volume addresses issues concerning prosody generation in speech synthesis, including prosody modeling, how we can convey para- and non-linguistic information in speech synthesis, and prosody control in speech synthesis (including prosody conversions). A high level of quality has already been achieved in speech synthesis by using selection-based methods with segments of human speech. Although the method enables synthetic speech with various voice qualities and speaking styles, it requires large speech corpora with targeted quality and style.
Accordingly, speech conversion techniques are now of growing interest among researchers. HMM/GMM-based methods are widely used, but entail several major problems when viewed from the prosody perspective; prosodic features cover a wider time span than segmental features and their frame-by-frame processing is not always appropriate. The book offers a good overview of state-of-the-art studies on prosody in speech synthesis.


Part I Modeling of Prosody
  • Chapter 1 ProZed: A Speech Prosody Editor for Linguists, Using Analysis-by-Synthesis (by Daniel J. Hirst)
  • Chapter 2 Degrees of Freedom in Prosody Modeling (by Yi Xu and Santitham Prom-on)
  • Chapter 3 Extraction, Analysis and Synthesis of Fujisaki model Parameters (by Hansjörg Mixdorff)
  • Chapter 4 Probabilistic Modeling of Pitch Contours Toward Prosody Synthesis and Conversion (by Hirokazu Kameoka)
Part II Para- and Non-Linguistic Issues of Prosody
  • Chapter 5 Communicative Speech Synthesis as Pan-Linguistic Prosody Control (by Yoshinori Sagisaka and Yoko Greenberg)
  • Chapter 6 Mandarin Stress Analysis and Prediction for Speech Synthesis (by Ya Li and Jianhua Tao)
  • Chapter 7 Expressivity in Interactive Speech Synthesis; Some Paralinguistic and Nonlinguistic Issues of Speech Prosody for Conversational Dialogue Systems (by Nick Campbell and Ya Li)
  • Chapter 8 Temporally Variable Multi attribute Morphing of Arbitrarily Voices for Exploratory Research of Speech Prosody (by Hideki Kawahara)
Part III Control of Prosody in Speech Synthesis
  • Chapter 9 Statistical Models for Dealing with Discontinuity of Fundamental Frequency (by Kai Yu)
  • Chapter 10 Use of Generation Process Model for Improved Control of Fundamental Frequency Contours in HMM-Based Speech Synthesis (by Keikichi Hirose)
  • Chapter 11 Tone Nucleus Model for Emotional Mandarin Speech Synthesis (by Miaomiao Wang)
  • Chapter 12 Emphasis, Word Prominence, and Continuous Wavelet Transform in the Control of HMM-Based Synthesis (by Martti Vainio, Antti Suni and Daniel Aalto)
  • Chapter 13 Exploiting Alternatives for Text-To-Speech Synthesis: From Machine to Human (by Nicolas Obin, Christophe Veaux and Pierre Lanchantin)
  • Chapter 14 Prosody Control and Variation Enhancement Techniques for HMM-Based Expressive Speech Synthesis (by Takao Kobayashi)