2019-04-30

簡易ボイスチェンジャーのスクリプトを改良：キー入力によるピッチの上下およびフォルマント調整機能

音声 python

表題の通り。
前回の記事でキー入力を監視できるようになったので、特定のキーが押されたらピッチの上下（'u' と 'd'）、などの機能を追加したということ。

gist.github.com

2019-04-30

PyWorldによる簡易ボイスチェンジャーのスクリプトを改良：キー入力で終了するようにした

音声 python

以前の記事で紹介したPyWorldによるスクリプトは、実行を終了するときはCommand + Cなどの強制終了が必要だったが、あまりスマートなやり方ではなかった。
tam5917.hatenablog.com

そこで、キー入力を監視して、特定のキー（ESC or 'q'）が押下されたらスクリプトを終了するように改良した、という話。

実行環境はMac 10.13.6。Python 3.7.3。PyAudio, PyWorld, Numpyは導入済み。
キー入力監視にはライブラリpynput を用いる。

pip3 install pynput

改良したスクリプト（real_time_vc_keylog.py）は以下。
gist.github.com

MacOSのターミナルで上記のスクリプトを実行する際の注意点として、

sudo による実行：これはMacOSのセキュリティ上の制約から来ている
標準エラーに大量の警告：これはthreadingのライブラリ由来かと。とりあえず/dev/nullに捨てる。

というわけで、以下のコマンドラインで実行すると、音声を変換しつつ、キー入力を監視し続ける。冒頭に述べたとおり、ESC or 'q' で終了になる。

sudo python3 real_time_vc_keylog.py 2> /dev/null

グローバル変数を使うので、これもまたスマートではないが、とりあえず動作したので良しとする。

本スクリプトで実装したキー入力の監視機能は以下を参考にしました。ありがとうございました。
PyAudioとpyworldの連携ができますた＾＾ · GitHub

2019-04-28

Google音声認識の結果をGoogle翻訳し、Google Text-to-Speechで音声に戻すPythonスクリプト

音声 python

いい加減タイトルが長くなってきた。
Google翻訳をPythonから使うためのライブラリを使って、音声翻訳をしたということ。

pip3 install SpeechRecognition
pip3 install gTTS
pip3 install googletrans

としてインストール。以下のスクリプト。

#!/usr/bin/env python3

# pip3 install SpeechRecognition
# pip3 install gTTS
# pip3 install googletrans

import speech_recognition as sr
import subprocess
from gtts import gTTS
from googletrans import Translator

out_mp3 = '/tmp/synth.mp3'

if __name__ == "__main__":

    # マイクからの音声入力
    r = sr.Recognizer()
    with sr.Microphone() as source:
        print("話しかけてみましょう！")
        r.adjust_for_ambient_noise(source)
        audio = r.listen(source)

    try:
        # 日本語でGoogle音声認識
        recognized = r.recognize_google(audio, language="ja")
    except sr.UnknownValueError:
        print("Google音声認識は音声を理解できませんでした。")
    except sr.RequestError as e:
        print("Google音声認識サービスからの結果を要求できませんでした;"
              " {0}".format(e))
    else:
        # Google翻訳
        print("翻訳前: " + recognized)
        translator = Translator()
        translated = translator.translate(recognized).text
        print("翻訳後: " + translated)

        # Google TTS
        tts = gTTS(text=translated, lang='en')
        tts.save(out_mp3)  # mp3で音声を保存する仕様
        subprocess.run('afplay ' + out_mp3, shell=True)

2019-04-28

Google音声認識した結果をGoogle Text-to-Speechで音声に戻すPythonスクリプト

音声 python

今回はgTTSライブラリを使って、Google 音声認識の結果をGoogle Text-to-Speech で音声に戻してみた。つまり音声認識と音声合成（TTS）の両方でオンラインのサービスを使ってみた、というわけである。

pip3 install SpeechRecognition
pip3 install gTTS

としてインストールした後に以下のスクリプト。
音声はmp3で保存するというユニークな仕様。

#!/usr/bin/env python3

# pip3 install SpeechRecognition
# pip3 install gTTS

import speech_recognition as sr
import subprocess
from gtts import gTTS

out_mp3 = '/tmp/synth.mp3'

if __name__ == "__main__":

    # マイクからの音声入力
    r = sr.Recognizer()
    with sr.Microphone() as source:
        print("話しかけてみましょう！")
        r.adjust_for_ambient_noise(source)
        audio = r.listen(source)

    try:
        # 日本語でGoogle音声認識
        text = r.recognize_google(audio, language="ja")
    except sr.UnknownValueError:
        print("Google音声認識は音声を理解できませんでした。")
    except sr.RequestError as e:
        print("Google音声認識サービスからの結果を要求できませんでした;"
              " {0}".format(e))
    else:
        print(text)

        # Google TTS
        tts = gTTS(text=text, lang='ja')
        tts.save(out_mp3)  # mp3で音声を保存する仕様
        subprocess.run('afplay ' + out_mp3, shell=True)

2019-04-28

リアルタイムに変化する音声の波形を表示し続けるPythonスクリプト

音声 python

リアルタイムに波形を表示する。「あいうえお」で波形が変わる様子が観察できて面白いと思う。

#!/usr/local/bin/python3
# -*- coding:utf-8 -*-

import numpy as np
import sys

import pyqtgraph as pg
from pyqtgraph.Qt import QtCore, QtGui

import pyaudio

sample_rate = 16000
frame_length = 1024
frame_shift = 80


class PlotWindow:
    def __init__(self):

        self.win = pg.GraphicsWindow()
        self.win.setWindowTitle(u"波形のリアルタイムプロット")
        self.win.resize(1100, 800)
        self.plt = self.win.addPlot()  # プロットのビジュアル関係
        self.ymin = -100
        self.ymax = 80
        self.plt.setYRange(-1.0, 1.0)  # y軸の上限、下限の設定
        self.curve = self.plt.plot()  # プロットデータを入れる場所

        # マイク設定
        self.CHUNK = frame_length  # 1度に読み取る音声のデータ幅
        self.RATE = sample_rate  # サンプリング周波数
        self.audio = pyaudio.PyAudio()
        self.stream = self.audio.open(format=pyaudio.paInt16,
                                      channels=1,
                                      rate=self.RATE,
                                      input=True,
                                      output=True,
                                      frames_per_buffer=self.CHUNK)

        # アップデート時間設定
        self.timer = QtCore.QTimer()
        self.timer.timeout.connect(self.update)
        self.timer.start(5) 

        self.data = np.zeros(self.CHUNK)

    def update(self):
        self.data = self.AudioInput()
        self.curve.setData(self.data)

    def AudioInput(self):
        ret = self.stream.read(self.CHUNK)
        ret = np.frombuffer(ret, dtype="int16") / 32768
        return ret


if __name__ == "__main__":
    plotwin = PlotWindow()

    if (sys.flags.interactive != 1) or not hasattr(QtCore, 'PYQT_VERSION'):
        QtGui.QApplication.instance().exec_()

こちらの記事を参考にしました。ありがとうございました。
takeshid.hatenadiary.jp

2019-04-28

リアルタイムに変化する音声のFFTスペクトルを表示するPythonスクリプト

音声 python

STFTとか何とか。スペクトルがギザギザしてますね。

#!/usr/local/bin/python3
# -*- coding:utf-8 -*-

# プロット関係のライブラリ
import pyqtgraph as pg
from pyqtgraph.Qt import QtCore, QtGui
import numpy as np
import sys

# 音声関係のライブラリ
import pyaudio

sample_rate = 16000
frame_length = 1024
frame_shift = 80


class PlotWindow:
    def __init__(self):

        # プロット初期設定
        self.win = pg.GraphicsWindow()
        self.win.setWindowTitle(u"FFTスペクトルのリアルタイムプロット")
        self.win.resize(1100, 800)
        self.plt = self.win.addPlot()
        self.ymin = -100
        self.ymax = 80
        self.plt.setYRange(self.ymin, self.ymax)

        # yukara
        self.plt.setXRange(0, frame_length / 2, padding=0)
        specAxis = self.plt.getAxis("bottom")
        specAxis.setLabel("Frequency [Hz]")
        specAxis.setScale(sample_rate / 2. / (frame_length / 2 + 1))
        hz_interval = 500
        newXAxis = (
            np.arange(int(sample_rate / 2 / hz_interval)) + 1) * hz_interval
        oriXAxis = newXAxis / (sample_rate / 2. / (frame_length / 2 + 1))
        specAxis.setTicks([zip(oriXAxis, newXAxis)])

        self.curve = self.plt.plot()  # プロットデータを入れる場所
        self.epsiron = 0.0001

        # マイク設定
        self.CHUNK = frame_length
        self.RATE = sample_rate
        self.audio = pyaudio.PyAudio()
        self.stream = self.audio.open(format=pyaudio.paInt16,
                                      channels=1,
                                      rate=self.RATE,
                                      input=True,
                                      output=True,
                                      frames_per_buffer=self.CHUNK)

        # アップデート時間設定
        self.timer = QtCore.QTimer()
        self.timer.timeout.connect(self.update)
        self.timer.start(10)  # 10msごとにupdateを呼び出し

        # 音声データの格納場所(プロットデータ)
        self.data = np.zeros(self.CHUNK)

    def update(self):
        self.data = self.AudioInput()

        y = np.fft.fft(self.data[0:self.CHUNK])
        y = np.abs(y) ** 2
        y = y[0:int(self.CHUNK / 2)]
        y = 20 * np.log10(y + self.epsiron)

        self.curve.setData(y)  # プロットデータを格納

    def AudioInput(self):
        ret = self.stream.read(self.CHUNK)
        ret = np.frombuffer(ret, dtype="int16") / 32768
        return ret


if __name__ == "__main__":
    plotwin = PlotWindow()

    if (sys.flags.interactive != 1) or not hasattr(QtCore, 'PYQT_VERSION'):
        QtGui.QApplication.instance().exec_()

こちらの記事を参考にしました。ありがとうございました。
takeshid.hatenadiary.jp

2019-04-28

リアルタイムに変化する音声のスペクトル包絡を表示するPythonスクリプト

音声 python

要pyqtgraph、numpy、pyaudio、pysas。
母音（あいうえお）で動作確認すると、スペクトル包絡の形状がリアルタイムに変化する様子が見られるので面白い。

#!/usr/local/bin/python3
# -*- coding:utf-8 -*-

# プロット関係のライブラリ
import pyqtgraph as pg
from pyqtgraph.Qt import QtCore, QtGui
import numpy as np
import sys

# 音声関係のライブラリ
import pyaudio
from pysas import World


sample_rate = 16000
frame_length = 1024
frame_shift = 80


class PlotWindow:
    def __init__(self):

        # プロット初期設定
        self.win = pg.GraphicsWindow()
        self.win.setWindowTitle(u"スペクトル包絡")
        self.plt = self.win.addPlot()
        self.ymin = -100
        self.ymax = 0
        self.plt.setYRange(self.ymin, self.ymax)  # y軸の上限、下限の設定

        # Thanks to yukara 氏
        self.plt.setXRange(0, frame_length / 2, padding=0)
        specAxis = self.plt.getAxis("bottom")
        specAxis.setLabel("Frequency [Hz]")
        specAxis.setScale(sample_rate / 2. / (frame_length / 2 + 1))
        hz_interval = 500
        newXAxis = (
            np.arange(int(sample_rate / 2 / hz_interval)) + 1) * hz_interval
        oriXAxis = newXAxis / (sample_rate / 2. / (frame_length / 2 + 1))
        specAxis.setTicks([zip(oriXAxis, newXAxis)])

        self.curve = self.plt.plot()  # プロットデータを入れる場所
        self.epsiron = 0.0001

        # マイク設定
        self.CHUNK = frame_length  # 1度に読み取る音声のデータ幅
        self.RATE = sample_rate  # サンプリング周波数
        self.audio = pyaudio.PyAudio()
        self.stream = self.audio.open(format=pyaudio.paInt16,
                                      channels=1,
                                      rate=self.RATE,
                                      input=True,
                                      output=True,
                                      frames_per_buffer=self.CHUNK)

        # アップデート時間設定
        self.timer = QtCore.QTimer()
        self.timer.timeout.connect(self.update)
        self.timer.start(10)  # 10msごとにupdateを呼び出し

        # 音声データの格納場所(プロットデータ)
        self.data = np.zeros(self.CHUNK)

    def update(self):
        self.data = self.AudioInput()

        world = World(sample_rate)
        _, spec_env = world.spectral_envelope(self.data.astype(np.float64))
        spec = np.mean(spec_env, axis=0)
        spec = 20 * np.log10(spec + self.epsiron)
        self.curve.setData(spec)  # プロットデータを格納

    def AudioInput(self):
        ret = self.stream.read(self.CHUNK)  # 音声の読み取り(バイナリ)
        ret = np.frombuffer(ret, dtype="int16") / 32768
        return ret


if __name__ == "__main__":
    plotwin = PlotWindow()

    if (sys.flags.interactive != 1) or not hasattr(QtCore, 'PYQT_VERSION'):
        QtGui.QApplication.instance().exec_()

音声を取得する部分は以下の記事を参考にしました。ありがとうございました。
takeshid.hatenadiary.jp