webページからAivisSpeechを使用した読み上げを行う

やりたいこと
1. 前提
環境
AivisSpeech-Engine（サーバ）の起動
SpeakerIDの取得
コード
結果
参考にさせていただいたサイト

やりたいこと

以下のページではJavaScriptの基本機能であるSpeechSynthesisを使用して読み上げを行いましたが、発音がいまいちでした。

そこでより発音のきれいなAivisSpeechを使用して読み上げを行います。

前提

今回はローカルのWebアプリ（http://localhost/からのアクセスを想定しています。File://も使用できません。）を例に行います。これはCORS Policyの問題のせいです。設定の変更により他のＰＣからもアクセスできるようですが試していません。

環境

AivisSpeech Engine version 1.0.0

AivisSpeech-Engine（サーバ）の起動

AivisSpeechをインストールしていない場合は、以下のページを参考にインストールします。

コマンドラインから以下のコマンドを実行しサーバを起動します。

C:\Program Files\AivisSpeech\AivisSpeech-Engine¥run.exe

ユーザ毎のインストールを行った場合はパスは以下のようになります。

\AppData\Local\Programs\AivisSpeech\AivisSpeech-Engine\run.exe

最初はモデルのダウンロードが行われるため時間がかかります。（AivisSpeechを起動したことがあってもAivisSpeech-Engineを直接起動するのが初めての場合、ダウンロードが行われます。）

[2025/01/27 09:02:52] INFO:     Started server process [19536]
[2025/01/27 09:02:52] INFO:     Waiting for application startup.
[2025/01/27 09:02:52] INFO:     Application startup complete.
[2025/01/27 09:02:52] INFO:     Uvicorn running on http://localhost:10101 (Press CTRL+C to quit)

上記のログが表示されたらChromeなどのブラウザでhttp://localhost:10101を開きます。

以下のように表示されれば正しくサーバが起動されています。

SpeakerIDの取得

以下のURLにアクセスします。

http://localhost:10101/speakers

見やすいようにプリティ　プリントにチェックをいれます。

使用したい設定のid(speaker_uuidではないので注意)をメモします。

ここではAnneliのノーマルを使用するため888753760を使用します。

コード

HTMLとJSを作成します。ただし、File://からはAivisSpeech-Engineにリクエストするのが失敗するのでNode.jsやLiveServerを使用してローカルサーバを起動する必要があります。

LiveServerの使用方法は以下を参照。

index.html

<!doctype html>
<html lang="en">

<head>
  <meta charset="UTF-8" />
  <title>TEST</title>
</head>

<body>
  <input type="text" id="text" class="txt" value="読み上げます" required><br><br>
  <input type="button" value="読み上げ" id="execute"><br><br>
  <script type="module" src="/src/main.js"></script>
</body>

</html>

htmlでは読み上げ対象のテキストボックスと読み上げボタンを作成しました。

main.js（コード中の888753760はSpeakerIDの取得で取得した値に置き換えます。）

document.getElementById("execute").onclick = function (event) {
  readText(document.getElementById("text").value)
};

function readText(text) {
  const xhr = new XMLHttpRequest();
  const url = 'http://localhost:10101/audio_query?speaker=888753760&text=' + text;
  xhr.open("POST", url, false);
  xhr.send();
  const res_str = xhr.responseText;

  const xhr_synth = new XMLHttpRequest();
  const url_synth = 'http://localhost:10101/synthesis?speaker=888753760';
  xhr_synth.open("POST", url_synth);
  xhr_synth.setRequestHeader("Content-Type", "application/json");
  xhr_synth.responseType = "arraybuffer";
  xhr_synth.onreadystatechange = async () => {
    if (xhr_synth.readyState === XMLHttpRequest.DONE && xhr_synth.status === 200) {
      const context = new AudioContext();
      const audioBuffer = await (new Promise((res, rej) => {
        context.decodeAudioData(xhr_synth.response, res, rej);
      }));
      const source = context.createBufferSource();
      source.buffer = audioBuffer;
      source.connect(context.destination);
      source.start(0);
    }
  }
  xhr_synth.send(res_str);
}

ざっくりした流れは以下の通りです。

http://localhost:10101/audio_queryにPOSTリクエストで読み上げる情報をサーバーに渡します。

  const xhr = new XMLHttpRequest();
  const url = 'http://localhost:10101/audio_query?speaker=888753760&text=' + text;
  xhr.open("POST", url, false);
  xhr.send();
  const res_str = xhr.responseText;

上記のリクエストで得たJsonを使用してhttp://localhost:10101/synthesisにPOSTリクエストでwavファイルを作成します。

  const xhr_synth = new XMLHttpRequest();
  const url_synth = 'http://localhost:10101/synthesis?speaker=888753760';
  xhr_synth.open("POST", url_synth);
  xhr_synth.setRequestHeader("Content-Type", "application/json");
  xhr_synth.responseType = "arraybuffer";
  xhr_synth.send(res_str);

wavのダウンロードが完了したら以下のコードでwavの再生を行います。

      const context = new AudioContext();
      const audioBuffer = await (new Promise((res, rej) => {
        context.decodeAudioData(xhr_synth.response, res, rej);
      }));
      const source = context.createBufferSource();
      source.buffer = audioBuffer;
      source.connect(context.destination);
      source.start(0);
    }

結果

AivisSpeechを使用して読み上げをすることができました。

参考にさせていただいたサイト

GitHub - Aivis-Project/AivisSpeech-Engine: AivisSpeech Engine: AI Voice Imitation System - Text to Speech Engine

AivisSpeech Engine: AI Voice Imitation System - Text to Speech Engine - Aivis-Project/AivisSpeech-Engine