ElevenLabs 文字轉語音 (TTS)

簡介

ElevenLabs 提供使用深度學習的自然發聲語音合成軟體。其 AI 音訊模型以 32 種語言生成逼真、多功能且上下文感知的語音、聲音和音效。ElevenLabs 文字轉語音 API 使使用者能夠透過超逼真的 AI 旁白將任何書籍、文章、PDF、新聞簡報或文字生動地呈現出來。

先決條件

建立 ElevenLabs 帳戶並獲取 API 金鑰。您可以在 ElevenLabs 註冊頁面註冊。登入後，您的 API 金鑰可以在您的個人資料頁面找到。
將 spring-ai-elevenlabs 依賴項新增到您的專案構建檔案中。更多資訊，請參閱依賴管理部分。

自動配置

Spring AI 為 ElevenLabs 文字轉語音客戶端提供 Spring Boot 自動配置。要啟用它，請將以下依賴項新增到您的專案 Maven pom.xml 檔案中

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-model-elevenlabs</artifactId>
</dependency>

或新增到您的 Gradle build.gradle 構建檔案中

dependencies {
    implementation 'org.springframework.ai:spring-ai-starter-model-elevenlabs'
}

請參閱依賴管理部分，將 Spring AI BOM 新增到您的構建檔案中。

語音屬性

連線屬性

字首 spring.ai.elevenlabs 用作所有 ElevenLabs 相關配置（包括連線和 TTS 特定設定）的屬性字首。這在 ElevenLabsConnectionProperties 中定義。

財產

描述

預設值

spring.ai.elevenlabs.base-url

ElevenLabs API 的基本 URL。

api.elevenlabs.io

spring.ai.elevenlabs.api-key

您的 ElevenLabs API 金鑰。

配置屬性

音訊語音自動配置的啟用和停用現在透過字首為 spring.ai.model.audio.speech 的頂級屬性進行配置。

要啟用，spring.ai.model.audio.speech=elevenlabs（預設已啟用）

要停用，spring.ai.model.audio.speech=none（或任何不匹配 elevenlabs 的值）

此更改是為了允許配置多個模型。

字首 spring.ai.elevenlabs.tts 用作屬性字首，專門用於配置 ElevenLabs 文字轉語音客戶端。這在 ElevenLabsSpeechProperties 中定義。

財產	描述	預設值
spring.ai.model.audio.speech	啟用音訊語音模型	elevenlabs
spring.ai.elevenlabs.tts.options.model-id	要使用的模型 ID。	eleven_turbo_v2_5
spring.ai.elevenlabs.tts.options.voice-id	要使用的語音 ID。這是語音 ID，而不是語音名稱。	9BWtsMINqrJLrRacOk9x
spring.ai.elevenlabs.tts.options.output-format	生成的音訊的輸出格式。請參閱下面的輸出格式。	mp3_22050_32

財產

描述

預設值

spring.ai.model.audio.speech

啟用音訊語音模型

elevenlabs

spring.ai.elevenlabs.tts.options.model-id

要使用的模型 ID。

eleven_turbo_v2_5

spring.ai.elevenlabs.tts.options.voice-id

要使用的語音 ID。這是 語音 ID，而不是語音名稱。

9BWtsMINqrJLrRacOk9x

spring.ai.elevenlabs.tts.options.output-format

生成的音訊的輸出格式。請參閱下面的輸出格式。

mp3_22050_32

基本 URL 和 API 金鑰也可以使用 spring.ai.elevenlabs.tts.base-url 和 spring.ai.elevenlabs.tts.api-key 專門為 TTS 進行配置。但是，通常建議為了簡單起見使用全域性 spring.ai.elevenlabs 字首，除非您有特定原因需要為不同的 ElevenLabs 服務使用不同的憑據。更具體的 tts 屬性將覆蓋全域性屬性。

所有以 spring.ai.elevenlabs.tts.options 為字首的屬性都可以在執行時被覆蓋。

表 1. 可用輸出格式
列舉值	描述
MP3_22050_32	MP3, 22.05 kHz, 32 kbps
MP3_44100_32	MP3, 44.1 kHz, 32 kbps
MP3_44100_64	MP3, 44.1 kHz, 64 kbps
MP3_44100_96	MP3, 44.1 kHz, 96 kbps
MP3_44100_128	MP3, 44.1 kHz, 128 kbps
MP3_44100_192	MP3, 44.1 kHz, 192 kbps
PCM_8000	PCM, 8 kHz
PCM_16000	PCM, 16 kHz
PCM_22050	PCM, 22.05 kHz
PCM_24000	PCM, 24 kHz
PCM_44100	PCM, 44.1 kHz
PCM_48000	PCM, 48 kHz
ULAW_8000	µ-law, 8 kHz
ALAW_8000	A-law, 8 kHz
OPUS_48000_32	Opus, 48 kHz, 32 kbps
OPUS_48000_64	Opus, 48 kHz, 64 kbps
OPUS_48000_96	Opus, 48 kHz, 96 kbps
OPUS_48000_128	Opus, 48 kHz, 128 kbps
OPUS_48000_192	Opus, 48 kHz, 192 kbps

執行時選項

ElevenLabsTextToSpeechOptions 類提供了在進行文字轉語音請求時使用的選項。啟動時，使用 spring.ai.elevenlabs.tts 指定的選項，但您可以在執行時覆蓋這些選項。以下是可用選項：

modelId：要使用的模型 ID。
voiceId：要使用的語音 ID。
outputFormat：生成的音訊的輸出格式。
voiceSettings：一個包含語音設定的物件，例如 stability（穩定性）、similarityBoost（相似度提升）、style（風格）、useSpeakerBoost（使用揚聲器增強）和 speed（速度）。
enableLogging：一個布林值，用於啟用或停用日誌記錄。
languageCode：輸入文字的語言程式碼（例如，"en" 代表英語）。
pronunciationDictionaryLocators：發音詞典定位器列表。
seed：用於隨機數生成的種子，以實現可復現性。
previousText：主要文字之前的文字，用於多輪對話中的上下文。
nextText：主要文字之後的文字，用於多輪對話中的上下文。
previousRequestIds：對話中先前輪次的請求 ID。
nextRequestIds：對話中後續輪次的請求 ID。
applyTextNormalization：應用文字規範化（"auto"、"on" 或 "off"）。
applyLanguageTextNormalization：應用語言文字規範化。

例如：

ElevenLabsTextToSpeechOptions speechOptions = ElevenLabsTextToSpeechOptions.builder()
    .model("eleven_multilingual_v2")
    .voiceId("your_voice_id")
    .outputFormat(ElevenLabsApi.OutputFormat.MP3_44100_128.getValue())
    .build();

TextToSpeechPrompt speechPrompt = new TextToSpeechPrompt("Hello, this is a text-to-speech example.", speechOptions);
TextToSpeechResponse response = elevenLabsTextToSpeechModel.call(speechPrompt);

使用語音設定

您可以透過在選項中提供 VoiceSettings 來自定義語音輸出。這使您可以控制穩定性、相似度等屬性。

var voiceSettings = new ElevenLabsApi.SpeechRequest.VoiceSettings(0.75f, 0.75f, 0.0f, true);

ElevenLabsTextToSpeechOptions speechOptions = ElevenLabsTextToSpeechOptions.builder()
    .model("eleven_multilingual_v2")
    .voiceId("your_voice_id")
    .voiceSettings(voiceSettings)
    .build();

TextToSpeechPrompt speechPrompt = new TextToSpeechPrompt("This is a test with custom voice settings!", speechOptions);
TextToSpeechResponse response = elevenLabsTextToSpeechModel.call(speechPrompt);

手動配置

將 spring-ai-elevenlabs 依賴項新增到您的專案 Maven pom.xml 檔案中

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-elevenlabs</artifactId>
</dependency>

或新增到您的 Gradle build.gradle 構建檔案中

dependencies {
    implementation 'org.springframework.ai:spring-ai-elevenlabs'
}

請參閱依賴管理部分，將 Spring AI BOM 新增到您的構建檔案中。

接下來，建立一個 ElevenLabsTextToSpeechModel

ElevenLabsApi elevenLabsApi = ElevenLabsApi.builder()
		.apiKey(System.getenv("ELEVEN_LABS_API_KEY"))
		.build();

ElevenLabsTextToSpeechModel elevenLabsTextToSpeechModel = ElevenLabsTextToSpeechModel.builder()
	.elevenLabsApi(elevenLabsApi)
	.defaultOptions(ElevenLabsTextToSpeechOptions.builder()
		.model("eleven_turbo_v2_5")
		.voiceId("your_voice_id") // e.g. "9BWtsMINqrJLrRacOk9x"
		.outputFormat("mp3_44100_128")
		.build())
	.build();

// The call will use the default options configured above.
TextToSpeechPrompt speechPrompt = new TextToSpeechPrompt("Hello, this is a text-to-speech example.");
TextToSpeechResponse response = elevenLabsTextToSpeechModel.call(speechPrompt);

byte[] responseAsBytes = response.getResult().getOutput();

即時音訊流式傳輸

ElevenLabs 語音 API 支援使用分塊傳輸編碼進行即時音訊流式傳輸。這允許在生成整個音訊檔案之前開始音訊播放。

ElevenLabsApi elevenLabsApi = ElevenLabsApi.builder()
		.apiKey(System.getenv("ELEVEN_LABS_API_KEY"))
		.build();

ElevenLabsTextToSpeechModel elevenLabsTextToSpeechModel = ElevenLabsTextToSpeechModel.builder()
	.elevenLabsApi(elevenLabsApi)
	.build();

ElevenLabsTextToSpeechOptions streamingOptions = ElevenLabsTextToSpeechOptions.builder()
    .model("eleven_turbo_v2_5")
    .voiceId("your_voice_id")
    .outputFormat("mp3_44100_128")
    .build();

TextToSpeechPrompt speechPrompt = new TextToSpeechPrompt("Today is a wonderful day to build something people love!", streamingOptions);

Flux<TextToSpeechResponse> responseStream = elevenLabsTextToSpeechModel.stream(speechPrompt);

// Process the stream, e.g., play the audio chunks
responseStream.subscribe(speechResponse -> {
    byte[] audioChunk = speechResponse.getResult().getOutput();
    // Play the audioChunk
});

語音 API

ElevenLabs 語音 API 允許您檢索有關可用語音、其設定和預設語音設定的資訊。您可以使用此 API 發現要在語音請求中使用的 `voiceId`。

要使用語音 API，您需要建立一個 ElevenLabsVoicesApi 例項

ElevenLabsVoicesApi voicesApi = ElevenLabsVoicesApi.builder()
        .apiKey(System.getenv("ELEVEN_LABS_API_KEY"))
        .build();

然後可以使用以下方法：

getVoices()：檢索所有可用語音的列表。
getDefaultVoiceSettings()：獲取語音的預設設定。
getVoiceSettings(String voiceId)：返回特定語音的設定。
getVoice(String voiceId)：返回特定語音的元資料。

示例

// Get all voices
ResponseEntity<ElevenLabsVoicesApi.Voices> voicesResponse = voicesApi.getVoices();
List<ElevenLabsVoicesApi.Voice> voices = voicesResponse.getBody().voices();

// Get default voice settings
ResponseEntity<ElevenLabsVoicesApi.VoiceSettings> defaultSettingsResponse = voicesApi.getDefaultVoiceSettings();
ElevenLabsVoicesApi.VoiceSettings defaultSettings = defaultSettingsResponse.getBody();

// Get settings for a specific voice
ResponseEntity<ElevenLabsVoicesApi.VoiceSettings> voiceSettingsResponse = voicesApi.getVoiceSettings(voiceId);
ElevenLabsVoicesApi.VoiceSettings voiceSettings = voiceSettingsResponse.getBody();

// Get details for a specific voice
ResponseEntity<ElevenLabsVoicesApi.Voice> voiceDetailsResponse = voicesApi.getVoice(voiceId);
ElevenLabsVoicesApi.Voice voiceDetails = voiceDetailsResponse.getBody();

示例程式碼

ElevenLabsTextToSpeechModelIT.java 測試提供了一些如何使用該庫的通用示例。
ElevenLabsApiIT.java 測試提供了使用低階 ElevenLabsApi 的示例。