Voice Generation using ElevenLabs
In this post we are going to explore text to speech using the best Generative Voice AI technology by ElevenLabs.
Here is an example of what can be generated in a few lines of code:
Start by creating a Python environment and activate it
python -m venv /path/to/new/virtual/environment
source /path/to/new/virtual/environment/bin/activate
Then install the requirements
pip install -r requirements.txt
Create a .env file with the following contents
ELEVENLABS_API_KEY="Your API KEY"
Import the necessary libraries
import dotenv
import os
import requests
First, you need to select the voice to use. For a list of all the available voices see https://api.elevenlabs.io/v1/voices.
In this case we are going to use Joseph, voice_id = 'Zlb1dXrM653N07WRdFW3'
Second, we need the URL for text to speech API endpoint in streaming mode
voice_id = 'Zlb1dXrM653N07WRdFW3'
url = 'https://api.elevenlabs.io/v1/text-to-speech/' + voice_id + '/stream?optimize_streaming_latency=0'
Third, load the environment file to access the value of the ELEVENLABS_API_KEY variable and pass it in the headers of the request
dotenv.load_dotenv('.env')
ELEVENLABS_API_KEY = os.environ.get('ELEVENLABS_API_KEY')
headers = {
'accept': '*/*',
'xi-api-key': ELEVENLABS_API_KEY,
'Content-Type': 'application/json'
}
Fourth, write a script to generate audio
script = """Good evening, Gotham City! It is I, the Joker here (ha ha ha!)
"""
data = {
"text": script,
"model_id": "eleven_monolingual_v1",
"voice_settings": {
"stability": 0,
"similarity_boost": 0,
"style": 0.5,
"use_speaker_boost": True
}
}
Finally, call the ElevenLabs API and process the result, in this case we are going to save the result as an audio file
response = requests.post(url, headers=headers, json=data)
if response.status_code == 200:
# Streaming response
print('Generated, saving...')
with open('output.mp3', 'wb') as f:
for chunk in response.iter_content(chunk_size=CHUNK_SIZE):
if chunk:
f.write(chunk)
else:
print("Error:", response.text)
Run your code and you should see some output like:
python .\test_elevenlabs_stt.py
Current Time is : 20:57:06
Current Time is : 20:57:16
Elapsed time: 10.0059 s
And the audio file will be generated with the name: output.mp3