Aug 20, 2023 2 min read

Voice Generation using ElevenLabs

In this post we are going to explore text to speech using the best Generative Voice AI technology by ElevenLabs.

Here is an example of what can be generated in a few lines of code:

Aigaze elevenlabs output 001

0:00

/31.477583

Start by creating a Python environment and activate it

python -m venv /path/to/new/virtual/environment
source /path/to/new/virtual/environment/bin/activate

Then install the requirements

Requirements

requirements.txt

35 Bytes

pip install -r requirements.txt

Create a .env file with the following contents

ELEVENLABS_API_KEY="Your API KEY"

Import the necessary libraries

import dotenv 
import os
import requests

First, you need to select the voice to use. For a list of all the available voices see https://api.elevenlabs.io/v1/voices.
In this case we are going to use Joseph, voice_id = 'Zlb1dXrM653N07WRdFW3'

Second, we need the URL for text to speech API endpoint in streaming mode

voice_id = 'Zlb1dXrM653N07WRdFW3'
url = 'https://api.elevenlabs.io/v1/text-to-speech/' + voice_id + '/stream?optimize_streaming_latency=0'

Third, load the environment file to access the value of the ELEVENLABS_API_KEY variable and pass it in the headers of the request

dotenv.load_dotenv('.env')    
ELEVENLABS_API_KEY = os.environ.get('ELEVENLABS_API_KEY')    
headers = {        
    'accept': '*/*',        
    'xi-api-key': ELEVENLABS_API_KEY,        
    'Content-Type': 'application/json'    
}

Fourth, write a script to generate audio

script = """Good evening, Gotham City! It is I, the Joker here (ha ha ha!)
        """
    
data = {
    "text": script,
    "model_id": "eleven_monolingual_v1",
    "voice_settings": {
        "stability": 0,
        "similarity_boost": 0, 
        "style": 0.5,
        "use_speaker_boost": True
    }
}

Finally, call the ElevenLabs API and process the result, in this case we are going to save the result as an audio file

response = requests.post(url, headers=headers, json=data)

if response.status_code == 200:
    # Streaming response
    print('Generated, saving...') 
    with open('output.mp3', 'wb') as f:
        for chunk in response.iter_content(chunk_size=CHUNK_SIZE):
            if chunk:
                f.write(chunk)
else:
    print("Error:", response.text)

Run your code and you should see some output like:

python .\test_elevenlabs_stt.py
Current Time is : 20:57:06
Current Time is : 20:57:16
Elapsed time: 10.0059 s

And the audio file will be generated with the name: output.mp3