Text-To-Audio with AudioLDM
The @isolated
decorator lets you to run Python functions in a serverless manner. This means that you can execute your functions in a dedicated environment in the cloud, freeing up local resources and improving performance. In this example, we will build a Text-to-Audio (TTA) function using the @isolated
decorator.
TTA is a technology that generates audio from a written text prompt. We will use a modified version of AudioLDM for this task.
Step 1: Install fal-serverless and authenticate
pip install fal-serverless
fal-serverless auth login
Step 2: Import the isolated
decorator
Create a file named tta.py
and import the isolated
decorator.
from fal_serverless import isolated
Step 3: Define the TTA function
The TTA function will take in a string of text (prompt
) along with other parameters and convert it into audio.
@isolated(
requirements=["git+https://github.com/fal-ai/AudioLDM.git@mederka/add-cache-dir-env-var"],
machine_type="GPU",
keep_alive=30)
def tta(
prompt,
duration=5,
guidance_scale=2.5,
random_seed=random.randint(0, 100000),
n_candidates=3):
import io
import os
import numpy as np
from scipy.io.wavfile import write
# Set cache directory to avoid redownloading model weights
os.environ['TRANSFORMERS_CACHE'] = '/data/hfcache'
os.environ['AUDIOLDM_CACHE_DIR'] = '/data/audioldm'
# Import audioldm methods (this also initializes adioldm)
from audioldm import text_to_audio, build_model
model = build_model()
# Run inference
waveform = text_to_audio(
model,
prompt,
random_seed,
duration=duration,
guidance_scale=guidance_scale,
n_candidate_gen_per_text=int(n_candidates),
)
# Store audio in an IO buffer
buff = io.BytesIO()
write(buff, 16000, waveform[0][0])
return output
The tta
function above has the isolated
decorator, that has three arguments: requirements
, machine_type
and keep_alive
. The requirements
argument sets our modification of AudioLDM as a package dependency. machine_type
specifies that the function should run on a GPU
machine, and keep_alive
makes sure that the target environment stays alive for 30 seconds after the function execution is complete. So if tta
is called again within 30 seconds, the target environment is reused.
Inside the tta
function definition, we use the text_to_audio
method from audioldm
to run inference. text_to_audio
accepts a set of arguments that can be tuned for better results.
Step 4: Call the TTA function and save the result
res = tta("A hammer hitting a wooden surface.")
with open("result.wav", mode="wb") as f:
f.write(res.getbuffer())
As a result, the audio is saved in a result.wav
file.
And that's it! You now have a TTA function using the @isolated
decorator. The function will run in a dedicated environment with all the necessary dependencies already installed.
To demonstrate this function, we have built a Streamlit app for it.