Generate Images from Text with Stable Diffusion
In this example, we will deploy Stable Diffusion using fal-serverless. As we do that, we will learn about important fal-serverless concepts.
Step 1: Install fal-serverless and authenticate
pip install fal-serverless
fal-serverless auth login
Step 2: Import required libraries
First, we need to define the requirements for our project:
requirements = [
"accelerate",
"diffusers[torch]>=0.10",
"ftfy",
"torch",
"torchvision",
"transformers",
"triton",
"safetensors",
"xformers==0.0.16",
]
Step 3: Define the generate function
Next, we will define the generate
function, which will be responsible for generating an image using Stable Diffusion:
from fal_serverless import isolated
@isolated(requirements=requirements, machine_type="GPU-T4", keep_alive=30)
def generate(prompt: str):
import torch
import os
import io
import base64
from diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler
model_id = "runwayml/stable-diffusion-v1-5"
os.environ['TRANSFORMERS_CACHE'] = '/data/hugging_face_cache'
pipe = StableDiffusionPipeline.from_pretrained(
model_id,
torch_dtype=torch.float16,
cache_dir=os.environ['TRANSFORMERS_CACHE'])
pipe = pipe.to("cuda")
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
image = pipe(prompt, num_inference_steps=20).images[0]
buf = io.BytesIO()
image.save(buf, format="PNG")
return buf
The @isolated
decorator is the most important building block in fal-serverless. It lets you run any Python function in the cloud instantly on many types of GPUs. In this example, the decorator accepts a requirements
argument which defines the libraries needed to run the function, a machine_type
argument that specifies the machine that we want to run this function on and a keep_alive
argument that specifies the number of seconds to keep the underlying machine alive in case there are no other function calls.
Step 4: Generate the image
Now that we defined the generate
function, we can use it to generate an image by passing a prompt. In fal-serverless, you call an @isolated
function just as if it is a local Python function.
image_data = generate("Donkey walking on clouds")
This will generate an image based on the given prompt "Donkey walking on clouds" and store it in image_data.
To save this image locally:
with open("test.png", "wb") as f:
f.write(image_data.getvalue())
Step 5: Make it faster with @cached
You may notice that we are loading the model to GPU VRAM every time we call the generate function. We will now introduce another building block in fal-serverless: the @cached
decorator. This decorator lets you keep expensive operations in memory. By caching the model, we can get improved performance. Our code now looks like:
from fal_serverless import isolated, cached
requirements = [
"accelerate",
"diffusers[torch]>=0.10",
"ftfy",
"torch",
"torchvision",
"transformers",
"triton",
"safetensors",
"xformers==0.0.16",
]
@cached
def model():
import torch
import os
import io
import base64
from diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler
model_id = "runwayml/stable-diffusion-v1-5"
os.environ['TRANSFORMERS_CACHE'] = '/data/hugging_face_cache'
pipe = StableDiffusionPipeline.from_pretrained(
model_id,
torch_dtype=torch.float16,
cache_dir=os.environ['TRANSFORMERS_CACHE'])
pipe = pipe.to("cuda")
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
return pipe
@isolated(requirements=requirements, machine_type="GPU", keep_alive=30)
def generate(prompt: str):
import io
pipe = model()
image = pipe(prompt, num_inference_steps=50).images[0]
buf = io.BytesIO()
image.save(buf, format="PNG")
return buf
image_data = generate("astronaut riding a horse")
with open("test.png", "wb") as f:
f.write(image_data.getvalue())