Core Concepts
Understanding these essential terms will help you follow the tutorials and deploy your first model successfully.
App
An App is a Python class that wraps your AI model for deployment. Your app defines what packages it needs, how to load your model, and how users interact with it.
class MyApp(fal.App): machine_type = "GPU-H100" # Choose your hardware
def setup(self): # Load your model here # Executed on each runner
@fal.endpoint("/") def generate(self, input_data): # Your endpoint logic here—usually a model call
Machine Type
Machine Type specifies the hardware (CPU or GPU) your app runs on. Choose based on your model’s needs: "CPU"
for lightweight models, "GPU-H100"
for most AI models, or "GPU-B200"
for large models.
Runner
A Runner is a compute instance that executes your app using your chosen machine type. Runners automatically start when requests arrive and shut down when idle to save costs.
Endpoint
An Endpoint is a function in your app that users can call via API. It defines how your model processes inputs and returns outputs.
Playground
Each endpoint gets an automatic Playground - a web interface where you can test your model with different inputs before integrating it into your application.
fal run
vs fal deploy
-
fal run
: Test your app on a single cloud gpu during development. Creates a temporary URL that disappears when you stop the command. -
fal deploy
: Deploy your app to production. Creates a permanent URL that stays available until you delete it.
Use fal run
while building and testing, then fal deploy
when ready for production use.