Accessing Persistent Storage
Each fal app runner runs in an isolated environment that gets voided when runner is released, which includes wiping the filesystem except for the /data
volume.
This volume is a shared network drive that is mounted on each runner, across all your apps running at any point in time linked to your fal account. You can use
this volume to persist data across requests, runners and apps. For example, to cache a torchvision dataset in your app, you can do the following:
import falfrom pathlib import Path
DATA_DIR = Path("/data/mnist")
class FalModel( fal.App, requirements=["torch>=2.0.0", "torchvision"],
): machine_type = "GPU"
def setup(self): import torch from torchvision import datasets
already_present = DATA_DIR.exists() if already_present: print("Test data is already downloaded, skipping download!")
test_data = datasets.FashionMNIST( root=DATA_DIR, train=False, download=not already_present, ) ...
When you invoke this app for the first time, you will notice that Torch downloads the test dataset. However, subsequent invocations - even those not covered by the invocation’s keep_alive
- will skip the download and proceed directly to your logic.
Usage Considerations
Since the /data
is a distributed filesystem, there are a couple of caveats to keep in mind.
Concurrency
/data
is shared across all runners, so you should be mindful of how you access common files from your runners to avoid race conditions. For example, when creating or
downloading a file, you should use a temporary unique path beside the final destination until the file is fully downloaded or created and only then move it into place,
which makes the operation quasi-atomic and avoids the situation where another runner tries to use an incomplete file.
import falfrom pathlib import Path
WEIGHTS_URL = "https://example.com/weights.safetensors"WEIGHTS_FILE = Path("/data/weights.safetensors")
class FalModel( fal.App, ...,): def setup(self): # Create temporary file right beside the final destination, so that we can # use os.rename to move the file into place with 1 system call within the # same filesystem. with tempfile.NamedTemporaryFile(delete=False, dir="/data") as temp_file: # download the weights ... # Move the weights to the final destination. os.rename(temp_file.name, WEIGHTS_FILE) ...
Caching
/data
is a distributed filesystem which uses regional and node caching to speed up access to files. This means that if your runner happens to land on a node
that is currently missing the cache, it will be downloaded from the network, which can lead to increased latency. After the cache is downloaded, all subsequent
accesses to the same file will be served from the cache at native storage speed.