Skip to content

Accessing Persistent Storage

Each fal app runner runs in an isolated environment that gets voided when runner is released, which includes wiping the filesystem except for the /data volume. This volume is a shared network drive that is mounted on each runner, across all your apps running at any point in time linked to your fal account. You can use this volume to persist data across requests, runners and apps. For example, to cache a torchvision dataset in your app, you can do the following:

import fal
from pathlib import Path
DATA_DIR = Path("/data/mnist")
class FalModel(
fal.App,
requirements=["torch>=2.0.0", "torchvision"],
):
machine_type = "GPU"
def setup(self):
import torch
from torchvision import datasets
already_present = DATA_DIR.exists()
if already_present:
print("Test data is already downloaded, skipping download!")
test_data = datasets.FashionMNIST(
root=DATA_DIR,
train=False,
download=not already_present,
)
...

When you invoke this app for the first time, you will notice that Torch downloads the test dataset. However, subsequent invocations - even those not covered by the invocation’s keep_alive - will skip the download and proceed directly to your logic.

Usage Considerations

Since the /data is a distributed filesystem, there are a couple of caveats to keep in mind.

Concurrency

/data is shared across all runners, so you should be mindful of how you access common files from your runners to avoid race conditions. For example, when creating or downloading a file, you should use a temporary unique path beside the final destination until the file is fully downloaded or created and only then move it into place, which makes the operation quasi-atomic and avoids the situation where another runner tries to use an incomplete file.

import fal
from pathlib import Path
WEIGHTS_URL = "https://example.com/weights.safetensors"
WEIGHTS_FILE = Path("/data/weights.safetensors")
class FalModel(
fal.App,
...,
):
def setup(self):
# Create temporary file right beside the final destination, so that we can
# use os.rename to move the file into place with 1 system call within the
# same filesystem.
with tempfile.NamedTemporaryFile(delete=False, dir="/data") as temp_file:
# download the weights
...
# Move the weights to the final destination.
os.rename(temp_file.name, WEIGHTS_FILE)
...

Caching

/data is a distributed filesystem which uses regional and node caching to speed up access to files. This means that if your runner happens to land on a node that is currently missing the cache, it will be downloaded from the network, which can lead to increased latency. After the cache is downloaded, all subsequent accesses to the same file will be served from the cache at native storage speed.