Skip to content
Dashboard

Download Model Weights and Files

When building serverless applications with fal, you may need to download model weights, files, and datasets from external sources.

Downloading Files

You can download files from external sources using the download_file function. This function handles downloading files from URLs with built-in caching and error handling.

This is particularly useful for downloading datasets, configuration files, or any external resources your application needs.

from fal.toolkit import download_file, FAL_PERSISTENT_DIR
class MyApp(fal.App):
def setup(self):
path = download_file(
"https://example.com/myfile.txt",
FAL_PERSISTENT_DIR / "mydir",
)
...

Downloading Model Weights

You can download model weights from external sources using the download_model_weights function. This function is specifically designed for downloading model weights and provides several useful features:

  • Predefined storage location: Automatically stores weights in an optimized directory structure
  • Smart caching: Avoids re-downloading weights that are already present unless forced
  • Authentication support: Supports custom request headers for private repositories

This function is particularly useful for downloading pre-trained model weights from Hugging Face, custom model repositories, or private storage locations.

from fal.toolkit import download_model_weights
class MyApp(fal.App):
def setup(self):
path = download_model_weights(
"https://example.com/myfile.txt",
# Optional: force download even if the weights are already present
force=False,
# Optional: specify request headers
request_headers={
"Authorization": "Bearer <token>",
},
)
...

Improving Hugging Face download speeds

The Hugging Face library caches files locally to prevent duplicate downloads. Within Fal, this cache is automatically placed on the /data persistent volume via the HF_HOME environment variable that is set to /data/.cache/huggingface.

The steps below offer additional speedups.

Set Hugging Face token

Ensure you have your Hugging Face token set to be used by Fal runs. Authenticated downloads seem to be faster than anonymous ones:

fal secret set HF_TOKEN=xxx

Save weights to the /data volume

Hugging Face weights need to be stored within the /data volume. This:

  • Speeds up runner starts by removing the need for files to be reconstructed from the cache (which is already on /data)
  • Ensures enough disk space is available
snapshot_download(
repo_id=model_id,
local_dir="/data/models/deepseek-ai",
...
)

Speed up initial weights downloading

Depending on the size of the weights, the initial call to snapshot_download can take a while.

Hugging Face seems to reduce download speeds for large models after some time. While individual transfer thread often start at 50+ MB/s, after a while the speed drops to 5-6 MB/s.

There are 3 ways to speed up downloads:

  • Increase concurrency, max_workers and cache size:

    os.environ["HF_XET_HIGH_PERFORMANCE"] = "1"
    os.environ["HF_XET_CHUNK_CACHE_SIZE_BYTES"] = "1000000000000"
    os.environ["HF_XET_NUM_CONCURRENT_RANGE_GETS"] = "32"
    snapshot_download(
    repo_id=model_id,
    local_dir=model_dir,
    max_workers=32
    )
  • Download many models in parallel: when downloading multiple models, it helps to start a separate Fal run for each one. The different source IP address reduces the risk of rate limiting.

  • Restart slow downloads: restart the run after e.g. 10 minutes, which will likely cause it to go to a different physical server, and the download will resume at higher speeds.

Speed up model files check

Even after fully caching weights, calling snapshot_download with a large model can sometimes take 45+ seconds.

The Hugging Face library takes this time to do API calls and metadata checks. You can use local_files_only=True to skip this step, which typically makes the call return in less than 1 second.

Using local_files_only=True will throw an error if the files are not completely cached. To prevent that, it is a good idea to wrap the call in a try/catch block and retry it without local_files_only:

from huggingface_hub.errors import LocalEntryNotFoundError
try:
snapshot_download(
...
local_files_only=True,
)
except LocalEntryNotFoundError:
snapshot_download(
...
)

Speed up model loads

Hugging face models are typically split into multiple files, and loading them happens serially. This is sub-optimal if the files are not cached locally as they will be fetched one by one.

It is advisable to pre-read all the files in parallel which will create missing caches concurrently, at much higher speeds.

Please refer to Sequential vs parallel reading.