Creating fal Python models
With version 0.4.0
, fal supports a new building block: Pure Python models. This type of model allows you to represent a dbt
model purely in Python code leveraging its rich ecosystem of libraries. With Python models, you can build:
- Data transformations that Python is better suited for. (e.g. text manipulation, leveraging external Python modules)
- ML model artifacts (like predictions) as dbt models, using Data Science libraries such as
sklearn
andxgboost
.
From version 0.7.0
onwards, to start using fal Python models, add a dbt variable fal-models-paths
with a list of directories/folders in which to look for Python models.
Think of it like model-paths
of dbt, but for fal.
This folder must not be in the dbt model-paths
list because dbt has now added
its own implementation of Python models for some adapters.
# dbt_project.yml
name: "jaffle_shop"
# ...
model-paths: ["models"]
# ...
vars:
# Add this to your dbt_project.yml
fal-models-paths: ["fal_models"]
Then, to create a Python model, create a Python (.py
) or Notebook (.ipynb
) file in your fal models folder and make sure your file calls the write_to_model
function.
df = ref('model_a')
df['col'] = 1
# write to data warehouse
write_to_model(df)
Dependencies on other models
fal
will resolve any usage of source
or ref
functions and generate the appropriate dependencies for the dbt
DAG.
Certain complex expressions in the Python AST may not be picked up by fal
. In that case you can specify dependencies in the top of the Python script as a module docstring:
"""Generates Python model with forecast data
For fal to pick up these dependencies:
- ref('model_a')
- source('database_b', 'table_b')
"""
from prophet import Prophet
# fal will pick up this dependency
df_c = ref('model_c')
# calculate dataframe
df = some_calc(df_c)
# write to data warehouse
write_to_model(df)
Under the hood
When running fal flow run
, fal
will automatically generate an ephemeral dbt model for each Python model. This is done in order to let dbt
know about the existence of the Python model. This enables some nice properties such as Python models being available in dbt
docs, and the ability to refer to Python models from other dbt models by using ref
.
❗️ NOTE: Generated files should be committed to your repository.
These generated files should not be modified directly by the user. They will similar to this:
{{ config(materialized='ephemeral') }}
/*
FAL_GENERATED f3d686c040e94a5b33aa082f0ddcd6d3
Script dependencies:
{{ ref('model_a') }}
{{ source('database_b', 'table_b') }}
{{ ref('model_c') }}
*/
SELECT * FROM {{ target.schema }}.{{ model.name }}
The FAL_GENERATED <checksum>
line is there to make sure that the file is not being directly modified.