This example demonstrates how to build a 3D reconstruction pipeline that streams
voxel data in real-time during diffusion. Watch your 3D models take shape
progressively as geometry and appearance diffusion stages run.
🚀 Try this Example
View the complete source code on GitHub .
Live Demo: manifold-jet.vercel.app
Steps to run:
Install fal:
Authenticate:
Clone the repository with submodules:
git clone --recurse-submodules https://github.com/rehan-remade/Manifold.git
cd Manifold
Set up the frontend:
Configure environment variables in frontend/.env.local:
FAL_KEY=your_fal_api_key_here
FAL_ENDPOINT_ID=rehan/sam-3d-stream
GROQ_API_KEY=your_groq_api_key_here # Optional, for prompt enhancement
Run the development server:
Open http://localhost:3000 to see the demo.
You can use the hosted endpoint rehan/sam-3d-stream directly, or deploy your own custom endpoint to modify the reconstruction pipeline.
How it Works
Prompt Enhancement — Groq LLM (llama-3.3-70b) rewrites your text into an optimized image prompt + segmentation label
Image Generation — fal-ai/z-image/turbo generates a 3D-ready image in ~1s
3D Reconstruction — SAM-3D runs geometry and appearance diffusion on H100, streaming voxel data via SSE callbacks at each denoising step
Live Visualization — React Three Fiber renders voxels/mesh/GLB in real-time as data streams in
For image-to-3D, a vision model analyzes the uploaded image to generate the segmentation prompt.
Deploy Your Own Endpoint
To customize the SAM-3D backend, deploy your own:
cd serverless
fal deploy app.py::SAM3DStreamApp
Then update FAL_ENDPOINT_ID in .env.local with your new endpoint ID.
Key Features
Progressive Rendering : See voxels appear during both geometry and appearance diffusion stages
Binary Streaming Protocol : Efficient base64-encoded voxel data (xyz + rgb per voxel)
Multiple Output Formats : Voxels, vertex-colored mesh preview, and final GLB model
Custom Container : Complex CUDA dependencies handled via Dockerfile
H100 GPU : High-performance inference for fast reconstruction
Backend Architecture
The serverless endpoint uses Server-Sent Events (SSE) for streaming progressive updates:
import fal
from fal.container import ContainerImage
from fastapi.responses import StreamingResponse
class SAM3DStreamApp (
fal . App ,
keep_alive = 600 ,
kind = "container" ,
image = ContainerImage . from_dockerfile_str (dockerfile_str, builder = "depot" ),
):
machine_type = "GPU-H100"
def setup ( self ):
# Download model weights and initialize pipeline
from huggingface_hub import snapshot_download
snapshot_download(
"jetjodh/sam-3d-objects" ,
local_dir = str ( CACHE_DIR ),
)
self .pipeline = self ._create_pipeline(config_path)
@fal.endpoint ( "/stream" )
def stream_3d_reconstruction ( self , input : SAM3DStreamInput, request : Request):
"""Stream 3D reconstruction with real-time voxel visualization."""
def geometry_callback ( stage , step , total_steps , coords , ** kwargs ):
# Encode and queue voxel data for streaming
voxel_data = encode_voxels_binary(coords)
progress_queue.put({
"stage" : "geometry" ,
"step" : step,
"voxel_data" : voxel_data,
})
def appearance_callback ( stage , step , total_steps , coords , colors , ** kwargs ):
# Stream colored voxels during appearance diffusion
voxel_data = encode_voxels_binary(coords, colors)
progress_queue.put({
"stage" : "appearance" ,
"step" : step,
"voxel_data" : voxel_data,
})
# Run pipeline with streaming callbacks
outputs = self .pipeline.run(
merged_image,
geometry_callback = geometry_callback,
appearance_callback = appearance_callback,
)
return StreamingResponse(
event_stream(),
media_type = "text/event-stream" ,
)
SSE Event Types
The streaming endpoint emits these event types:
Event Description loadingInitial setup and image loading geometryVoxel coordinates during geometry diffusion (Stage 1) appearanceVoxel coordinates + colors during appearance diffusion (Stage 2) mesh_previewVertex-colored mesh (instant preview before texture baking) glb_readyFinal textured GLB model data completeFinal URLs for Gaussian splat and GLB files
Binary Voxel Encoding
Voxels are packed as uint8 arrays for efficient streaming:
def encode_voxels_binary ( coords_np , colors_list ):
"""Pack voxels as uint8: [x,y,z,r,g,b] per voxel."""
# Normalize coordinates to 0-255 range
coords_normalized = ((coords_np - coords_min) / coords_range * 255 ).astype(np.uint8)
# Pack coordinates and colors
packed = np.empty(( len (coords_np), 6 ), dtype = np.uint8)
packed[:, : 3 ] = coords_normalized # xyz
packed[:, 3 :] = colors_arr[:, : 3 ] # rgb
return base64.b64encode(packed.tobytes()).decode( "ascii" )
Frontend Integration
The React frontend uses a custom hook to consume the SSE stream:
export function useSAM3DStream () {
const [ voxels , setVoxels ] = useState < Voxel []>([]);
const [ meshData , setMeshData ] = useState < MeshData | null >( null );
const [ renderMode , setRenderMode ] = useState < RenderMode >( "voxels" );
const startStream = useCallback ( async ( imageUrl : string , prompt : string ) => {
const response = await fetch ( "/api/stream-3d" , {
method: "POST" ,
body: JSON . stringify ({ imageUrl , prompt }),
});
const reader = response . body . getReader ();
while ( true ) {
const { done , value } = await reader . read ();
if ( done ) break ;
// Parse SSE events
const event = JSON . parse ( line . slice ( 6 ));
if ( event . stage === "geometry" || event . stage === "appearance" ) {
const decoded = decodeVoxels ( event );
setVoxels ( decoded );
setRenderMode ( "voxels" );
} else if ( event . stage === "mesh_preview" ) {
const mesh = decodeMesh ( event );
setMeshData ( mesh );
setRenderMode ( "mesh" );
}
}
}, []);
return { voxels , meshData , renderMode , startStream };
}
Voxel Rendering with Three.js
Use React Three Fiber to render the streaming voxels:
function VoxelViewer ({ voxels } : { voxels : Voxel [] }) {
return (
< Canvas >
< instancedMesh args = { [ undefined , undefined , voxels . length ] } >
< boxGeometry args = { [ 0.1 , 0.1 , 0.1 ] } />
< meshStandardMaterial vertexColors />
{ voxels . map (( voxel , i ) => (
< group key = { i } position = { [ voxel . x , voxel . y , voxel . z ] } >
< mesh >
< boxGeometry args = { [ 0.1 , 0.1 , 0.1 ] } />
< meshStandardMaterial color = { voxel . color } />
</ mesh >
</ group >
)) }
</ instancedMesh >
</ Canvas >
);
}
Custom Container Image
The backend requires complex CUDA dependencies. Use a Dockerfile for full control:
dockerfile_str = r """
FROM nvidia/cuda:12 . 8 . 0-cudnn-devel-ubuntu22 . 04
# Install Python and system dependencies
RUN apt-get update && apt-get install -y python3 . 11 python3 . 11-dev \
libgl1 libosmesa6 libosmesa6-dev
# PyTorch with CUDA support
RUN pip install torch==2 . 8 . 0 torchvision --index-url https://download . pytorch . org/whl/cu128
# SAM-3D Objects with streaming callback support
RUN git clone https://github . com/rehan-remade/sam-3d-objects . git && \
cd sam-3d-objects && pip install -e .
# Additional 3D libraries
RUN pip install kaolin pytorch3d gsplat
"""
class SAM3DStreamApp (
fal . App ,
kind = "container" ,
image = ContainerImage . from_dockerfile_str (dockerfile_str, builder = "depot" ),
):
machine_type = "GPU-H100"
Parameter Type Default Description image_urlstring required URL of the image to reconstruct mask_urlslist[string] []Optional mask URLs for segmentation promptstring "car"Text prompt for auto-segmentation seedint random Random seed for reproducibility stream_geometry_everyint 1Emit geometry updates every N steps stream_colors_everyint 1Emit color updates every N steps
Project Structure
manifold/
├── frontend/ # Next.js web application
│ ├── app/
│ │ ├── page.tsx # Main orchestrator
│ │ ├── api/ # Server-side API routes
│ │ ├── components/ # React Three Fiber components
│ │ ├── hooks/ # useSAM3DStream SSE hook
│ │ └── lib/ # Types, decoders, constants
│ └── public/
│
├── serverless/ # fal.ai serverless endpoint
│ ├── app.py # SAM-3D streaming endpoint
│ └── pyproject.toml # fal app config
│
└── sam-3d/ # Git submodule (forked SAM-3D Objects)
Streaming Frequency
Control the streaming frequency to balance smoothness vs. performance:
class SAM3DStreamInput ( BaseModel ):
stream_geometry_every: int = Field(
default = 1 ,
ge = 1 ,
le = 10 ,
description = "Emit geometry updates every N steps (1=smoothest, 10=fastest)" ,
)
stream_colors_every: int = Field(
default = 1 ,
ge = 1 ,
le = 10 ,
description = "Emit color updates every N steps" ,
)
Binary Encoding
The binary encoding reduces payload size significantly compared to JSON:
6 bytes per voxel (xyz + rgb as uint8)
~10,000 voxels = ~60KB per frame
Base64 encoding adds ~33% overhead
Key Takeaways
SSE streaming enables real-time visualization of long-running 3D reconstruction
Binary encoding makes voxel streaming efficient over HTTP
Custom containers handle complex GPU dependencies (CUDA, PyTorch3D, kaolin)
Progressive rendering gives users immediate visual feedback during generation
H100 GPUs provide the compute power needed for fast 3D diffusion
This pattern demonstrates how to build interactive 3D AI applications that provide real-time feedback, combining the power of diffusion models with efficient streaming protocols and modern web rendering.