Quickstart with Compute
Get up and running with fal Compute in minutes. This guide will walk you through provisioning your first GPU instance and connecting to it.
Prerequisites
Before you begin, make sure you have:
- A fal.ai account with Compute access
- An SSH key pair for secure instance access
- Basic familiarity with SSH and command line tools
Generate SSH Key (if needed)
If you don’t have an SSH key pair, generate one:
# Generate a new SSH key pair
# Display your public key (you'll need this for instance creation)cat ~/.ssh/id_rsa.pub
Step 1: Create Your Instance
-
Access the Dashboard
- Navigate to the fal Compute Dashboard
- Click the “Create” button
-
Configure Your Instance
-
Instance Type: Choose between:
1xH100-SXM
: Single GPU for development and smaller workloads8xH100-SXM
: Eight GPUs for large-scale training and inference
-
Sector Selection:
- Default: For single-instance workloads
- Specific Sector: For multi-node clusters with InfiniBand connectivity
-
SSH Key: Paste your public SSH key for secure access
-
-
Launch Instance
- Review your configuration
- Click “Create” to provision your instance
- Wait for the instance to reach “ready” state (typically 2-3 minutes)
Step 2: Connect to Your Instance
Once your instance is running, you’ll receive connection details:
# Connect via SSH (replace with your actual connection details)ssh ubuntu@your-instance-ip
# Example connection
Step 3: Verify Your Setup
After connecting, check your GPU resources:
# Check GPU availabilitynvidia-smi
# Verify CUDA installationnvcc --version
# Check storagedf -h
# View system resourceshtop
Expected output for 1xH100-SXM:
+-----------------------------------------------------------------------------+| NVIDIA-SMI 525.xx.xx Driver Version: 525.xx.xx CUDA Version: 12.x ||-------------------------------+----------------------+----------------------+| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC || Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. || | | MIG M. ||===============================+======================+======================|| 0 NVIDIA H100-SXM... Off | 00000000:01:00.0 Off | 0 || N/A 27C P0 68W / 700W | 0MiB / 81920MiB | 0% Default || | | Disabled |+-------------------------------+----------------------+----------------------+
Step 4: Install Your Dependencies
Install your required software stack:
# Update system packagessudo apt update && sudo apt upgrade -y
# Install Python and pip (if not already installed)sudo apt install python3 python3-pip -y
# Install common ML librariespip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
# Verify PyTorch can see your GPUpython3 -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}'); print(f'GPU count: {torch.cuda.device_count()}')"
Step 5: Run Your First Workload
Test your setup with a simple GPU workload:
import torchimport time
# Check if CUDA is availableprint(f"CUDA available: {torch.cuda.is_available()}")print(f"GPU count: {torch.cuda.device_count()}")
if torch.cuda.is_available(): # Create large tensors on GPU device = torch.device('cuda')
# Simple matrix multiplication test print("Running GPU compute test...") start_time = time.time()
a = torch.randn(10000, 10000, device=device) b = torch.randn(10000, 10000, device=device) c = torch.matmul(a, b)
end_time = time.time() print(f"Matrix multiplication completed in {end_time - start_time:.2f} seconds") print(f"GPU memory used: {torch.cuda.memory_allocated(device) / 1024**3:.2f} GB")
Run the test:
python3 test_gpu.py
Step 6: Transfer Your Data
For training workloads, you’ll need to transfer your datasets:
# Using scp to transfer filesscp -r /local/path/to/dataset user@your-instance-ip:/remote/path/
# Using rsync for large datasetsrsync -avz -P /local/path/to/dataset/ user@your-instance-ip:/remote/path/dataset/
# Or download directly on the instancewget https://example.com/dataset.tar.gztar -xzf dataset.tar.gz
Next Steps
Now that your instance is running, you can:
For Machine Learning
- Training: Start your training scripts with dedicated GPU resources
- Fine-tuning: Adapt pre-trained models with your custom datasets
- Inference: Deploy models for batch or real-time inference
For Multi-GPU Workloads (8xH100)
- Distributed Training: Use frameworks like DeepSpeed, Horovod, or PyTorch DDP
- Model Parallelism: Split large models across multiple GPUs
- Data Parallelism: Process multiple batches simultaneously
For Multi-Node Clusters
- InfiniBand Setup: Configure high-speed inter-node communication
- Cluster Management: Use tools like SLURM or Kubernetes for job scheduling
- Distributed Computing: Scale workloads across multiple instances
Managing Your Instance
# Monitor GPU usagewatch -n 1 nvidia-smi
# Check disk usagedf -h
# Monitor system resourceshtop
# Check network connectivity (for multi-node)ibstatus # InfiniBand status
Troubleshooting
Common Issues
SSH Connection Failed
- Verify your SSH key is correctly configured
- Check instance status in the dashboard
- Ensure your IP is not blocked by firewalls
GPU Not Detected
- Run
nvidia-smi
to check GPU status - Verify CUDA installation with
nvcc --version
- Restart the instance if GPU drivers aren’t loaded
Out of Memory Errors
- Monitor GPU memory with
nvidia-smi
- Reduce batch sizes in your training scripts
- Use gradient checkpointing to save memory
Getting Help
- Check the fal.ai documentation for detailed guides
- Contact support through the dashboard for technical issues
- Join the community forums for user discussions