Build a SAM2-Powered Roto Assistant for VFX

Learn how to integrate Meta's Segment Anything Model 2 (SAM2) into a production compositing workflow. You'll build an interactive rotoscoping tool that reduces manual segmentation time while handling motion blur and occlusion.

Nov 19, 2024 11 min read Segmentation & Roto SAM2 Nuke Python
Prerequisites

Skills: Python (intermediate), basic compositing concepts, familiarity with Nuke or similar tools

Software: Python 3.9+, PyTorch 2.0+, Nuke 13+ (or After Effects), CUDA-capable GPU (recommended)

Time: 2-3 hours for full implementation

Introduction

Rotoscoping is one of the most time-intensive tasks in VFX production. Artists spend hours manually drawing and refining masks for complex plates—particularly when dealing with hair, motion blur, or occlusion. Meta's Segment Anything Model 2 (SAM2) offers a breakthrough: interactive segmentation that can handle these edge cases while integrating into production pipelines.

In this tutorial, you'll build a production-aware roto assistant that:

  • Accepts interactive user prompts (clicks, boxes) for segmentation
  • Handles temporal consistency across frames
  • Exports masks compatible with Nuke and other compositing tools
  • Manages GPU memory efficiently for high-resolution plates

Set Up Your Environment

First, let's set up the Python environment with all required dependencies:

# Create a new virtual environment python -m venv sam2-roto-env source sam2-roto-env/bin/activate # On Windows: sam2-roto-env\Scripts\activate # Install PyTorch (CUDA 11.8 version) pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118 # Install SAM2 pip install git+https://github.com/facebookresearch/segment-anything-2.git # Install additional dependencies pip install opencv-python numpy pillow tqdm

Download SAM2 Model Checkpoints

SAM2 provides multiple model sizes. For production work, we recommend the sam2_hiera_large checkpoint, which balances quality and performance:

# Download checkpoint mkdir checkpoints cd checkpoints wget https://dl.fbaipublicfiles.com/segment_anything_2/072824/sam2_hiera_large.pt cd ..

Prepare Your VFX Plates

SAM2 works best with properly prepared input sequences. Here's how to prepare your footage:

Convert to Image Sequence

SAM2 expects individual frames. If you're working with a video file, extract frames first:

# Using FFmpeg to extract frames ffmpeg -i input_plate.mov -qscale:v 2 frames/frame_%04d.png # Or export directly from Nuke as an image sequence (EXR or PNG)

Downscale for Processing (Optional)

For 4K+ plates, consider downscaling to 2K for initial segmentation, then upscale the mask:

import cv2 import os input_dir = 'frames_4k' output_dir = 'frames_2k' os.makedirs(output_dir, exist_ok=True) for filename in os.listdir(input_dir): img = cv2.imread(os.path.join(input_dir, filename)) img_2k = cv2.resize(img, (1920, 1080), interpolation=cv2.INTER_AREA) cv2.imwrite(os.path.join(output_dir, filename), img_2k)

Wire Up SAM2 for Segmentation

Now let's build the core segmentation engine. We'll create a class that handles SAM2 inference with interactive prompts:

import torch from sam2.build_sam import build_sam2_video_predictor class SAM2RotoAssistant: def __init__(self, checkpoint_path, model_cfg='sam2_hiera_l.yaml'): """Initialize SAM2 video predictor""" self.predictor = build_sam2_video_predictor(model_cfg, checkpoint_path) self.device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') print(f"Using device: {self.device}") def load_sequence(self, frame_dir): """Load video frames for processing""" with torch.inference_mode(), torch.autocast(self.device.type, dtype=torch.float16): state = self.predictor.init_state(video_path=frame_dir) return state def add_click_prompt(self, state, frame_idx, point, label): """ Add interactive click prompt point: (x, y) coordinates label: 1 for foreground, 0 for background """ _, out_obj_ids, out_mask_logits = self.predictor.add_new_points( inference_state=state, frame_idx=frame_idx, obj_id=0, points=point, labels=label ) return out_mask_logits def propagate_masks(self, state): """Propagate masks across all frames""" masks = {} for frame_idx, obj_ids, mask_logits in self.predictor.propagate_in_video(state): masks[frame_idx] = (mask_logits[0] > 0.0).cpu().numpy() return masks
Note: GPU Memory Management

SAM2 can consume significant GPU memory for high-resolution sequences. If you encounter out-of-memory errors:

  • Use the sam2_hiera_small checkpoint instead
  • Process sequences in smaller chunks (e.g., 50-frame batches)
  • Enable gradient checkpointing in the model config

Integrate With Your Compositing Workflow

Now let's connect this to Nuke. We'll create a Python panel that allows artists to:

  • Load a plate sequence
  • Add interactive foreground/background clicks
  • Preview the segmentation
  • Export masks as a Nuke-compatible image sequence

Export Masks for Nuke

def export_masks_for_nuke(masks, output_dir, frame_format='frame_%04d.png'): """Export binary masks as 8-bit PNG sequence""" import os os.makedirs(output_dir, exist_ok=True) for frame_idx, mask in masks.items(): # Convert boolean mask to 8-bit (0 or 255) mask_8bit = (mask[0] * 255).astype('uint8') # Write to disk filename = frame_format % (frame_idx + 1) cv2.imwrite(os.path.join(output_dir, filename), mask_8bit) print(f"Exported {len(masks)} masks to {output_dir}")

Create a Simple Interactive UI

For production use, you'll want a GUI. Here's a minimal example using OpenCV's window system:

import cv2 def interactive_segmentation(assistant, frame_dir): """Simple click-based interface for segmentation""" state = assistant.load_sequence(frame_dir) # Load first frame for display first_frame_path = os.path.join(frame_dir, sorted(os.listdir(frame_dir))[0]) display_frame = cv2.imread(first_frame_path) points = [] labels = [] def mouse_callback(event, x, y, flags, param): if event == cv2.EVENT_LBUTTONDOWN: # Foreground click points.append([x, y]) labels.append(1) cv2.circle(display_frame, (x, y), 5, (0, 255, 0), -1) elif event == cv2.EVENT_RBUTTONDOWN: # Background click points.append([x, y]) labels.append(0) cv2.circle(display_frame, (x, y), 5, (0, 0, 255), -1) cv2.namedWindow('SAM2 Roto Assistant') cv2.setMouseCallback('SAM2 Roto Assistant', mouse_callback) while True: cv2.imshow('SAM2 Roto Assistant', display_frame) key = cv2.waitKey(1) if key == ord('q'): # Quit break elif key == ord('p'): # Propagate if points: mask_logits = assistant.add_click_prompt(state, 0, points, labels) masks = assistant.propagate_masks(state) export_masks_for_nuke(masks, 'output_masks') print("Masks exported!") cv2.destroyAllWindows()

Visualize and Iterate

Once you've generated masks, bring them back into Nuke to refine:

  1. Import the mask sequence as a Read node
  2. Use a Premult node to apply the mask to your original plate
  3. Add a RotoPaint node for manual refinements on problem frames
  4. Use edge operations (Dilate/Erode) to fine-tune the mask edge
Tip: Handling Motion Blur

SAM2 can struggle with heavy motion blur. For best results:

  • Add more interactive prompts on blurred frames
  • Use Nuke's MotionBlur node to match the original plate blur on the mask
  • Consider temporal smoothing with a FrameBlend node

Production Considerations

Before deploying this in a production pipeline, consider:

  • Latency: SAM2 inference on a 100-frame sequence at 2K resolution takes ~2-3 minutes on an A100 GPU. Budget accordingly for interactive sessions.
  • Quality assurance: Always review masks frame-by-frame. SAM2 is excellent but not perfect—plan for manual cleanup time.
  • Version control: Save prompt coordinates and model versions so artists can reproduce results.
  • Batch processing: For large shows, create a job submission system that queues SAM2 inference on render nodes.

Next Steps

Now that you've built a basic SAM2 roto assistant, consider these extensions: