We tested SAM2 for rotoscoping. This is what we found.

SAM2, the latest segment anything model by Meta, was released last month. With claims of single click video segmentation, will this be the silver bullet for VFX rotoscoping? Will this labour intensive task be reduced from days to mere minutes?

As a company in the trenches of accelerating rotoscoping, we take a balanced look at the pros and cons of SAM2, exploring areas of importance to VFX artists. We compare this to our tool, Spotlight V2. All our conclusions are made from the outputs provided in this article.

Table of content

Object selection

Workflow

Speed

Temporal consistency (boiling)

Edge fidelity

Complex shots

Results

Conclusion

Usability

Object selection

Selecting the object you want to rotoscope

Image Description

SAM2: Number of clicks = 1

SAM2, as it claims provided a powerful single-click segmentation feature that outperformed all existing methods in the industry. The single-click mechanism allows users to select their target object in record time, making it fantastic at rapid prototyping.

Image Description

Spotlight: Number of clicks = 4

Spotlight takes more clicks to get a good lock on the subject.

Comparing the two, SAM2 consistently requires less clicks than Spotlight and is better at selecting the target object.

Winner: SAM2

Workflow

Compatibility with the professional workflow

Image Description

SAM2: The user interface of SAM2 is sleek and intuitive, particularly in its demo version. One feature we liked was the swim-lanes that visually represent when segmentation layers are in shot. However, it does not support exporting usable outputs directly from the interface, which can be a significant drawback for professionals.

Advanced users can manually matte images by inputting coordinates via the SAM2 GitHub repository, offering a high degree of control, but also requiring technical expertise, and a bit more time on your hands. This exports PNGs.

Image Description

Spotlight V2: Spotlight has been built for professionals. Users can upload and output all professional file formats (EXRs, ProRes, PNG, JPG, JPEG, TIFF, TIF, Quicktime MOVs), colour spaces (ACES, Linear and Rec709) and manage and organise files in the media pool. All the while also experiencing a UI built to handle complex segmentation tasks - such as multiple layers.

Winner: Spotlight V2

Speed

SAM2: SAM2 is the faster of the two tools working at or above real time, making it ideal when rapid turnaround is more critical than output quality.

Spotlight V2: Spotlight is slower than SAM2, taking 6 seconds to track 1 second (24 frames) of footage. Meaning a 5 second clip would take 30 seconds to track.

Winner: SAM2

While the experience is important, if the output is unusable, it is irrelevant. We explore this next.

Output

Temporal consistency (boiling)

Once an object is selected, how are the edges “tracked” through frames

SAM2: A common issue for AI tools is temporal consistency - maintaining edge consistency across frames. SAM2, as you can see above, is no different. Meta’s model is a special new architecture that allows it to analyse 7 frames either side of the current frame. Unfortunately the edges still flicker in and out (or “boil” as it's also known). These aren’t obvious in the public demo as they have thick outlines covering the detail. SAM2’s is not good for where consistent edge definition across frames is critical.

Spotlight V2: The edges on the right are stable. Over the last year, the Spotlight team has been meticulously working away to reduce “boiling” edges. This was achieved, in part, by analysing each frame in the context of all other frames.

Winner: Spotlight V2

Edge fidelity

How do the edges look under the microscope - especially for motion blur and hair detail

SAM2: SAM2 outputs a binary (hard black/white) matte causing aliasing, limiting its usefulness in high-end production environments where detail and fidelity are critical. This makes SAM2 more suitable for quick edits and content creation where speed is prioritised over precision.

Image Description

Left: Sam2, Right: Spotlight V2

Spotlight V2: V2 can produce tighter edges and soft mattes (i.e. pixels values between 1 and 0) preserving intricate details. V2 manages motion blur with greater accuracy and smoothness, ensuring that the segmentation remains consistent even in fast-moving scenes. This capability is particularly important in professional settings. That said V2 still struggles with hair detail, which is something the team is addressing.

Winner: Spotlight V2

Complex shots

Ability to matte intricate detail and/or multiple objects


SAM2: SAM2 has fantastic knowledge of image context, prevalent in its object selection. Where it appears to fall short however, is when objects present smaller intricate details or when two mattes interact.

For example, in the scene involving two pairs of hands above, the holes between the fingers are lost. Moreover, it creates gaps between overlapping mattes. This limits the usefulness of SAM2 for jobs with fine details and/or multiple mattes - often required in VFX rotoscoping.

Image Description

Left: SAM2, right: Spotlight V2. Top: RGB on black, bottom: luma mattes

Spotlight V2: V2 handles finer detail better. In the example above, you can see it accurately captures small spaces between fingers, fingers sticking out as well as understanding that these mattes should be flush against each other.

Winner: Spotlight V2

Results

Feature

SAM2

Spotlight V2

Winner

Object selection

Often Single-click, less clicks generally

Auto-detects humans/objects, labels

SAM2

Workflow

Sleek UI, but limited export options

User-friendly, tailored for professionals

Spotlight V2

Speed

Faster

Slower

SAM2

Temporal consistency (boiling)


Inconsistent edges between frames

Consistent, reduced artifacts

Spotlight V2

Edge fidelity


Adequate but not the best

Smooth and realistic

Spotlight V2

Complex shots


Doesn’t handle intricate details, or overlapping multiple mattes well

Superior handling, consistent results, multiple mattes work well together

Spotlight V2

Conclusion

SAM2 is a powerful and fast tool. It has an incredible ability to understand one object from another during the selection process. Where it falls short however, is making this usable for the professional VFX market. Without more detail, higher resolution edges and better temporal consistency it’ll have limited uses. SAM2 is ideal where speed or budget is of the utmost importance.

In the time we've written this article, our tech team has built SAM2’s point-and-click functionality into our pipeline. This is just one example. We’ve researched and collated the best models in the world, fine-tuned them on Hollywood-grade training data and built an interface compatible with professional workflows.

For those that value quality, Spotlight V2 would be optimal.

Rotoscoping is inherently a tough job and our tool, like any machine learning tool, will not produce perfect results every time. But we’re getting closer. We’re already trialling promising solutions for improving hair detail and - through popular demand - splines.

We’re on a mission to provide VFX professionals with the accuracy and control they need to tell spectacular stories.