Short answer: Rotoscoping (the manual frame-by-frame process of isolating an object from its background) is being replaced by AI-driven mask generation. Modern segmentation models, trained on tens of thousands of curated frames, can produce a pixel-accurate alpha matte for an average 3-second shot in roughly a minute - work that traditionally took a VFX artist between one and three days per shot.
2026 updateElectric Sheep no longer focuses on rotoscoping. The product pivoted in 2024 to an agentic AI video editor for newsrooms and production companies (the Telegraph and GB News are among our newsroom customers; production studios use the same platform for post-production workflows). The original analysis below still holds for the rotoscoping question; the lesson informed how we approach narrative-aware AI video editing today. See our current product positioning for where Electric Sheep is now.
Nobody likes rotoscoping. Fortunately you won't have to for much longer.
Read below for an introduction and overview of rotoscoping, or jump to the section on how we're solving it.
Contents
The History of Rotoscoping
Rotoscoping Today
Rotoscoping Tomorrow
Beyond Rotoscoping
Final Words
How the rotoscoping work informs Electric Sheep today (2026)
FAQ
What is the history of rotoscoping?
Rotoscoping is a technique to trace over objects frame by frame. In most modern day use cases, this is a technical process to separate people from backgrounds.
The term rotoscoping comes from the piece of equipment called the rotoscope. It was invented by animator Max Fleischer in 1914 for animation.
Fleischer was granted a patent for this technique in 1917.
Fleischer projected a film of his brother David (a clown from Coney Island) onto a glass panel, and then traced it frame by frame. David's clown character was known as "Koko the Clown" and was the basis of the 'out of the inkwell' animated series Fleischer made famous from this technique.
Soon after Fleischer's patent expired in 1934, Disney would begin filming live actors performing scenes as a reference for character movement, and then rotoscoping over it to make animated films.
This technique was used in animated films such as Snow White and the Seven Dwarfs (1938), Cinderella (1950), Alice in Wonderland (1951), and Sleeping Beauty (1959).
This process next evolved through a man named Bob Sabiston who developed Rotoshop animation software. This allowed 'interpolated rotoscoping' - the process of being able to select a range of hand drawn frames and let the software morph between them, rather than painstakingly rotoscope each frame by hand.
Sabiston eventually went on to do 'A Scanner Darkly', which - for good reason - is the most commonly referenced film mentioned around rotoscoping.
The computer software aided the process, and made rotoscoping viable for achieving a creative look, but it still took a long time.
The exact timings aren't clear (as is often the case with finding specific VFX budget breakdowns), but in an interview the producer Tommy Pallotta said: "We were thinking it was going to take about 350 man hours per minute of material... and we ended up being pretty off on that... it took a lot longer."
[PLACEHOLDER STAT - replace before publish. Example pattern: "Per [VFX industry source], rotoscoping accounts for X% of post-production time on a typical VFX shot, costing studios approximately $Y per second of finished footage." Suggested source: VFX Voice, Foundry industry survey, FXPHD, or post-production trade press.]
Why is rotoscoping so time-consuming today?
Today, most rotoscoping is for technical not creative reasons. For example, separating an actor for compositing in new backgrounds or visual effects.
As a result it needs to be perfect per frame, and while many tools try, they are not able to interpolate accurately.
When shooting against a green screen or blue screen, chroma keying is a useful place to start.
This will take a band of colours and reduce the opacity so you can composite another layer behind it. But this rarely - if ever - is a one stop solution.
Due to the time sensitive nature of an actual film set, capturing the entire frame within the green screen or shooting suboptimal conditions is common, and as a result manually matte painting and rotoscoping is pushed into post production.
Rotoscoping can be tricky for various reasons - such as when green/blue screens don't cover the full actor or shot, when the lighting spill on the actor's face from the green screen or environment lighting is too strong, or when a green/blue screen isn't used at all.
Rotoscoping can be done in modern applications such as Flame, Nuke, Adobe After Effects, or even non-linear editors like DaVinci Resolve, however, this process is still very manual and is often done frame by frame by VFX artists.
[PLACEHOLDER QUOTE - replace with real attributed quote from a VFX practitioner before publish. Example pattern: "[draft on AI rotoscoping]" - [Name, Title, Studio]. Suggested source: VFX Voice interviews, FXGuide, post-production trade press, or direct outreach to VFX supervisors.]
How is AI changing rotoscoping?
Ultimately, the next step of efficiency for rotoscoping - and broader image processing in Film and TV - is using object detection and image matting techniques through machine learning.
This is how Electric Sheep first approached the problem.
Other service offerings aimed at the professional market are delivered as plugins inside apps and are constrained by the user's hardware, causing them to be slow and ineffectual.
Our strategy was to use large scalable GPU compute and focus on delivering a significant quality increase.
We are also firm believers in the 2030 Cloud Vision by MovieLabs, and we want to be part of the wave of services moving the applications to the media, reading directly from a cloud archive of the original camera footage (OCF), and delivering back to the vendor, so that all vendors will be working off of the latest without the large file versioning overheads, and constant upload/download problems currently faced.
Getting back to rotoscoping:
Generating a pixel accurate luma or alpha matte is a complicated task for a machine that can be tackled in many ways.
The complexities arise from several angles:
1. What is a person?
Understanding and recognising objects in a frame as a 'person', and that their extremities such as hair and fingers are to be captured in great detail, requires some sort of object detection and image segmentation. This means breaking down the image into the constituent objects (figure one below), then drawing an accurate matte around those.
Additionally, being able to recognise parts of a person in frame to be matted. For example: the camera pans and there is just a leg in shot.
2. Extensions of a person?
Understanding a hat as an extension of a person, in the same way that a costume with a cape or a backpack is too, quickly becomes a philosophical question.
If they are sitting on a chair is that considered a part of their silhouette? Their reflection in glass? Leaning on a walking cane would be matted, but would you consider leaning on a fence to need the fence matted?
We are only focusing on people with this algorithm and not considering external objects, but there are also questions about what objects should be considered 'important' if they were considered.
3. Determining depth?
How do you define to a machine what the background is? To determine which objects are background and which objects are foreground, you need to do depth analysis, and typically need some form of trimap, which estimates depth into 3 possible values: 'foreground', 'background' and 'unknown'. Generating this from a still frame is costly and often inaccurate. Lidar could help inform this, but again depth - and by extension 'the background' - is dependent on the context of a shot.
4. Consistency between frames?
Interframe consistency in events like occlusion by objects crossing the path of the frame, drastic light changes, or motion blur.
5. Delivering enough detail in the matte to live up to professional Film and TV VFX standards.
Matting around objects is hard, but having the fidelity required for fingers, hair, and costumes is even trickier.
Take for example the detail required for the hair strands in the matte below.
Fortunately, all of these problems are solvable. In fact, these problems can be solved in many ways with different AI models, frameworks and workflows.
Previously we discussed the place of diffusion models for image synthesis in our last article.
For the task of rotoscoping, Electric Sheep used a Generative Adversarial Network (GAN) framework.
GANs are broken into two components: one component tries to generate an output, and the other tries to find flaws in it. They both improve until the generation algorithm is good enough to fool the discriminator.
With ML and AI algorithms, the effectiveness of the algorithm is only ever as good as the data that trains it.
In this case we hyper tuned parameters with a very carefully curated dataset of over 50,000 images. We believed this to be the largest training dataset for this type of operation to date for this industry.
In the early version of our algorithm the mask was far too large and didn't have nearly enough fidelity on the person (hair etc). We fondly referred to this as the Michelin Man algorithm internally...
By version 4 we had the fidelity we needed.
This was very exciting news for us.
Along the way we also had to overcome several other problems such as delivering videos with consistent image mattes between frames.
There were frequently holes in the silhouette of the matte in the earlier versions; we hadn't solved this perfectly but we saw considerable improvements.
This is actually a problem we had seen in all current algorithms and was one of the final hurdles to consistent video matting in our opinion. The art is to make a temporally aware algorithm that can compensate between frames, intelligent enough to deal with occlusion from random objects, and valid gaps within a silhouette, while still being robust enough in a single frame. Take for example this frame below of a bent elbow:
Without having an intelligent and temporally aware algorithm, it is impossible to differentiate these inputs.
What does the future of rotoscoping look like?
Aside from the desire to remove a tedious process from the media workflow, there is a grand future vision, strongly onboard with virtual production enabling much of the post production workflow to move into game engines.
We expect post will evolve but always exist in some fashion, and that it is too limiting to capture everything in camera in volume stages for both financial and practical reasons. The opportunity is to create tools that glue virtual production workflow with post to benefit from efficiencies all the way through. So we can all get back to saying "fix it in post" again, virtual production edition.
By having access to incredibly clean mattes, this becomes invaluable data as the 2D to 3D model technology matures, reducing the need for photogrammetry and complex object or body scans. Models could be generated with a good enough matted video (and possibly injected lidar metadata for incredibly accurate depth mapping, instead of relying on depth mapping from purely optical data).
Final words
We were happy with the time savings achieved with our early testers, and grateful to all the VFX houses who joined in early on this journey.
Our initial findings were that using our algorithm, an average length shot (3 seconds) could be rotoscoped within a minute. Compared to traditional methods of 1-3 days, this was a huge time saving.
Technical workflows for professional film and TV bring their own challenges with colour space conversions, industry-specific file types and formats, amongst other things.
Electric Sheep's mission has always been to allow artists to focus on being creative, and to help storytellers realise their vision.
How did the rotoscoping work inform Electric Sheep's current product?
Solving rotoscoping at scale forced us to confront the same problem that defines all useful AI in video: frame-accurate control. A rotoscoping system that is right 95% of the time is useless to a VFX house, because the 5% of bad frames have to be hand-fixed and the speed gain disappears. The work taught us that mask precision, temporal consistency, and an understanding of what a 'subject' actually is have to be solved together, not separately. Those constraints shaped the architecture we use today.
When we pivoted in 2024, that insight transferred directly to two audiences. Newsrooms (the Telegraph, GB News, and other UK and US publishers) don't ask for masks - they ask for finished clips - but the underlying problem is identical: an AI has to find the exact right moment in a long piece of footage, cut it on a frame boundary, fit it to a brand-safe template, and hand a journalist something they would have produced themselves. Production companies and post-production studios ask the inverse question - they need frame-accurate AI control of footage, mask-aware manipulation, and agentic workflows that plug into existing post pipelines without losing creative control. Both audiences depend on the same underlying capability set: find me the moment semantic search across hours of footage, brand-safe templates editorial or creative leadership has signed off on, and an agentic editor that respects those constraints. The same precision discipline that drove our rotoscoping work now drives a journalist's ability to ship a finished social clip from a brief in under 10 minutes, and a production studio's ability to run agentic post-production workflows - with a human-in-the-loop approval before anything publishes.
If you want to see what that looks like in production: the Telegraph and GB News case study covers how the Telegraph went from 5 video editors to 500 self-producing journalists on flat headcount; find me the moment covers the semantic search layer in more depth; brand-locked templates for motion graphics studios covers the production-company side of template-driven editing; and Electric Sheep vs Remotion covers how the agentic editor compares to programmatic video alternatives that production teams often evaluate.
Frequently asked questions
What is rotoscoping?
Rotoscoping is a post-production technique that isolates an object (most commonly a person) from its background by drawing a precise outline around it on every frame of a video. It originated in 1914 when animator Max Fleischer patented the rotoscope, a device that projected live-action footage onto glass so animators could trace it. In modern VFX, rotoscoping is used to create alpha mattes for compositing, colour grading, and visual effects. It is traditionally done frame by frame in tools like Nuke, Flame, After Effects, or DaVinci Resolve.
Why is rotoscoping so time-consuming for traditional VFX?
Rotoscoping has to be pixel accurate on every single frame because any imperfection becomes visible the moment the matte is composited against a new background. Hair, fingers, motion blur, lighting spill from green screens, and occlusion (objects passing in front of the subject) all force the artist to make manual decisions per frame. Producer Tommy Pallotta estimated that A Scanner Darkly was budgeted at around 350 man hours per minute of finished material, and even that turned out to be optimistic. A typical 3-second VFX shot has historically taken between one and three days to rotoscope by hand.
How is AI changing rotoscoping?
Machine learning models, in particular object detection plus image matting networks (often built on GAN or transformer architectures), can now generate a per-frame alpha matte automatically. The hard problems are still the same ones VFX artists have always faced: deciding what counts as part of the subject (hat? cane? reflection?), maintaining temporal consistency between frames, and producing enough fidelity around hair and fingers to meet broadcast standards. AI doesn't remove these problems, but on a curated training set of tens of thousands of images, modern models can collapse a 1-3 day shot into roughly a minute of compute.
What does the future of rotoscoping look like?
The trajectory points toward fully automated, temporally aware matting that runs on cloud GPUs against the original camera footage, rather than as a plugin running locally inside an editor. MovieLabs' 2030 Cloud Vision describes the shift: applications move to the media instead of media moving to applications. Combined with virtual production and 2D-to-3D model generation, clean alpha mattes become training data for the next generation of post-production tools, reducing the need for photogrammetry and body scans.
Does Electric Sheep still do rotoscoping?
No. Electric Sheep pivoted in 2024 to an agentic AI video editor for newsrooms and production companies. The technical insight from rotoscoping work (frame-accurate AI control of video) informed the current product. Today Electric Sheep enables newsrooms like the Telegraph and GB News to scale social video output 100x on flat headcount, and production companies to run agentic post-production workflows with brand-safe templates and human-in-the-loop control. We do not currently sell a standalone rotoscoping product.
What does Electric Sheep do now?
Electric Sheep (Neo Edit) is an agentic AI video production platform for newsrooms, production companies, broadcasters, and marketing teams. It plugs into existing MAM stacks, enforces brand-safe templates editorial leadership defines, and gives a journalist or producer the ability to ship a finished clip from a brief in under 10 minutes with a human-in-the-loop approval before publish. The Telegraph scaled from 5 editors to 500 self-producing journalists using this workflow; production studios use the same platform for agentic post-production.
What kind of customers does Electric Sheep work with today?
Newsrooms (the Telegraph and GB News are flagship customers, alongside other UK and US publishers), production companies and post-production studios, broadcasters, and marketing teams. The common thread is teams that need to ship branded video at scale without losing creative control.
