Deep Dive · Reframing

How does AI-powered video reframing actually work?

AI video reframing rebuilds a 16:9 master as a 9:16, 1:1 or 4:5 cut by analysing each shot, locating the subject, then choosing one of four reframe methods - crop, pan with subject tracking, blur letterbox, or stack split-screen - per segment. Electric Sheep runs that analysis with our Agent Vision on every shot, persists the result per aspect ratio, and lets the editor override any keyframe by hand. The output is a platform-native edit, not a dumb centre-crop.

Why centre-crop ruins your edit

Most "auto-reframing" tools take a 16:9 master, slice the middle out, and call it 9:16. That works for talking-head footage where the speaker happens to be dead-centre. It fails the moment your subject moves, the action is on the right of frame, or there is on-screen text - chyrons, lower-thirds, OCR captions - outside the centre band.

The newsroom version of this problem is brutal. A VOSOT cut for 16:9 broadcast might place the reporter on the left third with a wide street behind them. Centre-crop that to 9:16 and you lose the reporter entirely. Centre-crop a football clip and the ball leaves frame on every pass. Centre-crop a presenter standing next to a chart and the chart vanishes. Producers end up re-editing the same package three or four times - once per platform - and the social team becomes a bottleneck. Other solutions are just simple face trackers which means that the moment someone walks past on screen, or there is ambiguity then it's lost. Electric Sheep's agent vision has contextual understanding of every frame, it focuses on who is speaking or is the most contextually important in that moment, like a human would.

NARRATIVES NOT MOMENTS
Find narratives, not moments. The agent reframes per shot, not per video - because the subject of shot one is rarely in the same place as the subject of shot four.

The four reframe methods (and when each one wins)

Electric Sheep ships four named reframe methods. The agent picks the right one per segment based on what Gemini Vision sees in the shot.

CROP
A static reframe to a region of the source frame. Best when the subject is stationary and well-placed - interview locked-off two-shot, presenter at a desk, product on a table. Cheapest and cleanest.

PAN
Motion keyframes that follow the subject across the frame. Faces, bodies, balls, vehicles - the agent generates a smooth path so a 16:9 wide becomes a 9:16 follow-shot without losing the action. Used for sports, walk-and-talks, and any clip where the subject moves more than a third of frame width.

BLUR
Source frame is scaled to fit, with a blurred copy of itself filling the bars. Used when the composition truly cannot be cropped - wide group shots, sweeping landscape b-roll, branded graphics that need to stay whole. Ugly when overused, essential when nothing else preserves meaning.

STACK
Two regions of the source stacked vertically - for example, 50/50 a presenter on top and the slide they are referencing below. Used for explainer content and reaction edits where the relationship between two parts of frame is the point. This can be custom to include, picture in picture, T crop, triple stack, etc.

What happens under the hood, per segment

When a timeline saves, the reframe service receives the exact frame ranges of every video overlay - verified down to the word boundary by the audio timing tool, not loose scene boundaries. Agent Vision then analyses each segment and returns crop coordinates, motion keyframes, or split-screen regions for every requested aspect ratio.

"Reframe analysis runs in under 60 seconds per shot on a typical newsroom package, vs 7 minutes per video for manual re-cutting." Electric Sheep rollout benchmarks 2026.

The result is stored per aspect ratio. A single source clip can carry one set of reframe parameters for 9:16 TikTok, another for 1:1 Instagram, another for 4:5 Reels, and the original 16:9 untouched. Switching the timeline between aspect ratios is a metadata flip - no re-render, no re-analysis. The editor sees a real-time multi-aspect preview while they work.

WHATS MORE..
A four-shot package gets four independent reframe decisions. Shot one might pan-track a reporter; shot two might crop tight on a chart; shot three might letterbox a wide crowd; shot four might stack a graph above a quote. One video, four methods, no compromises.

Human override on every keyframe

The agent proposes; the editor disposes. Every reframe decision is exposed in a dedicated reframe editor with a timeline UI for adjusting timing and coordinates. Drag the crop box. Move a tracking keyframe. Switch a shot from pan to stack. Change the safe zone for a lower-third. Nothing is locked.

This is the human-in-the-loop pattern applied to reframing specifically: the AI does the boring 80% - every shot, every aspect ratio, every keyframe - and the editor spends their time on the 20% that actually needs taste. It is the same loop our agent uses for prompt-driven cuts: change-list approval before commit, undo on every step, full audit log of who changed what and when. (See human-in-the-loop in production AI for the wider pattern.)

Aspect ratios, safe zones, and the captions problem

Reframing only matters if captions, lower-thirds and brand graphics survive the move. The reframe service receives every text overlay's safe-zone offsets - for example, bottom 10% for subtitles - and rescales them per platform. A caption sized for 16:9 gets re-positioned for 9:16 so it sits inside the platform-safe area, above the TikTok UI, below the YouTube Shorts progress bar.

The supported targets are the four that matter for social and broadcast: 16:9 (landscape), 9:16 (vertical), 1:1 (square), 4:5 (portrait). Per-platform safe zones, hook windows and caption styles are applied at render. (See per-platform video optimisation for what video style each platform wants.)

Reframe Methods CROP, PAN WITH SUBJECT TRACKING, BLUR LETTERBOX, STACK - PICKED PER SHOT, NOT PER VIDEO

What this unlocks for a real newsroom

Pete Fergusson, formerly Head of Commercial Video at The Telegraph, put it directly: "Electric Sheep understands story-based editing at a level that genuinely surprised me." Reframing is one of the reasons. When five editors expand to 500 self-producing journalists, you cannot afford a workflow where each platform export is a manual re-edit. Per-segment subject-tracked reframing is what makes one source video viable for every social channel without a producer in the loop.

"Per-shot reframing is the difference between shipping one platform a day and shipping every platform at once."

75% INCREASE IN OUTPUT VOLUME / REDUCTION IN DELIVERY TIME ACROSS ENTERPRISE NEWSROOM ROLLOUTS

And because reframe parameters are stored per aspect ratio, the social team can revisit a year-old story, flip the timeline to 9:16, and republish for a new trend window - without re-cutting from rushes. That is the archive monetisation play.

Frequently asked

Is this just a centre-crop with extra steps? No. Centre-crop is one possible outcome of the "crop" method when Agent Vision finds the subject in the centre. Most shots are not centred, so the agent picks a different region - or switches to pan-tracking, blur, or stack. The decision is per segment.

Which aspect ratios do you support?
16:9 (landscape), 9:16 (vertical), 1:1 (square) and 4:5 (portrait). Each is stored independently against the same source, so you can render any subset without re-analysing.

Can the editor override a tracked subject?
Yes. Every reframe is exposed in a dedicated reframe editor with timeline UI for timing and coordinates. Editors can move keyframes, redraw crop boxes, switch methods or lock a shot to manual.

How does it handle on-screen text and lower-thirds?
Text overlays are repositioned using safe-zone offsets defined in the template - for example bottom 10% for subtitles - so captions stay inside the platform-safe area on every aspect ratio.

Does it re-render for every aspect ratio?
No. Reframe parameters are stored as metadata per aspect ratio. Switching aspect ratios in the editor is instant. Renders are batched in parallel per platform when you export.

Where do face detection and OCR live?
Face detection runs inside the reframe service for subject tracking - it is not exposed as a general agent tool, by design. OCR of on-screen text is captured at scene-analysis time and used to keep chyrons inside the cropped region.

Is my footage used to train your models?
No training on customer data. Standard enterprise practice with clear DPAs, full audit logging and traceability of all AI-assisted edits, and configurable data residency for regulatory compliance.

How is this different from an Opus-Clip-style reframer?
Opus-Clip-class tools score whole clips for virality and apply one reframe per output. Our agent reframes per shot inside an edit and lets the editor override every keyframe. (See story-arc detection vs clip generation.)