Editing video by prompt: how natural language becomes a frame-accurate cut

Short answer: Prompt-driven video editing turns a sentence like "make me a 45-second highlight reel" into a real, frame-accurate cut on your existing footage. At Electric Sheep, the agent reads your media, plans the edit, calls tools in parallel, snaps every clip to the nearest word, and hands you a change list to approve before anything commits to the timeline.

"Editors using prompt-based assembly cut average highlight-reel turnaround from 1.5 hours to 15 minutes across 10,000 reels." - Electric Sheep data analysis on over 10,000 videos edited through the platform

What does "edit by prompt" actually mean?

Most video tools take a prompt and generate a clip. That is not what we do. The agent does not hallucinate footage. It edits the footage you already have - your rushes, your archive, your wire feed - and treats the prompt as a brief, the same way you would brief a junior editor.

You talk to the agent like a colleague. You say what you want. The agent makes a plan, executes it against your media, and shows you the result before saving. You stay in charge of the timeline; the agent does the scrubbing, the marking, the assembly, and the boring bits.

How does the agent go from prompt to plan to tool calls to a frame-accurate edit?

Under the hood, the agent runs a LangGraph loop governed by an expert team of editorial agent prompts. Every prompt you send goes through the same four-step shape: plan, execute, verify, present. None of those steps are skippable, and that is the point.

Step one is the plan. The agent breaks your brief into discrete tasks - find the goals, find the celebrations, find the manager reaction, build the order, add lower thirds, set the aspect ratio. The plan is visible. You can interrupt it.

Step two is the tool calls. The agent has roughly twenty tools grouped into media search, semantic search, timeline edit, history, validation, and worker delegation. Crucially, independent tool calls go out in parallel - every LLM turn costs 15 to 30 seconds, so batching is not an optimisation, it is the difference between an edit that feels alive and one that drags.

Step three is the frame-accurate edit. The agent never guesses where a clip starts or ends. It calls the audio timing with millisecond-to-frame precision, on every clip. If the brief says "end on the manager's reaction", the cut lands on the last word of the manager's sentence, not half a syllable into the next one.

Step four is verify. Before the agent presents anything, it runs validation. To ensure the best choices are made.

Worked example: how does a 45-second highlight reel get assembled?

Imagine you have uploaded ninety minutes of match footage with commentary and a few sideline interviews. You type one line into the chat: make a 45-second highlight reel from this match, vertical for TikTok, with our house lower thirds and a punchy hook in the first three seconds.

Plan.

The agent breaks that into tasks: find moments of high crowd energy, find every goal, find the manager post-match quote, sequence chronologically, add intro hook, add lower thirds with player names, target 9:16, validate against 45 second duration, prepare for reframe.

Tool calls in parallel. In a single turn, the agent finds moments across the whole 5 clips.

Frame-accurate edit. Each candidate moment comes back with milisecond precision so a goal call ends precisely on the last syllable of "GOAL!" and not in the middle of the crowd noise.

Assembly.

The agent appends each clip to a new timeline tagged 9:16, drops the lower-third template from your brand profile onto an overlay row, and places the hook clip in the first three seconds, then it validates and and tightens any clip that pushes the runtime over.

Reframe.

The Agent then submits the timeline to our reframe service, which subject-tracks each shot - players, ball, manager - and outputs a per-shot crop matrix for 9:16, 4:5, 1:1. You get platform edits that stays on the action, not on whatever was in the centre of the original 16:9 frame.

"Prompt editing changed how my team scopes a reel - we brief in a sentence and review in a change list, instead of building from scratch." Anthony Steward - JHT group.

How does approval work before edits commit?

The agent does not write directly to your timeline. Edits land in a pending queue of operations - and you approve, reject, or modify them before they commit. This is the non-negotiable mechanism behind "human in the loop". It is not a slogan; it is a structural pattern in the app.

If you accept, the agent saves a named version. If you reject, the change list is discarded and the timeline is untouched. If you say "good, but tighten the manager quote and add a 1-second freeze on the goal", the agent reads your review comments and runs the loop again - same plan-execute-verify shape, smaller scope.

Why does this work at newsroom scale?

If you want the longer story on how a sentence becomes a clip, see "how AI video reframing actually works" and "find me the moment: semantic search across video archives." They are the two halves that make prompt editing land on the right frame, every time.

Frequently asked

What does "edit video by prompt" mean at Electric Sheep? It means you brief the agent in natural language and it produces a frame-accurate edit on your existing footage. The agent plans, calls tools in parallel, lines up clips then hands you a change list to approve before anything commits to the timeline.

Does the agent generate footage from the prompt?
No. Prompt-driven video editing at Electric Sheep edits your rushes, archive, or wire feed. Generative footage is a separate capability. For prompt editing, the agent is finding, ordering, and trimming what you already have.

How does the agent stay frame accurate? Every clip boundary is measured against a real word or visual action; the agent is instructed to never extend a clip beyond a verified word boundary, which prevents accidental phrase truncation.

What does the agent do during planning? It breaks your brief into discrete tasks (find moments, sequence them, add overlays, set aspect ratio, validate duration), batches independent tool calls into a single turn, and shows the plan so you can interrupt or redirect it before any tool fires.

How does approval work before edits commit? Edits land in a pending change list, not directly on your timeline. You accept, reject, or amend each operation. Timeline history supports infinite steps of undo and named checkpoints, so any change is reversible.

Can the agent edit without a human in the loop? No. The change-list pattern is structural, not a setting. Even when the agent self-corrects after a review comment, it produces a fresh change list for you to approve. Human approval is the commit step.

Why parallel tool calls?
Every LLM turn costs roughly 15 to 30 seconds. The agent batches independent searches, validations, and edits into a single turn so a 45-second highlight reel does not take five minutes to assemble.

What about house style and brand safety?
Brand profiles enforce templates, colours, and safe zones. Editor-defined rule constraints sit on top of the agent's output. All AI-assisted edits are logged with full audit traceability, and the platform supports SSO, role-based access controls, configurable data residency, and clear data processing agreements.

Does prompt editing work for vertical, square, and landscape at the same time?
Yes. Each timeline carries its own aspect ratio. On save, the reframe service subject-tracks each shot and outputs per-aspect crop coordinates for 16:9, 9:16, 1:1, and 4:5, so one prompt produces multiple platform-native cuts.