Generative video in 2026: a field guide

The models change weekly. A tool that failed last month is good enough today, and any leaderboard you memorize will be wrong by the time you finish reading. So this is not a ranking. It is a map: how to think about generative video right now, which models are worth testing, and the part of the work that no model has taken.

Adapted from a talk at Bezalel. I have spent about fifteen years making video, and I now build Tonpit.

01 · First principle

There is no best model

The first thing to unlearn is the search for the single best model. Different models have different taste, motion, defaults, constraints, and failure modes. One renders skin beautifully and falls apart on fast motion. Another nails camera movement and mangles text. The honest answer to "which one should I use" is almost always: test a few, this project, this shot.

So pick per shot, not per subscription. And when something fails, try it again in a month. The pace of improvement means a tool that disappointed you is often quietly fixed by the next release.

02 · How to work

Three ways in

Every generation, no matter the tool, starts from one of three modes. The more of yourself you put into the starting point, the less you leave to luck.

Text to image / video

Describe the result and let the model invent the visual structure. Most freedom, least control.

Image to image / video

Start from a frame, reference, sketch, or design. You set the look; the model moves it.

Video to video

Use existing motion, camera, timing, or performance as the control signal.

Inside those modes, generation has many shades. These are the dials worth knowing:

References. Style, character, product, composition, or mood. The strongest, and the most legally sensitive, lever you have.
Start and end frames. Give the model a first frame, a last frame, or both, and let it fill the motion between.
Motion cues. Guide camera movement, character action, rhythm, and timing.
Parameters. Aspect ratio, duration, seed, strength, resolution, variants.

03 · The landscape

A snapshot of the field

Here is the working map as of mid-2026. Read the names as a snapshot and the categories as the durable part. By the time you act on this, a version number or two will have moved. The split that matters more than any single model is closed (convenient, hosted) versus open (controllable, yours to run).

Image models.

Closed / API4

GPT Image 2Best overall image quality and editingOpenAI
Gemini Flash Image / Nano BananaFast editing, text, infographicsGoogle
Seedream 4.0Strong generation and editingByteDance
Midjourney v7Aesthetic, artistic output

Open weights3

HiDreamStrong open image tier
FLUX.2Practical open ecosystemBFL
Qwen ImageText and structured imagesAlibaba

Video models.

Closed / API5

Veo 3.1Reliable production and API videoGoogle
SeedanceTop benchmark quality, limited accessByteDance
Kling 3.0Motion, control, creator workflowsKuaishou
Runway Gen-4.5Pro creator and studio workflowRunway
Sora 2Available via APIOpenAI

Open weights2

LTX-2Best open video-with-audio tierLightricks
HunyuanVideo / WanOpen ecosystem optionsTencent / Alibaba

Language models. Easy to forget in a visual workflow, but this is the layer that turns AI from a generator into a system: a collaborator, a search tool, and a way to build small tools around your own process.

Closed / API4

GPT-5.xAll-around, coding, agentsOpenAI
Claude OpusCoding, writing, long reasoningAnthropic
Gemini Pro / FlashLong context, multimodal, agenticGoogle
GrokFast, cheap, live/X-connectedxAI

Open weights4

DeepSeekStrong open-weight general tierDeepSeek
Qwen3Practical open coding and agentsAlibaba
GemmaSmaller, local, edge useGoogle
gpt-ossOpen reasoning, local specializationOpenAI

04 · Open vs closed

Control or convenience

The trade is simple once you name it. Open weights buy you control: specialization, pipeline access, privacy, the ability to run and tune the model yourself. The cost is setup, hardware, and maintenance. Closed models buy you convenience: better UX, managed infrastructure, faster starts. The cost is lock-in and less say over what happens inside. Neither is virtuous. Pick the one that fits the job.

05 · Where to use them

The same model, different surfaces

A model is not a website. The same model shows up in three kinds of places, each with a different reason to choose it.

Aggregators

Many models behind one interface or API: fal, Replicate, Krea.

Native apps

The maker's own polished surface: Runway, Kling, Midjourney.

Developer APIs

For building it into your own product or pipeline: OpenAI, Google AI Studio, fal.

06 · Working practice

Prompting is not a magic formula

There is no universal prompt. Different situations need different prompting, and the first result is rarely the final one. The real skill is reading how the model interpreted you and adjusting, the same way you would redirect a collaborator. One prompt will not save you. Iteration is the work, not the obstacle.

Output quality follows input quality.

The model amplifies your brief, your taste, your references, your constraints, your clarity. Give it mush and it returns polished mush. That is also why the model deserves suspicion, not just trust. A few patterns to watch for:

It flatters

It tends to lift your work up rather than challenge it. Praise is not feedback.

It compromises

It may pick the locally convenient answer over the objectively better one.

It overgeneralizes

Thin evidence gets presented as broad truth. Check the sources behind a confident claim.

It solves the wrong problem

It will patch a symptom, or answer the question you asked instead of the one you meant.

07 · What stays human

The idea stays in the center

Here is the part I am most sure of, and it runs against the loudest version of the hype. The means of production are getting radically cheaper. That does not make the maker less important. It makes the reason to make something more important, because everyone now has the means.

Story is not solved by scale. Good story, humor, timing, taste, and emotional nuance still depend on things we understand mostly by being alive. What changes is the form the human signature takes:

More access. A strong idea can reach production without the old budgets and crews.
Input still decides. Taste, references, constraints, and judgment are the difference between two people using the same model.
Editing becomes central.When generating is cheap, selection, deciding what is good and what to cut, becomes the creator's main tool.

A concrete bet, so this ages honestly: by the end of 2026 I expect to sit in a cinema and watch a feature that was AI-generated, and human-written and human-edited. I could be wrong on the timing. I do not think I am wrong on the shape.

The tools change. The work is still making choices.

Written by itay rose ari, builder and creative director, founder of Tonpit. Fifteen years in video across writing, directing, editing, motion, and sound. roseari.com

This is a living snapshot. The model names will date faster than the ideas around them, treat the categories as the lasting part.