ControlVideo: Training-free Controllable Text-to-Video Generation
Outline
Video visualizations
ControlVideo on depth maps
|
|
|
|
|
|
"A charming flamingo gracefully wanders in the calm and serene water, its delicate neck curving into an elegant shape." |
"A striking mallard floats effortlessly on the sparkling pond." |
"A gigantic yellow jeep slowly turns on a wide, smooth road in the city." |
|
|
|
|
|
|
"A sleek boat glides effortlessly through the shimmering river, van gogh style." |
"A majestic sailing boat cruises along the vast, azure sea." |
"A contented cow ambles across the dewy, verdant pasture." |
ControlVideo on canny edges
|
|
|
|
|
|
"A young man riding a sleek, black motorbike through the winding mountain roads." |
"A white swan moving on the lake, cartoon style." |
"A dusty old jeep was making its way down the winding forest road, creaking and groaning with each bump and turn." |
|
|
|
|
|
|
"A shiny red jeep smoothly turns on a narrow, winding road in the mountains." |
"A majestic camel gracefully strides across the scorching desert sands." |
"A fit man is leisurely hiking through a lush and verdant forest." |
ControlVideo on human poses
|
|
|
|
|
|
"James bond moonwalk on the beach, animation style." |
"Hulk is jumping on the street, cartoon style." |
"Goku in a mountain range, surreal style." |
|
|
|
|
|
|
"A man, wearing pink clothes, moonwalk at sunset." |
"The Simpsons in the city, Hockney style." |
"Wonder Woman in a desert, Pop Art style." |
Long video generation
|
|
|
|
"A steamship on the ocean, at sunset, sketch style." |
"Hulk is dancing on the beach, cartoon style." |
|
|
|
|
|
|
"An airplane flying on the grasslands." |
"Towers on grasslands, cartoon style." |
"A beautiful bird flying in the clear sky." |
Novel view generation
|
|
|
|
Depth Maps |
A_turtle, high defination, 4k. |
Depth Maps |
A plush teddy bear, high defination, 4k. |
Limitations
|
|
|
Source Video |
Structure Sequence |
"Iron man runs in the road." |
Qualitative comparisons
Depth map
Text Prompt: A daring man is scaling a treacherous and jagged peak in the alpine wilderness.
|
|
|
|
|
|
|
Source Video |
Structure Sequence |
Tune-A-Video |
Text2Video-Zero |
Vid2Vid-Zero |
FateZero |
ControlVideo (Ours) |
Text Prompt: A daring man performing gravity-defying stunts on a high-speed, blue motorbike in an empty parking lot.
|
|
|
|
|
|
|
Source Video |
Structure Sequence |
Tune-A-Video |
Text2Video-Zero |
Vid2Vid-Zero |
FateZero |
ControlVideo (Ours) |
Text Prompt: A dusty old jeep was making its way down the winding forest road, creaking and groaning with each bump and turn.
|
|
|
|
|
|
|
Source Video |
Structure Sequence |
Tune-A-Video |
Text2Video-Zero |
Vid2Vid-Zero |
FateZero |
ControlVideo (Ours) |
Text Prompt: A gigantic yellow jeep slowly turns on a wide, smooth road in the city.
|
|
|
|
|
|
|
Source Video |
Structure Sequence |
Tune-A-Video |
Text2Video-Zero |
Vid2Vid-Zero |
FateZero |
ControlVideo (Ours) |
Text Prompt: A contented cow ambles across the dewy, verdant pasture.
|
|
|
|
|
|
|
Source Video |
Structure Sequence |
Tune-A-Video |
Text2Video-Zero |
Vid2Vid-Zero |
FateZero |
ControlVideo (Ours) |
Canny edge
Text Prompt: A curious golden dog curiously wanders on the rocky mountain trail.
|
|
|
|
|
|
|
Source Video |
Structure Sequence |
Tune-A-Video |
Text2Video-Zero |
Vid2Vid-Zero |
FateZero |
ControlVideo (Ours) |
Text Prompt: A mighty elephant marches steadily through the rugged terrain.
|
|
|
|
|
|
|
Source Video |
Structure Sequence |
Tune-A-Video |
Text2Video-Zero |
Vid2Vid-Zero |
FateZero |
ControlVideo (Ours) |
Text Prompt: A shiny silver vehicle gracefully maneuvers towards a modern glass building.
|
|
|
|
|
|
|
Source Video |
Structure Sequence |
Tune-A-Video |
Text2Video-Zero |
Vid2Vid-Zero |
FateZero |
ControlVideo (Ours) |
Text Prompt: A yellow duck moving on the river, anime style.
|
|
|
|
|
|
|
Source Video |
Structure Sequence |
Tune-A-Video |
Text2Video-Zero |
Vid2Vid-Zero |
FateZero |
ControlVideo (Ours) |
Text Prompt: A lone camel strolls leisurely through the vast, arid expanse of the desert.
|
|
|
|
|
|
|
Source Video |
Structure Sequence |
Tune-A-Video |
Text2Video-Zero |
Vid2Vid-Zero |
FateZero |
ControlVideo (Ours) |
Human Pose
Text Prompt: Iron man does the moonwalk in the road.
|
|
|
|
|
|
|
|
Source Video |
Structure Sequence |
Tune-A-Video |
Text2Video-Zero |
Follow-Your-Pose |
Vid2Vid-Zero |
FateZero |
ControlVideo (Ours) |
Text Prompt: A robot dances on a road, animation style.
|
|
|
|
|
|
|
|
Source Video |
Structure Sequence |
Tune-A-Video |
Text2Video-Zero |
Follow-Your-Pose |
Vid2Vid-Zero |
FateZero |
ControlVideo (Ours) |
Text Prompt: The astronaut dances in futuristic city, cyberpunk style.
|
|
|
|
|
|
|
|
Source Video |
Structure Sequence |
Tune-A-Video |
Text2Video-Zero |
Follow-Your-Pose |
Vid2Vid-Zero |
FateZero |
ControlVideo (Ours) |
Text Prompt: James bond moonwalk on the beach, animation style.
|
|
|
|
|
|
|
|
Source Video |
Structure Sequence |
Tune-A-Video |
Text2Video-Zero |
Follow-Your-Pose |
Vid2Vid-Zero |
FateZero |
ControlVideo (Ours) |
Ablation studies
Non-deterministric DDPM sampler
Text Prompt: A striking mallard floats effortlessly on the sparkling pond.
|
|
|
|
Structure Sequence |
lambda=0.0 |
lambda=0.5 |
lambda=1.0 |
Trade-off between text prompt and motion
Text Prompt: A rabbit walks in the grasslands.
Text Prompt: A mallard swims in the river.
|
|
|
|
Input Video |
Structure Sequence |
control_scale=1.0 (by default) |
control_scale=0.3 |
Effect of fully cross-frame interaction and interleaved-frame smoother
(Different number of key frames)
Text Prompt: A mighty elephant marches steadily through the rugged terrain.
|
|
|
|
Source video |
Individual (k=0) |
First-only (k=1) |
Sparse-Causal (k=2) |
|
|
|
|
Frame_ids={0,4,8,12} (k=4) |
Frame_ids={0,2,4,6,8,10,12,14} (k=8) |
Fully Cross-frame (k=15) |
Fully + Smoother (k=15) |
Text Prompt: A dusty old jeep was making its way down the winding forest road, creaking and groaning with each bump and turn.
|
|
|
|
|
Structure Sequence |
w/o smoother |
Timesteps {0,1} |
Timesteps {30,31} |
Timesteps {48,49} |
How many timesteps are used in interleaved-frame smoother?
Text Prompt: A sleek black jeep was speeding along the narrow forest road, dodging trees and rocks.
|
|
|
|
|
|
Structure Sequence |
0 step |
2 steps |
4 steps |
6 steps |
8 steps |