Generation Pipeline
We use a T2V model fine-tuned on sketch animations, and condition it to follow an input sketch.
We perform attention composition with reference noise from input sketch.
Sketch animations offer a powerful medium for visual storytelling, from simple flip-book doodles to professional studio productions. While traditional animation requires teams of skilled artists to draw key frames and in-between frames, existing automation attempts still demand significant artistic effort through precise motion paths or keyframe specification. We present FlipSketch, a system that brings back the magic of flip-book animation -- just draw your idea and describe how you want it to move! Our approach harnesses motion priors from text-to-video diffusion models, adapting them to generate sketch animations through three key innovations: (i) fine-tuning for sketch-style frame generation, (ii) a reference frame mechanism that preserves visual integrity of input sketch through noise refinement, and (iii) a dual-attention composition that enables fluid motion without losing visual consistency. Unlike constrained vector animations, our raster frames support dynamic sketch transformations, capturing the expressive freedom of traditional animation. The result is an intuitive system that makes sketch animation as simple as doodling and describing, while maintaining the artistic essence of hand-drawn animation.
We use a T2V model fine-tuned on sketch animations, and condition it to follow an input sketch.
We perform attention composition with reference noise from input sketch.
We extrapolate videos by using the last frame of one video as the first frame of another.
@misc{bandyopadhyay2024flipsketch,
title={FlipSketch: Flipping Static Drawings to Text-Guided Sketch Animations},
author={Hmrishav Bandyopadhyay and Yi-Zhe Song},
year={2024},
eprint={2411.10818},
archivePrefix={arXiv},
primaryClass={cs.GR},
url={https://arxiv.org/abs/2411.10818},
}