
Multimodal Avatar Generation and Animation

Bytedance Inc.   *Equal Contribution   +Project Lead

Introducing MagicAvatar, a multi-modal framework capable of converting various input modalities — text, video, and audio — into motion signals that subsequently generate/ animate an avatar.


Text-guided Avatar Generation

Create avatar(s) with simple text prompts.

Text prompt Generated motion Generated video Text prompt Generated motion Generated video
"An astronaut, kicking, in volcano" "A rugged mountaineer, doing moonwalk, on the foot of the hill"
"A brave firefighter, punching a bag, in a burning building" "A female dancer, crossing arms, in a grand ballroom"
"A female astronaut, practicing yoga poses, undersea" "A yoga practitioner, practicing yoga poses, peaceful garden view"
"A group of k-pop stars, dancing, in volcano" "A group of children, dancing, in a park"

"An astronaut, doing handstand, undersea" "A basketball player, doing moonwalk, on a basketball court" "A soldier, saluting, in a military base"

Video-guided Avatar Generation

Given a source video, create avatar(s) that follows the given motion.

Source video Motion Generated video Source video Motion Generated video
"A boy running, red jacket" "A child, playing toy blocks"
"A man, dancing under water" "A boy, freestyle singing, fire in the background"
"A girl, in yellow clothes, walking on the road, raining" "A baby, in green clothes, dancing on the bed"

Source video & Motion Generated video Source video & Motion Generated video
"Two people, Judo, on the fire" "Four girls, dancing, nighttime, blue suits"
"Three girls, dancing, on the grassland" "Girls dancing"
"Three girls, dancing, raining, flowers" "Two persons, fencing, outdoor"

Multimodal Avatar Animation

Animate an avatar of a specific subject.

Identity Driving signal Motion Generated video Driving signal Motion Generated video
"A girl is dancing" "A girl is waving hands"

Audio-guided Avatar Generation (coming soon)

Create an avatar based on audio input.


    author = {Zhang, Jianfeng and Yan, Hanshu and Xu, Zhongcong and Feng, Jiashi and Liew, Jun Hao},
    title = {MagicAvatar: Multi-modal Avatar Generation and Animation},
    year = {2023}