2026/03/09

The Guide to Prompting Seedance 2.0: Tips, Techniques & Prompt Templates

Everything you need to know about Seedance AI — ByteDance's video generation model. Covers Seedance 1.0 vs 2.0, the bytedance/seedance-v1-pro-i2v-480p API model, and expert prompting techniques with copyable templates.

ByteDance's Seedance 2.0 is a professional-grade AI video generation model with native audio-visual joint generation. Built on a dual-branch Diffusion Transformer architecture with cross-modal joint modules, Seedance 2.0 unifies visual, speech, and rhythm modeling — delivering videos with millisecond-precise lip-sync, emotionally expressive characters, and cinematic-grade motion quality.

This guide covers everything you need to know about prompting Seedance 2.0 effectively, with ready-to-copy prompt blocks you can use right away.

Key Advantages of Seedance 2.0

High-Fidelity Audio-Visual Sync: Generates video with integrated audio output — environmental sounds, action sounds, synthesized audio, instruments, background music, and human voice.
Multi-Person, Multi-Language Dialogue: Supports monologues and multi-person conversations with millisecond-precision lip-sync. Covers Mandarin, Chinese dialects, English, and other languages — delivering natural, realistic dialogue.
Cinematic Narrative Quality: Natural motion amplitude, strong rhythmic pacing, precise action details. Expressive character emotions and facial expressions elevate the output to cinematic-grade quality.

Prompt Parameters

When using the video generation API, the key prompt-related parameters are:

Required: Text Prompt

Supports both Chinese and English input.

Optional: Output Parameters

Control the video output specifications with: resolution, ratio, duration, seed, camera_fixed, watermark.

Method 1 (Recommended): Pass parameters in the request body

{
    "model": "seedance-2-0-pro",
    "content": [
        {
            "type": "text",
            "text": "A kitten yawns at the camera"
        }
    ],
    "resolution": "720p",
    "ratio": "16:9",
    "duration": 5,
    "seed": 11,
    "camera_fixed": false,
    "watermark": true
}

Method 2: Append parameters to the text prompt

A kitten yawns at the camera --rs 720p --rt 16:9 --dur 5 --seed 11 --cf false --wm true

Or with full parameter names:

A kitten yawns at the camera --resolution 720p --ratio 16:9 --duration 5 --seed 11 --camerafixed false --watermark true

Parameter	Description	Accepted Values
`resolution`	Video resolution	`480p`, `720p`
`ratio`	Aspect ratio	`16:9`, `1:1`, `21:9`, `4:3`, `3:4`, `9:16`
`duration`	Video length in seconds	`4`, `5`, `8`, `12`
`seed`	Random seed for reproducibility	Any integer
`camera_fixed`	Lock the camera in place	`true`, `false`
`watermark`	Include watermark	`true`, `false`

The Prompt Formula

Subject + Motion + Environment (optional) + Camera Work (optional) + Aesthetic Description (optional) + Audio (optional)

By describing dialogue content, language type, emotional changes, camera movement, and narrative structure, you can achieve professional-level audio-visual consistency.

Core Principles

1. Describe Essential Information

There are three sub-principles to follow:

a) Specify the subject and movement clearly

A man with a weathered face, dressed in medieval pirate attire, stands on black rocks by the sea. His expression is passionate — he raises his hands powerfully toward the sky, his movements revealing a deep longing for freedom.

b) Describe what the scene should visually convey

In a raging storm, enormous waves roll across the sea. The water crashes into a city, smashing buildings along the shore. Hundreds of citizens flee in terror. Finally, the tsunami engulfs everything.

c) Use adverbs of degree wisely

The doll first rotates slowly, then she stops spinning and faces the camera to show her cuteness.

2. Describe Clear, Unambiguous Information

a) Ensure proper correspondence between prompt, video, and audio

A model showcases her qipao (cheongsam), exuding elegance and charm.

b) Use visual features to identify subjects — and keep identifiers consistent throughout

A film shoot at a race track. From left to right in the frame: a race car driver, a director, and a cameraman. The person on the far left wearing the racing suit is the race car driver; the young Chinese man in the middle is the director; the Black man on the far right holding the camera is the cameraman. Wide shot — the Black cameraman looks toward the director and asks in English with a puzzled expression: "We got it?" The camera slowly pushes in to a medium shot of the director and race car driver. The race car driver says confidently in French: "Perfecto" with a proud smile. The director hears this, nods, lowers his OK hand gesture, and says in Sichuan dialect with satisfaction: "有了有了，这条过" (Got it, got it — that's a wrap).

3. Write Precise Shot Transition Descriptions

a) Clearly distinguish each shot and tell the model exactly when to cut

Shot 1 is a side-angle medium shot — the boy looks out the window. He says: "大丈夫だと思ってた……" (I thought I was fine...) Then cut to Shot 2, a close-up of the boy's face. In Shot 2, he says: "でも、たぶん自分に嘘ついてただけだ。" (But I was probably just lying to myself.)

b) Write precise timing for shot transitions

The shot starts with a medium shot of three people in frame. The Black man in the middle speaks: "We need to clear this up." Cut to Shot 2 — a close-up of the woman on the left, who calmly responds: "I've already made my choice." Cut to Shot 3 — a close-up of the white man, who lets out a soft sigh: "The problem is, your choice affects all of us." Finally, the shot returns to the medium three-person frame, the tension clearly escalating.

c) Ensure clear visual distinction between shots (different framing / content)

Shot 1: Front-facing medium shot. An ordinary bedroom at night, faint city light through the window. An adult male faces the camera, wearing a plain T-shirt and jeans. He frowns at his hands — tiny energy particles begin appearing in the air, and the room lights flicker once.
Shot 2: Cut to hand close-up. Blue-white energy rapidly envelops his hands, flowing like liquid metal mixed with light, spreading from fingertips to arms.
Shot 3: Cut to face close-up. The energy climbs along his neck and jawline; clear hero-armor patterns emerge on his skin, and his eyes glow cold white.
Shot 4: Cut to front medium-wide shot. Energy bursts across his entire body, clothing consumed and reconstructed by light-energy. A complete superhero suit rapidly forms: metallic armor, sleek design, chest emblem glowing bright. He says: "Guess there's no going back."

Audio Generation Guide

Seedance 2.0 excels at generating synchronized audio. Here's how to control each type.

Voice & Dialogue

Fixing Voice Timbre with Detailed Descriptions

Use this formula: Gender + Age Range + Voice Attributes + Speech Rate + Emotional Baseline

Single-person scene:

A female, approximately 18-22 years old. Her voice range is on the higher side but not shrill; her delivery is light and quick with a moderate breathy quality; her tone is bright and elastic. Speech rate is medium-fast with noticeable intonation variation. Emotional baseline: positive, outgoing, with a slight sense of excitement and youthful vitality. She speaks in Mandarin Chinese. She says: "如果有变动，记得第一时间跟我说一声。" (If there are any changes, make sure to tell me right away.)

Dialogue scene:

Two men face each other in an office area, the overall atmosphere is relaxed, the shot remains stable with no cuts. The first male speaks — his voice is mid-range, natural and unforced, speech rate medium-fast, emotional baseline casual with a hint of concern — he says: "你现在主要卡在哪一块？" (What's the main thing you're stuck on right now?) The other male responds — his voice is mid-range to low, steady delivery, medium speech rate, emotional baseline calm and cooperative — he says: "核心部分已经处理好了，就是细节还要再对一遍。" (The core part is done; I just need to double-check the details.)

Multi-Language & Dialect Support

Seedance 2.0 supports precise lip-sync across multiple languages and dialects:

Chinese: Mandarin, Shaanxi dialect, Sichuan dialect, Cantonese, and more
Foreign languages: English, Japanese, Korean, Spanish, Indonesian, and more

Cantonese example:

He says in Cantonese: "你好靓呀！，我好中意你呀！" (You're so pretty! I really like you!)

Sichuan dialect example:

A scorching summer day. Under the shade of trees in an old residential area in Sichuan, two middle-aged men sit on small stools fanning themselves — one in a white tank top, one in a gray tank top, with an electric scooter parked nearby. Fixed camera, medium shot, slow pace. The man in the white tank top says: "这天气热得遭不住哦。" (This heat is unbearable.) The man in the gray tank top replies: "忍一哈嘛，等会儿就凉快咯。" (Just bear with it — it'll cool down soon.)

English multi-accent example:

In an office break room, the atmosphere is light with a touch of humor. A middle-aged Indian man and a young Japanese male colleague stand by the coffee machine. The Japanese man asks calmly: "What materials will be prepared for this afternoon's project?" The Indian man immediately replies in a noticeably accented, rapid tone: "Why did you only ask? Where is the competing product analysis report that the client wants? Hurry up and get it, it's due at two o'clock!" The Japanese man answers somewhat flustered yet helplessly: "I'll go right away, I've been editing the PPT just now..." He nods and exits the frame.

Lip-Sync Matching for Multi-Person Dialogue

For dialogue scenes, precisely define each character's unique features (gender/age/clothing/actions) so the model can match lip movements to the correct speaker.

Two-person dialogue:

The apprentice asks hesitantly: "师傅，这里…角度再大一点会不会更牢固？" (Master, would a slightly bigger angle here make it sturdier?) The old master gently shakes his head, runs his rough fingers over the wood grain, and says slowly: "不不不，孩子，过刚易折。你看这木头的性子，得顺着它来。" (No, no, no, child — too rigid and it'll snap. See the nature of this wood? You have to follow its grain.) His voice is steady and full of wisdom.

Multi-person dialogue:

The boy resting his chin on his hand sighs loudly: "已经四十分钟了。" (It's been forty minutes.) The girl with her hand raised keeps trying to flag a waiter while muttering: "我发誓他们忘了我们的单。" (I swear they forgot our order.) The boy holding the water pitcher pours water calmly: "没事，今天是周六晚上。" (It's fine, it's Saturday night.) The girl on her phone looks up and jokes: "照这个速度，这将是早餐了。" (At this rate, it'll be breakfast.)

Voiceover Support

Seedance 2.0 supports controlling voiceover timbre, emotion, tone, and speech rate.

Documentary style:

Generate a video with voiceover: A deep, calm male voice says: "在宇宙浩瀚的寂静中，我们的世界不过是一个短暂的瞬间。然而，在其中，生命不顾一切地繁荣。" (In the vast silence of the universe, our world is merely a fleeting moment. Yet within it, life flourishes against all odds.) The scene should slowly transition from night to dawn, stars gradually fading as the sun rises from behind the mountains.

Commercial/Advertisement:

Based on the input lipstick product first-frame image, generate a video maintaining realistic lipstick and character appearance, proportions, and materials. Overall style: high-end beauty e-commerce ad — clean, refined, premium. The video consists of three consecutive shots: Shot 1 — an extremely slow, steady push-in showcasing the lipstick's overall design, rich color saturation, soft diffused warm light, clean blurred background. Shot 2 — cut to model face close-up, model holds lipstick near face in a portrait-style composition, natural skin tone, calm and confident expression. Shot 3 — cut to lipstick close-up on a clean surface, restrained surroundings, soft lighting, stable camera, emphasizing product texture. Voiceover: clear, confident, premium female commercial voice, moderate pace, elegant and restrained tone, synced with visuals: "Rich color. Smooth texture. One swipe delivers radiant lips. Lightweight, comfortable, and effortlessly elegant."

Sound Effects (SFX)

Seedance 2.0 generates contextually appropriate sound effects automatically.

Generate a video showing the rain outside the window getting slightly heavier, raindrops gathering into streams on the glass, and a person with an umbrella hurrying past outside the window.

At dusk, a large fuel depot explodes, a fireball rising into the sky.

Background Music (BGM)

Seedance 2.0 generates background music that matches the prompt by default. You can control it with specific descriptions.

Control music style:

A magnificent epic aerial video — the camera sweeps through mist-shrouded majestic mountains and ancient castles. The background music is awe-inspiring orchestral symphony, full of power and hope.

Control rhythm through prompt:

A cartoon character in the center of the screen. The background music is an upbeat pop song. The cartoon character claps along to the beat of the music.

Control BGM emotion:

Fingers gently brush across each face in the photographs. The background music should be a warm, nostalgic, beautifully melodic guitar or piano solo. The music should carry a complex emotion of gentle reminiscence intertwined with happiness — warmth with a soft undercurrent of the passage of time.

Shot Transition Techniques

Consistent Style Across Cuts

Seedance 2.0 maintains visual style consistency across shot transitions. Here are examples in different styles:

Disney-style animation:

The shot starts with a medium shot of both characters. The girl turns to look at the boy and says confidently with a smile: "我们一定能做到！" (We can definitely do this!) Cut to a close-up of the boy, who responds hesitantly: "你确定吗？" (Are you sure?) Cut back to the girl in medium close-up — she turns and opens her arms, saying brightly: "当然！因为我们已经走到这一步了。" (Of course! Because we've already come this far.) The camera naturally settles with her gesture, the mood bright and resolute.

Pixar style:

The shot starts with a medium shot of father and son. The boy lowers his head and says quietly: "我是不是让你失望了？" (Did I disappoint you?) Cut to Shot 2 — a close-up of the father, who pauses briefly before softly answering: "不。" (No.) Cut to Shot 3 — a close-up of the boy looking up. Finally, cut back to the father, who smiles and says: "我只是担心你会先对自己失望。" (I was just worried you'd be disappointed in yourself first.)

Realistic style:

The shot starts with a close-up of the cat food bowl. Cut to Shot 2 — a front close-up of the cat, which doesn't approach immediately but watches quietly. Cut to Shot 3 — a medium close-up of the owner, who says softly: "这次换了新的。" (I changed to a new one this time.) Cut to Shot 4 — a side close-up of the cat, which slowly approaches, lowers its head to sniff, and finally begins eating. Final cut — a close-up of the owner, who says softly: "看来合你胃口。" (Looks like it suits your taste.) End with a stable wide shot of cat and food bowl together, pace slowing down.

Shot-Reverse-Shot for Dialogue

Two-person:

The shot starts with a medium close-up of the detective, who says evenly: "你在十一点四十七分回到那条巷子，这不是巧合。" (You returned to that alley at 11:47 — that's no coincidence.) Cut to a close-up of the suspect, who smirks slightly and looks away: "巧合这种东西，事后看起来总是很有计划。" (Coincidences always look planned in hindsight.) Cut back to an even tighter close-up of the detective — his expression noticeably colder, but he remains silent. The tension builds with each cut.

Three-person:

The shot starts with a medium three-person frame. The shorter man glances at his watch and says: "It's about time." Cut to Shot 2 — a medium close-up of the girl in the middle, who furrows her brow and responds: "Let's wait a little longer. They might arrive any minute." Cut to Shot 3 — a close-up of the tall man looking toward the end of the street, his tone calm but impatient: "We've been waiting for a long time." Final cut back to the three-person medium shot — their gazes briefly intersect, no one speaks, an awkward silence.

Advanced Techniques

Reference Specific Visual Styles

Specifying aesthetic references can produce videos with distinctive style and harmonious audio-visual elements.

Japanese drama "Little Forest" style:

In the style of the Japanese drama "Little Forest," generate a video of a girl picking apples in an orchard. The girl wears a pink plaid headscarf, has a sweet appearance, and carries a small canvas bag. She picks an apple from the tree, carefully wipes it clean, then tastes it.

Miyazaki/Studio Ghibli style:

In the style of Miyazaki anime, generate a video of a girl picking apples in an orchard. The girl wears a pink plaid headscarf, has a sweet appearance, and carries a small canvas bag. She picks an apple from the tree, carefully wipes it clean, then tastes it.

Disney 2D animation style:

In the style of Disney 2D animated films, generate a video of a girl picking apples in an orchard. The girl wears a pink plaid headscarf, has a sweet appearance, and carries a small canvas bag. She picks an apple from the tree, carefully wipes it clean, then tastes it.

Use Cinematography Terminology

Professional camera terminology significantly improves the quality of camera movement and overall visual impact.

Camera Angles

Camera position terms: High angle / Low angle / Bird's eye / Worm's eye / Eye level / Top-down / Straight-up

High angle:

A high-angle bird's eye view of a serene forest clearing. Autumn wind sweeps ginkgo leaves across slate stone paths. The camera slowly pushes in to focus on a bronze key half-buried in a pile of fallen leaves.

Eye-level:

Eye-level medium shot tracking a skateboarder, 45mm wide-angle lens shoulder-height with him. Water splashes sideways across the frame as the front wheels hit a puddle.

Low angle:

In heavy rain, a homeless man hugs his knees and curls up under a fire escape. A low-angle shot from between his knees looking up at his face — his features exaggerated in the wide-angle distortion. A clap of thunder startles him and he raises his head to look at the sky; the camera follows his movement, tilting up to the dark, rain-filled sky above.

Narrative Perspectives

Over-the-shoulder:

The camera shoots from behind Character A's shoulder, focusing on Character B's facial expression across the table. The two converse by a café window — B slowly lowers the coffee cup. The camera subtly pushes forward following B's body language, with blurred pedestrians visible through the window in the background. Side light from the window creates a rim light on A's shoulder.

Telescope perspective:

Through the telescope, you see a bard in a black cloak walking toward you from across the bridge. First you notice a deep scar running through his eye and left cheek. A sun-totem ceramic pendant hangs from his neck. A well-worn water flask is clipped to his belt. Looking at his boots — covered in heavy dust and scratches, as if he's walked a very long way. Behind him, his horse's eyes are deep and wise.

Surveillance fisheye perspective:

Fixed fisheye surveillance camera. In the center of the frame, a person anxiously paces inside a sealed room — their figure relatively normal in the center area. But the room's four walls, ceiling, and floor compress and curve inward under the fisheye effect, as if the entire space is collapsing toward them.

Subject Angles

Side view / Profile:

A young woman draws open the curtains — thick striped fabric curtains that almost completely block the floor-to-ceiling window, surrounded by many green plants. The gallery interior is dim and cool-toned. Window side close-up, shallow depth of field. Through the gap in the curtains, morning sunlight is visible — Tyndall rays streaming in.

Back view:

A woman in a white long trench coat with shoulder-length short hair stands at the edge of a city rooftop, overlooking the night view, her back to the camera. The shot starts from a close-up of her back, gradually pulling away to reveal more of the rooftop and city lights. The pull-back should be smooth, the lighting should not overexpose, and the trench coat hem should sway gently in the breeze while her posture remains unchanged.

Front view:

A man with slightly curly black hair, approximately 35 years old, stands in front of neon lights on a street, facing the camera directly. He wears a knee-length black leather trench coat (with subtle surface wear), a dark gray turtleneck sweater, leather gloves with metal buckles. His expression is calm yet imposing. Neon reflections from the city night cast across his facial contours. The camera starts from a medium shot (chest up) and slowly pushes in at a steady dolly speed toward a close-up (eyes and bridge of nose only).

Shot Scales

When using shot scale terms, follow this syntax: Subject + Shot Scale (e.g., "close-up of the man on the left", "half-body portrait of the woman in red").

Wide shot / Establishing shot:

In the yellow sand of a desert, a traveler wearing a long robe and goggles walks alone carrying a canvas backpack. The camera captures the vast horizon in a wide shot, panning steadily from left to right, keeping the traveler small in the frame.

Medium shot:

A short-haired girl sketches at a street corner, wearing a light-colored blouse and suspender skirt. The camera maintains a medium shot, slowly orbiting from directly in front to the right front.

Close-up / Extreme close-up:

The camera focuses on a woman's lip area — the full triangle from the corners of her mouth to below her nose. She wears a rose-bean shade lipstick with a slight moist highlight on the surface. She speaks a line — the lip movement is subtle but naturally paced. The background is a blurred spotlight, not distracting. The camera maintains its close-up composition and slowly pushes in.

Camera Movement

Use this formula: Starting composition + Movement type + Movement magnitude + Ending composition

Movement types: Push / Pull / Pan / Track / Follow / Rise / Descend / Whip pan / Orbit / Rotate / Zoom

Tilt up:

An elf boy stands beneath a giant glowing tree, points upward, and raises his head to look at the canopy. The camera rises — between the main branches of the tree, there's a nest made from branches, and inside it sits a large dinosaur egg emanating a mysterious glow.

Hitchcock zoom (dolly zoom):

Close-up of a Chinese girl with angular features wearing glasses, with dyed red short hair, frowning and looking directly at the camera. The background is a dilapidated amusement park — a stopped carousel and water slides in the distance, the ground covered in dirt and overgrown with weeds. The girl stubbornly chews her fingernails. Hitchcock camera movement: maintain the girl's framing composition while pulling the dolly back and increasing the focal length simultaneously.

Dolly-in (push-in):

Setting: a damp, late-night back alley with water-stained reflections on the ground and neon signs alternating blue and red. A man, approximately 35, with short messy black hair and slight stubble, stands with his back against a brick wall, facing the camera directly. He wears a deep black leather trench coat (with subtle surface wear) and a dark gray turtleneck sweater. His gaze is alert, brows furrowed, nostrils slightly flaring. The camera starts from a medium shot (chest up) and steadily pushes in at a smooth dolly speed, approaching his face, ultimately reaching an extreme close-up (only eyes and bridge of nose visible).

Special Effects

Playful Transformation Effects

Precisely describe the trigger moment:

She inadvertently touches an old Christmas ornament with her finger. The inside of the ornament instantly lights up with a soft golden glow, like a snow globe. This light ripples outward from the ornament — wherever it reaches, tiny points of light crystallize in the air. The light first wraps around the girl — her clothes reshape into Christmas attire with exquisite makeup. Simultaneously, a Christmas tree grows from the ground, its lights turning on one by one, snowflakes materialize and drift outside the window. The entire scene transforms into a Christmas-themed bedroom.

Precisely describe the transformation process:

The kitten is enveloped in a soft, warm bubble of glowing light. Its body gradually elongates and stands up. The fur evolves into fluffy orange short hair; the ears remain as cute cat ears; the tail sways gently. The outfit transforms into a Japanese-style casual hoodie and short skirt. The final form: an anime-style girl with cat pupils and cat ears, a playful expression, making a "meow" pose at the camera. Focus on the cute continuity from kitten to human expression — the transformation process should be cotton-candy smooth.

Cinematic Visual Effects

Precisely describe post-transformation details:

Her pupil color changes from blue to red. Starting from the corner of her eye, the previously delicate skin begins to harden and bulge. Dark black dragon scales seem to break through from beneath the skin, spreading rapidly along the cheekbone towards the neck. Accompanied by a small amount of dark red sparks leaking from the gaps between scales, half of her face completes the material transition from human skin to hardened dragon armor within two seconds. Dark Fantasy style, Cthulhu aesthetic, body horror beauty, extremely realistic 8K material detail.

Audio-enhanced transformation:

A beam of warm sunlight breaks through the dark clouds, shining directly at the center of a concrete wall. With the light spot as the epicenter, the gray concrete surface instantly fades and softens. Fresh green moss and vines radiate outward at time-lapse speed. Immediately, countless colorful wildflowers burst open on the vines. The previously lifeless wall transforms into a vertical sea of flowers swaying in the breeze within seconds. Solarpunk, Ghibli-style, vibrant and alive, colors shift from monotone gray to high-saturation brilliance.

Pro Tips for Better Results

Follow the formula: Structure your prompts as Subject + Motion + Environment + Camera Work + Aesthetics + Audio.
Be specific about characters: Use clear identifiers (clothing, features) and keep them consistent throughout.
Describe audio explicitly: Specify voice timbre, accent, emotion, and speech rate for each character.
Use cinematic terminology: Terms like "medium shot", "dolly-in", "tracking shot", "over-the-shoulder" give the model precise camera direction.
Mark shot transitions clearly: Number your shots and describe each cut point distinctly.
Reference well-known styles: Mentioning "Miyazaki anime style" or "Disney 2D animation" effectively sets the aesthetic direction.
Control BGM via descriptions: Specify music genre, instruments, rhythm, and emotional tone in your prompt.
Iterate and refine: Generate, review, and make small adjustments to your prompt for the best results.

Frequently Asked Questions

What is Seedance AI?

Seedance AI is ByteDance's flagship AI video generation platform — the same company behind TikTok and CapCut. It produces high-quality video from text prompts (text-to-video) and from images (image-to-video). Seedance has topped the Artificial Analysis leaderboards for both text-to-video and image-to-video tasks, outperforming models like Veo 3 and Kling 2.0 in complex prompt following and multi-scene storytelling.

The Seedance family of models is accessible via:

BytePlus API (official ByteDance developer platform)
fal.ai (serverless inference, popular with developers)
WaveSpeedAI and AIMLAPI (third-party providers)

What is Seedance 1.0? How Does It Compare to Seedance 2.0?

Seedance 1.0 is the first-generation flagship model from ByteDance. It introduced multi-shot storytelling, high-resolution generation, and strong semantic understanding — and it's still widely used today. Here's how it compares to Seedance 2.0 (the subject of this guide):

Feature	Seedance 1.0	Seedance 2.0
Architecture	Time-causal VAE + Spatio-temporal Transformer	Dual-branch Diffusion Transformer + cross-modal joint modules
Native Audio	❌ No native audio	✅ Full audio-visual joint generation (voice, SFX, BGM)
Resolution	Up to 1080p (24 FPS)	Up to 720p (with 480p option)
Clip Length	5 or 10 seconds	4, 5, 8, or 12 seconds
Multi-shot	✅ Native multi-shot storytelling	✅ Advanced multi-shot with audio sync
Lip Sync	Basic	Millisecond-precision lip sync
Language Support	Chinese & English	Mandarin, dialects, English, Japanese, Korean, Spanish, and more
Best for	Fast prototyping, 1080p output, open-source workflows	Cinematic dialogue scenes, audio-visual productions

Seedance 1.0 (also known as Seedance 1.0 Pro for the full version and Seedance 1.0 Lite for the faster browser-based version) generates a 5-second 1080p video in approximately 41.4 seconds on NVIDIA L20 hardware — designed for speed and efficiency.

Seedance 2.0 focuses on audio-visual unity: if your project requires native character voice, multi-language dialogue, lip-sync, or synchronized sound effects and music, Seedance 2.0 is the right choice.

What is `bytedance/seedance-v1-pro-i2v-480p`?

bytedance/seedance-v1-pro-i2v-480p is the model identifier for the Seedance 1.0 Pro Image-to-Video (480p) model, most commonly used via the fal.ai serverless API.

This model takes a static image + text prompt and generates a short animated video clip at 480p resolution. It's optimized for speed and cost-efficiency — ideal for rapid prototyping, high-volume social media content, and turning static assets into motion-ready video.

Key API parameters:

Parameter	Description	Example
`prompt`	Text description guiding motion	`"The cat stretches and yawns"`
`image_url`	URL of the source image (first frame)	`"https://..."`
`resolution`	Output resolution	`"480p"`, `"720p"`
`duration`	Clip length in seconds	`5`, `10`
`camera_fixed`	Lock the camera position	`true`, `false`
`seed`	Random seed for reproducibility	Any integer
`end_image_url`	Optional end frame for continuity	`"https://..."`

Example fal.ai usage (Node.js):

import * as fal from "@fal-ai/serverless-client";

const result = await fal.subscribe("fal-ai/bytedance/seedance/v1/pro/image-to-video", {
  input: {
    prompt: "The cat slowly stretches and yawns, blinking into the morning light.",
    image_url: "https://your-image-url.com/cat.jpg",
    resolution: "480p",
    duration: 5,
    camera_fixed: false,
    seed: 42,
  },
});

console.log(result.video.url);

When to use bytedance/seedance-v1-pro-i2v-480p:

You have an existing image and want to animate it
Speed and cost matter more than maximum resolution
You're prototyping or generating high volumes of short clips
You want to chain clips using end_image_url for visual continuity

For higher resolution output, use the 720p or 1080p variants. For text-only generation (no image input), use the text-to-video endpoint instead.

Conclusion

Seedance 2.0 represents a significant leap in AI video generation, combining native audio-visual joint generation with cinematic-quality motion, multi-language dialogue support, and sophisticated camera control — all in a single model.

The key to getting great results is writing detailed, structured prompts that specify the subject, motion, camera work, environment, dialogue, and audio design. Whether you're creating short films, commercials, or creative content, mastering Seedance 2.0 prompting will unlock a new level of creative output.

Start with the ready-to-copy templates above, experiment with different styles and camera techniques, and let your creative vision guide the way.

All Posts

Author

Accept Prompt

Waitlist

Early Access

Be the first to know when AcceptPrompt launches. Sign up to get early access and exclusive updates.

Be the first to join. Free early access, 50% off when subscribe. No spam, ever.

Photo by Nick Morrison on Unsplash

2026/03/09

The Guide to Prompting Seedance 2.0: Tips, Techniques & Prompt Templates

This guide covers everything you need to know about prompting Seedance 2.0 effectively, with ready-to-copy prompt blocks you can use right away.

Key Advantages of Seedance 2.0

High-Fidelity Audio-Visual Sync: Generates video with integrated audio output — environmental sounds, action sounds, synthesized audio, instruments, background music, and human voice.
Multi-Person, Multi-Language Dialogue: Supports monologues and multi-person conversations with millisecond-precision lip-sync. Covers Mandarin, Chinese dialects, English, and other languages — delivering natural, realistic dialogue.
Cinematic Narrative Quality: Natural motion amplitude, strong rhythmic pacing, precise action details. Expressive character emotions and facial expressions elevate the output to cinematic-grade quality.

Prompt Parameters

When using the video generation API, the key prompt-related parameters are:

Required: Text Prompt

Supports both Chinese and English input.

Optional: Output Parameters

Control the video output specifications with: resolution, ratio, duration, seed, camera_fixed, watermark.

Method 1 (Recommended): Pass parameters in the request body

{
    "model": "seedance-2-0-pro",
    "content": [
        {
            "type": "text",
            "text": "A kitten yawns at the camera"
        }
    ],
    "resolution": "720p",
    "ratio": "16:9",
    "duration": 5,
    "seed": 11,
    "camera_fixed": false,
    "watermark": true
}

Method 2: Append parameters to the text prompt

A kitten yawns at the camera --rs 720p --rt 16:9 --dur 5 --seed 11 --cf false --wm true

Or with full parameter names:

A kitten yawns at the camera --resolution 720p --ratio 16:9 --duration 5 --seed 11 --camerafixed false --watermark true

Parameter	Description	Accepted Values
`resolution`	Video resolution	`480p`, `720p`
`ratio`	Aspect ratio	`16:9`, `1:1`, `21:9`, `4:3`, `3:4`, `9:16`
`duration`	Video length in seconds	`4`, `5`, `8`, `12`
`seed`	Random seed for reproducibility	Any integer
`camera_fixed`	Lock the camera in place	`true`, `false`
`watermark`	Include watermark	`true`, `false`

The Prompt Formula

Subject + Motion + Environment (optional) + Camera Work (optional) + Aesthetic Description (optional) + Audio (optional)

By describing dialogue content, language type, emotional changes, camera movement, and narrative structure, you can achieve professional-level audio-visual consistency.

Core Principles

1. Describe Essential Information

There are three sub-principles to follow:

a) Specify the subject and movement clearly

A man with a weathered face, dressed in medieval pirate attire, stands on black rocks by the sea. His expression is passionate — he raises his hands powerfully toward the sky, his movements revealing a deep longing for freedom.

b) Describe what the scene should visually convey

In a raging storm, enormous waves roll across the sea. The water crashes into a city, smashing buildings along the shore. Hundreds of citizens flee in terror. Finally, the tsunami engulfs everything.

c) Use adverbs of degree wisely

The doll first rotates slowly, then she stops spinning and faces the camera to show her cuteness.

2. Describe Clear, Unambiguous Information

a) Ensure proper correspondence between prompt, video, and audio

A model showcases her qipao (cheongsam), exuding elegance and charm.

b) Use visual features to identify subjects — and keep identifiers consistent throughout

A film shoot at a race track. From left to right in the frame: a race car driver, a director, and a cameraman. The person on the far left wearing the racing suit is the race car driver; the young Chinese man in the middle is the director; the Black man on the far right holding the camera is the cameraman. Wide shot — the Black cameraman looks toward the director and asks in English with a puzzled expression: "We got it?" The camera slowly pushes in to a medium shot of the director and race car driver. The race car driver says confidently in French: "Perfecto" with a proud smile. The director hears this, nods, lowers his OK hand gesture, and says in Sichuan dialect with satisfaction: "有了有了，这条过" (Got it, got it — that's a wrap).

3. Write Precise Shot Transition Descriptions

a) Clearly distinguish each shot and tell the model exactly when to cut

Shot 1 is a side-angle medium shot — the boy looks out the window. He says: "大丈夫だと思ってた……" (I thought I was fine...) Then cut to Shot 2, a close-up of the boy's face. In Shot 2, he says: "でも、たぶん自分に嘘ついてただけだ。" (But I was probably just lying to myself.)

b) Write precise timing for shot transitions

The shot starts with a medium shot of three people in frame. The Black man in the middle speaks: "We need to clear this up." Cut to Shot 2 — a close-up of the woman on the left, who calmly responds: "I've already made my choice." Cut to Shot 3 — a close-up of the white man, who lets out a soft sigh: "The problem is, your choice affects all of us." Finally, the shot returns to the medium three-person frame, the tension clearly escalating.

c) Ensure clear visual distinction between shots (different framing / content)

Shot 1: Front-facing medium shot. An ordinary bedroom at night, faint city light through the window. An adult male faces the camera, wearing a plain T-shirt and jeans. He frowns at his hands — tiny energy particles begin appearing in the air, and the room lights flicker once.
Shot 2: Cut to hand close-up. Blue-white energy rapidly envelops his hands, flowing like liquid metal mixed with light, spreading from fingertips to arms.
Shot 3: Cut to face close-up. The energy climbs along his neck and jawline; clear hero-armor patterns emerge on his skin, and his eyes glow cold white.
Shot 4: Cut to front medium-wide shot. Energy bursts across his entire body, clothing consumed and reconstructed by light-energy. A complete superhero suit rapidly forms: metallic armor, sleek design, chest emblem glowing bright. He says: "Guess there's no going back."

Audio Generation Guide

Seedance 2.0 excels at generating synchronized audio. Here's how to control each type.

Voice & Dialogue

Fixing Voice Timbre with Detailed Descriptions

Use this formula: Gender + Age Range + Voice Attributes + Speech Rate + Emotional Baseline

Single-person scene:

A female, approximately 18-22 years old. Her voice range is on the higher side but not shrill; her delivery is light and quick with a moderate breathy quality; her tone is bright and elastic. Speech rate is medium-fast with noticeable intonation variation. Emotional baseline: positive, outgoing, with a slight sense of excitement and youthful vitality. She speaks in Mandarin Chinese. She says: "如果有变动，记得第一时间跟我说一声。" (If there are any changes, make sure to tell me right away.)

Dialogue scene:

Two men face each other in an office area, the overall atmosphere is relaxed, the shot remains stable with no cuts. The first male speaks — his voice is mid-range, natural and unforced, speech rate medium-fast, emotional baseline casual with a hint of concern — he says: "你现在主要卡在哪一块？" (What's the main thing you're stuck on right now?) The other male responds — his voice is mid-range to low, steady delivery, medium speech rate, emotional baseline calm and cooperative — he says: "核心部分已经处理好了，就是细节还要再对一遍。" (The core part is done; I just need to double-check the details.)

Multi-Language & Dialect Support

Seedance 2.0 supports precise lip-sync across multiple languages and dialects:

Chinese: Mandarin, Shaanxi dialect, Sichuan dialect, Cantonese, and more
Foreign languages: English, Japanese, Korean, Spanish, Indonesian, and more

Cantonese example:

He says in Cantonese: "你好靓呀！，我好中意你呀！" (You're so pretty! I really like you!)

Sichuan dialect example:

A scorching summer day. Under the shade of trees in an old residential area in Sichuan, two middle-aged men sit on small stools fanning themselves — one in a white tank top, one in a gray tank top, with an electric scooter parked nearby. Fixed camera, medium shot, slow pace. The man in the white tank top says: "这天气热得遭不住哦。" (This heat is unbearable.) The man in the gray tank top replies: "忍一哈嘛，等会儿就凉快咯。" (Just bear with it — it'll cool down soon.)

English multi-accent example:

In an office break room, the atmosphere is light with a touch of humor. A middle-aged Indian man and a young Japanese male colleague stand by the coffee machine. The Japanese man asks calmly: "What materials will be prepared for this afternoon's project?" The Indian man immediately replies in a noticeably accented, rapid tone: "Why did you only ask? Where is the competing product analysis report that the client wants? Hurry up and get it, it's due at two o'clock!" The Japanese man answers somewhat flustered yet helplessly: "I'll go right away, I've been editing the PPT just now..." He nods and exits the frame.

Lip-Sync Matching for Multi-Person Dialogue

For dialogue scenes, precisely define each character's unique features (gender/age/clothing/actions) so the model can match lip movements to the correct speaker.

Two-person dialogue:

The apprentice asks hesitantly: "师傅，这里…角度再大一点会不会更牢固？" (Master, would a slightly bigger angle here make it sturdier?) The old master gently shakes his head, runs his rough fingers over the wood grain, and says slowly: "不不不，孩子，过刚易折。你看这木头的性子，得顺着它来。" (No, no, no, child — too rigid and it'll snap. See the nature of this wood? You have to follow its grain.) His voice is steady and full of wisdom.

Multi-person dialogue:

The boy resting his chin on his hand sighs loudly: "已经四十分钟了。" (It's been forty minutes.) The girl with her hand raised keeps trying to flag a waiter while muttering: "我发誓他们忘了我们的单。" (I swear they forgot our order.) The boy holding the water pitcher pours water calmly: "没事，今天是周六晚上。" (It's fine, it's Saturday night.) The girl on her phone looks up and jokes: "照这个速度，这将是早餐了。" (At this rate, it'll be breakfast.)

Voiceover Support

Seedance 2.0 supports controlling voiceover timbre, emotion, tone, and speech rate.

Documentary style:

Generate a video with voiceover: A deep, calm male voice says: "在宇宙浩瀚的寂静中，我们的世界不过是一个短暂的瞬间。然而，在其中，生命不顾一切地繁荣。" (In the vast silence of the universe, our world is merely a fleeting moment. Yet within it, life flourishes against all odds.) The scene should slowly transition from night to dawn, stars gradually fading as the sun rises from behind the mountains.

Commercial/Advertisement:

Based on the input lipstick product first-frame image, generate a video maintaining realistic lipstick and character appearance, proportions, and materials. Overall style: high-end beauty e-commerce ad — clean, refined, premium. The video consists of three consecutive shots: Shot 1 — an extremely slow, steady push-in showcasing the lipstick's overall design, rich color saturation, soft diffused warm light, clean blurred background. Shot 2 — cut to model face close-up, model holds lipstick near face in a portrait-style composition, natural skin tone, calm and confident expression. Shot 3 — cut to lipstick close-up on a clean surface, restrained surroundings, soft lighting, stable camera, emphasizing product texture. Voiceover: clear, confident, premium female commercial voice, moderate pace, elegant and restrained tone, synced with visuals: "Rich color. Smooth texture. One swipe delivers radiant lips. Lightweight, comfortable, and effortlessly elegant."

Sound Effects (SFX)

Seedance 2.0 generates contextually appropriate sound effects automatically.

Generate a video showing the rain outside the window getting slightly heavier, raindrops gathering into streams on the glass, and a person with an umbrella hurrying past outside the window.

At dusk, a large fuel depot explodes, a fireball rising into the sky.

Background Music (BGM)

Seedance 2.0 generates background music that matches the prompt by default. You can control it with specific descriptions.

Control music style:

A magnificent epic aerial video — the camera sweeps through mist-shrouded majestic mountains and ancient castles. The background music is awe-inspiring orchestral symphony, full of power and hope.

Control rhythm through prompt:

A cartoon character in the center of the screen. The background music is an upbeat pop song. The cartoon character claps along to the beat of the music.

Control BGM emotion:

Fingers gently brush across each face in the photographs. The background music should be a warm, nostalgic, beautifully melodic guitar or piano solo. The music should carry a complex emotion of gentle reminiscence intertwined with happiness — warmth with a soft undercurrent of the passage of time.

Shot Transition Techniques

Consistent Style Across Cuts

Seedance 2.0 maintains visual style consistency across shot transitions. Here are examples in different styles:

Disney-style animation:

The shot starts with a medium shot of both characters. The girl turns to look at the boy and says confidently with a smile: "我们一定能做到！" (We can definitely do this!) Cut to a close-up of the boy, who responds hesitantly: "你确定吗？" (Are you sure?) Cut back to the girl in medium close-up — she turns and opens her arms, saying brightly: "当然！因为我们已经走到这一步了。" (Of course! Because we've already come this far.) The camera naturally settles with her gesture, the mood bright and resolute.

Pixar style:

The shot starts with a medium shot of father and son. The boy lowers his head and says quietly: "我是不是让你失望了？" (Did I disappoint you?) Cut to Shot 2 — a close-up of the father, who pauses briefly before softly answering: "不。" (No.) Cut to Shot 3 — a close-up of the boy looking up. Finally, cut back to the father, who smiles and says: "我只是担心你会先对自己失望。" (I was just worried you'd be disappointed in yourself first.)

Realistic style:

The shot starts with a close-up of the cat food bowl. Cut to Shot 2 — a front close-up of the cat, which doesn't approach immediately but watches quietly. Cut to Shot 3 — a medium close-up of the owner, who says softly: "这次换了新的。" (I changed to a new one this time.) Cut to Shot 4 — a side close-up of the cat, which slowly approaches, lowers its head to sniff, and finally begins eating. Final cut — a close-up of the owner, who says softly: "看来合你胃口。" (Looks like it suits your taste.) End with a stable wide shot of cat and food bowl together, pace slowing down.

Shot-Reverse-Shot for Dialogue

Two-person:

The shot starts with a medium close-up of the detective, who says evenly: "你在十一点四十七分回到那条巷子，这不是巧合。" (You returned to that alley at 11:47 — that's no coincidence.) Cut to a close-up of the suspect, who smirks slightly and looks away: "巧合这种东西，事后看起来总是很有计划。" (Coincidences always look planned in hindsight.) Cut back to an even tighter close-up of the detective — his expression noticeably colder, but he remains silent. The tension builds with each cut.

Three-person:

The shot starts with a medium three-person frame. The shorter man glances at his watch and says: "It's about time." Cut to Shot 2 — a medium close-up of the girl in the middle, who furrows her brow and responds: "Let's wait a little longer. They might arrive any minute." Cut to Shot 3 — a close-up of the tall man looking toward the end of the street, his tone calm but impatient: "We've been waiting for a long time." Final cut back to the three-person medium shot — their gazes briefly intersect, no one speaks, an awkward silence.

Advanced Techniques

Reference Specific Visual Styles

Specifying aesthetic references can produce videos with distinctive style and harmonious audio-visual elements.

Japanese drama "Little Forest" style:

In the style of the Japanese drama "Little Forest," generate a video of a girl picking apples in an orchard. The girl wears a pink plaid headscarf, has a sweet appearance, and carries a small canvas bag. She picks an apple from the tree, carefully wipes it clean, then tastes it.

Miyazaki/Studio Ghibli style:

In the style of Miyazaki anime, generate a video of a girl picking apples in an orchard. The girl wears a pink plaid headscarf, has a sweet appearance, and carries a small canvas bag. She picks an apple from the tree, carefully wipes it clean, then tastes it.

Disney 2D animation style:

In the style of Disney 2D animated films, generate a video of a girl picking apples in an orchard. The girl wears a pink plaid headscarf, has a sweet appearance, and carries a small canvas bag. She picks an apple from the tree, carefully wipes it clean, then tastes it.

Use Cinematography Terminology

Professional camera terminology significantly improves the quality of camera movement and overall visual impact.

Camera Angles

Camera position terms: High angle / Low angle / Bird's eye / Worm's eye / Eye level / Top-down / Straight-up

High angle:

A high-angle bird's eye view of a serene forest clearing. Autumn wind sweeps ginkgo leaves across slate stone paths. The camera slowly pushes in to focus on a bronze key half-buried in a pile of fallen leaves.

Eye-level:

Eye-level medium shot tracking a skateboarder, 45mm wide-angle lens shoulder-height with him. Water splashes sideways across the frame as the front wheels hit a puddle.

Low angle:

In heavy rain, a homeless man hugs his knees and curls up under a fire escape. A low-angle shot from between his knees looking up at his face — his features exaggerated in the wide-angle distortion. A clap of thunder startles him and he raises his head to look at the sky; the camera follows his movement, tilting up to the dark, rain-filled sky above.

Narrative Perspectives

Over-the-shoulder:

The camera shoots from behind Character A's shoulder, focusing on Character B's facial expression across the table. The two converse by a café window — B slowly lowers the coffee cup. The camera subtly pushes forward following B's body language, with blurred pedestrians visible through the window in the background. Side light from the window creates a rim light on A's shoulder.

Telescope perspective:

Through the telescope, you see a bard in a black cloak walking toward you from across the bridge. First you notice a deep scar running through his eye and left cheek. A sun-totem ceramic pendant hangs from his neck. A well-worn water flask is clipped to his belt. Looking at his boots — covered in heavy dust and scratches, as if he's walked a very long way. Behind him, his horse's eyes are deep and wise.

Surveillance fisheye perspective:

Fixed fisheye surveillance camera. In the center of the frame, a person anxiously paces inside a sealed room — their figure relatively normal in the center area. But the room's four walls, ceiling, and floor compress and curve inward under the fisheye effect, as if the entire space is collapsing toward them.

Subject Angles

Side view / Profile:

A young woman draws open the curtains — thick striped fabric curtains that almost completely block the floor-to-ceiling window, surrounded by many green plants. The gallery interior is dim and cool-toned. Window side close-up, shallow depth of field. Through the gap in the curtains, morning sunlight is visible — Tyndall rays streaming in.

Back view:

A woman in a white long trench coat with shoulder-length short hair stands at the edge of a city rooftop, overlooking the night view, her back to the camera. The shot starts from a close-up of her back, gradually pulling away to reveal more of the rooftop and city lights. The pull-back should be smooth, the lighting should not overexpose, and the trench coat hem should sway gently in the breeze while her posture remains unchanged.

Front view:

A man with slightly curly black hair, approximately 35 years old, stands in front of neon lights on a street, facing the camera directly. He wears a knee-length black leather trench coat (with subtle surface wear), a dark gray turtleneck sweater, leather gloves with metal buckles. His expression is calm yet imposing. Neon reflections from the city night cast across his facial contours. The camera starts from a medium shot (chest up) and slowly pushes in at a steady dolly speed toward a close-up (eyes and bridge of nose only).

Shot Scales

When using shot scale terms, follow this syntax: Subject + Shot Scale (e.g., "close-up of the man on the left", "half-body portrait of the woman in red").

Wide shot / Establishing shot:

In the yellow sand of a desert, a traveler wearing a long robe and goggles walks alone carrying a canvas backpack. The camera captures the vast horizon in a wide shot, panning steadily from left to right, keeping the traveler small in the frame.

Medium shot:

A short-haired girl sketches at a street corner, wearing a light-colored blouse and suspender skirt. The camera maintains a medium shot, slowly orbiting from directly in front to the right front.

Close-up / Extreme close-up:

The camera focuses on a woman's lip area — the full triangle from the corners of her mouth to below her nose. She wears a rose-bean shade lipstick with a slight moist highlight on the surface. She speaks a line — the lip movement is subtle but naturally paced. The background is a blurred spotlight, not distracting. The camera maintains its close-up composition and slowly pushes in.

Camera Movement

Use this formula: Starting composition + Movement type + Movement magnitude + Ending composition

Movement types: Push / Pull / Pan / Track / Follow / Rise / Descend / Whip pan / Orbit / Rotate / Zoom

Tilt up:

An elf boy stands beneath a giant glowing tree, points upward, and raises his head to look at the canopy. The camera rises — between the main branches of the tree, there's a nest made from branches, and inside it sits a large dinosaur egg emanating a mysterious glow.

Hitchcock zoom (dolly zoom):

Close-up of a Chinese girl with angular features wearing glasses, with dyed red short hair, frowning and looking directly at the camera. The background is a dilapidated amusement park — a stopped carousel and water slides in the distance, the ground covered in dirt and overgrown with weeds. The girl stubbornly chews her fingernails. Hitchcock camera movement: maintain the girl's framing composition while pulling the dolly back and increasing the focal length simultaneously.

Dolly-in (push-in):

Setting: a damp, late-night back alley with water-stained reflections on the ground and neon signs alternating blue and red. A man, approximately 35, with short messy black hair and slight stubble, stands with his back against a brick wall, facing the camera directly. He wears a deep black leather trench coat (with subtle surface wear) and a dark gray turtleneck sweater. His gaze is alert, brows furrowed, nostrils slightly flaring. The camera starts from a medium shot (chest up) and steadily pushes in at a smooth dolly speed, approaching his face, ultimately reaching an extreme close-up (only eyes and bridge of nose visible).

Special Effects

Playful Transformation Effects

Precisely describe the trigger moment:

She inadvertently touches an old Christmas ornament with her finger. The inside of the ornament instantly lights up with a soft golden glow, like a snow globe. This light ripples outward from the ornament — wherever it reaches, tiny points of light crystallize in the air. The light first wraps around the girl — her clothes reshape into Christmas attire with exquisite makeup. Simultaneously, a Christmas tree grows from the ground, its lights turning on one by one, snowflakes materialize and drift outside the window. The entire scene transforms into a Christmas-themed bedroom.

Precisely describe the transformation process:

The kitten is enveloped in a soft, warm bubble of glowing light. Its body gradually elongates and stands up. The fur evolves into fluffy orange short hair; the ears remain as cute cat ears; the tail sways gently. The outfit transforms into a Japanese-style casual hoodie and short skirt. The final form: an anime-style girl with cat pupils and cat ears, a playful expression, making a "meow" pose at the camera. Focus on the cute continuity from kitten to human expression — the transformation process should be cotton-candy smooth.

Cinematic Visual Effects

Precisely describe post-transformation details:

Her pupil color changes from blue to red. Starting from the corner of her eye, the previously delicate skin begins to harden and bulge. Dark black dragon scales seem to break through from beneath the skin, spreading rapidly along the cheekbone towards the neck. Accompanied by a small amount of dark red sparks leaking from the gaps between scales, half of her face completes the material transition from human skin to hardened dragon armor within two seconds. Dark Fantasy style, Cthulhu aesthetic, body horror beauty, extremely realistic 8K material detail.

Audio-enhanced transformation:

A beam of warm sunlight breaks through the dark clouds, shining directly at the center of a concrete wall. With the light spot as the epicenter, the gray concrete surface instantly fades and softens. Fresh green moss and vines radiate outward at time-lapse speed. Immediately, countless colorful wildflowers burst open on the vines. The previously lifeless wall transforms into a vertical sea of flowers swaying in the breeze within seconds. Solarpunk, Ghibli-style, vibrant and alive, colors shift from monotone gray to high-saturation brilliance.

Pro Tips for Better Results

Follow the formula: Structure your prompts as Subject + Motion + Environment + Camera Work + Aesthetics + Audio.
Be specific about characters: Use clear identifiers (clothing, features) and keep them consistent throughout.
Describe audio explicitly: Specify voice timbre, accent, emotion, and speech rate for each character.
Use cinematic terminology: Terms like "medium shot", "dolly-in", "tracking shot", "over-the-shoulder" give the model precise camera direction.
Mark shot transitions clearly: Number your shots and describe each cut point distinctly.
Reference well-known styles: Mentioning "Miyazaki anime style" or "Disney 2D animation" effectively sets the aesthetic direction.
Control BGM via descriptions: Specify music genre, instruments, rhythm, and emotional tone in your prompt.
Iterate and refine: Generate, review, and make small adjustments to your prompt for the best results.

Frequently Asked Questions

What is Seedance AI?

The Seedance family of models is accessible via:

BytePlus API (official ByteDance developer platform)
fal.ai (serverless inference, popular with developers)
WaveSpeedAI and AIMLAPI (third-party providers)

What is Seedance 1.0? How Does It Compare to Seedance 2.0?

Feature	Seedance 1.0	Seedance 2.0
Architecture	Time-causal VAE + Spatio-temporal Transformer	Dual-branch Diffusion Transformer + cross-modal joint modules
Native Audio	❌ No native audio	✅ Full audio-visual joint generation (voice, SFX, BGM)
Resolution	Up to 1080p (24 FPS)	Up to 720p (with 480p option)
Clip Length	5 or 10 seconds	4, 5, 8, or 12 seconds
Multi-shot	✅ Native multi-shot storytelling	✅ Advanced multi-shot with audio sync
Lip Sync	Basic	Millisecond-precision lip sync
Language Support	Chinese & English	Mandarin, dialects, English, Japanese, Korean, Spanish, and more
Best for	Fast prototyping, 1080p output, open-source workflows	Cinematic dialogue scenes, audio-visual productions

What is `bytedance/seedance-v1-pro-i2v-480p`?

bytedance/seedance-v1-pro-i2v-480p is the model identifier for the Seedance 1.0 Pro Image-to-Video (480p) model, most commonly used via the fal.ai serverless API.

Key API parameters:

Parameter	Description	Example
`prompt`	Text description guiding motion	`"The cat stretches and yawns"`
`image_url`	URL of the source image (first frame)	`"https://..."`
`resolution`	Output resolution	`"480p"`, `"720p"`
`duration`	Clip length in seconds	`5`, `10`
`camera_fixed`	Lock the camera position	`true`, `false`
`seed`	Random seed for reproducibility	Any integer
`end_image_url`	Optional end frame for continuity	`"https://..."`

Example fal.ai usage (Node.js):

import * as fal from "@fal-ai/serverless-client";

const result = await fal.subscribe("fal-ai/bytedance/seedance/v1/pro/image-to-video", {
  input: {
    prompt: "The cat slowly stretches and yawns, blinking into the morning light.",
    image_url: "https://your-image-url.com/cat.jpg",
    resolution: "480p",
    duration: 5,
    camera_fixed: false,
    seed: 42,
  },
});

console.log(result.video.url);

When to use bytedance/seedance-v1-pro-i2v-480p:

You have an existing image and want to animate it
Speed and cost matter more than maximum resolution
You're prototyping or generating high volumes of short clips
You want to chain clips using end_image_url for visual continuity

For higher resolution output, use the 720p or 1080p variants. For text-only generation (no image input), use the text-to-video endpoint instead.

Conclusion

Start with the ready-to-copy templates above, experiment with different styles and camera techniques, and let your creative vision guide the way.

All Posts

Author

Accept Prompt

Waitlist

Early Access

Be the first to know when AcceptPrompt launches. Sign up to get early access and exclusive updates.

Be the first to join. Free early access, 50% off when subscribe. No spam, ever.

The Guide to Prompting Seedance 2.0: Tips, Techniques & Prompt Templates

Author

Categories

More Posts

Top 10 Best AI Video Generators in 2026

AI Video Prompts: The Complete Guide to Cinematic, Viral & Movie-Quality Results

The Guide to Prompting Google Veo 3.1: Tips, Techniques & Prompts

Waitlist

The Guide to Prompting Seedance 2.0: Tips, Techniques & Prompt Templates

Author

Categories

More Posts

Top 10 Best AI Video Generators in 2026

AI Video Prompts: The Complete Guide to Cinematic, Viral & Movie-Quality Results

The Guide to Prompting Google Veo 3.1: Tips, Techniques & Prompts

Waitlist