What this feature does
With Reference Image Clips you can take a single photo—of a character, spokesperson, historical figure, or even yourself—and instantly generate up to five short video clips starring that exact person in totally different settings, outfits, or time‑periods. Perfect for dialogue projects, lessons, product promos, or any story that needs the same face across multiple scenes.
📺 Watch the 4‑minute walkthrough video at the bottom of this article to see the entire workflow in action.
What you need
- A clear image (JPEG/PNG) – Face forward is best; at least 512 × 512 px.
- Scene ideas – One sentence per scene is ideal.
- 5–7 minutes – Rendering happens in the cloud.
Step‑by‑step
- Open or create a project in Kukarella and click Create Clip.
- Upload your reference image. Drag‑and‑drop the file or click Browse to select it.
- Describe each scene (maximum five per clip).
- Keep it short (1–2 sentences).
- Mention setting, outfit, and key action.
- Example: “Young lady in a Victorian dress walks slowly along a foggy London street, dipping her head as a carriage passes.”
- Click Create Clip. A status bar shows progress. Typical render time: 5–7 minutes.
- Preview & download. Hover each thumbnail to play. Click Download to save MP4 files individually, or Download All for a ZIP.
Tips for best results
- Be specific, yet concise. The AI focuses on nouns and verbs.
- Stick to one main subject per scene. Crowded prompts can confuse the model.
- Use era‑appropriate details. e.g., “peasant dress, 19th‑century French village.”
- Expect small imperfections. We’re improving realism every week—your feedback helps!
Frequently asked questions
How long can each clip be? 3–6 seconds. Shorter clips keep render times fast.
Can I create more than five scenes? Not yet. Let us know if you’d like a higher limit.
Does the same person always appear? Yes. The engine locks onto the face in your reference photo.
Will this work with group photos? Only the most prominent face is used; for multiple actors, upload separate reference images and generate clips individually.
Walkthrough video (full transcript below)
<iframe width="560" height="315" src="https://www.youtube.com/embed/vK_mWCOk7e4?si=OmeN1bGPA6ndfMrT" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
0:00 Hello. In Kukarella, we have an option to create images and short video clips to illustrate your projects. So, let's say that you, created dialogue between two people on a market, or between professor and students, and so on.
0:22 and you need some visuals. You can do that, but, today we released a new feature, when you can use, reference image and create multiple scenes with person from that image.
0:36 So, let's say that we have photo of this, girl. We can add this image as a reference, and now we can describe scenes.
0:50 I've played with ChatGPT and, created few, descriptions, so that we could test. So, let's say that we want to, see this person in different countries, in different countries.
1:02 different outfits. We make, maybe, narration about, fashion. Okay, so, that was the first one. peasant girlfriends, 19th century. Second one, lady in London.
1:17 So, second scene, we'll, show same person in a Victorian dress, walking slowly down a foggy London street. And next one, First Nations warrior, Canada, early 20th century.
1:36 Let's add a scene. Young indigenous woman in traditional warrior clothing walks through a know if forest, then stops and raises her spear.
1:45 astronaut, astronaut, a girl in an astronaut suit floats in zero gravity, spins gently, then waves towards a window with Earth behind her.
1:58 That will be scene four. And the last one. Tour guide, ROM 50s. Right now, we set a limit for five scenes, but, if there is a demand we can, increase it, but I think that for now it should be enough.
2:20 So, we created, five scenes, description for five video clips, and now we click create clip, and wait for five, maybe seven minutes.
2:40 Okay, so it's done. Yann Yol, in a simple 1800s French dress, walked through a village field. holding, a basket and looking around.
2:56 So, this is the first video clip. Second one. Young lady, in a Victorian dress, walks slowly a foggy London street, dipping her head as a carriage passes.
3:12 yeah, kind of. It's an AI, so we need to excuse it. It's not perfect, but it's getting there. It's going there.
3:22 So, third one. Young indigenous woman in traditional warrior clothing. So, it's the The same image, the same person from that photo, and look, it, It, it did great, I think.
3:41 Okay, so it looks like it created, Ah, maybe I messed up somewhere, but it created two video clips with, from the same prompt.
3:57 Now, guide in Rome. So, as you see, with just few simple prompts you can create images, video clips from, one image.
4:16 So you you can you can use a person from that image for, demonstration of your, dialogue, narration, voiceover, any educational materials.
4:27 So feel, free to try, to give it a try, and share your thoughts. Thanks for watching. Bye.
We’d love your feedback
Try the feature and tell us what you create! Share clips or questions at support@kukarella.com or via the in‑app chat.
Happy voicing! 🎙️