I started this account at 18 years old. From Sweden. Not a native English speaker.
If I'd needed to sit in front of a camera and talk on day one, I'd have quit before I posted the first video. The friction was going to kill the project.
So I built it faceless. AI avatar talking to camera, voice generated, script delivered without me ever filming. That account is now at 270k.
The original setup was rough. Barely any mouth movement. Looked like an animated PowerPoint. But the flow has updated a lot since then, and there's a new reason this matters.
Even creators with millions of followers are cloning themselves now. Not because they're shy. Because recording, retakes, lighting, and editing eats most of the week. Once you have a clone of yourself, your "filming time" drops to zero.
Faceless or full clone, here is the 2026 setup.
What you need:
ElevenLabs - Text-to-speech voiceover - $22/mo
Nano Banana - Generates your avatar still image - Free
HeyGen - Turns avatar + audio into video - $29/mo
Topaz Video AI - Upscales to 60fps, sharper video - $299/year
Step 1. Generate your voiceover in ElevenLabs
Drop your script into ElevenLabs and pick a voice you like, hit go.
Straight forward but here’s three tricks that change everything…
First, add an exclamation mark at the end of every sentence. Sounds wrong on paper. Works on output. The "!" punches energy into the delivery, raises words-per-minute, and pulls the voice out of flat, monotone reading mode. Most voices default to a soft documentary cadence. Exclamations flip them into "creator talking to camera" mode.
Second, play with capitalization to control emphasis. Write the words you want hit harder in ALL CAPS. Write the throwaway connectors in lowercase. ElevenLabs reads case as intensity, so you can sculpt the delivery line by line without ever touching settings.
Third, Increase the speed slider. Most voices are slow, and slightly increasing the speed will allow your content to deliver more value per second, aka better view-retention, aka more views. - Just make sure it is not too fast.
Example:
Without: "this new AI is completely taking over social media right now, because Google…"
With: "This NEW AI is COMPLETELY taking over SOCIAL MEDIA right NOW! because Google…”
Generate. Listen back. Tweak the caps and exclamations until it grabs attention.

Pro tip: Cut your audio in your editor before doing anything else. Trim dead space, take the best lines if you have multiple generations. Cutting this before step 3 will save you credits and ensure you won’t have any visible cuts in your final video.
Step 2. Generate your avatar in GPT Image 2 or Nano Banana
You need a single high-quality still image of your avatar. This is the face HeyGen will animate.
Here is the exact prompt I use. Fill in the bracketed variables to match what you want:
Cinematic photoreal studio portrait of a [GENDER] in their [AGE_RANGE],
[ETHNICITY] with [HAIR], wearing [GLASSES], [FACIAL_HAIR], [BUILD] build,
dressed in a [OUTFIT]. The subject is seated at a warm wooden desk in a
creator studio, leaning slightly forward with arms relaxed near a silver
laptop on the desk in front of them. A professional podcast condenser
microphone on a black articulating boom arm with a red XLR cable enters
from the right side of the frame and sits just in front of the subject's
shoulder. The subject's expression: [EXPRESSION], looking directly into
the lens with quiet presence.
Background (heavily out of focus, shallow depth of field): a [BACKDROP]
forms the immediate backdrop, accented by [PRACTICALS] casting warm pools
of light from the left and right side of the frame. Two or three blurred
computer monitors are visible in the deep background showing soft glowing
editing-software interfaces. Studio monitor speakers on stands flank the
back wall. Dark acoustic foam panels are subtly visible on the walls.
A wash of cool [ACCENT_COLOR] light bathes the back wall, contrasting
against the warm practical lights in the foreground.
Lighting: classic cinematic two-color setup, warm tungsten key light from
camera-left (key + practical glow), cool [ACCENT_COLOR] rim and fill from
camera-right and behind, with a soft front fill that keeps the subject's
face well-exposed but contoured. Subtle catchlights in the eyes. Skin
reads natural with visible pores and realistic texture. No plastic
smoothing, no beauty filter.
Camera: shot on a full-frame cinema camera with an 85mm portrait lens at
f/2.0. Subject sharp, mic slightly out of focus in foreground, background
creamy bokeh. Eye-level framing, slight 3/4 angle to camera.
[ASPECT_RATIO] aspect ratio.
Style: photoreal, sharp detail, cinematic teal-and-amber color grade with
restraint, subtle natural film grain, no stylization, no AI smoothing, no
glamour lighting, no makeup filter. Reads like a real photograph taken in
a working creator studio at night.
Negative: no logos on clothing, no text overlays, no watermarks, no
plastic skin, no AI artifacts, no extra fingers, no warped hands, no
oversaturation, no glow, no halo edges, no stylized illustration look.
Generate a few. Pick the one where the eyes are sharpest and the lighting matches what you'd actually film in.

Step 2. Generate your avatar in GPT Image 2 or Nano Banana
Open HeyGen, upload your avatar image, upload your audio file.
Critical for 2026: select the latest avatar model. The one with hand gestures and natural body motion. This is the upgrade that pulled the output out of uncanny-valley territory and into "could pass for real on a phone screen." If you used HeyGen a year ago and dismissed it, this is the model that changes your mind.
Hit generate. Wait. You now have a talking-head video of your AI avatar saying your script.

Step 4. Upscale through Topaz Video
Most people stop at step 3.
That is fine for starters…but…
If you want the highest quality AI Avatar run your HeyGen output through Topaz Video AI. Upscale to 4K or at minimum to 1080p at 60fps. The difference is dramatic. HeyGen output looks fine. Topaz output looks shot.
The 60fps step alone is the single biggest "this feels real" upgrade you can make. AI-generated motion at 24-30fps reads as AI-generated motion. At 60fps, your brain stops flagging it.

Pro tip: If you decide to invest in it, don’t just use it for your avatar. It’s like a get out of jail free card when dealing with low resolution/frame rate footage which will make your videos look higher quality.
Variant: Clone yourself instead
If you want your real face in the video instead of an AI-generated one, skip the Image step entirely.
HeyGen has a "real avatar" feature where you record yourself once, on camera, reading their training script. Takes about 5 minutes. From then on, you can generate unlimited videos of you saying anything, in any language, without ever filming again.
The rest of the flow stays the same. Script in ElevenLabs (or use HeyGen's voice clone of your real voice), generate the video, run it through Topaz.
This is the route most established creators are migrating to. It keeps your face and presence on camera while killing the recording bottleneck.

Faceless got me from zero to 270k followers.
The clone route is what most full-time creators are running on now.
Either way the bottleneck disappears. Recording isn't the constraint anymore, and your output rate stops being capped by how much you feel like sitting in front of a camera this week.
Catch ya in the next one!
Best,
Melvin

Melvin Hagström
[email protected] | @ai.volve.news

