Content Engine · Deep Dive

Content
Pipeline

script · voice · avatar · render · publish — 5 AI services, 1 command

Klickbarjede Stage springt zum Detail
40+
Videos produziert
5
KI-Services
0
Manuelle Steps
The flow

The pipeline in detail

Click a stage to jump straight to it — or scroll through all five.

Autonomous Agent
Voice Synthesis
Visual Generation
Video Compositing
Distribution

Script Generation

An OpenClaw instance runs around the clock — a 24/7 agent runtime that keeps a Claude-based agent permanently active. It collects information, tracks trends, and builds context continuously. When a new video is needed, it doesn't start from scratch — it draws on accumulated knowledge to generate scripts that are informed, structured, and ready for production.

No manual research. No blank-page problem. The agent works in the background so the pipeline starts with substance.

pipeline/generate-script.ts
// OpenClaw agent — 24/7 autonomous runtime
const script = await openClaw.agent.generate({
  task: "write-video-script",
  context: accumulatedKnowledge,
  params: {
    topic,
    tone: "professional, direct",
    structure: "hook → problem → solution → CTA",
    duration: "60s",
  },
});

// The agent has been collecting context for weeks:
// - Industry trends from RSS feeds
// - Competitor content analysis
// - Performance data from previous videos
// Output: structured script with timing markers

Voice Synthesis

ElevenLabs converts each script into a consistent voice track. Same voice across all videos — recognizable, professional, multilingual. The API integration is fully automated: script text in, MP3 out.

The integration handles voice configuration, pacing control, and format optimization. Each script is processed with consistent voice settings — same speaker, same tone, every time.

pipeline_voice.mp3ElevenLabs0:00 / 0:00
"One script. Five AI services. Zero manual steps. From idea to YouTube in a single command."

Avatar Generation

This is the magic of the pipeline: a single photo is enough. The AI analyzes the facial structure, generates matching mouth movements in sync with the audio track, and produces a complete video — no motion capture, no green screen, no video shoot.

Depending on the requirement, different engines come in — from open-source solutions for local processing to cloud APIs for production-grade quality.

Referenzbild
Audio-Track
GPU lokal
Head Motion + Lip Sync
Input
1 photo + audio
Output
Head motion + lip sync
Wie es funktioniert
Audio-driven animation — the audio drives head and mouth movement. Runs on your own GPU, no cloud service needed.

Remotion Studio

Toggle layers to see how the composition builds up. Switch templates to see different video formats.

REMOTION STUDIO
Layers
Templates
00:00:32
DID YOU KNOW?
AI produces 40+ videos autonomously
@luccafaust
Follow for more
EP. 12
Timeline
Hook
Avatar
CTA
Outro
Avatar Video
Overlays
Lower Third
Intro / Outro
Audio Track
Ready
React 19 + Remotion 41920×1080 · 30fps5 layers active

Distribution

The last mile is automated too. Once the render completes, the pipeline hands off directly to the YouTube Data API — no manual intervention from script to published video.

Thumbnail generation extracts key frames from the rendered video, composites the episode title and channel branding, and exports an optimised 1280×720 JPEG ready for upload.

SEO metadata — title, description, and tags — is derived directly from the script context that was generated in Phase 1, keeping keyword intent consistent across every asset.

Scheduling lets each video be queued for a specific publish time so releases hit peak audience windows without any manual calendar work.

Analytics feedback closes the loop: view velocity, CTR, and watch-time are periodically pulled from the YouTube Analytics API and fed back into the script agent as context for the next episode.

Remotion
React
ElevenLabs
SadTalker
LivePortrait
HeyGen
Claude
YouTube API
Node.js
0+

Videos Produced

0

AI Services

0

Avatar Engines

0

Manual Steps

What I Built

A fully automated content factory. Five AI services chained into a single pipeline: Claude writes the scripts, ElevenLabs generates the voice, avatar engines produce lip-synced video, Remotion composites everything with React components, and YouTube API handles the upload. One command. Zero manual steps. 40+ videos produced.

The key insight: video production is a pipeline problem, not a creativity problem. Once each stage is automated and composable, scaling content becomes a function of configuration — not effort.

pipeline/render.tsx
// Remotion composition — video as React components
export const VideoComposition: React.FC<VideoProps> = ({
  script, voiceTrack, avatarVideo, template
}) => {
  return (
    <Composition
      id={template.id}
      component={VideoTemplate}
      durationInFrames={template.frames}
      fps={30}
      width={1920}
      height={1080}
      defaultProps={{
        script,
        voiceTrack,
        avatarVideo,
        layers: template.layers,
      }}
    />
  );
};

Key Decisions

  • Remotion over traditional editors

    Programmatic rendering. Videos are code — versionable, templatable, reproducible. No timeline dragging. Change a prop, re-render.

  • Multiple avatar engines

    Different tools for different jobs. SadTalker for batch processing, LivePortrait for quick iterations, HeyGen for production-grade output. The pipeline picks the right one.

  • OpenClaw for scripts

    Permanent knowledge, not one-shot prompts. The 24/7 agent runtime accumulates context over time — industry trends, competitor analysis, past performance. Scripts get better as the system learns.

  • YouTube API over manual upload

    Automating the last mile matters. Thumbnail generation, SEO metadata, tags — all derived from the script context. No final manual step to break the chain.