An OpenClaw instance runs around the clock — a 24/7 agent runtime that keeps a Claude-based agent permanently active. It collects information, tracks trends, and builds context continuously. When a new video is needed, it doesn't start from scratch — it draws on accumulated knowledge to generate scripts that are informed, structured, and ready for production.

No manual research. No blank-page problem. The agent works in the background so the pipeline starts with substance.

pipeline/generate-script.ts

// OpenClaw agent — 24/7 autonomous runtime
const script = await openClaw.agent.generate({
  task: "write-video-script",
  context: accumulatedKnowledge,
  params: {
    topic,
    tone: "professional, direct",
    structure: "hook → problem → solution → CTA",
    duration: "60s",
  },
});

// The agent has been collecting context for weeks:
// - Industry trends from RSS feeds
// - Competitor content analysis
// - Performance data from previous videos
// Output: structured script with timing markers

Voice Synthesis

ElevenLabs converts each script into a consistent voice track. Same voice across all videos — recognizable, professional, multilingual. The API integration is fully automated: script text in, MP3 out.

The integration handles voice configuration, pacing control, and format optimization. Each script is processed with consistent voice settings — same speaker, same tone, every time.

pipeline_voice.mp3ElevenLabs0:00 / 0:00

"One script. Five AI services. Zero manual steps. From idea to YouTube in a single command."

Avatar Generation

This is the magic of the pipeline: a single photo is enough. The AI analyzes the facial structure, generates matching mouth movements in sync with the audio track, and produces a complete video — no motion capture, no green screen, no video shoot.

Depending on the requirement, different engines come in — from open-source solutions for local processing to cloud APIs for production-grade quality.

Referenzbild

Audio-Track

GPU lokal

Head Motion + Lip Sync

Input

1 photo + audio

Output

Head motion + lip sync

Wie es funktioniert

Audio-driven animation — the audio drives head and mouth movement. Runs on your own GPU, no cloud service needed.

Remotion Studio

Toggle layers to see how the composition builds up. Switch templates to see different video formats.

REMOTION STUDIO

Layers

Templates

00:00:32

DID YOU KNOW?

AI produces 40+ videos autonomously

@luccafaust

Follow for more

EP. 12

Timeline

Hook

Avatar

CTA

Outro

Avatar Video

Overlays

Lower Third

Intro / Outro

Audio Track

Ready

React 19 + Remotion 41920×1080 · 30fps5 layers active

Distribution

The last mile is automated too. Once the render completes, the pipeline hands off directly to the YouTube Data API — no manual intervention from script to published video.

Thumbnail generation extracts key frames from the rendered video, composites the episode title and channel branding, and exports an optimised 1280×720 JPEG ready for upload.

SEO metadata — title, description, and tags — is derived directly from the script context that was generated in Phase 1, keeping keyword intent consistent across every asset.

Scheduling lets each video be queued for a specific publish time so releases hit peak audience windows without any manual calendar work.

Analytics feedback closes the loop: view velocity, CTR, and watch-time are periodically pulled from the YouTube Analytics API and fed back into the script agent as context for the next episode.

Remotion

React

ElevenLabs

SadTalker

LivePortrait

HeyGen

Claude

YouTube API

Node.js

Videos Produced

AI Services

Avatar Engines

Manual Steps

What I Built

A fully automated content factory. Five AI services chained into a single pipeline: Claude writes the scripts, ElevenLabs generates the voice, avatar engines produce lip-synced video, Remotion composites everything with React components, and YouTube API handles the upload. One command. Zero manual steps. 40+ videos produced.

The key insight: video production is a pipeline problem, not a creativity problem. Once each stage is automated and composable, scaling content becomes a function of configuration — not effort.

pipeline/render.tsx

// Remotion composition — video as React components
export const VideoComposition: React.FC<VideoProps> = ({
  script, voiceTrack, avatarVideo, template
}) => {
  return (
    <Composition
      id={template.id}
      component={VideoTemplate}
      durationInFrames={template.frames}
      fps={30}
      width={1920}
      height={1080}
      defaultProps={{
        script,
        voiceTrack,
        avatarVideo,
        layers: template.layers,
      }}
    />
  );
};

Key Decisions

→
Remotion over traditional editors
Programmatic rendering. Videos are code — versionable, templatable, reproducible. No timeline dragging. Change a prop, re-render.
→
Multiple avatar engines
Different tools for different jobs. SadTalker for batch processing, LivePortrait for quick iterations, HeyGen for production-grade output. The pipeline picks the right one.
→
OpenClaw for scripts
Permanent knowledge, not one-shot prompts. The 24/7 agent runtime accumulates context over time — industry trends, competitor analysis, past performance. Scripts get better as the system learns.
→
YouTube API over manual upload
Automating the last mile matters. Thumbnail generation, SEO metadata, tags — all derived from the script context. No final manual step to break the chain.