Content
Pipeline
script · voice · avatar · render · publish — 5 AI services, 1 command
The pipeline in detail
Click a stage to jump straight to it — or scroll through all five.
Script Generation
An OpenClaw instance runs around the clock — a 24/7 agent runtime that keeps a Claude-based agent permanently active. It collects information, tracks trends, and builds context continuously. When a new video is needed, it doesn't start from scratch — it draws on accumulated knowledge to generate scripts that are informed, structured, and ready for production.
No manual research. No blank-page problem. The agent works in the background so the pipeline starts with substance.
// OpenClaw agent — 24/7 autonomous runtime
const script = await openClaw.agent.generate({
task: "write-video-script",
context: accumulatedKnowledge,
params: {
topic,
tone: "professional, direct",
structure: "hook → problem → solution → CTA",
duration: "60s",
},
});
// The agent has been collecting context for weeks:
// - Industry trends from RSS feeds
// - Competitor content analysis
// - Performance data from previous videos
// Output: structured script with timing markersVoice Synthesis
ElevenLabs converts each script into a consistent voice track. Same voice across all videos — recognizable, professional, multilingual. The API integration is fully automated: script text in, MP3 out.
The integration handles voice configuration, pacing control, and format optimization. Each script is processed with consistent voice settings — same speaker, same tone, every time.
Avatar Generation
This is the magic of the pipeline: a single photo is enough. The AI analyzes the facial structure, generates matching mouth movements in sync with the audio track, and produces a complete video — no motion capture, no green screen, no video shoot.
Depending on the requirement, different engines come in — from open-source solutions for local processing to cloud APIs for production-grade quality.
Remotion Studio
Toggle layers to see how the composition builds up. Switch templates to see different video formats.
Distribution
The last mile is automated too. Once the render completes, the pipeline hands off directly to the YouTube Data API — no manual intervention from script to published video.
Thumbnail generation extracts key frames from the rendered video, composites the episode title and channel branding, and exports an optimised 1280×720 JPEG ready for upload.
SEO metadata — title, description, and tags — is derived directly from the script context that was generated in Phase 1, keeping keyword intent consistent across every asset.
Scheduling lets each video be queued for a specific publish time so releases hit peak audience windows without any manual calendar work.
Analytics feedback closes the loop: view velocity, CTR, and watch-time are periodically pulled from the YouTube Analytics API and fed back into the script agent as context for the next episode.
Videos Produced
AI Services
Avatar Engines
Manual Steps
What I Built
A fully automated content factory. Five AI services chained into a single pipeline: Claude writes the scripts, ElevenLabs generates the voice, avatar engines produce lip-synced video, Remotion composites everything with React components, and YouTube API handles the upload. One command. Zero manual steps. 40+ videos produced.
The key insight: video production is a pipeline problem, not a creativity problem. Once each stage is automated and composable, scaling content becomes a function of configuration — not effort.
// Remotion composition — video as React components
export const VideoComposition: React.FC<VideoProps> = ({
script, voiceTrack, avatarVideo, template
}) => {
return (
<Composition
id={template.id}
component={VideoTemplate}
durationInFrames={template.frames}
fps={30}
width={1920}
height={1080}
defaultProps={{
script,
voiceTrack,
avatarVideo,
layers: template.layers,
}}
/>
);
};Key Decisions
- →Remotion over traditional editors
Programmatic rendering. Videos are code — versionable, templatable, reproducible. No timeline dragging. Change a prop, re-render.
- →Multiple avatar engines
Different tools for different jobs. SadTalker for batch processing, LivePortrait for quick iterations, HeyGen for production-grade output. The pipeline picks the right one.
- →OpenClaw for scripts
Permanent knowledge, not one-shot prompts. The 24/7 agent runtime accumulates context over time — industry trends, competitor analysis, past performance. Scripts get better as the system learns.
- →YouTube API over manual upload
Automating the last mile matters. Thumbnail generation, SEO metadata, tags — all derived from the script context. No final manual step to break the chain.