The 7 Steps to Launch a Global Podcast — Powered Almost Entirely by AI
No Studio. No Team. No Problem. How One Creator Launched a Multilingual Kids Podcast with Just AI—and a $100 Budget
Want to create a global podcast without a studio, voice actors, or editing team?
In just 7 weeks, David Cross launched 16 podcast episodes — in 7 languages. All powered by AI.
The cost of creating the first 2 episodes was only $82
Below are the 7 steps he followed to create Kids Chatterbox.
If you want to take a deep dive into how he did it, check out David’s “How I Create a Wholly AI-Generated Children’s Story Podcast and YouTube Channel”. David was also kind enough to record this shorter summary.
Each step below includes tools used, time spent, and a shortcut to “Watch How David Did It”.

1. Write a Brand & Story Guide
Tool: Google Docs
Time: 1–2 hours
🎥 Watch How David Did It → 5:48
Before touching any AI tools, David created a 2–3 page document defining the voice, values, tone, character archetypes, and story structure of the podcast.
This acted as the North Star for everything that followed.
"Treat it like you're briefing a creative agency."
This human-crafted foundation ensured that the AI-generated content stayed true to the spirit of the show.
Here’s part of the brand guide (it’s 3 pages long):
2. Build a Story Prompt for AI
Tool: ChatGPT (GPT-4.5) and later a Custom GPT
Time: 30–45 minutes upfront, then reused
🎥 Watch How David Did It → 8:01
David translated his brand guide into a rich, reusable AI prompt.
It defined word count (1,000 words), tone (playful, gentle), reading speed (100 WPM), and asked the AI to confirm understanding before proceeding.
Here’s an example prompt he put into ChatGPT:
Here’s a snippet of what ChatGPT gave him back:
"The key to getting quality from AI is to spend more time on the prompt than the output."
He later wrapped this into a Custom GPT in ChatGPT Pro to scale story generation.
3. Generate & Edit the Script
Tools: ChatGPT (Custom GPT) + Hemingway App
Time: ~10 minutes to generate; ~20 minutes to edit per story
🎥 Watch How David Did It → 10:52
Using his Custom GPT, David generated each story draft. He then edited it in Hemingway App (a free app) to ensure readability, aiming for a Flesch-Kincaid score below 8.5 — the sweet spot for accessibility:
He read each script aloud to ensure it passed the "bar stool test" (engaging, not boring).
"You’ll know it’s working when you hear the story in your head and it feels like someone’s talking with you — not at you."
4. Convert Script to Voiceover
Tool: ElevenLabs Studio
Time: ~10 minutes per story, plus tweaking
🎥 Watch How David Did It → 12:08
David tested all major Text to Speech (TTS) systems and chose ElevenLabs for its humanlike emotion and multilingual capabilities.
He pasted each story into ElevenLabs Studio, chose a warm voice, and generated narration.
When needed, he adjusted phrasing or punctuation to fine-tune tone and pacing.
"Don’t be afraid to tweak your script to guide the AI voice. A well-placed em dash or ellipsis can fix an awkward pause."
5. Add Music & Visuals
Tools: Suno AI (music), Midjourney (images)
Time: ~15 minutes per story
🎥 Watch How David Did It → 15:03
David used Suno AI to create a 15 second music track for the intro and end of the episodes (he uses same music each episode).
…and Midjourney created picture book-style visuals:
The result: a podcast that helps kids wind down, not rev up.
"We deliberately avoided AI video to help kids settle, not spin up."
6. Compile and Export Episodes
Tools: ProTools (audio), iMovie (video/slideshow)
Time: ~20 minutes per episode
🎥 Watch How David Did It → 16:39
David's son, a professional audio engineer, created a ProTools template with intro/outro fades.
David used iMovie's Ken Burns effect to turn still images into gentle motion video.
The final result was a clean, calming video podcast.
"You don’t need a studio — just a simple rhythm. Voice, music, image. Repeat."
7. Publish and Regionalize
Tools: Spotify, Apple Podcasts, YouTube; ElevenLabs Multilingual + Custom GPT
Time: Initial setup: 1 day; Ongoing: 10 minutes per language per episode
🎥 Watch How David Did It → 18:13
David published episodes on all major platforms and used ElevenLabs’ multilingual voices to translate each episode into Hindi, Spanish, Swedish, French, Arabic, and Portuguese.
He even created a different podcast feed per language.
"No one asked me if it was AI. They just asked, 'Is that your wife's voice?'"
The Finished Product
You can see David’s final Kids Chatterbox episodes in any of these places:
English on Spotify and Apple and YouTube as “Kids Chatterbox”
Spanish on Spotify and Apple and YouTube as “Niños Parlanchines”
Swedish on Spotify and Apple and YouTube as “Barnens Chatterbox”
French on Spotify and Apple and YouTube as “Les petits bavards”
Portuguese on Spotify and Apple and YouTube as “Tagarelinhas”
Sounding Human?
David’s #1 goal was
“…that the stories would need to “feel correct”. They needed to be emotionally engaging content and feel the same as if a human had created them.”
He has asked friends in India, France, Sweden, and local Spanish-speaking friends if their language audio variant sounded accurate enough and moreover, whether it felt right.
The feedback has been positive.
Cost (the $82)
It cost just $82 to produce the first two podcast episodes — and you could probably do it for free if you're just testing ideas.
Here’s how the $82 broke down: ChatGPT ($20), Midjourney ($30), Suno ($10), and ElevenLabs ($22). They’ve since upgraded to the $99 ElevenLabs plan because we do more volume now (especially with regionalized voices).
Across all 17 podcast episodes and 14 videos, they still haven’t spent $500 total. And for music?
You don’t need to spend a dime. A simple setup with free tools like Cakewalk, GarageBand, Reaper, Audacity, Pro Tools Intro, or BandLab Online Studio can get the job done — even just using a basic “fade-out music / voice / fade-in music” flow.
Contacting David
In addition to Kids Chatterbox, David does marketing, consulting and content development. You can reach him at david@davidcross.com.
Final Takeaways
For Content & Media Execs
Creating a new audio property has never been cheaper (hundreds of dollars)
Speed to Market — With tools like ElevenLabs and Midjourney, you can produce pro-quality audio + visuals in under 30 minutes.
Localization through AI is now fast and real (passes the test of locals)
For AI Execs
Creators want control over tone, pacing, and language — not just speed.
Human-Like Wins — David chose ElevenLabs for emotional range — that’s what won his trust.
The tools that offer clean handoffs (script → voice → visuals) will power the next wave of solo creators.
Thanks for reading!
Rob Kelly
Creator & Host of Media & the Machine
p.s.: How to reach me:
If you want to reach me, the best way is to subscribe below for free and reply to my weekly emails.