Best for conversion
Chapter structure maps to the spine index automatically. Text flows cleanly without page-break artefacts. Footnotes are separated from body text. Start here if you have it.
From EPUB, DOCX or PDF to a retail-ready audiobook on Audible, Apple Books and Spotify. Every step, every cost, how to choose between AI and human narration, and how to keep your rights at every stage.
June 29, 2026 · 11 min read
Most self-published authors already have what they need to enter the audio market: the finished manuscript. The conversion gap is not the writing - it is the production. A decade ago, closing that gap required booking studio time or hiring a narrator at $300 per finished hour. The options have changed significantly, and the cost barrier has dropped enough that skipping audio no longer makes financial sense for most titles.
Audiobook listeners are loyal buyers who pay full price and rarely return titles. If your eBook is already listed on Amazon, adding an audio edition means the same reader can buy the same story twice - and a different buyer, one who prefers audio, can find you for the first time. This guide covers the full production process so you can make that happen without a studio booking or an audio engineering background.
Not all source formats are equal when it comes to extracting clean, narration-ready text. Choosing the right source file saves significant cleanup time downstream.
Chapter structure maps to the spine index automatically. Text flows cleanly without page-break artefacts. Footnotes are separated from body text. Start here if you have it.
Word documents convert reliably when Heading 1 and Heading 2 styles mark the chapter breaks. Inconsistent heading styles are the main failure mode - check them before uploading.
PDF extraction merges page headers, footers, running titles and body text into one stream. Column layouts break into fragments. Use PDF only if you have no other option, and plan to review the extracted text carefully.
If your publishing workflow ends at PDF, go back one step: most word processors can export DOCX, and most ebook tools (Vellum, Atticus, Sigil) can export EPUB. The extra two minutes of export time will save hours of text correction later.
Whatever narration method you choose, the production process follows the same sequence. Understanding each step tells you where the time and cost actually go - and where automation changes the equation.
Clean text narrates well; unclean text causes mispronunciations and pace problems. Fix chapter headings, remove running headers and page numbers, expand abbreviations (Dr. to Doctor, St. to Street or Saint depending on context), and write a pronunciation guide for character names, invented terms and foreign words. A 30-minute prep pass here pays back in every chapter.
Three options: record yourself, hire a professional narrator through ACX or Findaway Voices, or use AI audiobook software. Cost, time and quality differ substantially. The next section covers this comparison in detail. Pick your method before starting production - switching midway means restarting from scratch.
Whether you are recording in a booth or generating with AI, each chapter is produced as a separate audio file. For self-recording, aim for one chapter per session to keep your voice consistent. For AI production, chapter detection handles the split automatically from the manuscript structure.
ACX requires MP3 files at 192 kbps, mono or stereo, with RMS loudness between -23 and -18 dBFS and a peak ceiling of -3 dBTP. At least 5 seconds of room tone at the head and tail of each file. Apple Books and Spotify have similar but slightly different targets. Mastering software can hit these specs, but the process takes time per chapter if you are doing it manually.
Upload to ACX for Audible, directly to Apple Books via Apple Books for Authors, or use an aggregator like Findaway Voices to reach Spotify, Kobo, Scribd and others simultaneously. Each retailer requires an AI-generated audio disclosure if the narration was AI-produced. Budget two to four weeks for ACX review.
The five steps above are the same whether you spend $129 or $5,000. The difference is how much of the work is automated.
AudioBook Factory handles steps 1 through 4 automatically. Upload your EPUB or DOCX, choose a voice, and get a retail-ready audiobook in under an hour - ACX mastering and AI disclosure included, files yours to keep.
The narration decision is the biggest variable in the conversion process. Here is what the numbers actually look like for a typical 70,000-word novel (roughly eight to ten hours of finished audio):
| Method | Cost per book | Time to retail | You own the files? |
|---|---|---|---|
| Self-recording (DIY) | $200-500 setup + 40-60h work | 6-12 weeks | Yes |
| Human narrator (ACX / Findaway) | $1 600-5 000 | 6-12 weeks | Yes (contract permitting) |
| AI - AudioBook Factory | $129-499 per book | Under 1 hour | Yes, all files |
For authors converting a backlist of five or more titles, monthly plans from $29 bring the per-book cost lower still. The quality argument that once made AI narration a last resort has weakened considerably since 2024. For non-fiction and genre fiction (thriller, romance, sci-fi, fantasy), the leading AI voice engines produce narration that retail listeners regularly cannot distinguish from a mid-tier human narrator. The gap remains most noticeable in literary fiction and memoir, where the author's own vocal performance is part of what the reader expects.
For a detailed look at what separates voice engines in 2026, read our guide to the best AI voice generators for audiobooks. It covers prosody, long-form consistency and multi-voice casting - the three criteria that determine whether an AI voice will hold a listener's attention across a 10-hour book.
One practical note: AI narration and human narration are not mutually exclusive across your catalog. Many authors use AI for backlist conversion - titles that were never commercially narrated - and reserve the budget for a professional narrator on a new frontlist title where they want the highest quality ceiling.
PDF extraction tools pull text in visual reading order, not logical document order. Multi-column layouts, text boxes and decorative chapter headers all end up inline with the body text. Before narrating from a PDF extract, read through the first and last chapter carefully - what you see in the PDF and what the extraction tool produces are often different things. Correct the text before generating audio, not after you have 300 chapters of mispronounced passages to redo.
Every manuscript has terms a TTS engine or a narrator from outside your genre will mispronounce by default: character names with non-standard spellings, invented words in fantasy and sci-fi, titles in foreign languages, scientific terminology in non-fiction. A two-page pronunciation guide prepared before production starts costs less time than retakes or regenerations after the fact. List each term with a phonetic spelling or a sound-alike word (for example: "Aelindra - sounds like Belinda with an A").
Some services offering free or subsidised AI narration include a clause that grants the platform rights to the resulting audio, restricts distribution to their own store, or requires exclusivity for a period of years. Read the output ownership clause before you upload your manuscript. "Free" narration that locks you into one retailer can cost more in long-term revenue than paying $129 upfront for files you own outright and can sell anywhere.
Your rights situation after conversion has two distinct parts: who owns the audio files produced, and what distribution terms you accept at the retail step. They are separate decisions.
Audio ownership is determined by your production tool. With AudioBook Factory, you own all output files. There is no claim on the audio, no exclusivity baked into the production contract, and no restriction on which retailers you use. This is the same expectation you would have from hiring a studio: you pay for production, you own the result.
Distribution exclusivity is a separate decision you make when you upload to retailers. ACX (Audible) offers two options: exclusive (40% royalty, your audiobook only appears on Audible and Amazon) or non-exclusive (25% royalty, you can also sell on Apple Books, Spotify, Kobo and elsewhere). That exclusivity decision is yours to make at the distribution step and is not a consequence of which production tool you used.
One practical note: the ACX exclusive agreement runs for seven years from the date of the first sale. If your book sells at all on Audible, that is a long commitment to one retailer. Many authors with established audiences choose non-exclusive distribution precisely so they can reach buyers on Apple Books and Spotify without waiting seven years for the window to open.
Audible and Amazon (ACX) - The largest audiobook market by volume. Submit through ACX with a 15-minute opening-chapter sample for quality review. Review typically takes two to four weeks. ACX requires the AI-generated audio disclosure if applicable.
Apple Books - The second-largest market, with a strong presence in the US, UK and Australia. Submit through Apple Books for Authors directly, or via an aggregator. Apple offers its own built-in AI narration tool as an alternative to producing independently, but that option grants Apple rights to the resulting audio.
Spotify and Amazon Music - Both platforms carry audiobooks in the US as of 2025. Reach both through an aggregator such as Findaway Voices, or through AudioBook Factory's distribution chain.
Direct sales - Selling from your own website via Payhip, Gumroad or your own store keeps 95% or more of the revenue and builds a direct customer relationship. It works best when you already have an audience or an email list driving traffic.
Podcast distribution - Publishing the opening chapters as a free podcast feed is a proven way to build an audience before the full audiobook goes on sale. AudioBook Factory's AI podcast generator creates a podcast feed from the same production run that generates the audiobook, so both outputs come from a single workflow.
EPUB is the cleanest format: chapter structure is detected automatically and text flows without layout artefacts. DOCX works well when Heading 1 and Heading 2 styles mark the chapter breaks. PDF is the least reliable because extraction merges page headers, footers and body text into the same stream. Use EPUB or DOCX whenever you have the choice - if you only have a PDF, export from your original word processor or ebook tool first.
AI production via AudioBook Factory starts at $129 per book for Studio voice and $499 for Premium actor-grade narration. Human narrators charge $200-500 per finished hour; a typical 70,000-word novel runs $1,600-5,000 for narration alone. Self-recording costs $200-500 in equipment plus four to six hours of work per finished hour of audio. Monthly plans from $29 per month are available for authors producing multiple titles.
AI software like AudioBook Factory converts a 50,000-word manuscript to a retail-ready audiobook in under an hour. Human narrators take four to eight weeks from booking through production and approval. Self-recording a 50,000-word novel requires a minimum of 40-60 hours of recording and editing time, plus mastering.
Yes. ACX has accepted AI-narrated audiobooks since 2024. You must declare in the upload form that the audio was generated with AI. AudioBook Factory includes the required disclosure language in file metadata and in the upload checklist delivered with every finished book - so you do not have to write the disclosure text yourself.
Audio ownership depends on your production tool. With AudioBook Factory you own all output files with no exclusivity requirements. Distribution exclusivity - such as the ACX exclusive agreement - is a separate decision you make at the retail step and is not tied to which production tool you used. Read the terms of any free narration service carefully: some require you to grant the platform rights to the resulting audio as part of the offer.
Ready to take your manuscript from eBook to a published audiobook on Audible and Apple Books?
Studio voice from $129 per book. ACX mastering included. You own all files - every retailer, no exclusivity required.
Be first when your studio opens. No spam - just your invite.
We will email you the moment your studio opens.