Auto vs designer generated ebook conversion

Or why it takes more than just a quick tickle to make an eBook

Recently, we were commissioned to do a couple of poetry ebook conversions. The sensible route to take was to use reflowable, rather than a fixed layout. This blog post has come out of wrangling with Indesign and the fairly horrendous CSS and XHTML it spits out. It’s fair to say that you can get an ebook export out of Indesign, and it may even validate. However there’s a good chance that the moment Amazon or ePub standards change, it could break. And the messier it is, the harder it will be to fix.

Of course, you may wonder ‘why use Indesign?‘ Short answer – that’s the file format we were given. Of course we don’t get every conversion request as INDD files. We’ve also had Scrivener, PDF and (Noodley appendage help us) MS Word. However, we run them all through Indesign to get the .opf and toc files, along with the directory structure. Then we set about trashing most of it and replacing it with nice, clean XHTML.

Although it is true that a novel is much less faff than a poetry book. It is also true that you can expect it to take a minimum of a day or two for us to clean up the cruft, test, fine-tune and validate the CSS/XHTML it takes to put an ebook together. Hopefully this post will go some way to explaining why.

To begin with, I’ll look at an export with no attention paid to removing cruft at the Indesign stage. And what we can do to cut some of that rubbish out.

The Paragraph Overrides, they do nothing!

Indesign epub export options tagging dialogue box
Above we have the Paragraph Style options.

This dialogue is not part of the export process, but is vital to make sure you don’t get a horrendous slew of paragraph overrides in the XHTML. Removing these could mean you lose some styling, however the trade-off is a lot less work doing find/replace and getting rid of them individually.

Why individually? Because if you do find/replace all you can only do it on the first half of the span mark-up, and then you’re left a mass of </span> closing tags, and a big ol’ mess of errors preventing you from previewing the page.

As you can see the options are somewhat limited. You can have <p> tags or header tags. That’s it. That’s all you get. My advice? Ignore this, let it stay as automatic because you’ll be stripping this shit out somewhere along the line anyway.

The Reflowable layout Export Options dialogue

Indesign epub export options generate css dialogue box
Not much to it really is there?

I’m not going to cover all of them, as my main concern here is the CSS and XHTML that Indesign spits out. I’ve found that on average I throw away almost all of the CSS and start from scratch. The XHTML can be cleaned up, but it can be time-consuming, particularly in a more complex book. A novel is a much simpler task than a manual (potentially a lot of tables and table styles) or a poetry book. Poets will quite often utilise white space for structure, and that does require span styles. Those span styles are best created in the CSS.

When I export, I uncheck both “Generate CSS” and “Preserve local overrides”. For reasons which I will show, as well as tell.

“Preserve local overrides” left checked

Toggle the code below to see an example of CSS that has been generated with overrides intact. It’s not pretty. If the code doesn’t show straight away, tap/click inside the dark bar.

…. this is just a sample. There was a whooole lot more

&lt;div class="wp-block-codemirror-blocks-code-block code-block"&gt;&lt;pre&gt;p.ParaOverride-1 {
p.ParaOverride-2 {
p.ParaOverride-3 {
p.ParaOverride-4 {
p.ParaOverride-5 {
p.ParaOverride-6 {
p.ParaOverride-7 {
p.ParaOverride-8 {
p.ParaOverride-9 {
p.ParaOverride-10 {
p.ParaOverride-11 {
p.ParaOverride-12 {
p.ParaOverride-13 {
p.ParaOverride-14 {
p.ParaOverride-15 {
p.ParaOverride-16 {
p.ParaOverride-17 {
p.ParaOverride-18 {
p.ParaOverride-19 {
p.ParaOverride-20 {
p.ParaOverride-21 {margin-left:11px;
p.ParaOverride-22 {
p.ParaOverride-23 {
p.ParaOverride-24 {
p.ParaOverride-25 {&lt;/pre&gt;&lt;/div&gt;

Even if you strip everything out at the export stage, you still end up with this in many of the paragraphs (lorem substituted for the original content for copyright reasons) :

View generated paragraphs

<div id="_idContainer009" class="Basic-Text-Frame">
<p class="Basic-Paragraph"><span xml:lang="en-GB">Lorem ipsum dolor sit amet,</span></p>
<p class="Basic-Paragraph"><span xml:lang="en-GB"> consectetur adipiscing elit.</span></p>
<p class="Basic-Paragraph"><span xml:lang="en-GB">Nulla ut augue in dolor pellentesque</span></p>
<p class="Basic-Paragraph"><span xml:lang="en-GB"> egestas sit amet egestas velit.</span></p>
<p class="Basic-Paragraph"><span xml:lang="en-GB">Vivamus posuere nisl a justo lacinia ultricies'</span></p>
<p class="Basic-Paragraph"><span xml:lang="en-GB">Integer nec ante id justo</span></p>
<p class="Basic-Paragraph"><span xml:lang="en-GB">pretium finibus eu eu leo</span></p>
<p class="Basic-Paragraph"><span xml:lang="en-GB">Cras iaculis tellus ac ex rutrum dignissim.</span></p>
<p class="Basic-Paragraph"><span xml:lang="en-GB">Donec at leo sit amet leo feugiat</span></p>
<p class="Basic-Paragraph"><span xml:lang="en-GB"> dapibus sed volutpat dui.</span></p>
<p class="Basic-Paragraph"><span xml:lang="en-GB">Sed iaculis est nec mi auctor,</span></p>
<p class="Basic-Paragraph"><span xml:lang="en-GB">ac semper ex posuere.</span></p>

This is the same section of text after it’s been cleaned up

View clean paragraph tags

<div id="_idContainer009" class="Basic-Text-Frame">
<p>Lorem ipsum dolor sit amet,</p>
<p> consectetur adipiscing elit.</p>
<p>Nulla ut augue in dolor pellentesque</p>
<p> egestas sit amet egestas velit.</p>
<p>Vivamus posuere nisl a justo lacinia ultricies'</p>
<p>Integer nec ante id justo</p>
<p>pretium finibus eu eu leo</p>
<p>Cras iaculis tellus ac ex rutrum dignissim.</p>
<p>Donec at leo sit amet leo feugiat</p>
<p> dapibus sed volutpat dui.</p>
<p>Sed iaculis est nec mi auctor,</p>
<p>ac semper ex posuere.</p>

A note about the above: The <span xml:lang=”en-GB”> does not need to be there as a span class. We usually put this in the content.opf so it’s only needed the once. It could also be put in as a body class on each page.

Indesign Generated div boxes

The following seems to be unavoidable. These ids are generated for each ‘page’ in a print book. The XHTML export from Indesign is one single, very long page. If we are converting a novel we split this up as a page for each chapter. In the poetry books we used a page for each poem which gave us natural page breaks rather than forcing a page break with CSS. Toggle to view:

View generated styles for div boxes

#_idContainer004 {
-webkit-transform:translate(0.000px,0.000px) rotate(0.000deg) skew(0.000deg) scale(1.000,1.000);
-webkit-transform-origin:0% 0%;
transform:translate(0.000px,0.000px) rotate(0.000deg) skew(0.000deg) scale(1.000,1.000);
transform-origin:0% 0%;
#_idContainer005 {
-webkit-transform:translate(7.157px,6.012px) rotate(0.000deg) skew(0.000deg) scale(1.000,1.000);
-webkit-transform-origin:0% 0%;
transform:translate(7.157px,6.012px) rotate(0.000deg) skew(0.000deg) scale(1.000,1.000);
transform-origin:0% 0%;
#_idContainer006 {
-webkit-transform:translate(24.857px,18.459px) rotate(0.000deg) skew(0.000deg) scale(1.000,1.000);
-webkit-transform-origin:0% 0%;
transform:translate(24.857px,18.459px) rotate(0.000deg) skew(0.000deg) scale(1.000,1.000);
transform-origin:0% 0%;
#_idContainer007 {

The following is part of a poem with a standard export from Indesign, with overrides intact. The poetry has been replaced with lorem.

View Indesign generated XHTML

<div id="_idContainer009" class="Basic-Text-Frame">
<p class="Poem-text ParaOverride-4"><span class="CharOverride-6">Lorem ipsum dolor sit amet,</span></p>
<p class="Poem-text ParaOverride-4"><span class="CharOverride-6">consectetur adipiscing elit.</span></p>
<p class="Poem-text ParaOverride-4"><span class="CharOverride-6">Nulla ut augue in dolor pellentesque</span></p>
<p class="Poem-text ParaOverride-4"><span class="CharOverride-6"> egestas sit amet egestas velit.</span></p>
<p class="Poem-text ParaOverride-4"><span class="CharOverride-6">Vivamus posuere nisl a justo lacinia ultricies'</span></p>
<p class="Poem-text ParaOverride-4"><span class="CharOverride-6">Integer nec ante id justo</span></p>
<p class="Poem-text ParaOverride-4"><span class="CharOverride-6">pretium finibus eu eu leo</span></p>
<p class="Poem-text ParaOverride-4"><span class="CharOverride-6">Cras iaculis tellus ac ex rutrum dignissim.</span></p>
<p class="Poem-text ParaOverride-4"><span class="CharOverride-6">Donec at leo sit amet leo feugiat</span></p>
<p class="Poem-text ParaOverride-4"><span class="CharOverride-6"> dapibus sed volutpat dui.</span></p>
<p class="Poem-text ParaOverride-4"><span class="CharOverride-6">Sed iaculis est nec mi auctor,</span></p>
<p class="Poem-text ParaOverride-4"><span class="CharOverride-6">ac semper ex posuere.</span></p>

Cleaner, better, brighter!

This is the cleaned-up version.

As you can see, in this case I opted for div classes. I felt that as it was poetry, paragraph styles would not be semantically correct as poetry is not created as paragraphs. Also in “div vs p”, div comes out on top for me as it has more scope for styling. The “poemLine” class allows for a resolution-dictated line break that indents to keep the flow of a line intact before moving on to the next. If this was a novel, a simple <p> </p> would suffice.

Final formatting

<div id="poem002">
<h3>Poet Name</h3>
<h4>Poem Title</h4>
<div class="stanza">
<div class="poemLine">Lorem ipsum dolor sit amet,</div>
<div class="poemLine">consectetur adipiscing elit.</div>
<div class="poemLine">Nulla ut augue in dolor pellentesque</div>
<div class="poemLine">egestas sit amet egestas velit.</div>
<div class="poemLine">Vivamus posuere nisl a justo lacinia ultricies'</div>
<div class="poemLine">Integer nec ante id justo</div>
<div class="poemLine">pretium finibus eu eu leo</div>
<div class="poemLine">Cras iaculis tellus ac ex rutrum dignissim.</div>
<div class="poemLine">Donec at leo sit amet leo feugiat</div>
<div class="poemLine">dapibus sed volutpat dui.</div>
<div class="poemLine">Sed iaculis est nec mi auctor,</div>
<div class="poemLine">ac semper ex posuere.</div>
</div><!-- .stanza -->
</div><!-- #poem002 -->

As you can see, there’s a marked difference. For comparison:

  • Total lines of CSS with local overrides left in : 1,734
  • Total lines of CSS with local overrides styles stripped out : 1,463
  • Total lines of CSS with all overrides stripped out, id box styles removed and all CSS rewritten to be more efficient : 202
eBook Conversion CSS Comparison

That comes to a saving of around 1,500 lines of CSS.

Obviously, some of that can be saved in the export, but not all of it. A large amount of it is down to clean-up and ‘hand-coding’ the CSS and XHTML. Which brings me to my last point. We are professionals, we take a lot pride in our work and strive to give clients the best possible outcome. We leave it tidy so that someone else can edit it later if required.

There are cheap or even free apps that can do a lot of this for you, but will most likely generate the kind of cruft shown above, and will probably not be future proof. If you want a conversion that is light on mark-up, will follow your printed design as closely as possible and validate impeccably… drop that copy of Calibre and get in touch.

Leave a comment

Ninja Beaver Head
Illustrators Agency logo

To source illustration through an agent, Sarah can be hired through the illustrators agency