Chapter Detection/Table of Contents Tutorial
#1  ldolse 04-13-2011, 06:07 AM
Getting Calibre to appropriately detect Chapters and build a Table Of Contents (TOC) sometimes requires some relatively simple examination of your book's html source code. This is required for situations where your source ebook format lacks a TOC and you want to be able to navigate Chapters on your Reader - e.g. the TOC viewer on epub readers, or the Kindle's 5-way controller and inline TOC.

Calibre has default settings for detecting a Table of Contents, and while this will autodetect the TOC for some books, for many it just won't work. You'll need to get your hands dirty and look at html and edit the defaults.

User Manual Links for Conversion:
Structure Detection Settings
Table of Contents Panel
Using Images in Chapter Titles

Before you get started/Common Mistakes
If your ebook already has a well defined TOC then Calibre can convert this without a problem. A common mistake that users make is enabling the 'force use of auto-generated table of contents' under the conversion options for Table of Contents. This will cause Calibre to throw out the existing Table of Contents and require that you laboriously follow this tutorial for every book. Only enable that option if you know your book already has a bad Table of Contents and you follow this tutorial with the intention of having Calibre make a better one.

Kindle/Mobi Specific Issues
The Kindle has two types of TOCs. One is an NCX file similar to an epub TOC. This is used to create the TIC marks on the Kindle's progress bar. The other is a human readable TOC that is linked to from the Table of Contents button/menu. Read here for further details on Kindle TOC support. Calibre will create both types of TOCs during conversion if properly configured. Read this post for a solution to avoid two user visible TOCs or to re-use a book's existing user visible TOC.

Step One, Research your Book
Under the conversion options, go to Search and Replace. Click one of the magic wands on the right half of the screen. If you have multiple source formats Calibre will ask you to choose one - be sure to choose the correct one. Your book's html code will pop up in a new window.

Start scanning through the html code for your chapter headings. You can generally find one quite easily, but if you're having trouble try searching for the plain text that you see when viewing the chapter heading in a ebook reader/web browser.

There are two basic situations you'll run into at this point - the book has clearly defined chapter headings, or it doesn't. There are different ways of handling each case.

Well defined chapter headings:
A well defined chapter heading will typically have code that looks something like this:
<div class="chapter"></div><div> <h3><a name="ch05" id="ch05">5</a> <br /><br/><br /></h3>
<p class="fl1">My nagging got the better o
In this case the chapter heading is just the number '5' In this example, all the book's headings are just numbers like this. When you look through the html code you can see these are wrapped with '<h3>' heading tags:
<h3 class="calibre6"> <a name="ch05" class="calibre9" id="ch05">5</a> <br class="calibre3"/><br class="calibre3"/><br class="calibre3"/>
Other books could use <h1>, <h2>, <h4>, etc - this is why the source code needs to be examined - to figure out what's being used.

There is a box in the structure detection panel of conversion where you can configure an xpath to detect chapters, the default is this:
//*[((name()='h1' or name()='h2') and re:test(., 'chapter|book|section|part\s+', 'i')) or @class = 'chapter']
Note that expression only looks for h1 or h2 tags, but in our example we need h3 tags. It also has a regex that looks for the words chapter, book, section, or part, but we need numbers, which can be represented as '\d+'. If you're book's chapters just use varying words then you could use '.*'

So we can just change that xpath to this:
//*[((name()='h1' or name()='h3') and re:test(., '\d+', 'i')) or @class = 'chapter']
And now Calibre will create a TOC. If you're book uses <h4>, <h5> or something else, change the xpath appropriately.

To match everything in an h3 tag:
//*[((name()='h1' or name()='h3') and re:test(., '.*', 'i')) or @class = 'chapter']
If all the chapter tags in the book are h3 tags or similar, and those tags are used nowhere else, then you could also click on the little magic wand icon next to the xpath, and just type 'h3' or it's equivalent into the first box - even simpler.

Poorly defined chapter headings:
Here's an example of a poorly defined chapter heading:
<p class="MsoNormal" align="center" style="mso-margin-top-alt:auto;mso-margin-bottom-alt: auto;text-align:center;line-height:normal"> <span style="font-size:14.0pt; font-family:&quot;Times New Roman&quot;;mso-fareast-font-family:&quot;Times New Roman&quot;; color:black"></span>
<p class="MsoNormal" align="center" style="mso-margin-top-alt:auto;mso-margin-bottom-alt: auto;text-align:center;line-height:normal"> <span style="font-size:14.0pt; font-family:&quot;Times New Roman&quot;;mso-fareast-font-family:&quot;Times New Roman&quot;; color:black">Chapter 2</span>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto; line-height:normal"> <span style="font-size:14.0pt;font-family:&quot;Times New Roman&quot;; mso-fareast-font-family:&quot;Times New Roman&quot;;color:black">The incredulous look must have been plain on my face. As she realized how her offer sounded, her
In this case, the chapter is just in a <p> tag, which is the same way plain text is treated in most ebooks. Getting Calibre to create a TOC with the same technique we used before won't work.

Often the simplest solution for this type of chapter heading is to go into the Heuristic Processing panel of Calibre's conversion options and enable Heuristics. Heuristics will search for common types of chapter headings and wrap them with <h2> tags.

Now you can go into structure detection, click the magic wand next to the Chapter detection xpath, and just type '//h:h2' into the first box. Calibre should create a table of contents for this type of scenario.

Image Only Chapter Headings
Some books only use images for Chapter headings. This can be difficult to handle and may require hand editing in Sigil by converting to epub first. Basically an image heading might look like this:
<p class="sb-chapter-image"> <span class="chapter-image"> <img alt="Alice_01.tif" class="generated-style-2" src="images/Alice_01_fmt.jpeg"/> </span>
While you could write an xpath to detect this and create TOC entries/markers, there is no usable text here to create something human readable. This is where Sigil comes in. Wrap the image in <h1>, <h2>, or <h3> tags, depending on the level you want, and then use the 'title' attribute which will tell Sigil what text to use. The finished result would look something like this:
<h2 class="sb-chapter-image" title="Chapter 1"> <span class="chapter-image"> <img alt="Alice_01.tif" class="generated-style-2" src="images/Alice_01_fmt.jpeg"/> </span>
Sigil .3 will automatically build the TOC as you create these. In Sigil 0.4 you would need to click the 'Generate TOC from Headings' button. Calibre will then use this TOC during subsequent conversions.

Nothing worked, I'm getting Desperate
If none of the above solutions for you is working, convert to epub and edit your book in Sigil. During the conversion in Calibre it's a good idea to go into Calibre's conversion settings and temporarily change the 'Split files larger than' option under 'Epub Output' to 3000 or larger (depending on how large your book is - change this back when you're done). Using Sigil you can mark your Chapter headings manually (or possibly using Sigil's search and replace). Once you've finished, use Calibre to convert your new epub to your desired destination format - Calibre will preserve the TOC that was created by Sigil when it converts to the new format.

Today's Posts | Search this Thread | Login | Register