Mobileread
cleanhtml and footnotes
#1  bobbibo 12-08-2019, 08:02 AM
using cleanhtml in order to clean up a word document having footnotes, before moving it into Sigil.
!! Footnotes are converted by cleanhtml as:

<a name="_ftnref1" title="" href="https://word2cleanhtml.com/#_ftn1">
<sup><strong><sup>[1]</sup></strong></sup>
</a>

<a
name="_ftn1"
title=""
href="https://word2cleanhtml.com/#_ftnref1"
>

Of course, this does not work in the resulting epub.
The original doc has over 100 footnotes, so manual adjustment is just not do-able!
Reply 

#2  Doitsu 12-08-2019, 09:56 AM
Notjohn's favorite website is only suitable for simple books without footnotes. You'll usually get better results, if you save MS Word documents as a .docx files and convert them to .epub files with Calibre.
Reply 

#3  Notjohn 12-08-2019, 01:29 PM
I had no problem whatever cleaning up a 140,000-word non-fiction books with a few hundred endnotes. Word's endnotes did need a bit of massaging before they looked professional. Like so much in Word, it handles notes in a fashion acceptable maybe in an office environment, but not in publishing.

Indeed, it was simple enough that I don't really remember the process, except that it took me three passes to get the entire book shaped up in Sigil. The problem AFAI recall was Word's inability to display word count and spellcheck on a file so large.
Reply 

#4  Quoth 12-08-2019, 02:40 PM
Quote Doitsu
Notjohn's favorite website is only suitable for simple books without footnotes. You'll usually get better results, if you save MS Word documents as a .docx files and convert them to .epub files with Calibre.
Also applies to LibreOffice Writer 5.x or 6.x, though I save for editing in odt format.
Calibre works so well, that I now upload the same epub2 to Amazon and Smashwords. I also upload a Calibre created Dual Mobi to Smashwords and an exported from Writer MS doc for the other Smashwords formats.

Footnotes are tricky, especially in ebooks to cover older models. I tell the novel writers to try not to channel early Terry Pratchett. They are obviously unavoidable in certain non-fiction texts.
Reply 

#5  Quoth 12-08-2019, 02:48 PM
Quote Notjohn
I had no problem whatever cleaning up a 140,000-word non-fiction books with a few hundred endnotes. Word's endnotes did need a bit of massaging before they looked professional. ... The problem AFAI recall was Word's inability to display word count and spellcheck on a file so large.
140K words isn't large at all. I'm sure I've done WC, spell and grammar using Word 2002 on about that size with no problem on XP. I switched entirely to Writer a couple of years ago.
First used Word in Office 4.3 on WFW3.11 regularly, though I have used Word 2.0a on Windows before that.
Also no difficulty with that sort of size using a Wordstar Clone on CP/M and similar on DOS with 3rd party spell and grammar checking. I've used various actual Wordstar versions on CP/M and DOS and Wordperfect and MS Word on DOS too up till 1991.
Reply 

#6  Notjohn 12-09-2019, 01:34 PM
I remember now: given that Word clusters the endnotes in a single file, I just separated it out, cleaned it up myself, and added it to the end of the book.

I also moved the return-to links a bit earlier, either to the beginning of the paragraph or anyhow the beginning of a sentence, since otherwise the ebook return (Kindle, anyhow) is simply to the actual footnote number, which regularly orphans (widows?) a single word at the top of a "page".

There were a lot of photos and maps in that book, which maybe bogged Word down. Or perhaps it was the limitations of my then-computer.

Anyhow, no problem with using Word2CleanHtml dot Com on a book with endnotes if you're willing to mess about with the html a little bit. (And if you're not willing to do that, I'm not sure I'd recommend Sigil at all, and certainly not for a first book.)
Reply 

#7  Tex2002ans 12-09-2019, 07:23 PM
Quote bobbibo
using cleanhtml in order to clean up a word document having footnotes, before moving it into Sigil.

[...]

Of course, this does not work in the resulting epub.
The original doc has over 100 footnotes, so manual adjustment is just not do-able!
Do not use that crappy website. It's awful.

If you want an easy DOCX->EPUB conversion, just use Calibre to convert.

You can then do your cleanup from there.

* * *

But ultimately, the single largest thing in Word is learn how to use Styles.

I linked to a few videos/resources on the topic in this post:

https://www.mobileread.com/forums/sh...55#post3848055

Once you create your DOCX with Styles, your resulting code will be SO much cleaner in any workflows. You could then even use Save As > Clean HTML from Word and finagle that using Sigil or Calibre's Editor.

* * *

And it's best to keep your final Footnote code very simple:

This would go in your text:

Code
<p>This is an example sentence.<a href="#fn1" id="ft1">[1]</a></p>
And this would go at the bottom of your file:

Code
<p><a href="#ft1" id="fn1">[1]</a> This is a footnote.</p>
Note: Also, in ebooks, brackets are recommended over superscripts because it's easier to click, easier to read, and doesn't mess with line-heights.

Quote Doitsu
You'll usually get better results, if you save MS Word documents as a .docx files and convert them to .epub files with Calibre.
Agreed.

There are also plenty of other tools to help you convert cleanly:

1. If you have Microsoft Word, Toxaris's EPUBTools is a recommended addon:

https://www.mobileread.com/forums/sh...d.php?t=213372

This gives you extremely clean code.

(Note: It currently has a bug with italics disappearing in footnotes. Next version will fix this.)

2. If you're using Sigil, DiapDealer created a Sigil plugin: DOCXImport.
Reply 

Today's Posts | Search this Thread | Login | Register