I appreciate the gesture, but I have to say I like 'em with a leetle more meat on the bones
your wish is our command oh great code breaker...
image »Nice work, thanks! One question though: is it normal that the exploded html file has only three lines? Line one is always "<html><head>" line two is "<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />" and line three is the rest. It's no problem to make some breaks with par, but the resulting html code is not very cleary arranged for manual editing.
Quote IceHand
Nice work, thanks! One question though: is it normal that the exploded html file has only three lines? Line one is always "<html><head>" line two is "<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />" and line three is the rest. It's no problem to make some breaks with par, but the resulting html code is not very cleary arranged for manual editing.
All of the pre-.epub HTML-based e-book formats seems to do this strip out all unnecessary whitespace to save space. ConvertLIT tries to fix this for LIT files by adding whitespace to the generated HTML, but it gets it wrong often enough to be troublesome. For adding whitespace and otherwise cleaning up grody HTML check out
HTML Tidy.
Thanks for the tip, but I already knew of HTML Tidy and it won't generate a cleaned up version if the source file has errors – which includes most exploded Mobipocket html files.
Anyway, I had a closer look at the html code and it seems that running a search and replace for "> <" with ">\n<" does the trick. Maybe an idea for the next mobi2oeb version?
Quote IceHand
Thanks for the tip, but I already knew of HTML Tidy and it won't generate a cleaned up version if the source file has errors – which includes most exploded Mobipocket html files.
Anyway, I had a closer look at the html code and it seems that running a search and replace for "> <" with ">\n<" does the trick. Maybe an idea for the next mobi2oeb version?
That's not quite safe, what if you have something like
Code
<font size=4>W</font><font size=2>ord</font>
Quote kovidgoyal
That's not quite safe, what if you have something like
Code
<font size=4>W</font><font size=2>ord</font>
Then nothing will happen for that line. It's >
space< that would be replaced with >
line break< which gives the same output.
>< with no space between should of course not be separated by a line break.
Are there spaces in the output HTML? Seems odd there would be, if the creation tools are stripping unneeded whitespace characters.
Yes, there are. To me it doesn't look like that the creation tools are stripping unneeded whitespace characters, but rather like either they are converting line breaks to whitespaces (would seem odd to me, if they would do that) or the script used for exploding to html misinterprets line breaks as whitespaces (that's only a guess of course).
Here's a small sample output from mobi2oeb from a selfmade mobi file. Notice that whereever there is "> <" there should have been a line break between:
Code
<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<guide></guide></head><body><br/><br/> <h1 align="center"><b>Book Title</b></h1> <br/> <h2 align="center">Author Name</h2> </body></html>
OK will be in next release.