Mobileread
Not Well formed 1.4.3
#1  JSWolf 01-07-2021, 05:51 PM
My system is Windows 10 Home 64-bit. I have the 64-bit version of Sigil 1..3 installed along with the epubcheck plugin.

I was loading an ePub 3 eBook and Sigil pops up the message...

Quote
This EPUB has HTML files that are not well formed or are missing a DOCTYPE, html, head, or body elements. Sigil can automatically fix these files, although this may result in minor data loss in extreme circumstances.

Do you want to automatically fix the files?
When I clicked no, I then checked the eBook for errors using epubcheck and I received no errors.

Is this a bug with Sigil? If so, can it be fixed? If it's not a bug, what in the eBook code is incorrect?

Here is a scrambled copy of the eBook. The only changes made is that the embedded fonts were removed and all CSS code referencing these fonts was removed. Otherwise, it's the unchanged scrambled code. I did check it with epubcheck and it passed.
[epub] Star Trek - James Swallow_scrambled.epub (614.6 KB, 48 views)
Reply 

#2  KevinH 01-07-2021, 06:06 PM
No, If you read the error message it tells you that it is missing its DOCTYPE (assuming it has html, head and body tags), which is required by the epub spec. It is an open issue on epubcheck to test and report this. Calibre does not follow this aspect of the spec. Sigil does and has for years prior to a couple of releases ago when auto mending to move things to its standard layout always fixed it. Now that we no longer move things to standard locations, the auto fixing is no longer done.
Reply 

#3  DNSB 01-07-2021, 10:50 PM
Mend and prettify or just Mend will add the missing doctypes. The CSSUndefinedClasses plugin is not happy with running against an epub with those errors so I've been using Mend to fix the issue.

Looking at your scrambled epub, the first block is before mend and prettify, the second is after.

Code
<?xml version='1.0' encoding='utf-8'?>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" lang="en-us" xml:lang="en-us"> <head>
Code
<?xml version="1.0" encoding="utf-8"?><!DOCTYPE html><html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" lang="en-us" xml:lang="en-us">
<head>
Reply 

#4  Ashjuk 01-08-2021, 04:34 AM
I get that warning message on pretty much every new book I open in Sigil.

I am puzzled as to why Sigil says the book is not well formed because I can open a book (that's not been opened in Sigil) in Freda, ADE, on my Kobo, on my Ipad and (shudder) Calibre and none of them complain that the book is malformed.

From my experience it's only Sigil that brings up the warning about the missing DOCTYPE tag. Also, if it is so important, why is that just about every book I come across is missing it? Even new releases appear not to have it so I can only assume it's not that critical.
Reply 

#5  DiapDealer 01-08-2021, 06:19 AM
Funny. I get the warning on pretty much none of the epubs I open.

Being able to open a book in an ereading program with no warning has never been an indicator of whether the epub in question was spec-compliant or not.

Either start hitting yes to the warning (and subsequently saving the epub after) or get used to seeing the warning. Those are your options.
Reply 

#6  Ashjuk 01-08-2021, 07:02 AM
Quote DiapDealer
Funny. I get the warning on pretty much none of the epubs I open.

Being able to open a book in an ereading program with no warning has never been an indicator of whether the epub in question was spec-compliant or not.

Either start hitting yes to the warning (and subsequently saving the epub after) or get used to seeing the warning. Those are your options.
As you say, I just hit yes and let Sigil do it's thing but just wondering that's all - no criticism of Sigil.
Reply 

#7  KevinH 01-08-2021, 09:54 AM
Sigil is making the most spec compliant and consistent epub it can. According to the spec, DOCTYPE is required and older versions of Sigil quietly fixed this on load as it had to move things to fit Sigil's standard form. Newer versions of Sigil no longer auto fix the missing DOCTYPE (but Mend will properly fix it) so it warns the user to fix things and offers to auto fix for them.

Those same e-readers will work just with the DOCTYPE. As I said, epubcheck has an open issue to fix this.

BTW, any epub2 that has and uses any named entities (ie like nbsp) in it that is missing the DOCTYPE is technically broken and will not work on most e-readers because epub2's version of the DOCTYPE is where the named entities are included.

That is why this is important to fix.

Calibre is not spec compliant on this issue but does replace all named entities with their numeric or character equivalents, which makes not having a DOCTYPE even on epub2 possible but technically against the rules.
Reply 

#8  Ashjuk 01-09-2021, 04:18 AM
Thanks for the explanation, Kevin.

As I said, I was just curious why it seemed that it was only Sigil that picked up on the missing DOCTYPE.
Reply 

#9  odamizu 01-09-2021, 04:00 PM
Is there a way to tell what changes will be made if you agree to the changes Mend wants to make? i.e., I often find there are Sigil features I'm not aware of and learn about reading these forums, so just checking if I'm missing something that's already there.

I will also admit that the part of the warning that says "Sigil can automatically fix these files, although this may result in minor data loss in extreme circumstances [emphasis added]" always gives me pause, making me want to know exactly what changes are being proposed.

Thank you
Reply 

#10  BeckyEbook 01-09-2021, 05:06 PM
Quote odamizu
Is there a way to tell what changes will be made if you agree to the changes Mend wants to make?
What will be changed you cannot see, but ...

If you want to see what exactly Sigil changes during these changes:
1. Open the EPUB file
2. If you see the message, choose [No]
3. Save the checkpoint (Checkpoints > Create Checkpoint for Epub or [🡅] icon)
4. Close the EPUB file (without saving!)
5. Open the same EPUB file again
6. Select [Yes] when you see the message
7. Check what has changed (Checkpoints > Compare Epub against Checkpoint or [±] icon)
Reply 

  Next »  Last »  (1/3)
Today's Posts | Search this Thread | Login | Register