[Plugin] IDErrorCheck
#1  slowsmile 10-18-2017, 07:41 PM
Checks, repairs and reports all id errors in the epub

Plugin Type: Edit
MIT Licence(OSI)
Minimum Sigil requirement: v0.9.3 or higher
Python Requirements: Python 3.4+ (Bundled or External)
OS Requirements: Windows, Linux or OSX
*** Tested on Windows 7, 8 & 10 only ***
Current Version: "0.2.0"

* Select Manage Plugins from the Plugins menu. In the dialog box, select either the Bundled Python or the External Python(Python 3.4+ should be installed on your computer to run this plugin externally).

* Click Add Plugin and select This will load and install the plugin into Sigil, which you can then run by selecting Plugins > Edit > IDErrorCheck

This plugin was originally written with the sole intention of properly reporting and, if possible, fixing Epubcheck's infamous "colon" id error problems. This plugin now also does the following:

* Converts all "name" attributes to "id" attributes in the html files.

* Now checks and repairs all invalid id attribute values in the epub's html files. Checks and repairs illegal spaces and illegal first-digit-start errors and also checks and repairs other illegal non-alphanumerics that commonly occur within id attribute values.(v0.1.5)

* Also checks and repairs all internal links that contain bad bookmarks associated with the above html id problems.(v0.1.5)

* Checks and repairs all book uuid values in the toc.ncx and content.opf. If an illegal book uuid value is found then another unique uuid will be automatically generated to replace it.(v0.1.5)

* Now checks and repairs all navPoint id values in the toc.ncx.(v0.1.5)

* Checks and logs all id errors occurring in the content.opf manifest or spine wihout fixing them.

* Will properly check, flag and identify Epubcheck's "colon" id errors and fix these errors.

* At the end of the plugin run, an error dialog will display a simple error list showing all relevant information about each id error including associated file, line number, reason and bad id.

Don't use the "Mend and prettify..." Sigil feature directly after using this plugin. Doing so will change and increase the number of lines in the html files so that any reported error line numbers generated by the plugin automatically become inaccurate and void.

Plugin Run
First load your epub into Sigil and then just run the plugin. If you only want to know which errors have not been fixed then just run the plugin twice. The first time you run the plugin the display log will show you errors that have been fixed or not fixed. The second time you run the plugin will only show you what has not been fixed.

Update: This plugin can now process epubs that contain svg images without giving svg errors in Epubcheck.

Change Log:

Spoiler Warning below

-- Fixed a bug and removed some unnecessary code in the checkOPFID() function. Thanks to Lucsart.
-- Fixed a bug where html "name" attributes were not initially being converted to "id" attributes before the id error checks. Thanks to Thasaidon.
-- Fixed a problem causing svg formatting errors in Epubcheck. SVG images can now be used in epubs without problems when using this plugin.
-- Now removes the 'name="calibre:cover"' line in the cover file meta tags which was causing Epubcheck problems. Thanks to Becky.
-- Plugin now does not check or change any "name" attributes or their values in the meta tags of all xhtml files. Thanks to DiapDealer.
-- The plugin now prepends an 'x' for all illegal numeric first char problems in ids(ie as it was before the last change). Thanks to Becky.
-- Plugin now repairs all illegal non-alphanum characters within ids and href ids in the xhtml files and toc.ncx only.
-- Plugin now check ids in all tags in the xhtml files
-- Plugin now removes problematic and superfluous ids from navpoint hrefs
-- Plugin now removes probematic and superfluous ids from guide hrefs
-- Thanks to Becky for identifying these problems.
-- Changed epub error message from "Invalid Epub" to "Epub contains no data". Thanks to Doitsu.
-- Changed handling of illegal first char digit id errors. These errors are now fixed by prepending(not substituting) an 'x' char into the id value string. Thanks to AlanHK & DiapDealer
-- Fixed plugin exit problem. Thanks to AlanHK
-- Tentative fix for Linux OS identification problem(untested). Thanks to Doitsu.
-- Initial release
[zip] (45.7 KB, 637 views)

#2  AlanHK 10-19-2017, 04:54 AM
Is this plugin's functionality now all included in your CustomCleanerPlus plugin?

A note: you seem to change IDs beginning with a digit by replacing that digit with an x.
Which will probably be fine, but could create duplicate IDs, e.g.:

id="1" id="2"
both become id="x"

I manually corrected IDs by prepending X. There must be a limit to the length of an ID string, so I guess you should check if adding a character would push it over that if you were really being careful.
Or just forget the original ID and regen them all.

#3  slowsmile 10-19-2017, 07:26 AM

Is this plugin's functionality now all included in your CustomCleanerPlus plugin?
No this code hasn't been added to the CustomerCleanerPlus plugin(CCP). The reason for this is because CCP is a cleaner for html files and epubs, which has nothing really to do with checking or fixing ids.

The just-released IDErrorCheck does swap in an 'x' char for first char digit errors only. It also substitutes an underscore in all id values that have illegal spaces. It also regens both book ids in the toc.ncx and content.opf files if they are bad. That's all it fixes. All other illegal id values -- such as those containing illegal non-alphanumeriic chars -- are just reported. ID attribute errors in the content.opf are also not fixed -- just reported -- because of the complex rules and myriad dependencies between ids and hrefs within the content.opf and toc.ncx.

#4  DiapDealer 10-19-2017, 08:19 AM
I think what he's saying is that replacing any first-digits in an id with an 'x' could possibly result in identical ids in the same html file. Prepending the 'x' (instead of swapping) would at least guarantee that already unique ids would stay that way.

#5  slowsmile 10-19-2017, 07:07 PM
@DiapDealer...I'll try and put in the suggested change. This change will only apply to fixing the first char digit errors in the epub.

#6  slowsmile 10-19-2017, 08:36 PM
Plugin Update: The plugin has been updated(v0.1.2):

*Changed handling of illegal first char digit id errors. These errors are now fixed by prepending(not substituting) an 'x' char into the id value string. Thanks to AlanHK & DiapDealer.

#7  slowsmile 10-25-2017, 07:07 PM
Could someone please add this new plugin to the Sigil Plugin Index? Thanks in advance.

#8  KevinH 10-27-2017, 01:18 PM
Just added it.

#9  BeckyEbook 03-13-2018, 09:26 AM
Plugin replace id after hash for illegal first-digit-start errors, but incorrect IDs are do not fix.

Sample illegal ID:
<h1 id="123abc">Chapter 1</h1>
Sample link to illegal ID:
<a href="../Text/start.xhtml#123abc">Chapter 1</a>
First sample is not corrected.

Second is corrected to:
<a href="../Text/start.xhtml#x123abc">Chapter 1</a>
[epub] test-id.epub (12.5 KB, 143 views)

#10  slowsmile 03-14-2018, 06:10 AM
@Becky...It's certainly true what you say. But here's what it says in the release notes:

* Checks and, if possible, repairs all invalid id attribute values in the epub's html files.

* Also checks and, if possible, repairs internal links that contain bad bookmarks associated with the above html id problems.

* Checks and, if possible, repairs all navPoint id values in the toc.ncx.
The above means that it will not fix every single id problem. I saw no point in fixing all id problems because giving you the line number and the reason for the id fail should really be enough for you to fix the id problem. And the main reason that I wrote this app was because Epubcheck did not describe id problems very well. This plugin was really just an attempt to give proper reasons for any id failure as well as point the user accurately to the problem line in the epub.

If you want to see the problem that Epubcheck has with describing bad ids then you could try running your test epub(with bad ids) through Epubcheck. Then you will see the problem with Epubcheck's strange error messaging, which always seems to involve phantom colons that aren't there.

  Next »  Last »  (1/4)
Today's Posts | Search this Thread | Login | Register