[GUI Plugin] Modify ePub
#1  kiwidude 10-22-2011, 10:27 AM
This plugin offers a way to perform certain modifications to your selected ePub files without performing a calibre conversion. This plugin was created a number of months ago and has a history documented in the Development forum on this thread.

Performing an ePub->ePub conversion will enforce a number of changes to your ePub, some of which can be undesirable for some users. Examples are the rewriting of CSS, margin modifications, file splitting in undesired places, changes to directory structure etc.

Instead this plugin allows a user specific subset of changes to be performed in isolation without otherwise touching the original ePub's file structure, CSS files etc. Frequently these changes have been performed manually by users either using the Tweak ePub feature (time consuming), by editing in Sigil (which introduces changes/side effects of its own), by doing ePub->ePub conversions, or by saving to disk and reimporting into calibre.

Users may also find it useful to install the Quality Check plugin, which offers the ability to identify ePubs in your library which qualify for many of the modifications this plugin can make.

Refer to the Help file accessed from the plugin dialog for full details on each of the modification options and when you might use them.

Main Features:
Special Notes:
Installation Notes:
  1. Download the attached zip file and install the plugin/restart Calibre/add to context menu as described in the Introduction to plugins thread.

Running from command line:
Paypal Donations:
Version History:

Spoiler Warning below

Version 1.7.3 - 25 Apr 2022 by chaley
Remove some python 3 code inadvertently left in after debugging.

Version 1.7.2 - 23 Apr 2022 by chaley
Fix "Remove broken TOC entries in NCX file" on Linux. Improvement of error message when no epubs were changed.

Version 1.7.0 - 19 Jan 2022 by chaley
Support for calibre 6

Version 1.6.3 - 2 March 2021 (test release date) 27 July 2021 (release date) by chaley
More 'fix exception in rare cases when replacing the cover'.

Version 1.6.2 - 18 Dec 2020 by chaley
Fix exception in rare cases when replacing the cover

Version 1.6.1 - 5 Oct 2020 by chaley
Fix crash caused by presence of DRM

Version 1.6.0 - 30 Sep 2020 by chaley
Make plugin compatible with calibre 5 (Python 3)

Version 1.4.1 - 12 Mar 2020
Internal changes only, replaced indent tabs with 4 spaces

Version 1.4.0 - 17 Oct 2019
Minor adjustments to the "unpretty" option, including consideration of EPUB3 elements (section/nav) and removal of EMPTY "display: none" elements.
Bugfix: Expanded .xpgt link removal for better detection.
Resequenced modules so that deep parsing would not undo "unpretty" function's work.
Incorporated JimmXinu's fix to the list-based file removal logic.
Incorporated Terisa de morgan's option to move metadata jackets to the end of the book.
Enhanced pagemap removal function to work regardless of the filename.
Added option to only remove pagemaps and related artifacts generated by Google Play, leaving pagemaps from other sources intact. (Note that removing all pagemaps will override this option.)
Note: Neither pagemap removal routine affects pagelists incorporated into NCX files.

Version 1.3.14 - 29 Nov 2017
Added option to remove Adobe pagemap files.[/quote]
Version 1.3.13 - 05 Jul 2015
Added option to disable the confirmation prompt each time to update the epub. Use at your own risk - if you make simultaneous other changes to the book record they may get lost.
Fix for Cancel on the progress dialog (submitted by Raúl)

Version 1.3.12 - 02 Oct 2014
Fixed minor bug in "stripkobo" option that missed some Kobo artifacts inside the HEAD element.
Fixed minor spacing bugs in "unpretty" option.
Enhancement to "stripkobo", "stripspans", and "unpretty" options: All three now remove </br> and </hr> tags and always make BR and HR self-closing elements. (This fixes invalid <br> and <hr> markup, if such is present.)
Moved "stripkobo", "stripspans", and "unpretty" into the "Known artifacts" category to balance the dialog box better.
Added some code to to make the dialog box scrollable on smaller screens.
Help file: Filled in how one can detect the need to smarten punctuation. (Was previously blank.)

Version 1.3.11 - 13 Aug 2014
Add a "stripspans" option to allow removal of attributeless <span> elements from markup, as well as normalizing empty <x></x> elements to the <x/> form.
Add a "stripkobo" option to allow removal of the Kobo-specific code from kepub books, transforming them into standard EPUB books. This does NOT remove Kobo's DRM.
Note: Both of the above will also completely remove A, B, I, U, BIG, SMALL, EM, SPAN, and STRONG elements from the markup when those elements have neither attributes nor content.
Add an "unpretty" option to de-indent and otherwise reformat HTML elements in markup. This should have no effect on the rendered content; it only cleans the source code up a bit.
Fix for "Remove Adobe resource DRM meta tags" option to remove leading spaces and/or newlines, so these meta tags are completely removed instead of leaving blank lines.

Version 1.3.10 - 28 Jul 2014
Support for upcoming calibre 2.0

Version 1.3.9 - 01 Sep 2013
Fix for users who do not have any Extra CSS in their defaults trying to use the Append Extra CSS option.

Version 1.3.8 - 30 Aug 2013
Add a "Append extra CSS" option to allow appending any css style information from Preferences->Common Options->Look & Feel->Extra CSS to each .css file in the ePub.
Respect the tweak "save_original_format_when_polishing" if set to make a .ORIGINAL_EPUB copy of the book before making modifications if no such copy exists.
After running Modify ePub ensure the book details panel is updated in case an ORIGINAL_EPUB was added
Fix for encrypted font ePubs being treated as DRM protected preventing Font removal

Version 1.3.7 - 15 Feb 2013
Fix for dependency on calibre code removed in 0.9.19

Version 1.3.6 - 09 Dec 2012
Fix for "Rewrite CSS margins" to ensure it only processes manifest xhtml files when replacing inline styles.

Version 1.3.5 - 22 Nov 2012
Add a separate script to allow Modify ePub to be run from the command line. Unzip it and refer to the readme.txt/script for help on how to use it.
Change to ensure when running via command line the lack of an opf file allows plugin to still run.

Version 1.3.4 - 16 Nov 2012
Workaround for calibre "bug" to ensure that if user has both remove javascript and smarten punctuation checked, that remove javascript runs first which ensures smarten punctuation will actually work correctly for quotes.

Version 1.3.3 - 08 Nov 2012
Fix the fix (for when Update metadata is "not" selected... sigh...

Version 1.3.2 - 08 Nov 2012
Fix regression from last release where only selecting the "Update metadata" option would not apply changes.

Version 1.3.1 - 06 Nov 2012
Ensure than the "Remove non dc: metadata" option will always run after "Update metadata" if both are selected.
Reorganise some of the layout and groups.

Version 1.3.0 - 04 Nov 2012
Add a "Encode HTML in UTF-8" option strip charset meta tags and re-encode in UTF-8 for books that do not display correctly in calibre viewer
Change the UI appearance to look more balanced.

Version 1.2.10 - 31 Aug 2012
Rewrite the playOrder to make sure it is an incremental sequence after actions that delete from the TOC.
Change indenting from mucking up self-closing tags in NCX.

Version 1.2.9 - 04 Jul 2012
Alter the "Proceed" message text to hopefully make it clearer to new users.
Fix "Rewrite CSS margins" bug where if default margins are set to zero and an epub has margins specified it would error
Fix "Rewrite CSS margins" bug where if default margins are set to zero it should not add an @page directive
Change "Rewrite CSS margins" so that if default margins are zero it writes out margin attributes with a value of zero, rather than removing them
Change "Rewrite CSS margins" so that if default margins are negative then it omits the margin attribute from the style
Enhance "Rewrite CSS margins" so that if CSS file has no content it is deleted from the epub
Rename "Rewrite CSS margins" to "Modify @page and body style margins"
Bug fix for "Remove unused images" not detecting svg images in an svg section containing sibling tags
Fix for "Remove Adobe xpgt links" so that it includes removal of links using the @import format.

Version 1.2.7 - 29 Jun 2012
When inserting covers, if guide points to a non-existent cover href, make sure the log does not error.
In the CSS margin updating, if adding page declaration at it to start rather than end of CSS file to workaround Sigil bug

Version 1.2.6 - 24 Jun 2012
Add buttons to save and restore the current settings, to allow setting your own easily switched to defaults

Version 1.2.5 - 15 Jun 2012
Bug fix for when using the Add/replace jacket and Insert/replace cover options together if book has no jacket currently

Version 1.2.4 - 05 Jun 2012
Add some non-standard guide types of "coverimagestandard" and "thumbimagestandard" to increase cover replacement coverage
If the guide has incorrect casing of an image href, auto-correct it

Version 1.2.3 - 05 Jun 2012
Further optimise the CSS margins feature to minimise which files get changed

Version 1.2.2 - 05 Jun 2012
Add a "Remove inline javascript and files" option to remove any javascript leftover from html conversions
Fix for CSS margins feature which was not always updating the css file in the epub after resetting margins

Version 1.2.1 - 01 Jun 2012
Fix for remove Adobe xpgt links so it no longer is dependent on link attribute ordering to find them

Version 1.2.0 - 01 Jun 2012
Change to require minimum calibre version 0.8.53 in order to utilise some calibre bug fixes/changes
Change to calibre API for deprecated dialog in 0.8.49 which caused issues that intermittently crashed calibre on Mac OS
Add a "Insert or replace cover" option to attempt to insert or replace a cover without doing a conversion
Add a "Remove cover" option to attempt to completely remove an identified cover from the ePub.
Rewrite "Removed unused image files" and "Remove broken cover images" features to use lxml rather than regex for better accuracy
Add protection for numerous options against trying to apply them to a DRM encrypted book
Better handle ebooks where the ncx file is not in same directory as opf manifest
If user chooses redundant options (e.g. "Remove all jackets" makes "Remove legacy jackets" redundant) do not run the redundant option

Version 1.1.7 - 17 May 2012
Re-release of 1.1.6 to cater for missing file

Version 1.1.6 - 17 May 2012
Bug fix for the last_modified column not being updated if multiple books modified
Add a "Remove broken cover images" option to remove html pages which contain only an image tag to a broken image.
Add a "Remove broken TOC entries in NCX" option to remove ncx entries that point to non-existent html pages
Fix for remove unused images to include svg and bmp files as possible image extensions

Version 1.1.5 - 09 May 2012
Fix for Remove xpgt files and links to remove the xpgt file from the manifest
When performing any Modify action, update the last_modified column in calibre for the book.

Version 1.1.4 - 07 May 2012
Fix for remove unused images to check encrypted and unencrypted names, skip DRM ebooks
When using the Remove xpgt files and links option, remove trailing whitespace after the removed <link>
When no epubs are modified, ensure the log detail is available to review

Version 1.1.3 - 07 May 2012
Fix for remove unused images to better handle image paths with other characters like commas

Version 1.1.2 - 07 May 2012
Fix for remove unused images to better handle image paths with spaces

Version 1.1.1 - 05 May 2012
Fix for remove unused images to url encode image paths with spaces in them, and handle namespaced images

Version 1.1.0 - 05 May 2012
Move the "Remove margins from Adobe .xpgt files" into a new Adobe section on the UI
Add a "Remove Adobe .xpgt files and links" option for complete clean xpgt file removal
Add a "Remove Adobe resource DRM meta tags" option for stripping DRM <meta> resource identifiers from xhtml content.
Extend "Remove embedded fonts" to also remove @font-face declarations from the CSS and html files
Add a "Remove unused image files" option to remove orphaned images not referenced from the html content to save space.
Add a "Flatten TOC hierarchy in NCX file" option to move all the navPoints to a single level if they are nested.

Version 1.0.2 - 12 Feb 2012
Add ability to smarten punctuation of HTML files

Version 1.0.1 - 23 Nov 2011
When updating metadata, ensure that if calibre has no tags any dc:subject elements are removed
Improve the logging output when removing non dc: metadata elements

Version 1.0.0 - 22 Oct 2011
Preparation for deprecation for db.format_abspath() function in future Calibre for network backends
Merge in remaining CSS/margin changes from Idolse for initial release
Support keyboard shortcut for opening dialog

Version 0.3.5 - 26 Jun 2011
Fix an issue with css margin rewriting that used property names using '_' instead of '-'

Version 0.3.4 - 21 Jun 2011
Fix issue with some NCX files not parsing correctly causing error with OS artifact removal
Remove dependency on the Calibre epub-fix Container class to allow plugin to develop independently
Incorporate ldolse's rewrite CSS margin code to reset page/body margins

Version 0.3.1 - 12 Jun 2011
No longer look in manifest for NCX file, look for physical file instead to get around media-type variant issues
If cancel updating the ePubs, remove the temp directory
Additional mime type for xpgt files as supplied by Idolse

Version 0.3 - 06 Jun 2011
Add ability to remove embedded fonts
Add ability to update the metadata (including cover)
Add an error dialog if the user clicks ok with no options selected
Ensure rebuilding the ePub uses the Calibre zip code as per change to Tweak ePub

Version 0.2.2 - 03 Jun 2011
Treat iTunesArtwork the same as iTunes plist files
Add an option to remove OS artifacts of .DS_Store and thumbs.db files
Ensure that any xml elements inserted in the manifest are "tailed" correctly for indenting
When adding items to manifest, if a .htm* file check for xmlns indicating mimetype of xhtml+xml

Version 0.2.1 - 30 May 2011
Ensure Calibre bookmarks and iTunes files are removed from the manifest if present there

Version 0.2 - 30 May 2011
Add option to remove iTunesArtwork files
Add option to remove non dc: metadata elements
Add option to add/update calibre jackets
Rename Select none to Clear all on dialog

Version 0.1 - 26 May 2011
Initial release of Modify ePub plugin

[zip] Modify (78.5 KB, 1734 views)

#2  DoctorOhh 10-22-2011, 07:45 PM
Excellent, it will now be easier to refer folks to this plugin.

#3  bizzybody 10-23-2011, 12:22 AM
Here's a feature request, a function to edit or remove text color settings in the stylesheet.css file. I spent a lot of time trying different converters from epub to Mobi but none of them would get rid of what had the text locked to black (actually a really really dark grey). Finally, in response to a post in the epub forum, I was pointed to Sigil and looking for references to color in stylesheet.css. I found three instances of color: #231F20; deleted them and saved. Then the book converted without the text color locked.

I figure the best setup would be to find each color setting and show an example of what changing or removing each setting will do. Might not want to remove *all* the color settings, in this book the TOC links and chapter headings are set to "blue" while the rest of the text was 231F20. I left the "blue" and removed the others.

Boggles me why anyone would set a color to look like printed "black" ink when black text is the default on readers and reader software.

I read on my LED screen phone in white on black because it improves the battery life a bunch VS black on white. Black pixels on LED screens use zero power, VS white pixels on LCD using zero power, other than the backlight shining through. No backlight on LED.

#4  DoctorOhh 10-23-2011, 12:54 AM
Kiwidude, let me state it a little simpler. It would be nice if you could add the option to remove text color and remove background color to the plugin.

#5  kiwidude 10-25-2011, 06:51 PM
@bizzybody/dwanthny - modifying the css via this plugin is one of those things I personally have avoided to date, though I guess Idolse must be doing with his additions to this plugin. It has to be done in a bulletproof way while minimising the changes to the css, which means not using third party css parsing code for instance. It should be possible using regular expressions but needs careful application and a lot of testing. I want people to have confidence that when they use this plugin it lives up to its billing of making the least changes possible to do their task, instead of the potential gotchas associated with alternatives.

You have to handle potentially mutiple css files, and also inline css as well...

If I do take the leap at some point, certainly stripping colors could be an option to support. I would add to that wish list things options like line-height, letter-spacing and font-family too, no doubt there are others I've forgotten.

#6  ldolse 10-25-2011, 09:45 PM
Quote kiwidude
@bizzybody/dwanthny - modifying the css via this plugin is one of those things I personally have avoided to date, though I guess Idolse must be doing with his additions to this plugin. It has to be done in a bulletproof way while minimising the changes to the css, which means not using third party css parsing code for instance.

You have to handle potentially mutiple css files, and also inline css as well...
I haven't taken the leap and done my whole library with the plugin yet, but I've modified dozens if not a couple hundred ePub's css using this plugin. The code works in all the ways you describe at this point - multiple css files are supported, inline styles in the html, etc. It would be trivial to modify other css styles - the other thing it seems people want is to automatically add/remove justification.

I think the trick would be to apply all the css modifications at once though, so it would require a bit of a re-design of the current function - basically a top level modify css, then a bunch of sub-functions - sort of like heuristics.

#7  Magnus 10-26-2011, 10:15 PM
I'm quite naive here... but I noticed that when I run the plugin I get:

Looking for non dc: elements in manifest
Removing child: {}meta

After I've modified the ePub, the result persists when I again run the plugin. Why is that? What have I overlooked?

#8  kiwidude 10-27-2011, 03:58 AM
@Magnus - what "result persists"are you referring to? Send me a PM with a link to the ePub and more detail on what it is doing the you don't expect it to.

#9  capnm 11-02-2011, 09:37 AM
Glad to see this made it to prime time! Thanks guys!

I looked at the issue of line spacing a while ago and discovered that a popular technique for setting line spacing is to manipulate the font size (set the paragraph class to, say, font-size 1.3em, line-spacing 1, then span all the text blocks with a font-size .75 em ) making it really hard to 'fix'.

#10  paulfiera 11-05-2011, 12:29 PM
It'd be really great to have the possibility of updating the jacket with all the metadata, including tags.

Sometimes I change the tags and I would like to update the whole epub jacket without doing a conversion.

  Next »  Last »  (1/165)
Today's Posts | Search this Thread | Login | Register