Mobileread
Kindlestrip Python script and AppleScript wrapper
#1  pdurrant 09-01-2010, 02:01 PM
Kindlegen, Kindle Comic Creator and Kindle Previewer add the source files used in compiling the kindle ebook as one of the (invisible) records in the kindle ebook.

So I wrote a python script that strips out the sources record from Kindle format ebooks. And for those on Macs I wrote a nice Applescript wrapper and also put the python script in the AppleScript bundle to make things easy.

Kevin Hendricks has since updated the code to handle files from KindleGen 2.x, and I've also tweaked a bit more to handle KindleGen 2.7.

If you're going to upload to the Amazon store, this script is usually unnecessary, as Amazon will strip the sources before delivery anyway.

Do not use this script to make files to be uploaded to KDP, unless you have to because of size constraints on uploaded

Kindlegen now includes the option to not add the source files to the end of the generated book. So if you're using Kindlegen and want a file without the sources added, don't use KindleStrip, but specify this option in Kindlegen to get guaranteed correctly formatted books.

If you're on a Mac you only need the Applescript, as it includes the Python script in it. The Applescript is a simple drag&drop operation — drag your KindleGen generated file onto it, and it creates one named [oldname]_stripped.mobi.

As always, please comment with any bug reports or problems.
[zip] KindleStrip 1.36.app.zip (35.1 KB, 3742 views)
[zip] kindlestrip_v136.py.zip (4.8 KB, 7132 views)
Reply 

#2  pdurrant 09-03-2010, 06:23 PM
Now at version 1.1. Writes out the stripped data as a zip file. The data in the Mobipocket file seems to have a 16 byte header that's written out as hexadecimal to the standard output. Thos using the AppleScript won't see this at all. I have no idea what the 16 bytes mean, so this probably isn't a loss.
Reply 

#3  daffy4u 09-03-2010, 06:40 PM
Thanks pdurrant! I don't have any books to upload to Amazon but I always appreciate the efforts of those who push the Kindle limits to make it even more useful.
Reply 

#4  ATDrake 09-24-2010, 01:23 AM
Just tried this on a couple of auto-generated mobis made via the new version of Kindle Previewer (1.5).

It now has"ePub support", by which it means that it automatically converts any ePubs dragged upon it to mobi and drops the file in the same folder, apparently on the lower -c1 compression setting. Also a new simulation option for iPad, but no K3 mode yet. But the people trying to figure out Kindle Audio/Video now have a new testing tool for their efforts.

Anyway, the stripping works a treat and the extraction gives back almost exactly went in, as far as I can tell. Did a few more tests with my lazily assembled Fictionwise cleanup conversions and html comes back as zipped html, and a zipped up ePub in yields the exact same zipped-up ePub out.

Interestingly enough, if you originally pointed KindleGen at an opf (either custom or via unpacked epub), then no matter what the source structure, the unzipped-from-stripped version yields up the css, html, image, and misc (ncx, etc.) files rearranged into separate subdirectories with exactly those names.

Stripped file has immense space savings, often near-halving; sometimes more if there are a fair number of graphics involved in the source. Even pure text with no pictures is over a third smaller.

I have absolutely no idea why Amazon would remove the entirely logical -donotaddsource option unless they actually want to serve up plenty of bloated files via 3G and cut down on the marketable "Kindle can hold #### books!" space (and deduct extra from royalties paid out, of course), which seems rather counter-productive to me.

While we're on the subject of inexplicable KindleGen design decisions, might as well mention some more things I found out while using it:
  1. Plain old descendent selectors, a staple since CSS1, seem to be completely ignored. Another black mark for KindleGen's (lack of) CSS support and means that one will likely have to class every item one wants to target with a particular style not shared with its siblings, rather than classing a container parent element for the lot and letting specific descent rather than generic inheritance take place.
  2. If you forget to close a <div> with styling applied, all subsequent text seems to be rendered with the same styling, even if it occurs in separate files in the source, at least until it hits the next tag with a different style.
  3. If you have any superfluous tags in your NCX, even a mistakenly applied empty closing tag like say, </head>, then KindleGen will merrily ignore your painstakingly constructed <navMap> and happily build with nary a warning until you find out that your mobi has no chapter marks and spend far too long trying to figure out why.
Thanks again for writing this script! I'm sure people will be finding it very useful if Amazon's going to insist on always including the source files.
Reply 

#5  ATDrake 09-24-2010, 02:17 PM
Also, I think I've figured out what the mysterious header bytes mean.

If your source was converted straight from a properly zipped ePub, then you get 53524353000000100000003000000001. If it came from any combination of un-prepackaged html/opf, it'll be 53524353000000100000002f00000001. If it's a no-source-files-added mobi to begin with, then the header bytes are 46434953000000140000001000000002.

And it seems that even the samples offered for the newer books at Amazon nowadays include the bloat (but only from the mobi conversion and cut off appropriately at the sample length), which looks like it's a useless expenditure to me.

Ah well, if they want to waste their server bandwidth for no good reason, that's entirely up to them. As long as they don't go back to charging that extra $2 Whispernet surcharge that they finally got rid of for Canadians.
Reply 

#6  twedigteam 02-21-2011, 01:55 PM
If anyone still takes a gander at this thread, having some issues running the Kindlestrip tool on OSX10.6.6; a simple drag & drop of a .mobi file onto the AppleScript file doesn't actually cause anything to occur...taking a closer look, I'm wondering if the inherent Python files on my Mac are outdated to run kindlestrip properly (I had no issue at whatsoever using your ePub zip/unzip scripts, but I could be misled in that they don't use the Python language?). My version:

Python 2.6.1 (r261:67515, Jun 24 2010, 21:47:49)

Noticed that there is a more current build of 3.2, wondering if maybe this could be the issue? I'm sure also there is a way to run from Terminal, but I am certainly not at that level of familiarity with Python to do so....thanks in advance if anyone spots this...
Reply 

#7  ATDrake 02-21-2011, 02:08 PM
I'm also on 10.6.6 and the AppleScript has been working for me for the past couple of months and again when I used it yesterday.

I used to have the standard Python 2.6-ish install, but then I went and got the 2.7.1 installer from Python.org (after the source failed to compile, grr).

Maybe your unzip utility sets the permissions wrongly?

In any case, to use it on the command-line, just do python PATH/TO/kindlestrip.py OriginalFile.mobi OutputFile.mobi OptionalStrippedData.zip

You can drag and drop the kindlestrip.py file onto the Terminal window and it will autofill its path, and the 3rd filename is optional if you don't care about looking at the stripped data.

You can also alias it in your .profile for convenience, aka:

alias kstrip="python PATH/TO/kindlestrip.py"

and then string together a series of commands to batch process a folder:

alias kstripbatch='for m in *.mobi; do kstrip "$m" "${m/.mobi/-stripped.mobi}"; done'
Reply 

#8  pdurrant 02-21-2011, 04:03 PM
Quote twedigteam
If anyone still takes a gander at this thread, having some issues running the Kindlestrip tool on OSX10.6.6; a simple drag & drop of a .mobi file onto the AppleScript file doesn't actually cause anything to occur...taking a closer look, I'm wondering if the inherent Python files on my Mac are outdated to run kindlestrip properly (I had no issue at whatsoever using your ePub zip/unzip scripts, but I could be misled in that they don't use the Python language?). My version:

Python 2.6.1 (r261:67515, Jun 24 2010, 21:47:49)

Noticed that there is a more current build of 3.2, wondering if maybe this could be the issue? I'm sure also there is a way to run from Terminal, but I am certainly not at that level of familiarity with Python to do so....thanks in advance if anyone spots this...
I can't think why it wouldn't work for you. It works here. You don't need Python 3.x. Most of the python scripts around are written for Python 2.x where x>=5, including this one. It may well not work with 3.x at all.

What happens if you just double-click the applescript? (It should ask you to locate kindlestrip.py - just click cancel if it does.)
Reply 

#9  twedigteam 02-21-2011, 08:25 PM
Quote ATDrake

In any case, to use it on the command-line, just do python PATH/TO/kindlestrip.py OriginalFile.mobi OutputFile.mobi OptionalStrippedData.zip
Worked like a charm. Clearly no issue with the code if this goes through. I'll retry the script on a coworkers system later in the week.

Once again, a tip of the hat...the help here is impressively reliable, and kudos on the tools....
Reply 

#10  twedigteam 02-21-2011, 08:27 PM
Quote pdurrant
I can't think why it wouldn't work for you. It works here. You don't need Python 3.x. Most of the python scripts around are written for Python 2.x where x>=5, including this one. It may well not work with 3.x at all.

What happens if you just double-click the applescript? (It should ask you to locate kindlestrip.py - just click cancel if it does.)
Double-clicking does ask to locate the .py file, and I've tried every possible combination, including removing the scripts and re-downloading them. As I mentioned above, it works fine in command line so the AppleScript issue is just a local one

....thanks again!
Reply 

  Next »  Last »  (1/14)
Today's Posts | Search this Thread | Login | Register