Mobileread
Using perl scripts to produce .IMP ebooks and more...
#1  nrapallo 02-03-2008, 08:29 PM
I had wanted to use command-line based tools to facilitate the conversion of .html directly into .IMP format; bypassing the need for the eBook Publisher GUI. Don't get me wrong, I think the eBook Publisher is a very powerful tool. It is the most effective way to deal with multiple input files especially if they do not 'lend' themselves to be used in a ebook.

To achieve the best results, the .html should first be cleaned-up by 'Tidy'. This will remove those annoying '?', correct ill-formed TOC and clean-up the .html.

I needed this primarily as I was converting single .html files from various sources (expanded .PRC/.PDB, exploded .LIT, Blackmask/Project Gutenberg .html, etc.). I found it cumbersome to use the eBook Publisher for just one file, especially if the .html filename was in the format 'authorname - title'.html. For these .html files, I had all the info I needed to properly create a .IMP ebook in the filename; all I had to do was choose the category and I would be finished!

Enter the perl scripts...

To use these perl scripts, it is required that:
1. You have previously installed the eBook Publisher software from http://www.ebooktechnologies.com/support_publisher_download.htm . The perl scripts use 'SBPubX' interface calls to create, view and manipulate .opf and .IMP files.
2. That perl scripts can be executed on your computer. For Windows, I had to install ActivePerl from ActiveState from http://www.activestate.com/store/activeperl/ .
BUILDIMP
A simple batch file called buildIMP.bat demonstrates how .IMP ebooks can be created using the workhorse routine 'Html2imp.pl'. The 'Html2imp.pl' perl script takes as input four parameters: 'Authorname' 'Title' 'Category' and 'htmlfilename'. If any of the parameters contain spaces, then quotes need to surround that parameter!

After executing the sample batch file, the .IMP ebook is produced along with the .opf project file used internally. This file can later be loaded into eBook Publisher for further processing, if necessary.
EXAMINEOPF
This perl script is invoked by 'examineOPF.pl project.opf'. It displays some information about the .opf file and prints out to stdout. If warranted, this output could be redirected to log file.
INFOIMP
This perl script is invoked by 'infoIMP.pl ebook.imp'. It displays some information about the .IMP file and prints out to stdout. If warranted, this output could be redirected to log file.

A variant of this script is 'infoIMPcsv.pl' which will 'dump' the .IMP details to stdout in 'comma separated values' format. You should redirect the output to a file so it can be opened in Microsoft Excel or similar for further exploration.

Another variant is 'infoIMPtab.pl' which will 'dump' the .IMP details to stdout in 'tabbed text' format. Again, you should redirect the output to a file so it can be opened in Microsoft Excel or similar for further exploration.

Try these on a directory full of .IMP files and you will get a mini-database of .IMP details!
VALIDATEOPF
This perl script is invoked by 'validateOPF.pl project.opf'. It validates the .opf showing all errors/warnings and prints out to stdout. Redirect to log file, if warranted. This can be used to extract the error log of a complex .opf build for future study.
Please feel free to modify these to suit your needs and consider sharing your achievements for others to benefit.

-Nick

EDIT: 18-May-2008 added windows executables (see IMP_OPF_windows-executables.zip) of each perl script for those that can't/won't work with perl scripts directly.
[pl] Html2imp.pl (4.7 KB, 1198 views)
[pl] examineOPF.pl (2.6 KB, 961 views)
[pl] infoIMP.pl (1.8 KB, 995 views)
[pl] validateOPF.pl (2.4 KB, 976 views)
[zip] IMP_OPF_perl-scripts.zip (2.77 MB, 1242 views)
[zip] IMP_OPF_windows-executables.zip (8.70 MB, 2028 views)
Reply 

#2  nrapallo 02-07-2008, 12:03 PM
New 'infoIMPdir.bat' showing how to get 'mini-database' of .IMP details. Just open the resulting .csv in Microsoft Excel or similar to further explore your .IMPs!

Just place the 'infoIMP*' files (from the first posting above) in any .IMP directory and execute the 'infoIMPdir.bat' provided below!

I know this is 'crude', but the results are worthwhile!

-Nick
[bat] infoIMPdir.bat (781 Bytes, 1254 views)
[zip] IMP_OPF_sample_infoIMP_files.zip (1.81 MB, 1285 views)
Reply 

#3  nrapallo 02-12-2008, 12:47 AM
In the Content Forum under the (sticky) Mobiperl thread started by tompe, you will find post #219 that enables you to directly convert from mobipocket .mobi/.prc to .IMP formats via a perl script based on tompe's 'mobi2html'.

This new perl script is named 'mobi2imp.pl' and is available as a windows executable, 'mobi2imp.exe'.

MOBI2IMP
A simple batch file called mobi2IMP.bat demonstrates how .IMP ebooks can be converted directly from mobipocket .mobi/.prc using the workhorse routine 'mobi2imp'. 'Mobi2imp' takes as input two mandatory parameters: 'MobiSource' and 'ExplodeDir' and three optional parameters: 'Category' 'Authorname' and 'Title'. If any of the parameters contain spaces, then quotes need to surround that parameter!

To run this manually, just:
Code
perl mobi2imp.pl --verbose "Oliver Twist.prc" Oliver
or
Code
c:\> mobi2imp.exe --verbose "Oliver Twist.prc" Oliver
After executing the sample batch file, the .IMP ebook is produced along with the .opf project file used internally. This file can later be loaded into eBook Publisher for further processing, if necessary.
Attached below is the 'mobi2imp.pl' code, 'mobi2imp.exe' as well as two sample conversions in the .zip file for anyone who wants to test it out.

You must have the eBook Publisher software previously installed as well as the proper perl lib setup**. This will allow those with many mobipocket .mobi/.prc files to migrate them to their ebookwise 1150 easily.
Note: ** using 'mobi2imp.pl' requires a tricky setup as I used, as a base, the 'Mobiperl' package prepared by tompe (see his website http://www.ida.liu.se/~tompe/mobiperl/ for detailed setup instructions).

While it is daunting getting all the right libs, it is now very rewarding that it's setup properly. After all this SETUP, it is easy. I promise!
This all started out at post #197 in the Mobiperl thread and has evolved into a functional perl script.

For a MINI-TUTORIAL, check here.

For the Mobi2imp Wiki, check here.

Enjoy!

-Nick
Previous changes...
Code
version 2 - Now 'Category Author Title' are optional and don't need to be provided (if the mobipocket ebook was 'well' composed).
version 3 - Now more forgiving of poorly constructed anchors (seen in feedbooks.com .prc's) and will insert the '<a name' tag as long as the 'filepos' points to the start of a tag i.e. "<". This will help retain most, if not, all hyperlinks!
version 4 - Things that changed:
- Now better warns that eBook Publisher must be installed first.
- now takes switches '--1200' and '--1100' to allow for the simultaneous creation of the REB 1200 and REB 1100 versions along with the EBW 1150 .IMP version.
- conversly, if the switch '--1150' is specified, then the EBW 1150 .IMP version is NOT created.
version 5 - Things that are allowed now:
- now allows you to change the text one font size larger ('medium') and one font size smaller (back to 'x-small') by using '--largerfont' and '--smallerfont' respectively.
- per JSWolf's request, you can now change margins from the default (2%) to '--nomargins' (0%), '--largemargins' (5%) and even '--hugemargins' (8%)
- you can change the default text-align from justify to '--nojustify' (i.e. left aligned).
- further to Kovidgoyal's recent 'mobi2oeb' post, now can output in OEBFF (.oeb) output with '--oeb'.
As a result, the output can be any and all at once of: '--1150' .IMP, '--1200' .IMP, '--1100' .rb and '--oeb' OEBFF!
version 6 - Changes:
- per DaleDe's request, you can now change margins from the default (2%) to '--tinymargins' (2px).
- no longer requires external program (nconvert.exe); all image 'fixing' done internally by GD.pm (thanks to tompe for this suggestion)!
version 7 - Changes:
- per DaleDe's suggestion, you can now add small indent with '--indent'.
- per JSWolf's request, you can now eliminate (blank line) paragraph separation with '--nopara' (may also need to indent para with '--indent').
- per DaleDe's suggestion, you can now get more info with '--verbose' or '--debug'.
- first attempt at a 'readme.txt' - you get this also by executing 'mobi2imp' without any paramenters.version 8 - Changes:
- can now override default .IMP naming of 'Author - Title'.ext, by using '--out MYIMPBOOKNAME' to specify .IMP filename produced (omit .ext)
- BUGFIX: now strip <body> tag of any BD/mobi specific in-line styles before start 'fixing' things.[/SIZE]EDIT 21 Feb 2008: version 9 - Changes:
- mobi2imp.exe (version 9) - windows executable (very stable now!)- can now handle (text) .pdb files properly i.e. ereader 'TEXt'/'REAd' type
- now makes the BookDesigner notice at the end 'small print' by default :thumbsup:
- can make that BD notice 'big print' with '--BDbig' (case sensitive)- can make that BD notice start on a newpage using '--BDnewpage' :2thumbsup
- can even remove that BD notice at the end with '--BDremove' :eek:
- to add flare, can use '--bgcolor #FF80FF' to set background color for every page
- BUGFIX: Only when using '--nopara' option, some <br />'s get ignored so another <br /> is added; if this creates issues, then '--noBRfix' will not add the second <br />.
TO DO:
- better documentation and even a tutorial would be nice
- ability to add a (default) 'cover' image to every conversion from .mobi to .imp exists, but not yet ready for the consequences
- ability to add running headers (ala GEBLibraian) exists, but not yet fully implemented
- add more user defined settings along with some 'Mobiperl' fixes like TOC first, cover link, prefix title...
- add Windows GUI ala PDFRead 1.8

EDIT: For a new GUI based Mobi2IMP with many improvements, see Mobi2IMP 9.4 with new Windows GUI & UTF-8
[zip] mobi2imp-exe.zip (1.57 MB, 1355 views)
[pl] mobi2imp.pl (25.7 KB, 1353 views)
[bat] mobi2IMP.bat (1.8 KB, 1348 views)
[zip] mobi2imp_sample_conversions.zip (4.91 MB, 1248 views)
[txt] Readme.txt (2.1 KB, 989 views)
Reply 

#4  DaleDe 02-18-2008, 01:52 PM
I have added a description in the wiki for this tool. It is very simple so far and needs additional data but it is a start.

http://wiki.mobileread.com/wiki/Mobi2imp

Can the version be added somewhere in the pl file please. I am starting to get confused as to what I have download and what the latest is. (maybe a --v option also to print this out.)

Over time the other perl scripts can be added to the wiki also but I just want to get something down today.

Dale
Reply 

#5  nrapallo 02-18-2008, 02:06 PM
To do in 'mobi2imp' version 7 (started but not yet ready for release):

- add '--TOC switch' to add that TOC entry to the beginninf of the file. abandoned.
- perl script/source code had version number, but now it is printed out. done.
- more documentation/tutorial in the works (thanks for the wiki entry)

This program is a testament to the solid foundation provided by tompe's 'mobi2html'. It made the .IMP specific changes so easy to merge from my original 'html2imp.pl'. I never thought it would take off this much, so fast.

As more users use it, I will make any 'necessary' corrections/modifications to aid in the direct conversion of .prc to .imp.

-Nick
Reply 

#6  JSWolf 02-18-2008, 02:31 PM
Quote nrapallo
To do in 'mobi2imp' version 7 (started but not yet ready for release):

- add '--TOC switch' to add that TOC entry to the beginninf of the file.
- perl script/source code had version number, but now it is printed out.
- more documentation/tutorial in the works (thanks for the wiki entry)

This program is a testament to the solid foundation provided by tompe's 'mobi2html'. It made the .IMP specific changes so easy to merge from my original 'html2imp.pl'. I never thought it would take off this much, so fast.

As more users use it, I will make any 'necessary' corrections/modifications to aid in the direct conversion of .prc to .imp.

-Nick
What about the paragraph spacing? is that going to be fixed in the next version? I don't read IMP books, but I personally consider making eBooks with line spaces at every paragraph to be substandard and I won't do that to the readers.
Reply 

#7  DaleDe 02-18-2008, 03:53 PM
I just built a book using the latest version 6 mobi2imp and it said it built a 1150 but it really built a 1200.

Dale
Reply 

#8  DaleDe 02-18-2008, 04:03 PM
Quote JSWolf
What about the paragraph spacing? is that going to be fixed in the next version? I don't read IMP books, but I personally consider making eBooks with line spaces at every paragraph to be substandard and I won't do that to the readers.
I do not think this will be too hard to correct, probably as a new option. As Nick said eBook publisher default style is to add a space between paragraphs but this can overridden with a style change of the <p> element. He thought, at first it was the <div> but that is because he mixed up html0 (BD) with html as used in this script.

Dale
Reply 

#9  JSWolf 02-18-2008, 06:04 PM
Quote DaleDe
I do not think this will be too hard to correct, probably as a new option. As Nick said eBook publisher default style is to add a space between paragraphs but this can overridden with a style change of the <p> element. He thought, at first it was the <div> but that is because he mixed up html0 (BD) with html as used in this script.

Dale
The eBook I asked him to convert to test was one I made into a proper PRC using HTML exported from BD. I used Harry's directions to make it a proper PRC. I'm hoping that once mobi2imp is done and ready, I can use that to make better eBooks based on my PRC editions then from BD. I've already implemented the larger font fix for BD. Now all I need to is wait for the script to be fixed.
Reply 

#10  DaleDe 02-18-2008, 07:06 PM
Quote JSWolf
The eBook I asked him to convert to test was one I made into a proper PRC using HTML exported from BD. I used Harry's directions to make it a proper PRC. I'm hoping that once mobi2imp is done and ready, I can use that to make better eBooks based on my PRC editions then from BD. I've already implemented the larger font fix for BD. Now all I need to is wait for the script to be fixed.
Hmm, is it <p> or <div>. If you run Nicks program the html file is left behind. Could you take a look please?

Dale
Reply 

  Next »  Last »  (1/5)
Today's Posts | Search this Thread | Login | Register