Mobileread
PDFRead 1.8.2 released!
#1  nrapallo 03-13-2008, 01:56 AM
PDFRead is a tool for converting non-DRMed PDF and DJVU documents for reading on eBook devices. It does this by creating an image out of each page, enhancing the image and then collating the images in a device-specific format.

The Windows GUI and Installer is NOT mac-friendly, but the source code is python which can be made to work on Mac OS X using the detailed installation instructions: PDFRead 1.8.2 working on Mac OS X or Update to Mac OS X 10.5.x (Tiger) and pulling together useful links.

SUPPORTS:
Input Formats: .PDF / .DJVU / .TIFF / .CBZ / .CBR and also Imglist/Imgdir (i.e. .JPG / .PNG / .GIF / .TIF / .BMP)
Output Formats: .IMP / .RB / .OEB / .HTML / .LRF / .PRC
eBook readers: EBW1150 / REB1200 / REB1100 / PRS-500 / PRS-505 / KINDLE / CYBOOK GEN 3 / ILIAD

PDFRead was created by Ashish Kulkarni and announced, over a year ago, in the thread 'PDFRead 1.7 released'.

I (Nick Rapallo) have been hacking PDFRead v1.7 since fall 2007 (prior to my join date here) and became a developer along with Ashish Kulkarni.

REQUIRED: You must have the (free) eBook Publisher@Wiki » software previously installed to facilitate the conversions to .imp and .oeb. You can install the eBook Publisher software by going here. Then choose to download and install the current version ( Win_eBookPub_2.2.5.exe ).

EDIT: Note you MUST enter your own Title, Author and Category in the GUI screen for the Conversion to begin, otherwise it won't start. (If you don't really need them, just enter 1, 2, 3 or T, A, C.)

I have implemented some enhancements and fixed minor bugs in PDFRead 1.8.2

Changes in this release:
Changelog [2008-04-16] 1.8.2 (by NR)

• added an 'imgdir' In Format where you can select any image in a directory and have all images (files) in that directory loaded. This is similar to an 'imglist' but creates its own list of filenames without needing a (previously created) text file.
• for .prc output, removed current limitation on image sizes (480 max. width) and now use a modified 'html2mobi.exe' program. This should no longer cause large white margins. Cybook Gen 3 users are cautioned that images larger than 480x640 may crash your ereader. Please limit the Size H: and V:! An alternative solution exists with 'mobi2mobi --gen3imagefix' offered by tompe's MobiPerl.
• remembers last 'Output' directory upon startup, but you will need to edit destination filename or may overwrite previous output. To reset it, just type 'default'.
Previous changes:
[2008-03-30] 1.8.1 (by NR)

• can now install PDFRead to a different drive than your C: drive; just keep the same subdirectory structure for the GUI options file to be loaded properly.
• now uses, as a default, the input filename as the output filename (without file extension).
• added .prc output format support using opf2mobi.exe from Mobiperl by tompe on mobileread.com.
• added .cbz/.cbr input support for Comic books using unrar.exe and creating a (sorted) list of image filenames.
• tweaked and added Profiles for PRS500, PRS505, PRC-Mobi (Kindle and Cybook Gen 3), iLiad. On REB1200 only provides a 2 pixel left and right margin to avoid bleeding into the edge of the screen.
The default Profiles are:
Code
 PROFILE Hres Vres Layout Mode Rotate Colors Colorspace Format ebw1150 : 319 446 landscape left 16 gray imp2 ebw1150-p: 319 446 landscape left 16 gray imp2 reb1200 : 468 595 landscape left 16 gray imp1 reb1200-p : 468 595 portrait none 16 gray imp1 reb1200C : 468 595 landscape left 256 rgb imp1 reb1200Cp : 468 595 portrait none 256 rgb imp1 reb1100 : 312 472 landscape left 2 gray rb prs500 : 583 753 landscape right 4 gray lrf prs500-p : 583 753 portrait none 4 gray lrf prs505 : 583 753 landscape right 8 gray lrf prs505-p : 583 753 portrait none 8 gray lrf prc-mobi : 520 640 landscape right 4 gray prc prc-mobi-p: 520 640 portrait none 4 gray prc iLiad : 768 935 landscape right 16 gray prc iLiad-p : 768 935 portrait none 16 gray prc generic : 600 800 landscape left 256 rgb html generic-p : 600 800 portrait none 256 rgb html Note: Profile appended with '-p' means portrait; with 'C' means Color.
• FIX: fixed simple bug in imglist routine which halts process using text file with image filenames
• FIX: portrait modes now ignore rotation preference
• cover page used in resulting ebook now includes author in addition to title and TOC (if any).
• changed default category to 'PDFRead_Converted'
• added PDFRead source to MobileRead Dev Hub

[2008-03-12] 1.8 (by AK and NR)

• improved Windows GUI; added more user options and now remembers most choices using the 'pdfread.ini' configuration file. (to revert to program's defaults, just erase 'pdfread.ini')
• added 'generic' Profile along with two new output formats: 'oeb' (OEBFF) and 'html'. ('html' opens the temp directory where the images are stored)
• added new landscape Layout Modes: 'landscape-third' (with three fixed pages); 'landscape-full' (with one fixed page); 'landscape-2col' (with four quadrants/pages);
• added new portrait Layout Modes: 'portrait-full' (with one fixed page); 'portrait-2col' (with four quadrants/pages);
• now strips output document file extension, and appends 'Out Format' extension to it automatically.
• tweaked Profiles and changed maximum display size for EBW1150 and REB1200. Now allows for a 2 pixel left and right margin to avoid bleeding into the edge of the screen (only for REB1200). The default Profiles are:
Code
 • ebw1150 = {hres: 315, vres: 440, mode: landscape, rotate: left, colors: 16, colorspace: gray, format: imp2} • reb1200 = {hres: 468, vres: 595, mode: landscape, rotate: left, colors: 16, colorspace: gray, format: imp1} • reb1200-p = {hres: 468, vres: 595, mode: portrait, rotate: none, colors: 16, colorspace: gray, format: imp1} • reb1200C = {hres: 468, vres: 595, mode: landscape, rotate: left, colors: 256, colorspace: rgb, format: imp1} • reb1200Cp = {hres: 468, vres: 595, mode: portrait, rotate: none, colors: 256, colorspace: rgb, format: imp1} • reb1100 = {hres: 310, vres: 468, mode: landscape, rotate: left, colors: 2, colorspace: gray, format: rb } • prs500 = {hres: 565, vres: 754, mode: landscape, rotate: right, colors: 4, colorspace: gray, format: lrf } • prs500-p = {hres: 565, vres: 754, mode: portrait, rotate: none, colors: 4, colorspace: gray, format: lrf } • generic = {hres: 600, vres: 800, mode: portrait, rotate: none, colors: 256, colorspace: rgb, format: oeb }
• added command-line option '-r' to indicate rotation; '--colorspace' to specify gray or color output; '--color' to override number of colors used; '--overlap_h' and '--overlap_v' to override default overlap between pages.
• added 'colorspace' type to specify output color: gray (max. 16 shades) or rgb (max. 256 colors from 16M)
• added 'color' as an option to use images with fewer colors and thereby reducing output file size proportionately.
• fixed imglist option to allow for relative files to the directory where the text list resides; no longer need full pathnames. (DaleDe's suggestion)
• fixed problem with (broken) list generation introduced by eBook Publisher.
• placing an empty file called 'debug' in the PDFRead home directory will allow the temp directory to not be deleted at completion.

I will continue to maintain PDFRead, hopefully only minor bug fixes and/or enhancements will be needed.

TO DO:
- add GUI option to select between MinFilter 3 (orig) or MinFilter 5 (new) dilate.
- add Mini tutorial to get the best use out of converting into supported ebook formats for the various eBook reader devices.

Enjoy!

INSTALLATION (extras!):

EDIT: 14 Oct 2008 - FOR SONY USERS (fixed that Sony .lrf bug that stretched short pages!)
After executing the PDFRead Installer, from pdfread-MinFilter3-mod-bin.zip unzip the modified PDFRead 1.8.2.1 bin files (using the original MinFilter 3 dilate) into the bin directory and overwrite the existing files.
Should you wish to try the new MinFilter 5 dilate, unzip pdfread-MinFilter5-mod-bin.zip into the bin directory instead! (Note: you may have to increase the DPI to 500 when using the MinFilter 5 dilate to get acceptable results!)

EDIT: 8 Mar 2009 - FOR KINDLE/iLIAD/CYBOOK USERS (fixed that .jpg quality compromise imposed on .prc files!)
After executing the PDFRead Installer, from NRhtml2mobi.zip unzip the modified NRhtml2mobi.exe into the bin directory and overwrite the existing file.
It's a hack that may render the .prc unreadable on Palm PDA's or even the Cybook Gen 3. In those cases, use mobi2mobi with the --gen3imagefix switch as indicated above.

Note: A Kindle 2 specific resolution (480x622) has been found to best work with no blank pages in between.

EDIT: 7 Jan 2011 - A (original).pdf** to (enhanced/cropped).pdf method has been devised, but not yet included within the PDFRead GUI program. It's available in this post. Contains a modified pdfread.exe executable that now limits the expansion of small cropped pages to a more reasonable level (finally!).

**Actually, you can use any Input Format (.PDF / .DJVU / .TIFF / .CBZ / .CBR ) in lieu of just .PDF!

Previous version downloads: 249
PDFRead-GUI.jpg PDFRead-GUI-1.8.1.jpg 
[zip] pdfread-1.8.2-Installer.zip (11.81 MB, 32520 views)
[zip] PDFRead-manual.zip (6.4 KB, 10993 views)
[txt] PDFRead-help.txt (2.3 KB, 7154 views)
[zip] PDFRead-1.8.2-Source-noGUI-noInstaller.zip (133.6 KB, 5533 views)
[txt] PDFRead-FAQ.txt (4.2 KB, 6772 views)
Reply 

#2  nrapallo 03-13-2008, 04:40 PM
Reserved for updated Tutorial and FAQ.

FAQ: What does a 'Profile' do?
The 'Profile' box contains the default settings for each "device". If you select a profile from the drop-down box, then the various default options are loaded in. Afterwards, 'Processing' options can be selected to override these defaults. The 'Out Format' box selects the resulting ebook format to be generated. It also overrides the default settings.

The 'reb1200C' Profile retains 256 colors (but uses 16M colors prior to doing the conversion) while the 'reb1200' (note no 'C' suffix) produces a 16 grayshade image. A profile name with the a '-p' suffix denotes a Portrait one as without it defaults to landscape.
FAQ: When I view a 1150 .IMP why does it open up as a 1200 .IMP?
What can go wrong is a 'bug' that has reared its head before, namely the 1150 .imp "thinks" its a 1200 .imp. The solution, in the past, has been to re-install the eBook Publisher software. That usually cures the problem!
FAQ: Why does GEB (eBookwise) Librarian fail to properly transfer/load the resulting .IMP?
GEB Librarian can get confused if BOTH the 1150 .imp and 1200 .imp are in the same bookshelf when being uploaded. I think it has to do with the same 'Unique Edition ID' being used for both versions made from the same input. The solution here is to NOT put BOTH in the same bookshelf; use one for your 1150 ebooks and another for the 1200 ebooks.
FAQ: I am converting to .imp output format, but I get an error from the 'generate_imp' module. Why does it fail?
REQUIRED: You must have the (free) eBook Publisher@Wiki » software previously installed to facilitate the conversions to .imp and .oeb.

You can install the eBook Publisher software by going here. Then choose to download and install the current version ( Win_eBookPub_2.2.5.exe ).
FAQ: Why does PDFRead create stray pages or small leftover bits?
When rotating your input, if you want the output to be split over just two page flips, then select 'landscape-half' as your Layout Mode. For everything on one page only, select 'landscape-full'. The default 'landscape' mode will split over the maximum number of page flips so that the full width is displayed. There is also 'landscape-2col' which will display quadrants of the page over four page flips.
FAQ: What is an 'imglist' in the 'In Format' drop-down box?
To use the 'imglist' input format just create a text file with a list of filenames in any image directory, then you can get a mini color photo album. The easiest way to get this list of filenames is to open the command prompt in your image directory and issue the dos command:
Code
dir /b /on >list.txt
Then open list.txt from PDFRead. Happy Converting!
FAQ: What is an 'imgdir' in the 'In Format' drop-down box?
To use the 'imgdir' input format just select any filename in an image directory to get a mini color photo album without the need for a text file as above for 'imglist'. The only drawback is that there is no control over the order that the images are compiled this way. This method always sorts the filenames alphabetically. Image formats supported are .jpg, .png, .gif, .bmp and .tif.
FAQ: The resulting text is too blurry! Can anything be done?
• You can choose a landscape mode to improve the clarity/resolution.

• You can reduce the amount of dilation (thickening of the text) by either turning it off with the 'no dilation' option or by increasing the DPI from 300 to say 600.

• If your results are too blurry, try increasing slowly the 'Error Level' as well as trying the above. With the GUI, you can do multiple tests (especially on a limited number of pages) until you get it just right. Then do it again with no pages restriction.
FAQ: Can PDFRead convert any .pdf?
If the .pdf is not built properly (you may have to input the number of pages in the outout screen), or is encrypted or has security preventing printing/extraction, then it will not work with PDFRead. The .pdf must be free of any DRM prior to being given to PDFRead.
FAQ: Can PDFRead convert (original).pdf** to (enhanced/cropped).pdf
To facilitate easy conversion of the intermediate images created by PDFRead into .pdf, I have written a simple batch (sam2pdfread.bat) file that can be placed in the temporary directory (created when an empty file 'debug' is placed in the PDFRead install directory).

To accomplish this (original).pdf to (enhanced/cropped).pdf, do the following:

1. Unzip this file and copy the all files into the PDFRead/bin in the default install location (or copy the *.exe programs to a directory in your windows path)

2. Start PDFRead using the .pdf 'In Format" and .html 'Out Format'.

3. When PDFRead is finished, that .html 'Out Format will cause the temporary directory with the enhanced/cropped images to be visible.

4. Copy the 'sam2pdfread.bat' file to that temp dir and double-click it. That's it!

5. The resulting .pdf will have the name of the InputFilename (actually the resulting .html in that temp dir) and should be moved to a more permanent location. The temp dir can then be deleted.

Note: Doubles the memory storage required as each image is converted to a .pdf while retaining the original image.

The 'sam2pdfread.bat' can be edited to tweak the parameters passed to sam2p and/or pdftk. Just experiment what works best for you!

EDIT: 7 Jan 2011 Fixed 'sam2pdfreadt.bat' to correct some sam2p housekeeping issues as well as recompiled pdfread.exe to pad image filenames with leading zeros so that combining *.pdf works properly. This now includes a newer version of pdfread.exe and library.zip so you may want to make backups of the copies in your 'bin' folder before copying these files over.

EDIT: 7 Jan 2011 Also, while I was recompiling pdfread.exe I fixed a LONG time irk for me, the expansion of small crops into huge images. This time around, any small cropped image will not increase in size more than 10% of the original image. A good tradeoff! All modified files are included in the 'sam2pdfread.rar' available here.

**Actually, you can use any Input Format (.PDF / .DJVU / .TIFF / .CBZ / .CBR ) in lieu of just .PDF!

TIP: Just hover your mouse over any of the options and a nice tooltip help will pop up if you keep it there steady. Great way to learn the program hands-on!
Reply 

#3  rpresser 03-29-2008, 09:02 PM
I tried to convert some files for my Ebookwise1150. I tried several different "output format" parameters: first IMP1, then IMP2. Both generated GEB1200 format IMPs, according to GEB eBook Librarian, and both were unreadable on my 1150. I tried this both from the GUI and from the commandline, same results.

I had not tried any earlier version of the program before this.

I then generated OEB format, and unpacked it in ETI's eBook Publisher, then generated Grayscale VGA-half .. these files worked fine on my 1150. So I have an "out".

The return of landscape is blessedly welcome....

[EDIT] Never mind, I figured out what I was doing wrong. I need to specify "gray" as the color as well as "IMP2" as the output format. All is well.
Reply 

#4  Jadon 03-29-2008, 09:19 PM
Same for me. The program correctly makes ETI-2 sized PNGs when I choose choose IMP2 in the GUI, but it always puts them into a ETI-1 IMP at the last step. I get around it by choosing HTML and then building the IMP in Gemstar Publisher. It's done this for some versions.
Reply 

#5  nrapallo 03-30-2008, 12:52 AM
Quote rpresser
I tried to convert some files for my Ebookwise1150. I tried several different "output format" parameters: first IMP1, then IMP2. Both generated GEB1200 format IMPs, according to GEB eBook Librarian, and both were unreadable on my 1150. I tried this both from the GUI and from the commandline, same results.

I had not tried any earlier version of the program before this.

I then generated OEB format, and unpacked it in ETI's eBook Publisher, then generated Grayscale VGA-half .. these files worked fine on my 1150. So I have an "out".

The return of landscape is blessedly welcome....

[EDIT] Never mind, I figured out what I was doing wrong. I need to specify "gray" as the color as well as "IMP2" as the output format. All is well.
The Profile box contains the default settings for each "device". If you select the 'ebw1150' profile, then the Colorspace should default to (16 shades) gray. The 'Out Format' box also impacts the default setting, especially if it overrides the default settings. The ebw1150 (or ETI-2) uses the 'imp2' format. Once these are straightened out, you should be able to generate the .imp to properly view on your EBW1150.

What can go wrong is a 'bug' that has reared its head before, namely the 1150 .imp opens up as a 1200 .imp. The solution, in the past, has been to re-install the eBook Publisher@Wiki » software. That usually cures the problem!

Another problem could be that GEB Librarian can get confused if BOTH the 1150 .imp and 1200 .imp are in the same bookshelf when being uploaded. I think it has to do with the same 'Unique Edition ID' being used for both versions made from the same input. The solution here is to NOT put BOTH in the same bookshelf; use one for your 1150 ebooks and another for the 1200 ebooks.

As for landscape mode, if you want the output to be split over just two page turns, then select 'landscape-half' as your Layout Mode. For everything on one page only, select 'landscape-full'. BTW, the 'landscape' default will split over the maximum number of page turns so that the full width is displayed.
Reply 

#6  nrapallo 03-30-2008, 12:53 AM
Quote Jadon
Same for me. The program correctly makes ETI-2 sized PNGs when I choose choose IMP2 in the GUI, but it always puts them into a ETI-1 IMP at the last step. I get around it by choosing HTML and then building the IMP in Gemstar Publisher. It's done this for some versions.
Ditto!

What can go wrong is a 'bug' that has reared its head before, namely the 1150 .imp opens up as a 1200 .imp. The solution, in the past, has been to re-install the eBook Publisher@Wiki » software. That usually cures the problem!
Reply 

#7  nrapallo 03-30-2008, 02:20 AM
Coming soon... (now that Mobi2IMP 9.2 has been released)

Changes implemented already in PDFRead 1.8.1:
- fix simple bug in imglist routine which halts process using text file with image filenames
- add .prc output format support using opf2mobi.exe
- add .cbz/.cbr input support for Comic books using unrar.exe and creating a (sorted) list of image filenames

TO DO:
- add PDFRead source to MobileRead Dev Hub
- add Mini-Tutorial to get the best use out of converting PDF/DJVU/TIFF/Imglist/CBZ/CBR into ebook format IMP/RB/LRF/PRC/OEB/HTML for devices like EBW1150/REB1200/PRS-500/PRS-505/KINDLE/CYBOOK GEN 3/ILIAD (run-on sentence?)
Reply 

#8  nrapallo 03-30-2008, 02:49 PM
I have implemented many enhancements and fixed minor bugs in PDFRead 1.8.1 (see post#1 above in this thread)

Changes in this release:

Changelog [2008-03-30] 1.8.1 (by NR)

• can now install PDFRead to a different drive than your C: drive; just keep the same subdirectory structure for the GUI options file to be loaded properly.
• now uses, as a default, the input filename as the output filename (without file extension).
• added .prc output format support using opf2mobi.exe from Mobiperl by tompe on mobileread.com.
• added .cbz/.cbr input support for Comic books using unrar.exe and creating a (sorted) list of image filenames.
• tweaked and added Profiles for PRS500, PRS505, PRC-Mobi (Kindle and Cybook Gen 3), iLiad. On REB1200 only provides a 2 pixel left and right margin to avoid bleeding into the edge of the screen.
The default Profiles are:
Code
 PROFILE Hres Vres Layout Mode Rotate Colors Colorspace Format ebw1150 : 319 446 landscape left 16 gray imp2 reb1200 : 468 595 landscape left 16 gray imp1 reb1200-p : 468 595 portrait none 16 gray imp1 reb1200C : 468 595 landscape left 256 rgb imp1 reb1200Cp : 468 595 portrait none 256 rgb imp1 reb1100 : 312 472 landscape left 2 gray rb prs500 : 583 753 landscape right 4 gray lrf prs500-p : 583 753 portrait none 4 gray lrf prs505 : 583 753 landscape right 8 gray lrf prs505-p : 583 753 portrait none 8 gray lrf prc-mobi : 520 640 landscape right 4 gray prc prc-mobi-p: 520 640 portrait none 4 gray prc iLiad : 768 935 landscape right 16 gray prc iLiad-p : 768 935 portrait none 16 gray prc generic : 600 800 landscape left 256 rgb oeb generic-p : 600 800 portrait none 256 rgb oeb Note: Profile appended with '-p' means portrait; with 'C' means Color.
• FIX: fixed simple bug in imglist routine which halts process using text file with image filenames
• FIX: portrait modes now ignore rotation preference
• cover page used in resulting ebook now includes author in addition to title and TOC (if any).
• changed default category to 'PDFRead_Converted'
• added PDFRead source to MobileRead Dev Hub
Reply 

#9  themoores1us 04-08-2008, 03:43 AM
I have used this to convert several large PDF books (400-800 Pages) that I downloaded from Google Books. I have used your program in both Vista and XP to convert to LRF format for my Sony prs505 and it works very well in both. I have had a couple books that the program hung up on, usually after 30 or so pages for some reason ?? Thank You for a wonderful program.

Jim
Reply 

#10  nrapallo 04-08-2008, 10:44 AM
Quote themoores1us
I have used this to convert several large PDF books (400-800 Pages) that I downloaded from Google Books. I have used your program in both Vista and XP to convert to LRF format for my Sony prs505 and it works very well in both. I have had a couple books that the program hung up on, usually after 30 or so pages for some reason ?? Thank You for a wonderful program.

Jim
Glad you found it useful!

I'm always looking to tweak the internal Profiles; just to make the program that much better!

I want to produce the largest possible image proportional to the devices' screen without any resizing/zooming effects.

May I ask if the resulting LRF adequately "fills" the screen of the Sony PRS-505 or is there a bit too much (white) margin i.e. top/bottom or left/right?
Reply 

  Next »  Last »  (1/38)
Today's Posts | Search this Thread | Login | Register