Mobileread
[Tool] Multi-column PDF files on 6 inch display.
#1  Taesoo Kwon 11-08-2008, 06:58 AM
I developed a program to convert PDF documents such as articles, and technical papers into a GIF sequence so as to be readable on a small screen of e-book devices. This program automatically detects contiguous and non-empty regions in a page, and based on the information, split the page into multiple low-res pages. Unnecessary margins are also automatically removed.

Download: PaperCrop

Screenshots:
Input pdf:
image »

Output pdf:
image »

Currently, only windows are supported. (It works on other platforms through wine though)
There may exist some bugs.
Thanks,

- Version 0.24 uploaded. source codes are available too.
- Version 0.3 uploaded. (All 0.24 users should upgrade to this version. Sorry for the crash problem. Version 0.3 outputs to a PDF file. Could anybody please test the output pdf file on a Sony Reader?)
- Version 0.4 uploaded.
Reply 

#2  =X= 11-09-2008, 10:10 PM
Nice tool it would be great if this code could be added with PDFRead who needs a feature like this.

=X=
Reply 

#3  Taesoo Kwon 11-10-2008, 06:52 AM
Yes, I guess so. PDFRead supports command line mode, and paperCrop uses LUA script for generating output. So one who are familiar with LUA script can modify the .LUA files in the scripts folder such that the output images from PaperCrop are automatically converted to e-book files using PDFRead. But at the moment, I don't want to do the work by myself due to my lazyness.
I can provide the codes of paperCrop to anyone who are interested.
Reply 

#4  Pulp 11-10-2008, 11:20 AM
Thank you, this is a great tool!
Reply 

#5  nrapallo 11-10-2008, 12:52 PM
Quote Taesoo Kwon
Yes, I guess so. PDFRead supports command line mode, and paperCrop uses LUA script for generating output. So one who are familiar with LUA script can modify the .LUA files in the scripts folder such that the output images from PaperCrop are automatically converted to e-book files using PDFRead. But at the moment, I don't want to do the work by myself due to my lazyness.
I can provide the codes of paperCrop to anyone who are interested.
PDFRead v1.8.2 already supports converting two-column layouts using the layout modes:
Code
'landscape-2col' (with four quadrants/pages);
'portrait-2col' (with four quadrants/pages);
However, PDFRead is not too "smart" in how it determines where the columns start/end; it just picks the midpoint of the page and splits it there! Of course, this will be wrong if the column widths (and side margins) are not equal.

I had looked into programming using LUA when I was porting/tweaking some PSP homebrew programs, so it would be easy to re-use your programming logic.

However, PDFRead is due for a major overhaul, so I will hold off doing this just now. I'll wait to see what ashkulz (original authour of PDFRead) does with any update to PDFRead and then go from there.

Nice effort though!
Reply 

#6  soilwork 11-12-2008, 05:02 AM
I tried the program and it looks and works excellent. Especially, the program detects content well even when a wide table spans across the whole page width while the rest of the content is arranged in two-column.

However, I have a couple of suggestions, though.

1) Pre-trim option
In most articles, headers/footers are not necessary especially in small screen reading device. It would be great if you can implement pre-trim option (just like that in PDFLRF) before detecting the content.

2) Preventing from cutting the text in the middle
I noticed that, in some cases, a line of text is cut in the middle. Since the program already does a great job of detecting content, can you apply a similar logic/process to prevent this from happening when cutting the detected content into smaller gif/jpg/pngs?

3) Easier way to enter precise segmentation parameter.
To make fine changes in the segmentation, I noticed that I should use 'Tab' to highlight the sliding bar and then use left/right cursor key to change the number in the smallest increment. I would be easier if
A. double click on the bar will highlight it, and/or
B. double click on the displayed number allows users enter the number directly.

BTW, thanks for providing an excellent program.
Reply 

#7  =X= 11-12-2008, 01:47 PM
I've just used the software on a very complicated layout and it worked quite nice. I'm quite impressed

Suggestions
* Add a feature in the UI to make output JPG/GIF/PNG. Having to change the code is a bit cumbersome
* Add a cropbox that applies to all windows
* It would be nice if customized crop settings where saved per page so that all the adjustments can be made before the bulk conversion is executed.
* It would be nice if the final product was an eBook. If not maybe write a short tutorial here on how a user can create an ebook using calibre or comic2LRF. Where an LRF can be created from zipping up the files with a CBZ extension and runing these tools on the zip file.


Bugs/Issues
* Adjust crop settings with font size. In a pages the column space was quite tight so I had to decrease the column with. However the title of the stores had lager fonts where the spacing of the word equaled
* Some case the text was cut in half.
* There is overlap of crop shows up on some pages, where part of the second column shows up on the 1 column screen. The 2nd column shows up fine on the following screens but this is a bit distracting)

=X=
Reply 

#8  Taesoo Kwon 11-12-2008, 06:40 PM
Thank you for the suggestions, =X= and Soilwork.
I would implement several of the suggestions and bug fixes in the next version,
e.g. a crop box that applies to all windows, pretrim option, font-size dependent processing (The last one is very difficult for me to implement - Currently all the processing is done at a pixel-level, not using any PDF informations such as font-sizes, PDF crop boxes, and so on.)

I will also open the source codes based on the free GPL license. (Actually, this is mandatory having used some GPL libraries.)

At the moment, supporting ebook formats is not what I want to spend much time on. (simply because I started this project for my own needs, and I don't need such a functionality.) Sorry.
Reply 

#9  ProDigit 11-16-2008, 10:10 AM
I'm sorry for the double post, but you convert pdf to jpg 800x600 pix.
The screen itself has a small bar on the bottom.
Isn't it better to convert to something like 790x600 pix?
just a question.
Reply 

#10  =X= 11-16-2008, 04:48 PM
Moderators can you please make this tread a sticky?
Reply 

  Next »  Last »  (1/10)
Today's Posts | Search this Thread | Login | Register