Mobileread
soPdf - Better than Yet another PDF to LRF converter
#1  theguru 11-15-2008, 11:03 PM
I really liked the pdflrf tool from the "Yet another PDF to LRF converter" thread, but it has been taken down by the moderator for violation of GPL and has been down for quite some time because it seems like the author is not interested in providing the source for his tool. But there are some issues with the pdflrf tool.
  1. pdflrf renderes the pdf into image and then creates the lrf file.
    This makes the 4mb pdf file grow into more than 40mb file.
  2. No text information is preserved because of the image conversion
  3. Very slow
  4. No source for the tool <-- biggest disadvantage
So I decided to write a tool for myself. soPdf is a pdf formatter for sony reader. It is based on sumatrapdf's version of mupdf and fitz.

The advantages of soPdf over pdflrf
  1. Pdf to Pdf conversion
  2. Text and other contents of pdf are preserved
  3. Size of the output file is very close to size of input file
    and in some cases smaller than input file.
  4. Super fast conversion compared to pdflrf.
  5. Source available to make further changes !!!!!! <-- biggest advantage
The disadvantages over pdflrf
  1. Cannot yet convert the comic book. It can still split the image pdfs into two.
  2. soPdf is in alpha stage. (ver 0.1). There may be lots of bugs to be found yet. At least all of the mupdf bugs.
  3. ???
soPdf command line options
Code
about: soPdf author: Navin Pai, soPdf ver 0.1 alpha
usage: soPdf -i file_name [options] -i file_name input file name -p password password for input file -o file_name output file name -w turn off white space cropping default is on -m nn mode of operation 0 = fit 2xWidth * 1 = fit 2xHeight 2 = fit Width 3 = fit Height 4 = smart fit Width (not yet implemented) 5 = smart fit Height (not yet implemented) -v nn overlap percentage nn = 2 percent overlap * -t title set the file title -a author set the file author -b publisher set the publisher -c category set the category -s subject set the subject -e proceed with errors -r reverse landscape * = default values
The conversion algorithm is as follows
  1. If user specified Fit2xWidth or Fit2xHeight then simply make two copies of pdf page from source into destination pdf file.
  2. Render the page and get the actual boundary box that encompasses all of the content in the page. This step removes all the white space border of the page.
  3. If page cannot be rendered by mupdf and error option is specified then split the page w/o rendering by setting the MediaBox of the page.
  4. Try to split the file first by iterating all the elements that can fit in half a page and if that does not work then split the file half way with 2% overlap (this can be changed).
  5. If FitWidth or Fit2xWidth is specified then rotate the page by -90 deg.
Source code for soPdf is available from google code.
http://sopdf.googlecode.com

To compile the source code you will need Visual Studio 8.0 (Even free edition will work). Visual studio is not required if you just want to run the soPdf tool. If you are having issues running the binary then make sure you have VC runtime library. You can download the VC runtime library from Microsoft website.

Coming soonUpdate 0.1 Rev 12Update 0.1 Rev 10Update 0.1 Rev 7
[pdf] ebooktestin.pdf (867.6 KB, 7723 views)
[pdf] ebooktestout.pdf (904.2 KB, 7820 views)
[pdf] ebooktestreverseout.pdf (904.2 KB, 4985 views)
[zip] soPdf.zip (895.1 KB, 19165 views)
Reply 

#2  godel10 11-16-2008, 07:42 AM
Thanks for your effort.

I am not an user of Windows, so I wonder if anyone could upload an example of an input file and an output file.
Reply 

#3  ProDigit 11-16-2008, 09:54 AM
I'd suggest you to try recoding the prs-505's manual again,it seems kind of buggy!
Reply 

#4  ddavtian 11-16-2008, 11:55 AM
Does this mean I need VC runtime (have no idea what it means)?


Error: .\mupdf\pdf_xref.c(459) : pdf_loadindirect() - cannot load indirect objec
t 1586
Error: .\mupdf\pdf_xref.c(442) : pdf_loadobject() - cannot load object 1586 into
cache
Error: .\mupdf\pdf_xref.c(416) : pdf_cacheobject() - found object 1636 0 instead
of 1586 0
Reply 

#5  theguru 11-16-2008, 01:09 PM
Quote ProDigit
I'd suggest you to try recoding the prs-505's manual again,it seems kind of buggy!
This bug has been fixed.
Reply 

#6  theguru 11-16-2008, 01:12 PM
Quote ddavtian
Does this mean I need VC runtime (have no idea what it means)?


Error: .\mupdf\pdf_xref.c(459) : pdf_loadindirect() - cannot load indirect objec
t 1586
Error: .\mupdf\pdf_xref.c(442) : pdf_loadobject() - cannot load object 1586 into
cache
Error: .\mupdf\pdf_xref.c(416) : pdf_cacheobject() - found object 1636 0 instead
of 1586 0
It means that there is error in your pdf file. Check if the pdf file can be loaded by sumatrapdf viewer. If the file cannot be handled by sumatrapdf viewer then soPdf cannot handle the file as well.
Reply 

#7  =X= 11-16-2008, 03:52 PM
Quite an excellent app. This tool provides the feature I have been sorely looking for. I have some scripts that do remove the margins but none provided this level of success. I have a feeling this tool will become my new favorite PDf tool.

This tool does struggle with the more complicated PDF but for those there are PDFLRF/PDFRead/PaperCrop

Thanks.


One recommendation is since the tool is written in CPP there is no reason to tie it to one platform. There is a surprising large number of users on this board that use Linux/Mac OSX.


Thank you,
=X=
Reply 

#8  theguru 11-16-2008, 05:22 PM
I am working on fixing the bugs for the complicated pdf's. And yes it can be easily ported to any platform. There is no platform specific stuff in the code and since the source is available, anyone who is interested in creating a port for Linux/Mac is welcome to do so.
Reply 

#9  ProDigit 11-17-2008, 11:36 AM
So far I've only managed to get PDF to PDF working here.
So how do I convert it to LRF,or do you suggest keeping those documents in PDF?

(BTW thank you for the program,I've only had a brief look at it)
Reply 

#10  ProDigit 11-17-2008, 11:46 AM
So far I've only managed to get PDF to PDF working here.
So how do I convert it to LRF,or do you suggest keeping those documents in PDF?

(BTW thank you for the program,I've only had a brief look at it)
Reply 

  Next »  Last »  (1/20)
Today's Posts | Search this Thread | Login | Register