I really liked the pdflrf tool from the "Yet another PDF to LRF converter" thread, but it has been taken down by the moderator for violation of GPL and has been down for quite some time because it seems like the author is not interested in providing the source for his tool. But there are some issues with the pdflrf tool.
- pdflrf renderes the pdf into image and then creates the lrf file.
This makes the 4mb pdf file grow into more than 40mb file. - No text information is preserved because of the image conversion
- Very slow
- No source for the tool <-- biggest disadvantage
So I decided to write a tool for myself. soPdf is a pdf formatter for sony reader. It is based on sumatrapdf's version of mupdf and fitz.
The advantages of soPdf over pdflrf
- Pdf to Pdf conversion
- Text and other contents of pdf are preserved
- Size of the output file is very close to size of input file
and in some cases smaller than input file. - Super fast conversion compared to pdflrf.
- Source available to make further changes !!!!!! <-- biggest advantage
The disadvantages over pdflrf
- Cannot yet convert the comic book. It can still split the image pdfs into two.
- soPdf is in alpha stage. (ver 0.1). There may be lots of bugs to be found yet. At least all of the mupdf bugs.
- ???
soPdf command line options
Code
about: soPdf author: Navin Pai, soPdf ver 0.1 alpha
usage: soPdf -i file_name [options] -i file_name input file name -p password password for input file -o file_name output file name -w turn off white space cropping default is on -m nn mode of operation 0 = fit 2xWidth * 1 = fit 2xHeight 2 = fit Width 3 = fit Height 4 = smart fit Width (not yet implemented) 5 = smart fit Height (not yet implemented) -v nn overlap percentage nn = 2 percent overlap * -t title set the file title -a author set the file author -b publisher set the publisher -c category set the category -s subject set the subject -e proceed with errors -r reverse landscape * = default values
The conversion algorithm is as follows
- If user specified Fit2xWidth or Fit2xHeight then simply make two copies of pdf page from source into destination pdf file.
- Render the page and get the actual boundary box that encompasses all of the content in the page. This step removes all the white space border of the page.
- If page cannot be rendered by mupdf and error option is specified then split the page w/o rendering by setting the MediaBox of the page.
- Try to split the file first by iterating all the elements that can fit in half a page and if that does not work then split the file half way with 2% overlap (this can be changed).
- If FitWidth or Fit2xWidth is specified then rotate the page by -90 deg.
Source code for soPdf is available from google code.
http://sopdf.googlecode.comTo compile the source code you will need Visual Studio 8.0 (Even free edition will work). Visual studio is not required if you just want to run the soPdf tool. If you are having issues running the binary then make sure you have VC runtime library. You can download the VC runtime library from Microsoft website.
Coming soon
- Output to image pdf - for complex pdf that renders slowly on the reader devices.
Update 0.1 Rev 12
- Added reverse landscape mode. Ever wished that you could hold your reader the other way around in landscape mode and scroll thru the pages using your right thumb. Use reverse landscape mode and start reading from last page onwards.
Update 0.1 Rev 10
- Proceed with error option. With this option, soPdf can now process any pdf file, even the ones mupdf cannot handle. If mupdf cannot load the contents then it simply splits the page into two w/o any processing. The disadvantage is that the white space border in this case is not removed but you can still get a pdf output file.
- Set subject of the pdf file option
- Fixed stack over flow when processing complex pdf files
- Better clipping algorithm
Update 0.1 Rev 7
- Work around a mupdf bug where it is not able to allocate oid and gid numbers. This prevented some of the files from being split properly.
Thanks for your effort.
I am not an user of Windows, so I wonder if anyone could upload an example of an input file and an output file.
I'd suggest you to try recoding the prs-505's manual again,it seems kind of buggy!
Does this mean I need VC runtime (have no idea what it means)?
Error: .\mupdf\pdf_xref.c(459) : pdf_loadindirect() - cannot load indirect objec
t 1586
Error: .\mupdf\pdf_xref.c(442) : pdf_loadobject() - cannot load object 1586 into
cache
Error: .\mupdf\pdf_xref.c(416) : pdf_cacheobject() - found object 1636 0 instead
of 1586 0
Quote ProDigit
I'd suggest you to try recoding the prs-505's manual again,it seems kind of buggy!
This bug has been fixed.
Quote ddavtian
Does this mean I need VC runtime (have no idea what it means)?
Error: .\mupdf\pdf_xref.c(459) : pdf_loadindirect() - cannot load indirect objec
t 1586
Error: .\mupdf\pdf_xref.c(442) : pdf_loadobject() - cannot load object 1586 into
cache
Error: .\mupdf\pdf_xref.c(416) : pdf_cacheobject() - found object 1636 0 instead
of 1586 0
It means that there is error in your pdf file. Check if the pdf file can be loaded by sumatrapdf viewer. If the file cannot be handled by sumatrapdf viewer then soPdf cannot handle the file as well.
Quite an excellent app. This tool provides the feature I have been sorely looking for. I have some scripts that do remove the margins but none provided this level of success. I have a feeling this tool will become my new favorite PDf tool.
This tool does struggle with the more complicated PDF but for those there are PDFLRF/PDFRead/PaperCrop
Thanks.
One recommendation is since the tool is written in CPP there is no reason to tie it to one platform. There is a surprising large number of users on this board that use Linux/Mac OSX.
Thank you,
=X=
I am working on fixing the bugs for the complicated pdf's. And yes it can be easily ported to any platform. There is no platform specific stuff in the code and since the source is available, anyone who is interested in creating a port for Linux/Mac is welcome to do so.
So far I've only managed to get PDF to PDF working here.
So how do I convert it to LRF,or do you suggest keeping those documents in PDF?
(BTW thank you for the program,I've only had a brief look at it)
So far I've only managed to get PDF to PDF working here.
So how do I convert it to LRF,or do you suggest keeping those documents in PDF?
(BTW thank you for the program,I've only had a brief look at it)