Mobileread
Tools to test pdf compatibility and speed?
#1  MarjaE 05-05-2019, 03:01 PM
I variously use k2opt with -mode copy and -dev dx and Ghostscript with -dCompatibilityLevel=1.4 to pre-process pdfs for my Kindle Dx.

I sometimes use k2opt with -mode copy and sometimes -dev dx and then OCRmypdf with -f --output-type pdfa-1 to find readable text on pdfs without it or with corrupt text.

I occasionally use PDF to EPUB+, but it can screw up formatting, and it is a hassle if I need to cite page numbers.

I used to use Ghostscript with more aggressive options, but it would lose pages.

These tools *should* cover compatibility with the Kindle Dx, but Ghostscript sometimes yields very fast pdfs... and sometimes very slow ones that take several minutes to turn a page... forcing me to turn to k2 after all.

Is there any easy way to check for compatibility issues, missing or corrupt text, malformed pages, and/or speed?
Reply 

#2  Doitsu 05-05-2019, 06:00 PM
Quote MarjaE
Is there any easy way to check for compatibility issues, missing or corrupt text, malformed pages, and/or speed?
The free community edition of cpdf will automatically fix some errors if you simply use the following command:

Code
cpdf in.pdf -o out.pdf
You can also use the linearization option to convert your PDF files to web optimized PDF files:

Code
cpdf -l in.pdf -o out.pdf
Reply 

#3  MarjaE 05-05-2019, 09:07 PM
Thank you, but without a testing tool, I'd have to run every file through yet another app.

By default cpdf is a command-line tool, and depending how it refers to cpdflin, it may not work in Apple's Automator. I know ocrmypdf does not work in Apple's Automator, because of how it refers to tesseract.

Even with drag-and drop, it would be an issue for my repetitive stress injuries. Without drag-and-drop, it isn't suitable for more than a few files a week.

And the price is well out of my range.
Reply 

#4  willus 05-08-2019, 09:00 PM
Quote MarjaE
These tools *should* cover compatibility with the Kindle Dx, but Ghostscript sometimes yields very fast pdfs... and sometimes very slow ones that take several minutes to turn a page... forcing me to turn to k2 after all.

Is there any easy way to check for compatibility issues, missing or corrupt text, malformed pages, and/or speed?
My experience is that the most common type of "slow" PDF is when the PDF uses a highly-compressed bitmap compression format such as JPX. A lot of archived books use this. Using k2pdfopt with the -i option will show you the internals of the PDF. Do you have a couple examples of the slow ones?
Reply 

#5  MarjaE 05-09-2019, 07:19 PM
No, these are ones I'd bought from Bundles of Holding.

I already converted jpx to Kindle-readable formats and removed passwords, etc. It may be an issue with technically-readable but oversized images. Unfortunately trying to compress or remove images occasionally leads to mis-scaled pages, with only the lower left corner showing, missing pages, etc.

I use the following /bin/bash shell script in Mac automator, passing input as arguments:

for f in "$@"
do
suffix="-converted.pdf"
base=`basename "$f" .pdf`
outputfile=$base$suffix
/usr/local/bin/gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -sstdout=%sstderr -dNOPAUSE -dQUIET -dBATCH -sOutputFile="$outputfile" "$f"
done

I know the syntax is different in the regular terminal.
Reply 

#6  MarjaE 05-13-2019, 12:35 AM
Apparently I can use -dFastWebView to linearize w/in Ghostscript. I haven't tested the results on the Kindle yet.
Reply 

#7  MarjaE 05-25-2019, 02:01 PM
It didn't help. I think it's an issue with excessive images, even when they aren't jpx. But I get erratic results from using GS compression options to deal with the images.
Reply 

#8  MarjaE 06-10-2019, 03:32 PM
I got what looks like an interesing fix here:

https://softwarerecs.stackexchange.com/questions/61671/tools-to-test-loading-rendering-speed-for-different-versions-of-same-pdf

I can't use it because I can't see the code window in wxdemo, because of the blinking cursor. But maybe some of you can use it.
Reply 

#9  Doitsu 06-11-2019, 03:55 AM
Quote MarjaE
I got what looks like an interesting fix here:

https://softwarerecs.stackexchange.com/questions/61671/tools-to-test-loading-rendering-speed-for-different-versions-of-same-pdf

I can't use it because I can't see the code window in wxdemo, because of the blinking cursor. But maybe some of you can use it.
Did you install MuPDF via homebrew? Otherwise the tool wouldn't work anyway, since PyMuPDF expects MuPDF to be installed on your system. You'll also probably need to install wxpython via homebrew.

IMHO, the tool would be of limited use anyway, unless most of your pdf files have similar page counts, because all it does is log the time it takes to load a .pdf file. Obviously, it'll take longer to load a 200 page document than a 50 page document.
Reply 

#10  MarjaE 06-13-2019, 04:37 PM
Yes, I have mupdf. I got stuck at the wxdemo code window. I probably should uninstall both versions of wxpython if I can't use them.

At this point I am thinking of using ghostscript with:

-K268435456 so memory and speed limitations show up. I think this is the total memory of the Kindle Dx. (256 mb x 1048576 b/mb).

-sPageList=1-10 so pdf length isn't too much of a problem.

and something to measure and output the rendering time, or attach it to the file name.
Reply 

  Next »  Last »  (1/2)
Today's Posts | Search this Thread | Login | Register