Mobileread
Is there a way to detect buggy pdfs without manually checking each pdf?
#1  MarjaE 03-27-2020, 03:24 PM
Some pdfs have corrupt text encoding to begin with. I have a pre-process pdfs for my Kindle. Some pdfs end up with corrupt text encoding after pre-processing in Ghostscript.

If I select text from these pdfs, I get either gibberish, or blank spaces punctuated with ... well, occasional punctuation.

I usually find this out by trying to search in a pdf, or by selecting text in a pdf. Is there an easy way to detect pdfs with malformed or missing text, without manually opening and selecting passages from each pdf?
Reply 

#2  Quoth 03-27-2020, 05:20 PM
No.
This is why I avoid them and only have ones needed for documentation and read them on a 10" Tablet.

Life is too short. If I was immortal I might run OCR on the images and proof them.

Once or twice I've fixed up PDFs of rarer old books totally unavailable to be able to use them on a 7" eink.
Reply 

#3  MarjaE 03-27-2020, 05:58 PM
I don't get to choose what formats other people publish in, so I can't avoid pdf.
Reply 

#4  Quoth 03-28-2020, 06:18 AM
Quote MarjaE
I don't get to choose what formats other people publish in, so I can't avoid pdf.
Then use a 10" or better tablet for those. A decent one is cheaper than many 6" eink and can be close to half the price of an 8" eink.
Reply 

#5  MarjaE 03-28-2020, 01:47 PM
Quote Quoth
Then use a 10" or better tablet for those. A decent one is cheaper than many 6" eink and can be close to half the price of an 8" eink.
Do you know of any 10" or better non-touchscreen tablet with an e-ink screen, button-based controls, and preferably a keyboard?

Because I have coordination problems, and can't use touch devices, as well as visual processing problems, and can't see very bright screens, and get get migraines from flashing, zooming, etc. animation.

On my computer, I pick software based on my ability to avoid too much animation, avoid blinking cursors, etc. I use Firefox and Waterfox with about:config hacks, user css, and add-ons to try to block as much problematic animation as possible, but still struggle. I use a Benq flicker-free monitor at 0% brightness, 30% contrast, 10% red, 20% green, 10% blue, because standard brightness ranges are too bright, red light is most likely to trigger seizures and migraines, and blue light is often said to be most likely to trigger eye strain.

Even compared with these extreme settings, e-ink is easier for me than glowing screens. Even with the occasional flashes during page reload, as long as I'm not flipping through things too quickly, it's still easier for me than glowing screens.
Reply 

#6  MarjaE 03-29-2020, 11:45 PM
The old Iriver can load many of the Kindle-unreadable pdfs. Although it can't display jpx images in them. Librerator can as well. Kpv is supposed to be better, but I don't have the coordination to run it.

k2pdfopt is still a good way to convert scanned pdfs. Ghostscript lets me convert jpx images, and if it weren't for the trade-off with losing text, I'd just keep using it.
Reply 

Today's Posts | Search this Thread | Login | Register