Mobileread
Tool to OCR an "image" PDF → add text as extra layer?
#1  Shohreh 11-10-2020, 04:07 PM
Hello,

Is there a tool that can…
1. OCR an "image" PDF, and
2. Include the text output as an additional layer in a PDF, so that the user can search, and possibly select+copy, and paste it elsewhere, like it were a "text" PDF?

Thank you.
A9DC227F-30D7-497F-911E-D38A2888CD63.png 
Reply 

#2  Doitsu 11-10-2020, 05:42 PM
Quote Shohreh
Is there a tool that can…
1. OCR an "image" PDF, and
2. Include the text output as an additional layer in a PDF, so that the user can search, and possibly select+copy, and paste it elsewhere, like it were a "text" PDF?
Besides Adobe Acrobat, pretty much any commercial OCR tool, e.g. ABBYY FineReader, can do this.
There are also a couple of free Linux tools that can do this, e.g. pdfsandwich, but most of them are neither easy to install nor exactly user-friendly.
Reply 

#3  Shohreh 11-10-2020, 07:44 PM
Thanks for the info.

I tried a couple of open-source apps (Naps2 and ocrmypdf), and the output is pretty good.
Reply 

#4  willus 11-14-2020, 10:23 AM
Thanks for the tips on naps2 and ocrmypdf. Great looking utilities. k2pdfopt will also do this and also uses Tesseract.

k2pdfopt -mode copy -n- -ocr t file.pdf
Reply 

#5  charsee 12-15-2020, 11:23 AM
Quote willus
k2pdfopt -mode copy -n- -ocr t file.pdf
These commands go in "Additional options" box?
Reply 

#6  willus 12-19-2020, 01:47 PM
Quote charsee
These commands go in "Additional options" box?
With the MS Windows GUI you can set them as shown in the attached screen shot. The OCR option will automatically turn off native mode.
screenshot.png 
Reply 

Today's Posts | Search this Thread | Login | Register