Mobileread
"Digitized by Google"
#1  jgray 09-07-2020, 02:17 AM
The following information is obviously for use with public domain books.

It seems that books that were scanned by Google are not stamped with a watermark. Instead, each page has an image placed on it. Software that removes watermarks will not remove these images.

I downloaded a sample book to see what I could do about this.

https://ia803203.us.archive.org/16/items/norwood-1699-the-sea-man-s-practice/Norwood%201699%20The_Sea_man_s_Practice.pdf

At the bottom of each page is an image that says "Digitized by Google". Through some investigation, I found that this is really just one image in the PDF, that is referenced multiple times, on each page. If we remove this image object, the image should be removed from all pages.

The following may not work for all books that are scanned by Google, but I'm assuming that since they automated the process, that all scanned books should have the same image in them.

1 - Using a free program, "GUIpdftk", uncompress the PDF.

2 - Using Notepad++ (or other text editor that can open large files), search for "/Width 1034", which is the width of the Google image. I found it in two consecutive places.

3 - In the sample PDF, objects 24 and 25 have the specified width, and the binary data (stream / endstream) appears to be exactly the same in both. In other scanned PDF files, the object numbers will probably be different.

NOTE: it seems that object 25 is the one that is used to stamp every page in this PDF. It's removal was sufficient. However, since object 24 looks to have the same image data, I felt it was best to remove it, also.

4 - Using the text editor, delete both objects entirely. This starts with "24 0 obj" and ends with the "endobj", after "25 0 obj". You are removing two similar (but not identical) code blocks from the PDF.

5 - Recompress the edited PDF with GUIpdftk. You should no longer see "Digitized by Google" on any pages.

BTW, you can also remove the Google title page with GUIpdftk, Use the "Remove" button. Specifying "2-end" removed the Google title page, leaving the rest of the document. This works on the compressed or uncompressed PDF.
Reply 

#2  PoP 09-07-2020, 11:59 AM
I have a 617 pages book which I once edited in "Adobe Acrobat Pro", removing these object references, page by page. Nedeless to say it was tedious and error prone.

I tried your effortless method on my book, it worked beautifully.

Thanks for sharing.
Reply 

Today's Posts | Search this Thread | Login | Register