Mobileread
Help with PDF conversion
#1  eduard93 10-20-2021, 01:43 PM
Hello.

I'm unable to convert this PDF file to mobi (or any other ebook format).

Tried calibre, underlying ebook-convert, online converters - they skip all the text. I can see the text in PDF readers and the PDF passes pdf validation tools (pdfinfo), but opening in PDF editors like Libre Office shows cover and 1300 empty pages. pdftotext also returns nothing.

Any idea how to convert this file into a ebook?
Reply 

#2  Tex2002ans 10-20-2021, 06:17 PM
That PDF has 1707 different "fonts".

If you copy/paste the text out, or try to search, you can tell it's all completely gibberish.

They ran it through some sort of program that completely substitutes in all the characters. So on the surface, it may LOOK like a "C" + "h" + "a" + "p", but in reality, it's nonsense.

You'll have to rerun that entire PDF through actual OCR.

You can use any OCR programs you want, like:

The accuracy should be quite good, since the text is still vector (you can fully zoom in and it stays perfectly crisp).

Here's the OCR I got out of Chapter 1 using Finereader 12:

Quote
Chapter 1

The story of an orphan adopted into a wealthy noble family - What a romantic setting, especially for a girl. If it were a novel or television drama she would be the heroine of her own Cinderella story. The reality was nothing like the stories. Real life isn't a novel or a drama.

When my mother died my estranged father, a wealthy businessman, adopted me. For the crime of suddenly appearing in their lives, my two older half-brothers bullied and harassed me from day one. They were cruel. They insulted me and even pulled pranks with my food. My half-brothers' torment became my new normal. Any hope of reprieve at school was guickly dashed. [...]

[...]

"Father! Look! I was accepted! I was accepted!" I nearly shouted withjoy.
Looks like:

but besides that, looks extremely accurate.

Nothing a little elbow grease couldn't fix up.
Reply 

#3  eduard93 10-21-2021, 06:17 AM
@Tex2002ans thank you.

You're right, looks like the OCR is the only way.
Reply 

Today's Posts | Search this Thread | Login | Register