Mobileread
ABBYY Fine Reader
#1  crutledge 12-01-2016, 09:48 AM
I either I have a complete mis-understanding or the ABBYY people dont't understand and answer my querys with references to their manual which is useless as ......

I wish to control the "mode e.g. greyscale or rgb" and "forma e.g. jpg or png of illustrations.

I find references to these in the "Tools" menu, but no place to select values ."

When I'm through editing in Fine Reader, I save the book as HTML.

If anyone can help, he or she will receive my blessings to the seventh generation, and I'm sure HarryT will ensure this gets to the proper forum.
Reply 

#2  Tex2002ans 12-10-2016, 12:18 AM
Quote crutledge
I wish to control the "mode e.g. greyscale or rgb" and "forma e.g. jpg or png of illustrations.
I don't believe you can have Finereader automatically export the images to a specific image format. (If I recall correctly, it exports Color as JPG and B&W as PNG).

The only way I know of to specify the output image is to Right Click the thumbnail of the page on the left side and press "Save Selected Pages as Images":

show attachment »

then you can select whatever image type you want from the dropdown:

show attachment »

The only thing is this will export the entire page as an image... So you will have to do your image manipulation in an outside program.
Reply 

#3  crutledge 12-10-2016, 04:05 AM
Thank you, sir.

You describe what I have also found. I was hoping for more control.
Reply 

#4  famfam 12-29-2018, 07:05 AM
Bei älteren Büchern mit Sperrschrift macht finereader beim ocr oft daraus Wörter mit Leerzeichen. Wenn man ein epub daraus macht, geht die Durchsuchbarkeit für die betreffenden Wörter verloren. Wie kann man die Wörter mit Leerzeichen automatisch umwandeln in Wörter ohne Leerzeichen?

For older books with a blocking font, finereader often makes words with spaces in the ocr. When you make an epub of it, the searchability for the words in question is lost. How to automatically convert the words with spaces into words without spaces?
Reply 

#5  HarryT 12-29-2018, 08:25 AM
I'm afraid that any OCR process is going to involve manual editing afterwards to get a usable file. OCR is pretty good, but it's far from perfect.
Reply 

#6  DaleDe 12-29-2018, 11:54 AM
The wiki articles OCR@Wiki » and OCR villains@Wiki » can provide some things to watch for. A spell checker is often a good thing to use to find initial problems with the output of OCR documents. As HarryT said you will need to proof read and manual fix errors.

Dale
Reply 

#7  HarryT 12-31-2018, 05:33 AM
Quote DaleDe
The wiki articles OCR@Wiki » and OCR villains@Wiki » can provide some things to watch for. A spell checker is often a good thing to use to find initial problems with the output of OCR documents. As HarryT said you will need to proof read and manual fix errors.

Dale
I've just added another entry to the "OCR villains" page which wasn't there, and that's the misinterpretation of the letter pair "cl" as "d", so you end up with "clock" as "dock", "close" as "dose", etc. That's one I've come across a lot.
Reply 

#8  famfam 01-05-2019, 04:46 AM
@HarryT
@DaleDe

auf deutsch:

Was ich meinte ist Folgendes:
Nach dem Speichern des Buchs mit Finereader als epub wurden Wörter, die im Original in Sperrschrift gedruckt waren als Wörter mit Leerzeichen dargestellt (z.B.: W o r t oder w o r d). Ich möchte nun in Sigil mit regex jedes W o r t bzw. w o r d finden das mit Leerzeichen dargestellt ist, und dann die gefundenen Wörter mit Leerzeicen durch dieselben Wörter, aber ohne Leerzeichen, ersetzen.
Vielleicht hat jemand ne Idee, wie man das mit regex vereinfachen kann?
Das Thema betrifft also zum einen Finereader als Problem (Verursacher des Fehlers), aber zum anderen Sigil (bzw. Regex) als Lösung (Korrektur des Fehlers). Eigentlich gehört der Thread nicht nur zu Finereader, sondern auch zu Sigil.

In English:

What I meant is this:
After saving the book with finereader as epub, words that were originally printed in block letters were represented as words with blanks (for example: W o r d or w o r d). In sigil, I would like to find with regex any W o r d or w o r d that are shown with spaces, and then replace the found words with spaces by the same words, but without spaces.
Maybe someone has an idea how to simplify this with regex?
So the topic is finereader-topic as a problem (cause of the error), but also a sigil-topic (or regex-topic) as a solution (correction of the error). Actually, the thread belongs not only to finereader, but also to Sigil.
Reply 

#9  Tex2002ans 01-05-2019, 06:37 AM
Quote famfam
After saving the book with finereader as epub, words that were originally printed in block letters were represented as words with blanks (for example: W o r d or w o r d). In sigil, I would like to find with regex any W o r d or w o r d that are shown with spaces, and then replace the found words with spaces by the same words, but without spaces.
Maybe someone has an idea how to simplify this with regex?
I doubt there are many legitimate 4+ single characters by themselves:

Search: \b(\w) (\w) (\w) (\w)\b
Replace: \1\2\3\4

That should point you towards all of these spaced out words, so:

Find: a b c d
Replace: abcd

Or maybe you can start out with more \w... like 7 or 8 of them, then work your way down.
Reply 

#10  Doitsu 01-05-2019, 12:06 PM
Quote Tex2002ans
I doubt there are many legitimate 4+ single characters by themselves
Since there are no italics blackletter fonts, German printers had to use increased letter spacing for emphasis. I.e., there might be even longer words.

@famfam I found a suitable regex in a German forum and used it to create a simple throwaway plugin that should automatically remove all unwanted spaces. Please make a backup copy before running this plugin!

Note that if you uncomment the following line in plugin.py by removing the # sign:

Code
#unspaced_word = '<span class="italics">{}</span>'.format(unspaced_word)
the plugin will wrap all replaced words in <span> tags.

And for completeness' sake here are instruction for Calibre Editor:

BTW, you also might want to post your question in the German MR subforum.
[zip] Spacer_v0.0.1.zip (1.2 KB, 65 views)
Reply 

  Next »  Last »  (1/2)
Today's Posts | Search this Thread | Login | Register