KOreader cannot handle certain dictionaries
#1  LittleBiG 11-15-2020, 11:34 AM
There are two problems I could spot out in handling certain dictionaries.
1. When a dictionary contains the same word as separated headwords (separated by meanings), KOreader can sjow only the last one. Example: if you serach for the word "bang", the dictionary can contain 3 headwords: bang1, bang2, bhang. You will NEVER see the first two meanings, because KOreader can show only the last one.
Another dictionary contains 2 headwords for "bang". One is "bang" itself (containing all meaning of "bang") and also a phrase "slap bang". KOreader will show only the "slap bang", but never the first one.
May I ask for improving directory handling by showing all headwords in these cases?
2. In some dictionaries the line breaks are disregarded, and the text would be more difficult to read. Probably there is a type of line break which KOreader doesn't interpret as a new line.

#2  Galunid 11-15-2020, 12:23 PM

#3  LittleBiG 11-15-2020, 05:38 PM
It's a pitty. Many of the dictionary files are built this way (On my reader 5 from 9 dictionaries are involved). The dictionary softwares can handle these "faulty" ones, adapting to this common "error": better to work it around and show correct result somehow than working in a faulty way, saying, the dictionary file is to blame. I hope once somebody could do something about it, who won't be content with this. Plus I am wondering if sdcv is under development and the developer could do something about it. Or it is became abandoned.

#4  Galunid 11-16-2020, 03:07 AM
It seems somewhat maintained

#5  LittleBiG 11-16-2020, 05:25 AM
sdcv has an open issue about the first one:
However, the description said sdcv always got the first result. Now in KOreader, it is the last result, not the first one.

And also my second issue is known:

KOreader used to handle the sorting of the dictionaries in a stone age way. Then somebody stepped up and improved it and now it is really comfortable. So

#6  Markismus 11-22-2020, 09:04 AM
@LittleBiG I’ve just created a function to deal with multiple entries for a Duden dictionary optimized for Koreader in my script Pocketbookdic.
Currently, it just prefixes with a superscript Roman numeral the definitions of entries with an identical keyword.

I’ve no idea why line breaks are disregarded. However, I’ve stumbled over both </br> and <br/> and it seems feasible that at least one is not recognized as a correct line break tag.

If you an provide a link to a dictionary with a relevant entry, I am willing to test and add a conversion to the tags for Koreader optimized part of the script.

#7  Galunid 11-23-2020, 10:27 AM
To my knowledge </br> is incorrect tag, unless you mean <br></br>, so I think it's reasonable it's not rendered correctly.

#8  Frenzie 11-24-2020, 08:36 AM
Our version of MuPDF currently requires well-formed XML. Once is finished that'll be more forgiving.

Today's Posts | Search this Thread | Login | Register