Mobileread
Question about dictionaries
#1  mzel 10-23-2020, 01:37 PM
Is the support of morphology (genders, conjugations, contractions, etc.) the function of a dictionary or the koreader application?

It is not so much of a problem for English, but it is for many other languages
Reply 

#2  mergen3107 10-29-2020, 06:09 PM
AFAIK, Stardict (which engine is used in KOReader) does not support morphology. However it has fuzzy search, which looks for the most similar looking word if no exact match found. Sometimes it works, sometimes doesn’t. In the latter case I just tap and hold the word title in the dict popup in KOReade and type my required word manually.
Reply 

#3  Doitsu 10-30-2020, 04:25 PM
Quote mergen3107
AFAIK, Stardict (which engine is used in KOReader) does not support morphology.
StarDict does support inflections. The KOReader StarDict engine does not.

EDIT: I was wrong, KOReader has been supporting .syn files since late 2019.
Reply 

#4  mzel 11-01-2020, 02:09 PM
I guess you are talking about the .syn (synonyms) file. It is not really inflection. It requires you to list all the possible forms of the word as opposed to the list of rules of the language.
That means that if there is ~100 forms of the verb in Italian you need to provide 100 forms for each of the verbs as opposed to 100 rules for all the correct verbs combined
Reply 

#5  Doitsu 11-01-2020, 02:37 PM
Quote mzel
I guess you are talking about the .syn (synonyms) file.
I was indeed referring to .syn files, which the KOReader StarDict engine doesn't support.

Quote mzel
That means that if there is ~100 forms of the verb in Italian you need to provide 100 forms for each of the verbs as opposed to 100 rules for all the correct verbs combined
AFAIK, there are no cross-platform Open Source dictionary engines that support defining POS-based morphology rules.
Having to define all forms for each entry may seem like a rather primitive method, but it works surprisingly well.

BTW, If you want to add inflections to your own StarDict dictionary, you might find Tvangeste's inflection word lists for English, French, Italian, German, Spanish, Portuguese, Polish and Russian helpful.
Reply 

#6  NiLuJe 11-01-2020, 07:36 PM
Doesn't it? I recall a host of issues about sdcv being *slow* when dealing with synonyms, but handling them nonetheless .
Reply 

#7  Galunid 11-02-2020, 10:02 AM
Yup, it should support it, at least according to the issue @NiLuJe mentioned: https://github.com/koreader/koreader/issues/5437
Reply 

#8  Doitsu 11-02-2020, 12:40 PM
Quote NiLuJe
Doesn't it? I recall a host of issues about sdcv being *slow* when dealing with synonyms, but handling them nonetheless .
You are of course right. Apparently, KOReader has been supporting .syn files since 2019.

(I updated my initial post.)
Reply 

#9  mzel 11-02-2020, 11:44 PM
Reading up on this forum and 3-4 others I came to the conclusion that my options are:
1) Generating .syn file out of the grammar rules from the link above or the .aff file from a Goldendict dictionary
2) Trying to build a command line Goldendict for Kobo and write a plugin for it in koreader
3) Implementing those same rules in .lua wrapper around sdcv and trying to find a closest match from koreader
4) finding a ready-made .syn file for the language - Italian in this case
5) Something else? Kindle was able to do a better job in this department. I mean the native Kindle reader with dictionaries built for it. The Italian-English dictionary was pretty good in this regard. The Italian-Russian was not perfect, but still better than what we have now under koreader. It uses the same initial vocabulary but handles inflections way better. I never tried to install the dictionaries under koreader on kindle
6) Forego all of the above and use manual entry plus a guesswork to arrive at the correct headword

All suggestions and comments are welcome
Reply 

#10  Doitsu 11-03-2020, 03:17 AM
Quote mzel
4) finding a ready-made .syn file for the language - Italian in this case
AFAIK, .syn files are dictionary-specific index files that can't be re-used. They're automatically generated by StarDict Editor when you compile a Babylon GLS source file.
Quote mzel
5) Something else? Kindle was able to do a better job in this department.
You could unpack one of the free bilingual Oxford dictionaries that Amazon offers as optional downloads for eInk Kindle users with KindleUnpack and extract the inflection data.

Here's an example entry from the Italian-English Oxford dictionary:

Code
<idx:orth value="abbacchiato"> <idx:infl> <idx:iform name="" value="abbacchiata"/> <idx:iform name="" value="abbacchiate"/> <idx:iform name="" value="abbacchiati"/> </idx:infl>
</idx:orth>
The Babylon GLS equivalent is:

Code
abbacchiato|abbacchiata|abbacchiate|abbacchiati
Reply 

  Next »  Last »  (1/2)
Today's Posts | Search this Thread | Login | Register