Mobileread
QuickDicBuilder: Custom dictionaries on the Tolino
#1  Peripathetic 02-09-2020, 04:47 PM
Dictionaries used by the Tolino app are stored under .tolino/dictionaries/ on the user data partition. The format used is that of QuickDic (*.quickdic).

Existing Dictionaries

The original QuickDic was an Android app written by Thad Hughes and eventually open-sourced. Dictionary files were hosted on Google Code and available for download but all of them got deleted and were apparently lost when Google shut down the website. A Web Archive snapshot of the project repository is available but files cannot be downloaded this way.

The project was later resurrected as QuickDic Restored by Reimar Döffinger. The author's repository contains a lot of dictionaries generated from Wiktionary, a sister project of Wikipedia, which was also the source of the original QuickDic data. However, as part of his work on the app, the author improved the dictionary format, which means that newer dictionaries (v007 instead of v006) are no longer compatible with the Tolino.

These Wiktionary-based dictionaries can be downloaded on GitHub:Make sure to download the files labeled v006 only.

Creating Dictionaries: The Tool

DictionaryPC is a Java tool for generating QuickDic dictionaries accompanying the QuickDic app:GitHub user Gitsaibot authored shell scripts for generating QuickDic dictionaries specifically with the Tolino in mind (the .jar file here is exactly the same as in the original project):Since it is a Java application, it needs JRE to run (portable version). Further, it requires the following classes: Common Compress, Common Lang3, International Components for Unicode, Xerces-J Impl.

For convenience, I packaged everything necessary to run it in a Windows environment into a single archive, which I named QuickDicBuilder. Here's how to use it:Note: Thad Hughes are Reimar Döffinger are the original authors, I am only redistributing this. For source code, please refer to the GitHub links above.

Creating Dictionaries: How to Use It

The dictionary generation tool is functional but not very well documented. Some extra information how it is supposed to be used can be obtained by reading old, closed GitHub issues and its source code.

The utility supports several input formats: "Wiktionary", "tab_separated", and "Chemnitz". The latter format follows that of several German dictionaries available here. Tab-separated is the most straightforward format to use. Perhaps it's best to illustrate how to use it by example.

Case #1: Dict.cc

Dict.cc dictionaries can be downloaded (for personal use) from:
https://www1.dict.cc/translation_file_request.php

I downloaded their Russian-English dictionary, and converted it to QuickDic format with the following command:

QuickDicBuilder --dictInfo="Dict.cc Russian-English" --dictOut="RU-EN_DictCC.quickdic" --input1="dictcc.ru-en.txt" --input1Charset=UTF8 --input1Format=tab_separated --input1Name="dictcc" --lang1="RU" --lang1Stoplist="StopLists\xx.txt" --lang2="EN"

I did not have a Russian stoplist so I used an empty one. Stoplists include frequently-appearing words that should be dropped from index. It'd probably be better to use one.

This conversion is relatively easy because the format of the downloaded file follows what the utility expects as its "tab_separated" input.

Case #2: CC-CEDICT

CC-CEDICT is a Chinese-English dictionary that can be downloaded from:
https://www.mdbg.net/chinese/dictionary?page=cc-cedict

Here, the conversion command was:

QuickDicBuilder --dictInfo="CC-CEDICT Chinese-English" --dictOut="CC-CEDICT.quickdic" --input1="cedict_ts.txt" --input1Charset=UTF8 --input1Format=tab_separated --input1Name="cc-cedict" --lang1="ZH" --lang1Stoplist="StopLists\xx.txt" --lang2="EN" --lang1Stoplist="StopLists\en.txt"

However, the input data needed to be rearranged first from:
SimplifiedHeadword TraditionalHeadword [Pronunciation] Definition
to:
SimplifiedHeadword TraditionalHeadword<Tab>Definition /Pronunciation/

For this purpose I used the following regular expression with sed:

sed -e "s/^ *\([^ ]*\) \([^ ]*\) *\[ *\(.*\) *\] *\/ *\(.*\) *\/.*$/\1 \2\t\4 \/\3\//g" cedict_ts.u8 > cedict_ts.txt

Results

This was done quickly just to check if it works but if you want to, you can download the dictionary files I generated.
Reply 

#2  Morioh 02-11-2020, 09:42 AM
This looks really cool sadly i don't have the technical expertise to create ja-en dictionary from Jmdic
Reply 

#3  Peripathetic 02-25-2020, 08:06 AM
Quote Morioh
This looks really cool sadly i don't have the technical expertise to create ja-en dictionary from Jmdic
JMDict is an XML file you'd have to parse. This would be an extra step.

But it seems the same data is also available as a "legacy" EDICT download:
http://ftp.monash.edu/pub/nihongo/edict.zip

The EDICT version is a plain-text, JIS-encoded text file. So all you'd have to do is convert it to UTF8, and then you can transform it with regular expressions like I did with sed for the CC-CEDICT.
Reply 

#4  Morioh 02-25-2020, 10:18 AM
Thank you for the mention but i should have said that i'm next to completely code illiterate .
So this is a pretty cool tool but i cannot use it.
Though i'm quite happy that even tolino has a dedicated way to make custom dictionaries since someone can get a bit of fun and usage out of this.
P.S Actually my toline is not even capable of selecting the text so its pointless ^^
Reply 

#5  oliverdb 04-05-2020, 06:45 AM
Hi,
first, I would like to thank @peripathetic for his/her amazing work. Thank you!

I managed to install TWRP, boot, and the modified EPubprob. I also modified the file assets/environments/app.properties.prod so to keep the connection to Thalia shop, as I still use it to keep my library there.

But, I did not manage to understand where should I put the dictionaries!!! everybody speaks about .tolino/dictionaries/ but I do not have that folder, and still I know I have some dictionaries installed.

Can someone give me some light on this issue? Probably it is very simple, but I cannot figure it out!

Regards,
Reply 

#6  Peripathetic 04-05-2020, 04:51 PM
Quote oliverdb
But, I did not manage to understand where should I put the dictionaries!!! everybody speaks about .tolino/dictionaries/ but I do not have that folder, and still I know I have some dictionaries installed.
When you connect the Tolino to your computer via USB, if you followed my customization guide to change the connection mode to MTP, it should look like that:

Spoiler Warning below






image »
If you left the default settings, it will show up as a Mass Storage Device, there will be a drive letter for it (like D:).
Tolino Dictionaries.png 
Reply 

#7  oliverdb 04-06-2020, 07:00 AM
Wow, easier could it not be. Thank you again!
Reply 

#8  AnimalOfArt 05-24-2020, 06:24 AM
Is it possible to convert a stardict dictionary to a file that Toligen can convert to QuickDic v6?

EDIT: Nevermind. I used pygossary for this.

Now, thanks to PocketbookDic (https://github.com/Markismus/PocketBookDic) I have the Kindle Duden converted from mobi to stardict and converted that by using pyglossary to a tablimited textfile and now with Toligen to quickdic!

EDIT: Unfortunately both dictionaries won't work
Reply 

Today's Posts | Search this Thread | Login | Register