conversion pyglossary pdf
#21  pzack 09-10-2022, 08:43 PM
Dear David(DNSB),

Thank you for responding. I didn't notice the change in bracket style, however, I think that what is important is the fact that the headword(s) will be located before the first leading bracket no matter the style.

I guess the code could be written to look for the first leading bracket in the two styles.

I hope that this is a help.


#22  pzack 09-10-2022, 09:00 PM
Dear Sarmat89,

Thank you for responding.

I must ask you to forgive my obtuseness when it comes to programming and code. For example "unfold the lines first"? How do I "unfold a line". "Replacing code". Where and what code am I replacing?

Please don't assume that I know the techinical language that you probably are comfortable with.

May I ask you, since you have the real sample of the text to work with, to actually illustrate using the provided text of what you suggest needs to be done.

You are communicating with a first-grader when it comes to this type of programing. I have to be lead by the hand here.

Hopefully, you have the patience to walk me through this. I sense some impatiernce among some of the respondants, and I understand this, but I am not at the level of expertise of my respondants.

Very cordially,

#23  pzack 09-10-2022, 09:07 PM
Dear Sarmat89,

Adding to the message just sent to you. I also don't know what text editor that you are using and what program(and how to obtain the program)that would be doing the text modifications.

I have bloc-notes under win 11. Perhaps, I need a different text editor?


#24  DNSB 09-10-2022, 09:52 PM
I would suggest installing Notepad++ (not absolutely certain but is bloc-notes the same as notepad?).

As for unfolding lines, what is meant is to take:

zymotique [zimotik] adj. (gr. zumôtikos,
propre à faire fermenter, de zumôtos, fer-
menté, dér. de zumoün, faire fermenter, de
zum, levain ; 1855 [d'après Robert, 1977],
puis 1868, Souviron, 585). Qui se rapporte
aux ferments solubles.
and convert it to a single line:

zymotique [zimotik] adj. (gr. zumôtikos, propre à faire fermenter, de zumôtos, fer- menté, dér. de zumoün, faire fermenter, de zum, levain ; 1855 [d'après Robert, 1977], puis 1868, Souviron, 585). Qui se rapporte aux ferments solubles.

#25  Markismus 09-11-2022, 07:55 AM
So the problem is of course in the assumptions.

Conversion to csv-file
I've used sublime3, because it supports Perl regex. However, with a bit of googling you'll find the slight differences in regex implementation in editors. I've also included the perl commands.

If you use the following substitutions in order, you get a csv-file.

Find --> Replace ALL, e.g. perl -pe 's/\n\n+/||/sg'
'\n\n+' --> '||' , masking of the lines separating articles
'\n' --> ' ' , removal of the <EOL>-characters inside an article
'\|\|' --> '\n' , insertion of <EOL>-character at the end of an article. The article is now on 1 line.
'^(\S+)' --> '$1|,|$1' , Repeating the first word and introducing a delimiter, e.g. |,|. The reason for a complex delimiter is that it will not occur naturally in the article.
'^(\S+)' --> '$1,' , Splitting the first word and introducing a comma

The last two replacements are alternatives.
I've added the original text-file and the intermediate results.
You can recreate them with the commands
perl -pe 's/\n\n+/\|\|/sg' <original.txt> output1.txt
perl -pe 's/\n/ /sg' <output1.txt> output2.txt
perl -pe 's/\|\|/\n/sg' <output2.txt> output3.txt
perl -pe 's/^(\S+)/$1 /sg' <output3.txt> output4.csv
A final result in the classical csv-format is this:
zymogène, [zims3en] adj. (de zymo- et de -gène, du gr.gennân, engendrer, produire ; 1888, Larousse, comme qualificatif d’une substance qui produit un ferment soluble, par une transformation spontanée ; sens actuel, 1964, Larousse). Pouvoir zymogène, propriété des cellules de fabriquer leurs propres enzymes ; propriété des glandes spécialisées de produire les enzymes néces- saires à l'organisme.
©, n. m. (1964, Robert). Précurseur inactif d'un enzyme. (Syn. PROENZYME.)
zymotechnie, [zimotekni] n. f. (de zymo- et de -fechnie, du gr. tekhné, art [manuel], industrie, métier ; 1762, Acad.). Art de produire et de diriger une fermentation.
zymotechnique, [zimoteknik] adj. (de zymotechnie ; 1872, Littré). Qui se rapporte à la zymotechnie.
zymotique, [zimotik] adj. (gr. zumôtikos, propre à faire fermenter, de zumôtos, fer- menté, dér. de zumoün, faire fermenter, de zum, levain ; 1855 [d'après Robert, 1977], puis 1868, Souviron, 585). Qui se rapporte aux ferments solubles.
zythum, {zitsm] ou zython [zit5] n.m. (lat. zythum, bière, boisson faite avec de l'orge, du gr. zuthos, décoction d'orge, bière ; 1710, Richelet — additions — [zythum], et 1923, Larousse [zython]). Bière que les Égyptiens préparaient avec de l’orge fermentée.
So what's the problem? You now have an article with the key '©' that has a quite new meaning. Apparently, there are articles that have subsections separated from the main article in the same way that articles are separated.

Using my script I've added to the txt-file a csv-extension and ran it using
perl zymogène.S-delimiter .txt.csv fr '|,|'
The result in both the xml- and zipped binary form are also uploaded.

The screen output (with '$isTestingOn = 1;' in the script) is like this:
image »
Screenshot from 2022-09-11 14-32-07.png 
[txt] zymogène.txt (1.3 KB, 42 views)
[txt] n-||.txt (1.2 KB, 46 views)
[txt] n- .txt (1.2 KB, 43 views)
[txt] n.txt (1.2 KB, 42 views)
[txt] zymogène.S-delimiter .txt (1.3 KB, 45 views)
[txt] zymogène.S-, .txt (1.2 KB, 44 views)
[zip] zymogène.S-delimiter (1.8 KB, 45 views)
[xml] zymogène.S-delimiter .txt_reconstructed.xml (2.2 KB, 63 views)

#26  pzack 09-11-2022, 12:07 PM
Dear David(DNSB)

It's Sunday and I don't know if you want "work" on Sunday.

I see your example, thankyou. Now, what is the reason for the one line and how do I actually do this in notepad++ and have it go through the over 100,000 listed words and the attached definitions?

Looking at the text you see that some lengthy definitions are separated into paragraphs with space between paragraphs; how would the separate paragraphs be included in the one line?

After everything is put on one line for each headword what would be the next step for getting pyglossary to convert the file to stardict? Would I be putting a tab somewhere? and if so, how would this be done?

Do I need a special "sub-editor" to work inside notepad++?

As for notepad++, it may be the same as bloc-notes, however, I will try to install notepad++.

I assume, then, that you prefer to have me work under win 11 with notepad++ than linux.

Whatever is the most simple is best for me.


#27  pzack 09-11-2022, 01:01 PM
Dear Markismus,

You have put not a little work in your response to me and I am very appreciative of your efforts to help me.

Let me try to understand what you are proposing;

Firstly, I need to build a csv file. You applied the four lines of perl code to convert the one word to a csv formated file. Thus, do I plug in the original text file name in your first of four lines of code(perl -pe 's/\n\n+/\|\|/sg' <original.txt> output1.txt)and then follow through to the fourth line insserting the actual file names?

This then, would give me a complete csv file of the full text file of which you have the example?

Secondly, and I quote you:

Using my script I've added to the txt-file a csv-extension and ran it using

perl zymogène.S-delimiter .txt.csv fr '|,|'

I thought that we already built the csv file with your four lines of perl code. What txt file are you now adding a csv extension too. And what am I doing with the .xml and .zip files? Have I created these with your code?

Do I understand correctly that pyglossary will convert the csv file created? Does this side-step the tab-delimiting of the text file or was this accomplished in your code?

Where do I find "perl" and is this an instruction set to be used in a particular text editor. Is this under Linux terminal? What text editor are you using? Is sublime3 the editor? I am a little confused about what I actually need to impliment what you want me to do.

I hope that my understanding(or what little there is of)is not completely off base!


#28  pzack 09-11-2022, 01:47 PM
Dear Markismus,

Adding to the just-sent message this Sunday, I have installed notepad++ and have installed Perl in it. This is under windows 11.


#29  pzack 09-11-2022, 04:44 PM
Dear Markismus,

I was finally able to install ActivePerl for windows. I copied your first line of code into the command line for perl to execute it and it gave me back this message:

[ActiveState/ActivePerl-5.28] C:\Users\k\ActivePerl-5.28>perl -pe 's/\n\n+/\|\|/sg' grandl.txt output1.txt
'\' n’est pas reconnu en tant que commande interne
ou externe, un programme exécutable ou un fichier de commandes.

Which means that the '\' is not recognised as an internal nor external commande nor an executanle programm nor an file of commands.

How do execute the code, then, that you wrote?

I didn't want to impose upon you but would you convert the full text file that I have into a stardict dictionary for me? Otherwise, I can continue on this way-with your guidance. It is a learning experience in any event.

Thus, I think I have Perl installed under windows but I am stuck executing the code that you wrote.


#30  Markismus 09-11-2022, 05:05 PM
You can post a link to the full txt-file.

 « First  « Prev Next »  Last »  (3/15)
Today's Posts | Search this Thread | Login | Register