Mobileread
Antique languages in epub
#1  carmenchu 12-30-2020, 03:24 PM
Hello!
I hope somebody can help with this: it is there some recommendation/best practice about dealing with words in an antique language in an epub?
And by 'antique language' I don't mean Latin nor Classic Greek, but beauties like Hittite or Phyrgian--so, I suspect that both spelling and pronunciation are dubious, and for specialists...
However, I feel that they ought to be got out of the way of the spell-checker, and if one were to use a text-to-speech
Any help will be appreciated.
Reply 

#2  Jellby 12-30-2020, 03:51 PM
Use the appropriate lang code, spell checking and text-to-speech should obey it, and that could mean ignoring the word.
Reply 

#3  carmenchu 12-30-2020, 04:07 PM
Please, are there language codes for Old Indic, Avestan, Old Church Slavonic, Old Norse, Old English, Old High German, Phyrgian, Hittite, Luwian...? Where does one find them?
I was rather hoping for some comprehensive label on the lines of xml:lang="exclude" / "none" / "PIE", that would deal with all those bits--some are only roots: 'xxx-'
Reply 

#4  jhowell 12-30-2020, 04:44 PM
I am not familiar with those languages but it occurs to me that some may be based on obsolete alphabets. If so there may not be existing fonts to support them or even Unicode code points defined for some of the needed characters.
Reply 

#5  carmenchu 12-30-2020, 05:15 PM
Quote jhowell
I am not familiar with those languages but it occurs to me that some may be based on obsolete alphabets. If so there may not be existing fonts to support them or even Unicode code points defined for some of the needed characters.
Well, neither am I familiar, and in fact I am cleaning the book (from OCR) as I read. There were some glyphs rendered by gif images (just like I used to render equations before MATHML) but fortunately, I was able to find all of those characters in utf-8--chiefly Latin subset, and some Greek. I suspect that in such cases (most of the languages referred to in the book are preliterate) linguists try to give a 'transliteration',or a 'reconstructed pronunciation'. Most are not worse than *k’ṃtom (the m has a point below) but I feel that there should be some epub mark-up to distinguish such from 'normal language'... So, I am asking--hope that somebody knows ??
Reply 

#6  DNSB 12-30-2020, 05:31 PM
Are we looking at Hittite, Middle Hittite, Neo-Hittite or Old Hittite? The ISO 639-3 language code list has all four of those. Phrygian only has the one entry.

See ISO 639 Code Tables for the complete searchable list.

Wrapping an string in the language code should prevent a spellchecker from attempting to spellcheck it unless it has a matching dictionary. Something like <span xml:lang="xpg">*k’ṃtom</span> for example.
Reply 

#7  carmenchu 12-30-2020, 06:10 PM
My, oh, my! Ask to learn: I never suspected that there would be so many codes for dead languages!
Now, a short and perhaps silly question: would it work just to mark those words (or roots, mostly) as <span xml:lang="pied">*k’ṃtom</span>, i.e., with a non-existing language code, or it would it sound all the bells in epubChecker? Really, the author is hardly ever giving words in a particular language, but rather pointing out 'cognates' or common roots in related languages... It seems rather an overkill to insert more than thirty different <span xml:lang="---">...</span> for mere fragments, mostly not belonging to actual languages, when what is really required is the notice 'don't treat this as a common word'.

Anyway, my thanks for the link to the code tables: good to know that there is such a complete reference for out-of-the-way languages.
Reply 

#8  DNSB 12-30-2020, 07:21 PM
I've never tried a non-existent language code but give it a try and see what happens.

Hey, if it's got room for Klingon, antique languages are nothing.
Reply 

#9  Jellby 12-31-2020, 04:30 AM
There are some special codes for cases where no other existing code is appropriate. Of course, what a particular application will do or not do with such codes is... unknown (I think calibre does not accept 'zxx' as a book language, for example).
Reply 

#10  Jellby 12-31-2020, 04:40 AM
Quote carmenchu
Please, are there language codes for Old Indic, Avestan, Old Church Slavonic, Old Norse, Old English, Old High German, Phyrgian, Hittite, Luwian...?
You could refer to:
https://www.loc.gov/marc/languages/language_name.html

Old Indic: USE Vedic: Assigned collective code [san]
Avestan: [ave]
Slavonic, Old Churh: USE Church Slavic: [chu]
Old Norse: [non]
Old English: USE English, Old (ca. 450-1100): [ang]
Old High German: USE German, Old High (ca. 750-1050): [goh]
Phrygian: Assigned collective code [ine]
Hittite: [hit]
Luwian: Assigned collective code [ine]

I had some "fun" translating all language names in calibre...
Reply 

  Next »  Last »  (1/2)
Today's Posts | Search this Thread | Login | Register