MITT HE 0.9
MITT HE-RZ/NT 0.9 -- Standard for romanized and native-script text styles for hebrew language.
Draft version. This document may contain some to-do notes, very incomplete explanations, and even some information that is not factually correct.
This document contains many rare Unicode characters, whose name is not mentioned. If you need to know the Unicode name of some character, you can copy the character from this document to some website that identifies Unicode characters.
The MITT text styles do not support the hebrew cantillation marks for their original purpose. The cantillation marks are used instead for indicating various details of hebrew grammar and ortography, as is explained in this documentation. (This is the case in judeo-arabic too, which uses the cantillation marks rėvīắ and sėgoltā for indicating the pronunciation variants of some hebrew letters.)
Letters marked with a hashtag # have a different design in the MITT fonts than you will see in this documentation with a standard font. Many of these design differences are explained in this document. A complete list of untypical letter designs related to the MITT text styles is available in the documentation of the MITT font.
This page is best viewed as the source code, in Notepad or other text editor.
Main versions of writing styles:
HE-RZ -- fluent romanized hebrew:
** HE-RZ-P Precise romanized hebrew, with full masoretic vowelization and double / hard / soft consonants - Indicates with diacritical marks, where traditional and modern mater lectionis would differ. - foreign names (and possibly also loan words) are written based on letters in original script (not pronunciation), not using q for k or ṱ for t, but indicating these consonants in foreign words - possibly also indicating (e.g. with footnotes or source code that is invisible to a human reader), how foreign names and some loan words would be spelled based on pronunciation
-- HE-RZ-V Vowelized romanized hebrew, with full masoretic vowelization and double / hard / soft consonants - typical traditional mater lectionis --- not indicating where traditional and modern mater lectionis would differ - foreign names (and possibly also loan words) are written based on letters in original script (not pronunciation), not using q for k or ṱ for t --- not indicating how foreign names and some loan words would be spelled based on pronunciation
-- HE-RZ-L Lowercase romanized hebrew, with lowercase letters only - identical with HE-RZ-V, but using lowercase letters only, and: - foreign names and some loan words are spelled based on pronunciation, and using q for k and ṱ for t
-- HE-RZ-S Simple romanized hebrew, with simplified vowelization, with double / hard / soft consonants - identical with HE-RZ-V, but using simplified vowelization
-- HE-RZ-B Basic romanized hebrew, with simplified vowelization, with double / hard / soft consonants - identical with HE-RZ-S, but using the following simplified consonants (to avoid using any characters outside of the Basic Latin Unicode block): ' = ạlef / ắyin, ch = ḫẹt, t = ṱẹt / tạw, kh = ķaf (kaf in a softened position), s = sạmeķ / ṡīn, tz = źadeh (double: ttz), sh = šīn (double: shh).
-- HE-RZ-T Unvowelized traditional romanized hebrew, unvowelized, with lowercase consonants only - identical with HE-RZ-L, but without vowelization or double / hard / soft consonants
-- HE-RZ-U Unvowelized modern romanized hebrew, unvowelized, with lowercase consonants only - identical with HE-RZ-N, but with exaggerated modern mater lectionis https://en.wikipedia.org/wiki/Ktiv_hasar_niqqud#Rules_for_spelling_without_niqqud - consonantal waw is marked with two waws in the middle of word, also if a preposition causes the waw to be technically not the first letter, e.g. in ha wạrōd -- however, prefix ṷ does not cause three consecutive waws - consonantal yod in middle of word is marked with two yods, but not after prepositions, and not before or after mater lectionis (e.g. in bėriyyōt) -- also yạyin has only two yods, so maybe three consecutive yods are generally avoided - every O and U is marked with waw - every I is marked with yod, except not: - as the vowel of a preposition, such as li-, mi-, ki-, wi- etc. - in a closed syllable (= before shwa nach): higdīl -- however, this rule is often not followed, e.g. universally in iššạh - strong E is marked with yod, when: - a guttural letter has caused a short I to become a strong E, e.g. in tęạvōn and tęạvẹd (which should start with pattern TiQQ-) - in other situations more commonly not, e.g. not in ẹzōr or mẹmad, but yes in tęvạh or hęṡẹg - traditional hebrew names are usually written without mater lectionis, e.g. Mošeh, Šlomoh, Kohẹn
HE-YI -- fluent romanized yiddish:
** HE-YI-T Traditional romanized yiddish.
HE-BL -- hebrew (and yiddish) romanized with Basic Latin characters only (ASCII 32 - 126):
-- HE-BL-P Precise extended basic-latinized hebrew.
-- ... ...
-- HE-BL-U Unvowelized modern basic-latinized hebrew.
HE-NT -- hebrew in the native script:
** HE-NT-P Precise extended hebrew with two-case hebrew script, with full masoretic vowelization and double / hard / soft consonants - uppercase and lowercase letters - typical traditional mater lectionis - foreign names and some loan words are spelled based on pronunciation, and using q for k and ṱ for t
-- HE-NT-B Unvowelized traditional hebrew with two-case hebrew script --- without vowelization or double / hard / soft consonants - uppercase and lowercase letters - typical traditional mater lectionis - foreign names and some loan words are spelled based on pronunciation, and using q for k and ṱ for t
-- HE-NT-M Unvowelized modern hebrew with two-case hebrew script --- without vowelization or double / hard / soft consonants - uppercase and lowercase letters - exaggerated modern mater lectionis (also in many short vowels) - foreign names and some loan words are spelled based on pronunciation, and using q for k and ṱ for t
-- HE-NT-V Vowelized unicase hebrew with unicase hebrew script, with full masoretic vowelization and double / hard / soft consonants - typical traditional mater lectionis ạlef / yōd / wạw - foreign names and some loan words are spelled based on pronunciation, and using q for k and ṱ for t - one text case only (which is called "lowercase" here)
++ HE-NT-S Vowelized simple hebrew, with simplified vowelization, with double / hard / soft consonants (modify the description yet) - identical with HE-RZ-V, but using simplified vowelization
-- HE-NT-T Unvowelized unicase traditional hebrew with unicase hebrew script --- without vowelization or double / hard / soft consonants - typical traditional mater lectionis - foreign names and some loan words are spelled based on pronunciation, and using q for k and ṱ for t - one text case only (which is called "lowercase" here)
-- HE-NT-U Unvowelized unicase modern hebrew with unicase hebrew script --- without vowelization or double / hard / soft consonants - exaggerated modern mater lectionis (also in many short vowels) - foreign names and some loan words are spelled based on pronunciation, and using q for k and ṱ for t - one text case only (which is called "lowercase" here)
The hebrew script traditionally does not have a distinction between uppercase and lowercase letters. The text styles HE-NT-P, -D, -T, -C, -S and -Z indicate uppercase letters with Unicode characters masora circle ( ْ ) or vertical four dots (⁞), as is explained further below in this document. In these six text styles a masora circle is never used for any other purpose, to avoid confusions about its meaning in each case.
Typical texts in hebrew script can be automatically converted into HE-RZ-U format (or HE-RZ-L format, if the text is vowelized). These text styles prefer lowercase letters, because most of the letters in the text would become lowercase, if the text will be manually converted into a text style that uses uppercase and lowercase letters.
Foreign non-semitic proper nouns are usually written based on pronunciation in the hebrew script. The hebrew style of writing foreign proper nouns is imitated also in such of these romanized standards, whose primary emphasis is replicating the hebrew script text as such in latin script, rather than writing optimally convenient text with the latin script.
Curly brackets can be used to mark words or longer text sections as literal quotes of another langauge, on which the rules of these text style conversions should not be enforced. Single curly brackets indicate that a literal quote should be transcribed letter by letter: e.g. "{Charlotte} mėvủttā Ź˴ąrlỏt". Double curly brackets indicate that the quote should remain in its original script, without transcribing it: e.g. "{{Charlotte}} mėvủttā Ź˴ąrlỏt".
Only such latin and cyrillic Unicode characters have been deemed acceptable for these standards, which do not force the text row to be any higher than normal. Sometimes a cyrillic text standard uses a latin Unicode character, even if a similar character were possible to achieve as a combination of a cyrillic letter (visually identical to the latin letter) and a separate diacritical mark. Single Unicode characters are always favoured, because they produce the expected visual look more reliably and precisely in various software.
The HE-BL text styles may not be esthetically very pleasant to read. Their purpose is to provide an easy and safe way to write and store hebrew text (with the probable intention to later display the text in a more fluently readable text style), using the most universally supported Basic Latin character set only (ASCII 32 - 126): a...z A...Z 0 1 2 3 4 5 6 7 8 9 . , : ; - ' " ! ? ( ) [ ] { } < > / | \ _ ~ = + * ^ ` @ # $ & %.
Sample text, for comparing these text styles: [TO DO...]
** HE-RZ-P Lorem ipsum dolor sit amet...
-- HE-RZ-V Lorem ipsum dolor sit amet...
-- HE-RZ-L Lorem ipsum dolor sit amet...
-- HE-RZ-U Lorem ipsum dolor sit amet...
** HE-NT-D Lorem ipsum dolor sit amet...
-- HE-NT-V Lorem ipsum dolor sit amet...
-- HE-NT-U Lorem ipsum dolor sit amet...
Logical convertibility between these text standards (based on the text itself only, without using any dictionary data or human help):
TO DO...
A dash --- or arrows --> and >-- are used in this diagram, if the conversion between two text styles is not supported, because the source text does not contain enough information for logically concluding the correct ortography in the target text style (in all possible scenarios). If such an unsupported conversion is requested, the default option is not to convert the source text at all. However, if it is deemed preferable to convert the text into the closest possible text style (most notably, when the script should change from native to romanized, or vice versa), the arrows point towards the text style that is the recommendable substitute for the unsupported conversion. A dash --- is used in the diagram, when no other logically supported substitutes are available (in the same script) than the source text style itself.
Foreign proper nouns cannot be automatically converted between a literal format (replicated letter by letter) and a transliteration based on pronunciation. Such a conversion would be reliably possible only if both the literal and the pronounced form of the name are documented in the source text, using some kind of tags or footnotes. This table of logical convertibility between the text styles ignores this aspect of foreign proper nouns, and promises convertibility from a text style to another, if no other logical obstacles exist for the conversion than the writing of foreign proper nouns being based on different principles.
The sample texts afore use such a notation that the form based on pronunciation is given in [{curly brackets inside square brackets}], and the literal form is given in {[square brackets inside curly brackets]}, after the spelling that is chosen for the main text. Thus it would be possible to automatically recognize, which of the two formats is the literal one. These codes and alternative spellings are not intended to be seen by the human reader in the main text.
These text styles use a strict logical correlation between latin letters and hebrew script letters, so that the text can be converted back and forth between hebrew script and latin script, and the text should stay exactly similar through all these conversions, without any changes caused by the conversion process back and forth. However, foreign proper nouns are not always fully compatible with this system. In some scenarios it is possible that converting a foreign proper noun from latin script to hebrew script, and then back into latin script, produces a different spelling in latin script than was the original form of the name. This can happen because these romanized hebrew text styles are optimized for fluent reading and strict logical compatibility with the hebrew script, not strict compatibility with the way how other languages use the latin script.
HE-RZ-P text can be automatically converted into any other of these text types.
Fully vowelized HE-NT-V text in hebrew script (such as the Bible, for example) can be automatically converted into fully vowelized HE-RZ-L lowercase text. Some manual work would be required to convert this into HE-RZ-P text.
Unvowelized HE-NT-T or HE-NT-U texts in hebrew script (most of the modern texts in hebrew language) can be automatically converted into unvowelized HE-RZ-T or HE-RZ-U lowercase text. Quite much manual work would then be required to convert this into HE-RZ-P text.
Technical reliability of these text styles, as Unicode characters:
To increase the grammatical information content of the text, these text styles use many uncommon diactirical marks and special characters, both in latin and in hebrew script. This causes a higher risk that some fonts or software will fail to display the text correctly and beautifully. One of the esthetic risks is that some letters in the text are displayed with a different font than the rest of text. This happens if the primary font does not include some rare character. In that case the software will use any other font that contains the character. If none of the available fonts contains the character, then the software probably displays some generic character, such as a square or a question mark, for example.
(Terms such as ạlef, rạfeh, zarqā, rėvīắ, dạgẹš etc. are written in this document as precise transliterations from hebrew language, except when mentioning the name of a Unicode character that contains such an hebrew word: the names of Unicode characters are always given precisely in their official form, which may contain hebrew words that are written in a casual and simplified form.)
Punctuation in hebrew script:
The main punctuation symbols in biblical hebrew are maqqạf (־‎), which acts in the role of a hyphen, and sōf pạsūq (׃‎), which acts in the role of a period. Modern hebrew uses ordinary western/latin punctuation, except maqqạf is usually favoured instead of an ordinary hyphen in printed literature. This situation is not completely unproblematic, however, because the Unicode characters for western/latin punctuation have LTR (left-to-right) writing direction, while hebrew text is RTL (right-to-left). Such a mixture of RTL and LTR characters functions perfectly most of the time, but sometimes it happens that an LTR punctuation mark at the end of a row of RTL text gets wrongly placed as the first letter of the row. It should looke like this, so that period is the last character in a text row, which is read from right to left:
.TFEL-OT-THGIR-MORF-TXET
But it looks like this instead, as the software has erroneously moved the period from the end to the beginning of the text row:
TFEL-OT-THGIR-MORF-TXET.
This problem happens when RTL text is written in a text area, whose writing direction setting is LTR. Unicode assumes a bit over-optimistically that people will always choose the correct writing direction setting for the text area. This is not a realistic assumption: many people do not even know what they should do, or then the software where text is written does not offer any way how to do it. It was intended that these text styles in RTL hebrew script would use RTL characters only as punctuation, to avoid the problems that sometimes arise with LTR punctuation. However, a sufficient number of widely supported and correctly behaving Unicode characters was not found, which would look visually similar to the most commonly used latin punctuation, and which do not force the text row to be higher than is typical for hebrew script. Therefore, the chosen solution is to always write a Unicode right-to-left mark "‏" (‏ / ‏) after each LTR punctuation mark in these RTL text styles. This seems to solve the problem of displaced LTR punctuation as the last character of a text row. Unfortunately this means that the text contains invisible characters, which are necessary for the text to be displayed correctly in all possible circumstances, but people cannot see them and will not understand their purpose and necessity.
Here is a list of some RTL punctuation, which were considered and tested for a while, before this approach was abandoned:
: => ׃ Colon: hebrew punctuation sof pasuq.
- => ־ Hyphen: hebrew punctuation maqaf.
— => ־־ Em dash: two hebrew punctuation maqafs.
_ => ߺ‎ Low line (underscrore): nko lajanyalan. (This character forces the text row height to be a bit higher than is typical in hebrew script.)
| => ׀ Vertical bar: hebrew punctuation paseq.
' => ׳ Apostrophe: hebrew punctuation geresh.
" => ״ Quotation mark: hebrew punctuation gershayim.
“ => ߵߵ Left double quotation mark: two nko low tone apostrophes.
” => ߴߴ Right double quotation mark: two nko high tone apostrophes.
` => ߵ Grave accent, used in the role of left single quotation mark: nko low tone apostrophe.
´ => ߴ Acute accent, used in the role of right single quotation mark: nko high tone apostrophe. (In some fonts this character is nearly identical with hebrew punctuation geresh. All these nko tone apostrophes can force the text row height to be a bit higher than is typical in hebrew script.)
Romanized transcription for hebrew RTL punctuation:
maqqạf (־‎) = MACRON (¯), in MITT fonts modified as a hyphen located on top level of lowercase letters (lower than a macron, but higher than ordinary hyphen).
sōf pạsūq (׃‎) = THEREFORE SYMBOL (∴)
gẹršaĭm (״) = RATIO (∶), e.g. DHẬ∶H. Also two consecutive dots (..) are interpreted as gẹršaĭm, if romanized text is converted to hebrew script.
The character ₗ can be used in romanized hebrew to separate two same consonants in some exceptional cases, where text in native hebrew script contains a vowelless non-doubled consonant, which is followed by the same consonant: e.g. Yiṡ-ṡạķạr => Yiṡₗṡạķạr.
Punctuation and diacritical marks that can modify the pronunciation of a letter:
Gereš (׳) is often used to modify the pronunciation of a consonant in the hebrew native script. In this role it is romanized with MIDDLE GRAVE ACCENT (˴) in these text styles.
However, gereš can also serve as a marker of numbers or abbreviations. To disambiguate between these two usages, many of these text styles replace gereš with a slightly smaller character, GREEK TONOS (΄), both in the native script and in romanization, when its role is not a modifier of pronunciation.
Various traditions favour different methods of indicating modified pronunciations of hebrew letters. The most widely used methods are dạgẹš (a small dot in the middle of a hebrew letter) as marker of the hard pronunciation of BeGaD-KeFaT letters, and gereš (׳) as marker of foreign pronunciations of many letters in modern hebrew. Much rarer methods are rạfeh ( ֿ ) as marker of the weak pronunciation of BeGaD-KeFaT letters, and rėvīắ ( ֗ ) or sėgoltā ( ֒ ) as marker of various pronunciations in judeo-arabic texts.
Many of the MITT text styles indicate alternative pronunciations of hebrew letters in the following, to some extent untraditional way: A consonant without any diacritics is always hard (unless a prefix causes it to be soft). Rạfeh indicates the softened or alternative pronunciation of many hebrew letters. Zarqā ( ֮ ) indicates the foreign pronunciation of many hebrew letters. Rėvīắ indicates a foreign letter, which is written with the same character as a hebrew letter, but behaves differently in pronunciation or transliteration.
The following tables explain, how all these various traditions and marking systems affect the letters of the hebrew alphabet, how they are romanized in the MITT text styles, and how compatibility is ensured between traditions that use the same diacritical mark for different purposes:
RAFEH as marker of weak or alternative pronunciation, in MITT text styles:
V v = softer pronunciation of bẹt after a vowel => bẹt + rạfeh
Ģ ģ = softer pronunciation of gīmẹl after a vowel, since approximately 0 CE (+/- 300 years) => gīmẹl + rạfeh -- Modified glyphs in MITT fonts: Ģ => G with hook below, ģ => g with horn in top right corner
Ð ƌ = softer pronunciation of dạlet after a vowel, since approximately 0 CE (+/- 300 years) => dạlet + rạfeh -- Modified glyphs in MITT fonts: ƌ => the top bar is half narrower than usual. In serif fonts the vertical bar preferably bends to the left, without a sharp angle in the right top corner.
Ⱬ ⱬ = hebrew letter zaĭn in a word root, which originally included archaic letter ⱬāĭn instead of zaĭn (some 3500 years ago). Equivalent of arabic letter ḓāl: voiced TH, as in english "then". => zaĭn + rạfeh
Ḥ ḥ = arabic ḥẵ (while ḫẹt without any diacritics is the equivalent of arabic ḫẵ, a harder consonant) => ḫẹt + rạfeh
Ẓ ẓ = foreign (e.g. arabic) letter ẓẵ (arabic ṱẵ with dot) => ṱẹt + rạfeh -- Modified glyphs in MITT fonts: Z with comma below
Ķ ķ = softer pronunciation of kaf after a vowel => kaf + rạfeh
Ç ç = foreign C cedilla => sạmeķ + rạfeh
ᶝ ᵜ = hebrew letter ắyin in a word root, which originally included archaic letter ġāĭn instead of ắyin (until ca. 200 BCE) => ắyin + rạfeh -- modified glyphs in MITT fonts: ᶝ => Similar to superscript "c" (modifier letter small c), with the upper end of line having a round dot, which is larger than a period in the same font. ᵜ => Similar to superscript "c" (modifier letter small c), with the upper end of line having a round dot, which is larger than a period in the same font, and positioned lower than usual, with the top edge level with the top edge of lowercase "e".
F f = softer pronunciation of pē after a vowel => pē + rạfeh
Ż ż = hebrew letter źạdeh in a word root, which originally included archaic letter żāt instead of źạdeh (some 3500 years ago). => źạdeh + rạfeh
Ŧ ŧ = archaic semitic letter ŧān = dotless śīn + rạfeh
Ƭ ʈ = softer pronunciation of tạw after a vowel, since approximately 0 CE (+/- 300 years), no longer used in most dialects of modern hebrew => tạw + rạfeh -- Modified glyphs in MITT fonts: T t with a short horizontal bar at the middle, on the left side of vertical pillar only.
Ƽ ƾ = softer pronunciation of tạw after a vowel, alternative characters, which emphasize the ashkenazi pronunciation "s" since the middle ages. -- Modified glyph in MITT fonts: Ƽ => "S" with the horizontal bar of "T" on top.
Use SOME OTHER diacritical mark in native script:
Ḃ ḃ = bẹt after a vowel in a proper noun from the Tanakh era (whose post-Tanakh era softening might be deemed historically incorrect) => bẹt + ??? -- Modified glyphs in MITT fonts: Ḃ ḃ => B b with an open center area and open top edge, which has some visual similarities with letter V v.
Ƿ ƿ = pē after a vowel in a proper nounfrom the Tanakh era (whose post-Tanakh era softening might be deemed historically incorrect) => pē + ??? -- Modified glyphs in MITT fonts: Ƿ ƿ => P p with open right edge, which has some visual similarities with letter F f.
Ŝ ŝ = hebrew letter šin in a word root, which originally included archaic letter ŧān instead of šin (probably over 3000 years ago) = dotless śīn + middle dot above (?)
In traditional hebrew ortography, a dạgẹš (a small middle point) in the BeGaD-KeFaT letters always means that the consonant has a strong pronunciation. In addition to this, it can also mean that the consonant should be doubled (pronounced with a longer duration). The MITT master text styles remove this ambiguity, by using a dạgẹš only for indicating the doubling of consonants, never for any other purpose. Softened pronunciation of the BeGaD-KeFaT letters is indicated with a rạfeh. The same diacritical mark is used for indicating the primary alternative pronunciation of many other hebrew letters.
Ġāĭn and soft ḥẹt are ancient semitic letters, which existed in hebrew until approximately 200 BCE. Since then ġāĭn has been pronounced similarly as ắyin, and ḥẹt similarly as ḫẹt. When the grammar and pronunciation of hebrew was standardized by the masoretes 1000 years later, it was not considered necessary to indicate in the text of Tanakh, which ắyin was originally ġāĭn (e.g. Ġȧmorạh), and which ḫẹt was originally ḥẹt (most of them, e.g. ḥạlạv).
Ŧān is a very archaic proto-semitic letter, pronounced quite similarly as "th" in the english word "think", or the softened ẗaw in word šabbạẗ in post-Tanakh era hebrew. According to some theories, ŧān may have been the sound that the ephraimites were unable to pronounce in the famous ŧibbolet or šibbolet incident in Judges 12:6 (approximately 1400 BCE), and they pronounced a sạmeķ instead. Coincidentally, ashkenazi jews pronounce a sạmeķ instead of a softened ẗaw, which involves the same replacement of a thin "th" sound with an easier sound "s".
In the masoretic text the sound is šin, which the ephraimites were unable to pronounce correctly. The ŧān theory is not universally supported by all scholars. Nevertheless, the MITT text styles support the archaic semitic letter ŧān as a dotless śīn with rạfeh above, romanized as Ŧ ŧ.
ZARQA as marker of foreign pronunciation or ortography, in MITT text styles:
Ṽ ṽ = foreign letter V => bẹt + zarqā -- Modified glyphs in MITT fonts: V with macron
Ǧ ǧ = foreign sound Ǧ => gīmẹl + zarqā
Ḓ ḓ = foreign sound voiced TH (as in english "this") => dạlet + zarqā
Ŵ ŵ = foreign letter W => wạw + zarqā -- Modified glyphs in MITT fonts: W with macron
Ž ž = foreign sound Ž => zayin + zarqā
Ȟ ȟ = hebrew letter ḫẹt in a word root, which originally included the archaic ȟāt (harder ḫẹt) until ca. 200 BCE => ḫẹt + zarqā -- Modified glyphs in MITT fonts: H h with inverted breve below
J j = foreign letter J => yōd + zarqā
C c = foreign letter C => kaf + zarqā
Ł ł = foreign letter Ł => lạmed + zarqā
Ñ ñ = foreign letter Ñ => nūn + zarqā
ẞ ß = foreign letter ẞ => sạmeķ + zarqā
Ḟ ḟ = foreign letter F => pē + zarqā -- modified glyphs in MITT fonts: Ḟ ḟ => F with macron above, f with macron below
Č č = foreign sound Č => źạdeh + zarqā
X x = foreign letter X => qōf + zarqā
Ṫ ẗ = foreign sound thin TH (as in english "think") => tạw + zarqā -- Modified glyphs in MITT fonts: T with tilde above
Foreign pronunciations of hebrew letters are commonly marked with a gereš (׳) after the consonant. The MITT text styles use a zarqā above the letter to mark foreign pronunciations (and some foreign ortographical consonants). The latin letters C, J, W and X can have many different pronunciations in various languages: cent / cat / ciao, John / Jacques / Jose / Janßen, wait / show, mix / Xbox / Xerox / Xavier. In the MITT text styles a zarqā on hebrew letters wạw, yōd, kaf and qōf means latin letters W, J, C and X as ortographical elements, regardless of how they are pronounced.
Ð ƌ (dạlet + rạfeh) and Ḓ ḓ (dạlet + zarqā) are pronounced identically, but written with different ortography in the MITT text styles. It is important to choose the correct ortography in a master text style: when the text is converted to some other text style, most text styles prefer an ordinary dạlet (D d) instead of Ð ƌ, but Ḓ ḓ is expected to remain as such.
REVIA as marker of foreign letter, in MITT text styles:
Ḇ ḇ = foreign letter B => bẹt + rėvīắ
Ḡ ḡ = foreign letter G => gīmẹl + rėvīắ
Ḏ ḏ = foreign letter D => dạlet + rėvīắ
Ẕ ẕ = foreign letter Z => zayin + rėvīắ
Ț ț = foreign letter T, secondary variant => ṱẹt + rėvīắ
Ḵ ḵ = foreign letter K, primary variant => kaf + rėvīắ
Ș ș = foreign (e.g. arabic) letter șād => sạmeķ + rėvīắ
Ġ ġ = foreign (e.g. arabic) letter ġāĭn => ắyin + rėvīắ
Ṕ ṕ = foreign letter P => pē + rėvīắ -- modified glyphs in MITT fonts: Ṕ ṕ => P p with macron above
Ẑ ẑ = hebrew letter źạdeh in a word root, which originally included archaic letter ẑādeh instead of źạdeh (some 3500 years ago). Equivalent of arabic letter ḑād (șād with dot). => źạdeh + rėvīắ -- Modified glyphs in MITT fonts: Ẑ ẑ => Z z with grave above.
Ɋ ɋ = foreign letter K, secondary variant => qōf + rėvīắ -- Modified glyphs in MITT fonts: Ɋ ɋ => Q q with macron above.
Ṯ ṯ = foreign letter T, primary variant => tạw + rėvīắ
The MITT master text styles indicate these foreign letters with a rėvīắ in native hebrew script, and in most cases with a macron above or under the consonant in romanized text. The master text styles need to know, when these letters are grammatically foreign and not hebrew, to correctly manage conversions between text styles in some scenarios. For example, if the romanized hebrew text "Kẹn, Kevin!" is converted into native hebrew script (automatically, without human help), both words would begin with kaf in hebrew script, because K stands for kaf in the romanization standard. However, the most common hebrew text styles mark foreign K letter with a qōf, not kaf. The text conversion algorithm cannot reliably choose a qōf instead of kaf, unless foreign letter K is indicated with a special character: "Kẹn, Ḵevin!" This looks better than "Kẹn, Qevin!" when the text is displayed in latin script.
Two variants are available for foreign K and T: the primary variant in MITT text styles is based on the historical ancestor of these latin letters: foreign letter K = latin Ḵ ḵ, native hebrew script kaf + rėvīắ / foreign letter T = latin Ṯ ṯ, native hebrew script tạw + rėvīắ. A secondary variant is closer to the common practice in modern hebrew: foreign letter K = latin Ɋ ɋ, native hebrew script qōf + rėvīắ / foreign letter T = latin Ț ț, native hebrew script ṱẹt + rėvīắ. The preferred ortography is defined in each text style standard. If the secondary variant is used in master text, converting it to another text style results in a character that is secondary for that text style.
GERESH as marker of foreign pronunciation, in modern hebrew. Romanized in the MITT text styles with non-combining MIDDLE GRAVE ACCENT (˴) before the letter (not after it, as in native script). However, if used in a hebrew word on a word-final ắyin, the marker is written after the ắyin (e.g. źevắ˴), because the ġāĭn sound would be pronounced after the last vowel, not before it.
˴G ˴g = foreign sound DZ [Ǧ ǧ] => gīmẹl + gereš
˴D ˴d = foreign sound voiced TH (as in "this") [Ḓ ḓ] => dạlet + gereš
˴W ˴w = foreign letter W [Ŵ ŵ] => wạw + gereš
˴Z ˴z = foreign sound SZ [Ž ž] => zayin + gereš
˴K ˴k = arabic letter ḫẵ [Ḫ ḫ] => ḫẹt + gereš
˴S ˴s = arabic letter șād [Ș ș] => sạmeķ + gereš
˴ᶜ ˴ˁ = arabic letter ġayn [Ġ ġ], secondary variant => ắyin + gereš -- Modified glyph in MITT fonts: ˁ => ᶜ positioned lower than usual, with its top edge level with the top edge of "e".
˴Ź ˴ź = foreign sound CH [Č č] => źạdeh + gereš
˴R ˴r = arabic letter ġayn [Ġ ġ], primary variant => rẹš + gereš
˴T ˴t = foreign sound voiceless TH (as in "thin") [Ṫ ẗ] => tạw + gereš
Gereš is the most common method for indicating these variants of pronunciation and ortography in modern hebrew. The MITT master text styles do not use the gereš for this purpose (neither in romanized text nor in hebrew native script), but these variants are used when the text is converted into a text style that prefers this ortography.
REVIA as marker of alternative pronunciations in judeo-arabic texts. Romanized in the MITT text styles with non-combining DOT ABOVE (˙) before the letter (while rėvīắ is above the letter in native script). However, if used in a hebrew word on a word-final ắyin, the marker is written after the ắyin (e.g. źevắ˙), because the ġāĭn sound would be pronounced after the last vowel, not before it. The MITT text styles support indicating the judeo-arabic rėvīắ with non-combining DOT ABOVE after the hebrew letter (and its possible vowel), as rėvīắ is reserved for a different purpose in these text styles.
˙G ˙g = arabic letter ġayn [Ġ ġ], third variant => gīmẹl + rėvīắ [MITT: ˙]
˙D ˙d = arabic letter ḓāl [Ḓ ḓ] => dạlet + rėvīắ [MITT: ˙]
˙Z ˙z = arabic letter ẓẵ [Ẓ ẓ], secondary variant (arabic ṱẵ with dot) => zayin + rėvīắ [MITT: ˙]
* ˙Ḫ ˙ḫ = arabic letter ḫẵ [Ḫ ḫ] => ḫẹt + rėvīắ [MITT: ˙]
˙Ṱ ˙ṱ = arabic letter ẓẵ [Ẓ ẓ], primary variant (arabic ṱẵ with dot) => ṱẹt + rėvīắ [MITT: ˙]
* ˙ᶜ ˙ˁ = arabic letter ġayn [Ġ ġ], secondary variant => ắyin + rėvīắ [MITT: ˙ / * also in MITT primary standard ắyin + rėvīắ is arabic letter ġayn] -- Modified glyph in MITT fonts: ˁ => ᶜ positioned lower than usual, with its top edge level with the top edge of "e".
˙P ˙p = arabic letter fẵ [F f] => pē + rėvīắ [MITT: ˙]
˙Ź ˙ź = arabic letter ḑād [Ḑ ḑ] (arabic șād with dot) => źạdeh + rėvīắ [MITT: ˙]
˙R ˙r = arabic letter ġayn [Ġ ġ], primary variant => rẹš + rėvīắ [MITT: ˙]
˙T ˙t = arabic letter ẗẵ [Ṫ ẗ] => tạw + rėvīắ [MITT: ˙]
SEGOLTA as marker of alternative pronunciations in judeo-arabic texts. Romanized in the MITT text styles with non-combining DIAERESIS (¨) before the letter (while sėgoltā is above the letter in native script). The MITT text styles support indicating the judeo-arabic rėvīắ with non-combining DIAERESIS after the hebrew letter (and its possible vowel), as sėgoltā is reserved for a different purpose in these text styles.
¨S ¨s = arabic letter šīn [Š š] => dotless śīn + sėgoltā
¨T ¨t = arabic letter ẗẵ [Ṫ ẗ] => tạw + sėgoltā
Rėvīắ and sėgoltā are used in medieval judeo-arabic for writing arabic letters with the hebrew script.
Arabic letter ġayn has three variants with rėvīắ, and two variants with gereš. Arabic letter ẓẵ has two variants with rėvīắ. If the third variant of ġayn is used in a text that is converted into a text style that has only one or two variants of ġayn, it is converted into the primary ġayn variant of the target text style. The secondary variant of a letter in source text is replaced with the secondary variant of the target text style -- if the target text style has more than one variant.
Uppercase and lowercase letters:
Hebrew script is traditionally unicase, without a separate set of uppercase and lowercase letters. A distinction of two cases is practically never indicated with letter size either, in small caps style. Text styles HE-NT-P, -D, -C and -S treat the letters of traditional hebrew script as lowercase, and use hebrew mark masora circle ( ْ ) to indicate uppercase letters. (These text styles do not use the masora circle for any other purpose than indicating uppercase letters.)
.הִ֯יא גָרָה בְ תֵ֯ל אָ֯בִיב, בִ רְחוֹב יְ֯הוּדָה הַ לֵ֯וִי
Hí gạrạh bė Tẹl Ạvīv, bi rėḫōv Yėhūdạh ha Lẹwī.
."הַ֯ קִצּוּר ד֯ה֯עָ֯״ה֯ אוֹמֵר "דָ֯וִד הַ מֶלֶך, עָלָיו הַ שָלוֹם
Ha qiźźūr DHẬ∶H ōmẹr "Dạwid ha meleķ, ậlãw ha šạlōm".
A whole word (or a longer text) can be indicated as uppercase with vertical four dots ⁞⁞ ... ⁞⁞⁞. (Also tricolon ⁝ was originally considered for this purpose, but it seems to be less widely supported by fonts.) These are LTR characters, and therefore each vertical four dots is always followed by a right-to-left mark in native hebrew script text, to ensure that the text would behave correctly in all possible circumstances.
."הַ֯ קִצּוּר ⁞⁞ דהעָ״ה ⁞⁞⁞ אוֹמֵר "דָ֯וִד הַ מֶלֶך, עָלָיו הַ שָלוֹם
Ha qiźźūr ⁞⁞ DHẬ∶H ⁞⁞⁞ ōmẹr "Dạwid ha meleķ, ậlãw ha šạlōm".
Writing some small words separately, or together with the next word:
Prepositions and the word "and" are traditionally written together with the next word in hebrew script. All these romanized text styles and most of the hebrew script text styles write them as separate words, however. If a definite article or other prefix causes the first letter of the next word to be doubled, this is not shown in romanized writing (and not in hebrew script text styles X, Y and Z either): hạ ạreź, ba midbạr, we gam. Sometimes a prefix is joined to the next word with a hyphen, for example to indicate that "ha" means something else than a definite article: ha-im, ha-rạītạh?
The hebrew script text styles X and Y use a thin space (which is approximately 50 % narrower than an ordinary space) between the main word and the preceding word "and", or a definite article, or other prefixes, and the first letter of the next word is not doubled with a dạgẹš due to the prefix or definite article. If these text styles are converted into more traditional text styles in hebrew script, the thin spaces are removed (joining the prefix or word "and" to the next word), and the first letter of the next word is doubled (if grammar requires so).
When romanized text styles are converted into hebrew script text styles, the conversion algorithm automatically detects such prefixes that should be joined to the main word, and which might cause the first letter of the next word to be doubled. If the desired outcome differs from this assumption, any separately written romanized word can be forced to be joined to the next word in hebrew script by using a thin space instead of an ordinary space between the words. Or if you want to prevent that a word, which looks like a common prefix in hebrew language, will not be joined to the next word in hebrew script, use two spaces between the prefix and the next word, or a three-per-em space (which looks similar to an ordinary space, but is a different Unicode character).
The surrounding context generally does not affect how a word is written, even if it affects how it is pronounced: "wė kạtavtī" is written with a hard K, but is pronounced "wė ķạtavtī". The definite article and other one-syllable prefixes change their vowels based on the context, however: bė gan, ba gan, be ėmet, bi Yėrūšạlaĭm.
An untraditional word gap, which separates a prefix or "and" from the next word:
HE-RZ: - as a visible space = PUNCTUATION SPACE " " (in MITT fonts modified as identical with an ordinary space, in other fonts not much different from that) - as a hidden space = HAIR SPACE " " (modified in MITT fonts to have width 1 px)
HE-NT: - as a visible space = THIN SPACE " " (in MITT fonts modified as having 50 % width of an ordinary space, in other fonts typically 60 % - 70 %) - as a hidden space = HAIR SPACE " " (modified in MITT fonts to have width 1 px)
Discarded space variants: When copy-pasting from Word, FOUR-PER-EM SPACE " " becomes an ordinary SPACE, and ZERO WIDTH SPACE "​" completely disappears, as if it did not exist in the text. These changes might happen when text is copied out of Word. The correct characters may exist in the Word document. HAIR SPACE, THIN SPACE and PUNCTUATION SPACE retain their Unicode characters in Word. None of these problems were witnessed in Firefox web browser.
Final "sofit" forms of hebrew letters K, M, N, P and Ź
Five letters of the hebrew alphabet have a different "sōfīt" form, when the letter is the last consonant of a word: K, M, N, P and Ź. The sōfīt form is normally not indicated in romanized text. The text conversion algorithm assumes that each of these letters should have the sōfīt form as the last letter of a word that has more than one consonant. These letters take the main form in other positions in a word, and as an isolated single consonant (with or without vowels).
However, it is possible to force a hebrew letter to have the main form as the last letter in a word, and it is possible to force a hebrew letter to have the sōfīt form when it normally should have the main form. In hebrew script this would be done by typing the main or final letter form with the computer keyboard, whenever the user wishes to do so. In these romanized text styles, untypical usage of the main or sōfīt form is indicated with Unicode character MODIFIER LETTER END LOW TONE ( ˼ ), which practically means: "use the other form than should be normally used". Examples of hebrew letters that would be produced, if the following romanized characters are converted into hebrew script:
mi => mẹm + basic I / m˼i => mẹm sōfīt + basic I / gam => gīmẹl + short A + mẹm sōfīt / gam˼ => gīmẹl + short A + mẹm
Extra-wide hebrew letters
Hebrew words are practically never divided on two text rows with hyphenation. This can sometimes cause a text row to have quite much empty space, which is an esthetic challenge, how to make the text look beautiful and visually balanced. The most common solution in modern bookprinting is to make the spaces between words wider, so that the text fills the full width of the column. In medieval and ancient times it was customary for hebrew bookprinters or scribes to make some letters wider than usual, instead of making some spaces wider than usual.
The Unicode standard includes eight extra-wide hebrew letters: ạlef, D, H, K, L, M sōfīt, R and T. To achieve full compatibility with the hebrew script without any loss of information, these romanized text standards incidate an extra-wide letter with two "ano teleia" punctuation marks after the consonant: Gam·· Rạḫẹl·· rōźạh·· lištōt··.
Hebrew vowels have three theoretical lengths: rapid, short and long. In colloquial speech the longest vowel is often equal to the short vowel (especially in other positions than the last syllable), as the speaker wants to complete the communication more quickly. The MITT text styles mark long A vowels as "shortened" in scenarios where practically nobody ever pronounces the vowel as long, e.g.
The long A vowel is sometimes ambiguous in hebrew script, because a similar vowel point can also mean weak Ọ. Long A can be confirmed as the meaning by writing a meteg next to the vowel point. (But weak Ọ cannot be confirmed as the meaning, except by using the different Unicode character QAMATS QATAN, but in most fonts it looks identical to the long A vowel.) The character ₗ is available as the romanized version of meteg, but normally it is never necessary to use this character, unless one wants to emphasize (for some exceptional reason) that a meteg exists in the text. The combination of long Ạ + meteg (Ạₗ ạₗ) is called strong A / Ā ḫạzạq / qạmạź ḫạzạq.
A šwā is "strong", when it is in a position where it should be pronounced as a rapid E according to traditional grammar and pronunciation. In any other position a šwā is "weak". However, many people leave the strong šwā silent in some situations, e.g. pėrī => prī, gėvīnạh => gvīnạh, dabbėrū => dabbrū. In this case the šwā is "silent" for the speaker, but it is nevertheless "strong" grammatically, because of its position in the word. The correct ortography of šwā in romanized hebrew can be regarded as a matter of taste in such cases. However, the MITT text styles recommend to never leave a strong šwā unwritten in verb forms.
A prefix can cause the strong šwā in the first syllable of a word to become weak, e.g. mėhīrūt => bi mḙhīrūt (bimhīrūt). The romanized MITT text styles generally recommend maintaining the ortography of words regardless of prefixes. Thus the word "mėhīrūt" retains a strong šwā in the expression "bi mėhīrūt", although speakers will leave the šwā silent in this scenario.
Some people pronounce the weak šwā in many scenarios, e.g. hạlķū => hạlḙķū, most notably between two nearly similar consonants, e.g. lạmadtī => lạmadḙtī. In this case the šwā is "voiced" for the speaker, but it is nevertheless "weak" grammatically, because of its position in the word. Thus the terms "strong" and "weak" describe the logical position of the šwā in a word, regardless of if people will pronounce it aloud or not.
In romanized hebrew the weak šwā is usually left unwritten. It can become customary to leave also the strong šwā unwritten, when it occurs between two consonants that are easy to pronounce together without a vowel between them. The word pėsanḙtẹr becomes much more convenient to read, when the strong šwā and weak šwā are left unwritten: psantẹr. However, the text is probably easier to read, if the strong šwā is systematically written in nearly all situations, e.g. "hẹm mėvaqqėšīm še kullėķem tištammėšū ba sėfạrīm", not "hẹm mvaqqšīm še kullķem tištammšū ba sfạrīm".
When related to vowel E, the terms "strong" and "weak" have a different meaning, which has nothing to do with šwā or the position of the vowel in a word. In archaic hebrew the "strong E" and "weak E" were two different sounds. In modern hebrew they have the same sound, but they still have different vowel points. The strong E, źẹreh, is often long (as in "hẹm"), but not always long (as in "šnẹy"). The weak E, segōl, is often short (as in "lạhem"), but not always short (as in "qạšeh").
"Added A" is an additional rapid A vowel, which is inserted between a vowel and a word-final ắyin, ḫẹt or double hē in the pronunciation of masoretic hebrew (and all later dialects of hebrew) since approximately 600 CE. This vowel is traditionally written with the vowel point of short A, but its pronunciation is rapid. When a word ends with a vowel + added A + ắyin, the romanized ortography is -ắ (instead of -ⱥˁ).
"Crammed I" is an additional I vowel, which is sometimes inserted between a vowel and the next consonant (against the normal rules of hebrew ortography) in the masoretic text of Tanakh: Yėrūšạlaɨm.
"Shortened" vowels are usually pronounced as short in modern hebrew, but in traditional ortography they are written as long: e.g. Yėrūšạlẹm => Yėrūšảlẹm, Banglạdeš => Banglảdeš.
"Trailing" vowels are basic vowels + a silent hē, as the last letters of a word.
"Full" vowels include a long or strong vowel, and the most typical mater lectionis consonant. Full A is long A + vowelless ạlef. Full E is strong E + vowelless ạlef. Full I is basic I + vowelless yōd. Full O is strong O on wạw. Full U is strong U attached to wạw. (When related to vowels O and U, the term "strong" refers to visual style and location of the vowel point, which is attached to the mater lectionis consonant, not under the previous consonant like the basic O and U vowel points are.)
"Notable" vowels are written with a mater lectionis consonant in unvowelized hebrew, but in vowelized text the traditional grammar expects them to be without a mater lectionis consonant. Notable A, I, O and U receive the same mater lectionis consonant in unvowelized text as a "full" vowel would have: short A + vowelless ạlef, long A + vowelless ạlef, basic I + vowelless yōd, strong O on wạw, and strong U attached to wạw. Notable E vowels receive a different mater lectionis consonant than full E has: weak E + vowelless ắyin, and strong E + vowelless yōd. (Notable weak E is obsolete in modern hebrew, but it can still be a reasonable ortographical choice for yiddish names, to write the yiddish name as such, without hebraizing it. Or for documenting what the original yiddish form of the hebraized name is.)
NOTE: There are two possible ways to write ắyin as a glottal stop after consonant: "ắ" and "ˁa" are synonyms, also "ô" and "ˁō" are synonyms -- these behave similarly in transliteration. Examples: kimˁaṱ (this style is my preference in most cases) / kimắṱ (this style is my preference in proper nouns).
Grammatically or traditionally short "a", which is advised to be written with ạlef in unvowelized modern hebrew.
alephic yōd: Ị ị --- --- = (possible ạlef) + I + (possible ạlef). alephic yōd yōd ȧlạfīt = primary yōd ȧlạfīt, a written yōd that represents pronounced ạlef + I (if preceded by a vowel or word break), or ạlef + I + ạlef (if preceded by a vowel or word break, and followed by a vowel), or I + ạlef (if followed by a vowel). Used in some hebrew words of foreign origin, which contain the diphthong IA, IO or IU: Iṯạlyạh => Iṯạlịảh (expected: Iṯạliah), Serbyạh => Serbịảh (expected: Serbiah) / rạdyō => rạdịỏ (expected: rạdio), sṯūdyō => sṯūdịỏ (expected: sṯūdio) / Hānōy => Hảnỏị (expected: Hanoi) / Yōn => Ịon (expected: Ion) / Gāyūs => Gảịủs (expected: Gaius).
Ị ị = primary yōd ȧlạfīt, a written yōd that represents pronounced ạlef + I (if preceded by a vowel or word break), ạlef + I + ạlef (if preceded by a vowel or word break, and followed by a vowel), or I + ạlef (if followed by a vowel). Used in some hebrew words of foreign origin, which contain the diphthong IA, IO or IU: Iṯạlyạh => Iṯạlịảh (expected: Iṯạliah), Serbyạh => Serbịảh (expected: Serbiah) / rạdyō => rạdịỏ (expected: rạdio), sṯūdyō => sṯūdịỏ (expected: sṯūdio) / Hānōy => Hảnỏị (expected: Hanoi) / Yōn => Ịon (expected: Ion) / Gāyūs => Gảịủs (expected: Gaius).
Ï ï = secondary yōd ȧlạfīt, a single character that represents typically written I + yōd, which is pronounced as long I + ạlef: Riyō => Rïỏ (expected: Rīỏ).
Ȉ ȉ = third yōd ȧlạfīt, a single character that represents typically written I + double yōd, which is pronounced as short I + ạlef: Pōnṱiyyūs => Pỏntȉủs (expected: Pontius), Priyyūs => Prȉủs (expected: Prius).
Ɨ ɨ = crammed I: in the masoretic Bible texts, a second vowel between two consonants (which is normally not possible): Yėrūšảlaɨm. In hebrew script, simply an ordinary I directly after another vowel point.
Weak Ọ is traditionally written with long A (in an unstressed syllable), which causes confusion among inexperienced readers. The Unicode standard has separate character encodings for long A and weak O, but font designers usually make them identical visually. The most common visual difference is that weak O is vertically much higher than long A -- despite its name describing it as "small" qạmạź. Another traditional way to disambiguate these two vowels is writing a meteg (vertical line) on the left side of the long A in an unstressed closed syllable. In the MITT fonts weak O is designed as a horizontal line with a dot below it. An alternative way to clearly indicate the weak O (in any font that supports the hebrew cantillation marks) is using etnaḫtā (an arrowhead pointing upwards) instead of qạmạź qạṱạn. This ortography is not widely known, however, and it may require encountering a few such words, before the reader will spontaneously understand its meaning.
Vowels in standard hebrew text:
(This list does not include foreign vowels -- such as german ä, ö and ü, or scandinavian å -- or rare special scenarios in hebrew, such as vowel changes caused by disambiguated grammar. Foreign vowels and special scenarios are discussed elsewhere in this document.)
!!! YIDDISH:
** YI-RZ-P Precise romanized yiddish, fully vowelized.
YI-RZ-T Traditional romanized yiddish.
YI-RZ-M Minimally vowelized romanized yiddish. (IS THIS NEEDED: DOES THIS DIFFER FROM -T?)
** YI-NT-P Precise fully vowelized yiddish. All such vowels are marked with hebrew vowel points, in which traditional yiddish text is potentially ambiguous.
YI-NT-T Traditional yiddish. Some but not all vowels are marked, as is typical in traditional yiddish text. Notable short A is printed as ạlef + short A. Notable long A is printed as ạlef + long A. Foreign B is printed as bẹyt + dạgeš. [Does W = U need its own romanization?] Romanized ortography is available for the yiddish diphthongs. Explain possibilities for Y and YI. Foreign F is printed as peh + rạfeh. Also note: yōd-hiriq means I, not YI.
YI-NT-M Minimally vowelized yiddish. (IS THIS NEEDED: DOES THIS DIFFER FROM -T?)
YI-NT-C Compact yiddish. Vowels are written with large vowel points on the text base line (rotated 90 degrees), instead of consonants ạlef, yōd and ắyin.
YI-NT-E Extra-compact yiddish. Similar to YI-NT-C, but word-initial ạlef and ắyin are usually omitted (which means that a word can begin with a vowel point without a carrier consonant). This completes the principle of yiddish ortography that (nearly) all vowels are alphabetical letters in their own right. In this case not full-sized consonants, but instead vowel points on the text line.
İ ı = yōd with high basic I, a rarely used special Unicode character, in which yōd has a basic I vowel that is positioned much higher than usual. This Unicode character is apparently used in yiddish only, and it is pronounced "I", not "YI".
#yi
YIDDISH TEXT SAMPLE:
YI-NT-T אַ גליק איז, וואָס הענרי ראָוז איז נאָך אַלץ לײַכטער צו רעדן ייִדיש איידער ענגליש
YI-NT-C
YI-NT-E
YI-RZ-P Romanized yiddish within its own text style: A gliq iz, wos Henry Rouz iz noķ alź layķṱer źu redn yidiš eyder engliš.
HE-RZ-P Yiddish quoted within romanized hebrew: A glỉq ỉz, ẇỏs Hẻnry Rỏụz ỉz nỏķ alź laẏķṱẻr źu rẻdn yĭdỉš eẏdẻr ḗnglỉš.
#ac
MASORETIC ARAMAIC TEXT SAMPLE:
AC-NT-P מַלכָּא לְ עָלְמִין חֱיִי! אֱמַר חֶלמָא לְ עַבדָיך, וָ פִשׁרָא נְחַוֵּא
AC-RM-P Malkā lė ậlėmīn ḫèyī! Èmar ḫelmā lė ắvdạȳķ, ẉė pišrā nėḫawwē.
The text styles and romanization standards of hebrew are suitable for masoretic aramaic of the Tanakh, with the notion that aramaic Șạḓē is Ș ș, not Ź ź as in hebrew.
#sy
SYRIAC ARAMAIC TEXT SAMPLE -- the sample should have also a word-final ayin and chet:
SY-NT-W ܗܳܠܶܝܢ ܐܶܢܶܝܢ ܓ݁ܶܝܪ ܬ݁ܠܳܬ݂ ܕ݁ܰܡܟ݂ܰܬ݁ܪܳܢ: ܗܰܝܡܳܢܽܘܬ݂ܳܐ ܘܣܰܒ݂ܪܳܐ ܘܚܽܘܒ݁ܳܐ
SY-RZ-W West syriac aramaic: Hạleyn eneyn geyr tėlạẗ damķattėrạn: haymạnūẗā wė savrā wė ḥūbbā.
Pairs of soft hard and soft consonants in syriac: B b / V v, G g / Ġ ġ, D d / Ḓ ḓ, K k / Ķ ķ, P p / F f, T t / Ṫ ẗ.
Wovels in east syriac: A a, Ạ ạ, Ā ā, Ė ė, E e, É é, EY ey, Ẹ ẹ, Ē ē, ẸY ẹy, Ī ī, Ō ō, Ū ū.
Wovels in west syriac: A a, Ạ ạ, Ā ā, Ė ė, E e, É é, EY ey, I i, Ō ō, Ū ū.
The romanization standards for syriac aramaic are nearly identical with those of hebrew, with the notion that syriac Șạḓē is Ș ș, not Ź ź as in hebrew.
Unicode characters specifically for yiddish:
HEBREW LETTER YOD WITH HIRIQ
HEBREW LIGATURE YIDDISH YOD YOD PATAH
HEBREW LETTER BET WITH RAFE
HEBREW LETTER KAF WITH RAFE
HEBREW LETTER PE WITH RAFE
ADD PRECISE SUPPORT FOR THESE YIDDISH FEATURES (OR THEN: * HANDLE (NEARLY) ALL OF THESE WITH A YIDDISH TEXT STYLE?):
* ạlef + short A = pronounced: short A + ạlef (solemn A) =>
* ạlef + long A = pronounced: ashkenazi long A + ạlef (full A) =>
? yōd + I = pronounced: I => Ĭ ĭ / or is it this one: Yİ yı = yōd with high basic I, a rarely used special Unicode character ?
= beyt with dagesh => foreign B
* wạw as U => maybe: Ų ų
? double wạw => I have this already (?)
* ắyin as E without vowel points => consider if this is necessary
= feh with rafeh => foreign F
= thaw = ashkenazi s-like T
* yōd => alephic yōd (when vowel I) / yōd (when consonantal)
    VOWEL     ẮYIN +    ĠĀĬN +
A   ONLY      VOWEL     VOWEL     COLLOQUIAL NAME          COLLOQUIAL IN HEBREW   FORMAL NAME IN HEBREW     EXAMPLES

   # Ⱥ ʌ       ---       ---      low shwa                 šwā nạmūķ              šwā nạmūķ                 ĭṡrėẹlī => ĭṡrʌẹlī
     Ȧ ȧ     # Ằ ằ     # Ǻ ǻ      rapid A                  Ā mạhīr                ḫȧṱaf pattạḫ              ȧšer, ằvōdạh, Ǻmōrạh
   # Ɐ ⱥ       ---       ---      guttural enhancer A      mėšappẹr gėrōnīt Ā     pattạḫ gạnūv              rūⱥḫ / šạvūắ (šạvūⱥ)
     Ə ə       ---       ---      neutral guttural enhancer       mėšappẹr gėrōnīt nẹyṯrạlī                 Noəḫ, Hōšẹə
     A a     # Ắ ắ     # Ầ ầ      short A                  Ā qạźạr                pattạḫ                    gam, ắm, Ầzzạh
     AH ah   # ẮH ắh   # ẦH ầh    trailing short A         Ā qạźạr nigrạr         pattạḫ nigrạr             mah-
     Ą ą       ---       ---      notable short A          Ā qạźạr bōlẹṱ          pattạḫ bōlẹṱ              Ḵąrl, Ṯhỏmąs, Yąnqel
     Ȃ ȃ       ---       ---      tranquil A               Ā šạlẹw                pattạḫ šạlẹw              ládonī => lȃ Ȧdonī
     Ả ả       ---       ---      shortened A              Ā mėquźźạr             qạmạź mėquźźạr            Yėrūšạlẹm => Yėrūšảlẹm
 + # ẟ ɷ       ---       ---    * - " - (ashkenazi)        Ȫ mėquźźợr             qợmợź mėquźźợr            Yėrūšạlẹm => Yėrūšɷlẹm
     Ã ã       ---       ---    * lengthened A             Ā mėōrạķ               pattạḫ mėōrạķ             kãmmạh, mãššehū
 + # Ỡ ỡ       ---       ---    * - " - (ashkenazi)        Ȫ mėōrợķ               pattợḫ mėōrợķ             kỡmmợh, mỡššehū
     Ạ ạ     # Ậ ậ     # Ấ ấ    * long A                   Ā ạrōķ                 qạmạź [gạdōl]             šạm, ậrīm
 + # Ợ ợ   + # Ồ ồ   + # Ở ở    * - " - (ashkenazi)        Ȫ ợrōķ                 qợmợź [gợdōl]             šợm, ồrīm
 + # Ɒ ɒ   + # Ȭ ȭ   + # Ṏ ṏ    * - " - (ashkenazi 2nd)    Ȫ ɒrōķ                 qɒmɒź [gɒdōl]             šɒm, ȭrīm
     Ạₗ ạₗ    # Ậₗ ậₗ     # Ấₗ ấₗ   * strong A                 Ā ḫạzạq                qạmạź ḫạzạq               šạₗmrạh / šợₗmrạh / šɒₗmrạh
   # Ḁ ḁ       ---       ---      added feminine marker A  Ā nōsạf šel nėqẹvạh    qạmạź nōsạf šel nėqẹvạh   taȧmīn => taȧmīnḁ
   # Ự ự       ---       ---      - " - (ashkenazi)        Ȫ nōsợf šel nėqẹvợh    qợmợź nōsợf šel nėqẹvợh   taȧmīn => taȧmīnự
   # Ǡ ǡ       ---       ---      A in place of E          Ā bi mėqōm Ê           qạmạź bi mėqōm segōl ō źẹreh
                                                                                                            takkeh => takkǡh
   # Ư ư       ---       ---      - " - (ashkenazi)        Ȫ bi mėqōm Ê           qợmợź bi mėqōm segōl ō źẹreh
                                                                                                            takkeh => takkưh
     Ȁ ȁ       ---       ---      postponed A              Ā nidḫeh               pattạḫ nidḫeh [kė qạmạź]  tėqạfatnī => tėqạftȁnī
     Ữ ữ       ---       ---      - " - (ashkenazi)        Ȫ nidḫeh               pattợḫ nidḫeh [kė qợmợź]  tėqợfatnī => tėqợftữnī
     Ã ã     # Ẫ ẫ     # Ǣ ǣ    * straight AY              ẠY yạšạr               qạmạź-yōd yạšạr           eḫãw, rẹẫw
 + # Ỡ ỡ     # Ǭ ǭ     # Ȱ ȱ    * - " - (ashkenazi)        ỢY yợšợr               qợmợź-yōd yợšợr           eḫỡw, rẹǭw
     Ă ă       ---       ---      simplified short YYA     YYĀ qạźạr mėfuššạṱ     yōd-dạgẹš-pattạḫ mėfuššạṱ îriyyat => îriăt
     Ặ ặ       ---       ---    * simplified long YYA      YYĀ ạrōķ mėfuššạṱ      yōd-dạgẹš-qạmạź mėfuššạṱ  šėtiyyạh => šėtiặh
 + # Ṍ ṍ       ---       ---    * - " - (ashkenazi)        YȪ mėfuššợṱ            yōd-qợmợź mėfuššợṱ        šėtiyyợh => šėtiṍh
 + # Ǽ ǽ       ---       ---    * - " - (ashkenazi 2nd)    YȪ mėfuššɒṱ            yōd-qɒmɒź mėfuššɒṱ        šėtiyyɒh => šėtiǽh
     ẠH ạh   # ẬH ậh   # ẤH ấh  * trailing long A          Ā ạrōķ nigrạr          qạmạź nigrạr              pinnạh, yėdūậh
   # Ʌ ᶏ       ---       ---    * notable long A           Ā ạrōķ bōlẹṱ           qạmạź bōlẹṱ               Bᶏden, Le Mᶏns
 + # Ɑ ᶐ       ---       ---    * - " - (ashkenazi)        Ȫ ợrōķ bōlẹṱ           qợmợź bōlẹṱ               Hᶐdel, Yᶐsel
     Á á       ---       ---      solemn (short) A         Ā (qạźạr) rėźīnī       pattạḫ rėźīnī             liqrát, Blūmá
     Ā ā       Â â     # Ǟ ǟ    * full A                   Ā mạlē                 qạmạź mạlē                bātī
 + # Ȫ ȫ   + # Ố ố   + # Ớ ớ    * - " - (ashkenazi)        Ȫ mợlē                 qợmợź mợlē                bȫtī
* The long A vowel is pronounced as O in tiberian hebrew since ca. 600 CE, and in ashkenazi hebrew since ca. 1350 CE. However, also foreign names can contain a long A, and they would not be pronounced as O. It is possible to have both the ashkenazi O and foreign long A in the same (romanized) text, if foreign long A is marked with the primary long A characters, and hebrew long A is marked with the ashkenazi variant characters. Shortened A may change its colour in ashkenazi pronunciation, if the long O of traditional ortography is shortened into A.
Lengthened A and straight AY use the same character à ã in romanized hebrew. If the next letter is vowelless wạw, it is straight AY. Otherwise it is lengthened A. A different ortography of the same scenario uses "silent Y", which is described at vowel I.
Simplified YA replaces the characters YYẠ with one character Ặ in romanized hebrew, to make the text lighter to read.
Ȃ ȃ = 1) As the last letter of a preposition, represents a short A, and indicates that the first consonant of the next word should not be doubled, in a preposition that normally causes doubling of the next consonant: haylạdīm => hȃ yėlạdīm (expected form, with doubling: hayyėlạdīm => ha yėlạdīm).
2) As the last letter of a preposition, in the scenario "-ȃ ȧ-" represents a short A + end of preposition (without a space in traditional text styles, with a space in some modern text styles) + the next word begins with a vowelless ạlef (-á-), instead of the expected "-a ȧ-": e.g. laᵓdonī (masoretic) => lȃ Ȧdonī // la Ȧdonẹy (also masoretic).
    VOWEL     ẮYIN +    ĠĀĬN +
E   ONLY      VOWEL     VOWEL     COLLOQUIAL NAME          COLLOQUIAL IN HEBREW   FORMAL NAME IN HEBREW     EXAMPLES

   # Ɵ ɵ       ---       ---      neutral shwa             šwā nẹyṯrạlī           šwā nẹyṯrạlī              lɵ, wɵ, bɵgạdīm
   # Ḙ ḙ       ---       ---      weak shwa                šwā ḫallạš             šwā nạḫ                   lạmadḙtī, hạlḙķū
     Ė ė       ---       ---      strong shwa              šwā ḫạzạq              šwā nậ                    lė, wė, bėgạdīm
   # ƺ ɝ       ---       ---      shwa in place of A       šwā bi mėqōm Ā         šwā bi mėqōm pattạḫ ō qạmạź
                                                                                                            tivraḫ => tivrɝḫḁ
     Ȅ ȅ       ---       ---      shwa in place of weak E  šwā bi mėqōm Ê ḫallạš  šwā bi mėqōm segōl        tėlammedķem => tėlammȅdḁķem
   # ⱻ ɘ       ---       ---      shwa in place of strong E
                                                           šwā bi mėqōm Ê ḫạzạq   šwā bi mėqōm źẹreh        tėdabbẹr => tėdabbɘrḁ
   # ᵫ ɞ       ---       ---      shwa in place of weak O  šwā bi mėqōm Ō ḫallạš  šwā bi mėqōm qạmạź qạṱạn  tizkọrķạ => tizkɞrḁķạ
   # Ȼ ȼ       ---       ---      shwa in place of (basic) O
                                                           šwā bi mėqōm Ō (rạgīl) šwā bi mėqōm ḫōlem        tiķtǫv => tiķtȼvḁ
   # È è     # Ḕ ḕ     # ƹ ᶓ      rapid E                  Ê mạhīr                ḫȧṱaf segōl               èmet, leèķol
     E e     # Ḗ ḗ     # Ế ế      weak E                   Ê ḫallạš               segōl                     šemeš, ḗzrạh
     EH eh   # ḖH ḗh   # ḔH ḕh    trailing weak E          Ê ḫallạš nigrạr        segōl nigrạr              rōźeh, ṱōḗh
     Ȇ ȇ       ---       ---      tranquil E               Ê šạlẹw                źẹreh šạlẹw               lēlohīm => lȇ Èlohīm
     Ẹ ẹ     # Ệ ệ     # Ḝ ḝ      strong E                 Ê ḫạzạq                źẹreh                     kẹn, ệdūt
     ẸH ẹh   # ỆH ệh   # ḜH ḝh    trailing strong E        Ê ḫạzạq nigrạr         źẹreh nigrạr              qėźẹh
     Ẻ ẻ       ---       ---      notable weak E           Ê ḫallạš bōlẹṱ         segōl bōlẹṱ               Pẻrlḗ
     Ę ę     # Ề ề     # Ễ ễ      notable strong E         Ê ḫạzạq bōlẹṱ          źẹreh bōlẹṱ               ręš, bęrẹķ
     É é       ---       ---      solemn (weak) E          Ê (ḫallạš) rėźīnī      segōl rėźīnī              kelé, tėvạrénạh
     Ē ē       Ê ê     # Ể ể      full E                   Ê mạlē                 źẹreh mạlē                rōfē, rēšīt
     Ĕ ĕ       ---       ---      simplified YE            YÊ ḫallạš mėfuššạṱ     yōd-segōl mėfuššạṱ        Ṯurḵiyeh => Ṯurḵiĕh
     Ẽ ẽ       ---       ---      simplified full YYE      YYỆ mạlē mėfuššạṱ      yōd-dạgẹš-źẹreh mạlē mėfuššạṱ
                                                                                                            Dạniyyēl => Dạniẽl
Ȇ ȇ = as the last letter of a preposition: in the scenario "-ȇ è-" indicates a strong E, traditionally no word space, and the next word beginning with a vowelless ạlef (-ē-), instead of the expected "-e è-": e.g. lēlohīm (masoretic) => lȇ Èlohīm // le Èlohīm (nonmasoretic, the expected form). Without a word gap: lēmor (masoretic) => lȇmor // leèmor (nonmasoretic, the expected form).
    VOWEL     ẮYIN +    ĠĀĬN +
I   ONLY      VOWEL     VOWEL     COLLOQUIAL NAME          COLLOQUIAL IN HEBREW   FORMAL NAME IN HEBREW     EXAMPLES

     Ȳ ȳ       ---       ---      silent yōd               yōd šėqẹṱạh            yōd šėqẹṱạh               pạnạȳw
     Ɨ ɨ       ---       ---      crammed I                Ȋ dạḫūs                ḫīrīq dạḫūs               Yėrūšạlaɨm (in Tanakh)
     I i     # Ḯ ḯ     # ƪ ʝ      basic I                  Ȋ rạgīl                ḫīrīq                     hitrạḫẹź, ḯm
     Į į       ---       ---      I in place of shwa       Ȋ bi mėqōm šwā         ḫīrīq bi mėqōm šwā        wa tėhī => wa tįhī
     Ȋ ȋ       ---       ---      tranquil I               Ȋ šạlẹw                ḫīrīq šạlẹw               mīmīnī => mȋ yėmīnī
     Ỉ ỉ     # ɫ ᵼ     # ẛ ɉ      notable I                Ȋ bōlẹṱ                ḫīrīq bōlẹṱ               ḫỉnnūķ, ᵼttōn
     Í í       ---       ---      solemn I                 Ȋ rėźīnī               ḫīrīq rėźīnī              ríšōn
     Ī ī       Î î     # Ɉ ɟ      full I                   Ȋ mạlē                 ḫīrīq mạlē                šīrīm, mōîl
     Ĩ ĩ       ---       ---      supreme I                Ȋ muḫlạṱ               ḫīrīq muḫlạṱ              hĩ
     Ị ị       ---       ---      alephic yōd              yōd ȧlạfīt             yōd ȧlạfīt                Gāyủs => Gảịủs, rạdịỏ
     Ï ï       ---       ---      alephic IY               IY ȧlạfīt              ḫīrīq-yōd ȧlạfīt          Riyỏ => Rïỏ
     Ȉ ȉ       ---       ---      alephic IYY              IYY ȧlạfīt             ḫīrīq-yōd-dạgẹš ȧlạfīt    Priyyūs => Prȉủs
     Ĭ ĭ       ---       ---      simplified YI            YĪ mėfuššạṱ            yōd-ḫīrīq mėfuššạṱ        Ĭṡrạẹl, maĭm / yittẹn
     AẎ aẏ   # ẮẎ ắẏ   # ẦẎ ầẏ    yiddish AY               AY yīdī                yōd-yōd-pattạḫ yīdīt      Baẏlá
     EẎ eẏ   # ḖẎ ḗẏ   # ẾẎ ếẏ    yiddish EY               ḖY yīdī                yōd kėfūlạh yīdīt         Feẏgḗ
     OẎ oẏ   # ṒẎ ṓẏ   # ỔẎ ổẏ    yiddish OY               ŌY yīdī                wạw-yōd yīdīt             Ṱoẏbá
Silent yōd is unpronounced in the suffix -ạȳw in post-Tanakh era hebrew: yạmạȳw, pạnạȳw. (The Tanakh-era pronunciation was -ayū: yạmayū, pạnayū.) Also in some other rare scenarios in the masoretic Tanakh text: ḫoṱíȳm (1 Samuel 14:33), qoríȳm (Psalms 99:6). The MITT text styles prefer the ortography à ã instead of ạȳ: see "straight AY", which is described at vowel A. Also an ordinary yōd is understandable, but not recommended: yạmạyw, pạnạyw, ḫoṱíym, qoríym.
Alephic yōd is a written yōd that represents pronounced ạlef + I, or ạlef + I + ạlef (if preceded by a vowel or word break, and followed by a vowel), or I + ạlef (if followed by a vowel). The most common scenario is when the ortography of modern hebrew simplifies the ... ... ... foreign pronounced diphthong OU into a mere O. Another use scenario is in roman numerals: e.g. XṼỊỊ.
Gāyủs => Gảịủs, rạdịỏ
Alephic IY. Alephic IYY.
Iṯạlyạh => Iṯạlịảh (expected: Iṯạliah), Serbyạh => Serbịảh
Ȋ ȋ = as the last letter of a preposition:
WITH PREFIX MI- AND WORD-INITIAL Y WITH SHWA: in the scenario "-ȋ yė-" indicates vowel I, traditionally no word space, and the next word beginning with a vowelless nondoubled yod: e.g. mīmīnī (masoretic) => mȋ yėmīnī // mi yėmīnī (nonmasoretic, expected form with doubled first yod)
WITH PREFIX MI- AND SOME OTHER WORD-INITIAL CONSONANT WITH SHWA: midvạreyķạ (nonmasoretic, theoretical example, not in use) => mȋ dėvạreyķạ // mi dėvạreyķạ (masoretic, expected doubled first consonant)
WITH PREFIX (OTHER THAN MI-) AND WORD-INITIAL Y WITH SHWA: līrūšạlẹm (masoretic) => lȋ Yėrūšạlẹm // li Yėrūšạlẹm (nonmasoretic, with shwa under yod) // le Yėrūšạlẹm (untraditional, modern colloquial)
WITH PREFIX (OTHER THAN MI-) AND SOME OTHER WORD-INITIAL CONSONANT WITH SHWA: lidvạray (masoretic) => li dėvạray // lȋ dėvạray (indication to have no shwa under D, nonmasoretic and ortographically questionable, understood but not preferred by the text conversion algorithms)
    VOWEL     ẮYIN +    ĠĀĬN +
O   ONLY      VOWEL     VOWEL     COLLOQUIAL NAME          COLLOQUIAL IN HEBREW   FORMAL NAME IN HEBREW     EXAMPLES

   # Ɔ ɔ       ---       ---      round shwa               šwā ậgol               šwā ậgol                  (bɔqạrīm)
     Ȯ ȯ     # Ờ ờ     # Ṑ ṑ      rapid O                  Ō mạhīr                ḫȧṱaf qạmạź               mọḫȯrạtaĭm
   # ʘ ⱺ     # Ȏ ȏ     # ⱷ ᴒ      notable rapid O          Ō mạhīr bōlẹṱ          ḫȧṱaf qạmạź bōlẹṱ         ḫⱺdạšīm
     Ọ ọ     # Ộ ộ     # Ǿ ǿ      weak O                   Ō ḫallạš               qạmạź qạṱạn               kọl, Ộmrī
     O o     # Ṓ ṓ     # Ổ ổ      basic O                  Ō rạgīl                ḫōlem                     Mošeh
     OH oh   # ṒH ṓh   # ṒH ṓh    trailing O               Ō nigrạr               ḫōlem nigrạr              ẹyfoh
     Ǫ ǫ       ---       ---      notable weak O           Ō ḫallạš bōlẹṱ         qạmạź qạṱạn bōlẹṱ         źǫhȯraĭm
     Ỏ ỏ     # Ơ ơ     # ɸ ᴓ      notable O                Ō bōlẹṱ                ḫōlem bōlẹṱ               kạtỏm
     Ó ó       ---       ---      solemn O                 Ō rėźīnī               ḫōlem rėźīnī              ló
     Ō ō       Ô ô     # Ỗ ỗ      full O                   Ō mạlē                 ḫōlem mạlē                rōźeh
     Õ õ       ---       ---      supreme O                Ō muḫlạṱ               ḫōlem muḫlạṱ              yạvõ
     Ŏ ŏ       ---       ---    * simplified YYO           YYŌ mėfuššạṱ           yōd-dạgẹš-ḫōlem mạlē mėfuššạṱ
                                                                                                            Źiyyōn => Źiŏn
    VOWEL     ẮYIN +    ĠĀĬN +
U   ONLY      VOWEL     VOWEL     COLLOQUIAL NAME          COLLOQUIAL IN HEBREW   FORMAL NAME IN HEBREW     EXAMPLES

     U u     # Ṹ ṹ     # Ừ ừ      basic U                  Û rạgīl                qubbūź                    (kutnạh)
     Ủ ủ     # Ử ử     # Ứ ứ      notable U                Û bōlẹṱ                qubbūź bōlẹṱ              mủšlạm
     Ú ú       ---       ---      solemn U                 Ū rėźīnī               qubbūź rėźīnī             [hú]
     Ū ū       Û û     # Ṻ ṻ      full U                   Û mạlē                 šūrūq [mạlē]              ḫạšūv
     Ũ ũ       ---       ---      supreme U                Ū muḫlạṱ               šūrūq muḫlạṱ              hũ
     Ṷ ṷ       ---       ---      masoretic U              Ū mạsọrtī              šūrūq mạsọrtī             wu => ṷ
     Ụ ụ       ---       ---      alephic wạw              wạw ȧlạfīt             wạw ȧlạfīt                w => ụ: maụsỏlỉụm
     Ų ų       ---       ---      yiddish U                Ū yīdī                 wạw yīdīt                 Rųsląnd, Qųbą
     Ŭ ŭ       ---       ---    * simplified YYU           YYŪ mėfuššạṱ           yōd-dạgẹš-šūrūq mėfuššạṱ  heḫlẹṱiŭt
Masoretic U is a more stylish way to romanize the word "and", when its vowel is "u". -- ṷ and wu are synonyms, and behave similarly in transliteration. This trend of hebrew pronunciation leaves the consonantal wạw unpronounced, which ortographically begins this word.
Alephic wạw is a written wạw that represents pronounced ạlef + U (of any length), or ạlef + U + ạlef (if preceded by a vowel or word break, and followed by a vowel), or U + ạlef (if followed by a vowel). The most common scenario is when the ortography of modern hebrew simplifies the foreign pronounced diphthong OU into a mere O: e.g. Oakland = Ōqland => Ǫủḵland (Ǫaḵland), Superbowl = Sủṕerbōl / however: show = šỏủ, Crow = Qrỏủ => Crỏủ. In rare cases the foreign pronounced diphthong AU is simplified into a mere O: mōsỏlỉụm (expected: mawsỏlỉụm) => maụsỏlỉụm.
Yiddish U is written with a vowelless wạw in the native script.
Columns in this table:
The first column in this table shows the romanization of the vowel in most circumstances.
The second column shows the romanization after an ắyin, which is not a distinctive glottal stop. In these forms the typical marker of ắyin, a grave or circumflex accent, is added to the basic form, to the extent that is practically possible in the standardized Unicode characters. This is done to avoid having a glottal stop symbol ˁ in the text, when no glottal stop is pronounced, and to make the text perhaps a bit lighter to read. It is acceptable to write ˁ + the basic vowel form instead of the symbols in the second column.
The third column shows the romanization after an ắyin that is not a distinctive glottal stop, in a word root in which the ắyin was originally ġāĭn until approximately 200 BCE (e.g. in Gomorrah = Ắmōrạh, Gaza = Ầzzạh). This is a very theoretical etymological purpose, which is not expected to affect how most people pronounce the word in modern speech. In these forms the typical marker of ġāĭn, a turned comma above the vowel, is added to the basic form. Not many such characters are available in the Unicode standard, so these characters require a customized special font to look attractive. It is acceptable to write ᶝ + the basic vowel form instead of the symbols in the third column.
The colloquial name of each vowel in english and hebrew are is terminology, which is introduced in this document for the first time anywhere. The traditional names of vowels are given in the sixth column, with some added clarifying words. The last column gives some examples of circumstances, in which the vowel form occurs in hebrew language.
+ Alternative (duplicate) characters:
OBSOLETE INFORMATION:
Ợ ợ >= Ạ ạ, Ồ ồ >= Ậ ậ, Ở ở >= Ấ ấ = alternative characters for long A, which emphasize the pronunciation "o" in tiberian hebrew since ca. 600 CE, and in ashkenazi hebrew since ca. 1350 CE.
Ơ ơ >= Ạ ạ, Ȭ ȭ >= Ậ ậ, Ṏ ṏ >= Ấ ấ = secondary variants for long A, which emphasize the tiberian and ashkenazi pronunciation "o", without using a dot below (which should emphasize that the vowel is long).
Ȫ ȫ >= Ā ā, Ố ố >= Â â, Ớ ớ >= Ǟ ǟ = alternative characters for full A, which emphasize the tiberian and ashkenazi pronunciation "o".
Ǭ ǭ >= Ʌ ᶏ = alternative characters for notable long A, which emphasize the tiberian and ashkenazi pronunciation "o".
Ỡ ỡ >= Ã ã = alternative characters for lengthened A and straight AY, which emphasize the tiberian and ashkenazi pronunciation "o".
Ṍ ṍ >= Ặ ặ = alternative characters for simplified YA, which emphasize the tiberian and ashkenazi pronunciation "o".
# Modified glyphs in MITT fonts:
The symbol # in the table indicates the presence of a letter that should ideally have a different design than it has in a typical font.
      ˁ + rapid A            Ằ ằ => A a with grave and dot above.
     ᶝ + rapid A            
     ashk. lengthened A     Ỡ ỡ => oval O o with tail, with tilde above.
     ˁ + long A             Ậ ậ => A a with grave above and dot below.
     ᶝ + long A             
     ashk. long A           Ợ ợ => oval O o with tail, with dot below.
     ashk. ˁ + long A       Ồ ồ => oval O o with tail, with grave above and dot below.
     ashk. ᶝ + long A       Ở ở => oval O o with tail, with reversed comma above and dot below.
     ashk. long A 2         Ơ ơ => oval O o with tail.
     ashk. ˁ + long A 2     Ȭ ȭ => oval O o with tail, with grave above.
     ashk. ᶝ + long A 2     Ṏ ṏ => oval O o with tail, with reversed comma above.
     notable short A        Ʌ ᶏ => A a with hook in the left bottom corner.
     ashk. notable long A   Ǭ ǭ => oval O o with tail, with hook in the right bottom corner.
     ashk. simplified YA    Ṍ ṍ => oval O o with tail, with breve above and dot below.
     ashk. full A           Ȫ ȫ => oval O o with tail, with macron above.
     ashk. ˁ + full A       Ố ố => oval O o with tail, with circumflex above.
     ashk. ᶝ + full A       Ớ ớ => oval O o with tail, with acute and reversed comma above (merged into one diacritical mark).
     ˁ + strong E           Ệ ệ => E e with grave above and dot below.
     ᶝ + strong E           
     ˁ + rapid O            Ờ ờ => O o with grave and dot above.
     ᶝ + rapid O            Ṑ ṑ => 
     ˁ + weak O             Ộ ộ => O o with grave above and dot below.
     ᶝ + weak O             Ổ ổ => 
Simplified vowels, in HE-RZ-S and HE-NT-S:
All lengths of the same vowel are written in a similar way, generally without diacritical marks. Šwā is grouped with the E vowels, and written as "E" in romanized text (but as strong E in native script). Long A retains its diacritical mark, a dot below (as this vowel is not "A" in ashkenazi pronunciation, and the length of vowel A often has a significant impact on the meaning of words). Vowels retain also such diacritical marks, which indicate the presence of some consonant in hebrew script, including mater lectionis.
                    WITH    WITHOUT
                    ẮĬN     ẮĬN
A
     rapid A:       Ắ ắ     A a
     added A:       ---     Ɐ ⱥ
     short A:       Ắ ắ     A a
     long A:        Ậ ậ     Ạ ạ
                            Å å   -- alternative characters (without ắyin), to emphasize the tiberian and ashkenazi pronunciation "o".
     full A:        Â â     Ā ā
E
     weak šwā:      ---     E e   -- usually neither written nor pronounced.
     strong šwā:    ---     E e
     rapid E:       ---     E e
     weak E:        Ḗ ḗ     E e
     strong E:      Ḗ ḗ     E e
     full E:        Ê ê     Ē ē
I
     basic I:       Ḯ ḯ     I i
     full I:        Î î     Ī ī
O
     rapid O:       Ṓ ṓ     O o
     weak O:        Ṓ ṓ     O o
     basic O:       Ṓ ṓ     O o
     full O:        Ô ô     Ō ō
U
     basic U:       Ṹ ṹ     U u
     full U:        Û û     Ū ū
Some foreign vowels and consonants:
Below is a guide for transliterating foreign proper nouns from latin script to hebrew script literally, letter by letter -- regardless of the language, or how the word is pronounced. Standards HE-NT-P and HE-NT-C use the primary variant only, which is not in parentheses. Some ambiguity and taking the pronunciation into consideration is allowed in standard HE-NT-S, whose acceptable alternatives are listed in parentheses, using the most basic hebrew alphabet only. Standards HE-NT-P and HE-NT-C do not use any mater lectionis in foreign proper nouns: letters ạlef, wạw or yōd are added only when it is absolutely necessary technically. All other of these standards in hebrew script use mater lectionis in foreign proper nouns quite extensively.
A a = ַ
B b = בּ‎
C c = כ֘ (צ or ס or ק)
D d = ד
E e = ֶ
F f = פ
G g = ג
H h = ה
I i = ִ
J j = י֘ or (י)
K k = כּ or (ק)
L l = ל
M m = מ
N n = נ
O o = ֹ
P p = פּ‎
Q q = ק or (ק or )
R r = ר
S s = ס
T t = ת or (ט)
U u = ֻ
V v = ב
W w = ו or (וו)
X x = ק֘ or (קס)
Y y = י
Z z = ז or (ז or צ)
Ä ä = *
Å å = אָֹ (A and O crammed on the same consonant)
Ö ö = *
Ü ü = *
* Hebrew punctuation mark ֞ is used as umlaut, together with a hebrew vowel mark.
Unicode names of the vowel characters in this list: A = hebrew point patah, E = hebrew point segol, I = hebrew point hiriq, O = hebrew point holam, U = hebrew point qubuts.
Accent marks above a letter:
acute = HEBREW ACCENT QADMĀ ֨ ת֨
grave = HEBREW ACCENT GERESH ֜ צ֜מ֜נֿ֜בֻ֜
umlaut = HEBREW ACCENT GERSHAYIM ֞ צָ֞נֻ֞בֹ֞
circumflex = HEBREW ACCENT OLE ֫ צָ֫נֻ֫ב֫
caron = HEBREW POINT JUDEO-SPANISH VARIKA ﬞ צﬞבָﬞצ
In rare situations the writer of unvowelized text may want to indicate a vowel, if this is necessary for clarifying an ambiguous expression. The text conversion algorithms remove all vowels from an unvowelized text style, unless a vowel is indicated to be displayed also in unvowelized text. This can be done with character ˼ after any vowel, or with subscript small A, E, I, O or U (ₐ ₑ ᵢ ₒ ᵤ) after any vowel (but recommendably, a similar vowel): e.g. E˼klund, Eₑklund.
Letters of the hebrew alphabet:
HEB > LAT : SPECIAL CASES
א > ᵙ ᵓ Ạlef is written as ᵓ only when it is an emphasized glottal stop after consonant: tarᵓeh (you show). When a vowel (without ắyin or ġāĭn) begins the word, or is after another vowel, ạlef is logically present before the vowel, but the ạlef is not written in transliterated text. (Theoretically an ạlef can always be written, however, if the author of text wants to do so: "im" / "ᵓim", "rạạh" / "rạᵓạh" and "bā" / "bạᵓ" are synonyms, and behave similarly in transliteration to HE-NT-U etc. -- Modified glyphs in MITT fonts: ᵙ => ᵓ, ᵓ => positioned lower than usual, with its top edge level with the top edge of "e". ˻ = No ạlef marker. This character prevents the automatic inclusion of ạlef before a word-initial or solitary vowel (according to the rule that was explained above). If the romanized text "ī" is converted into hebrew script, it produces ạlef + short I (ḫīrīq) + yōd. However, the romanized text "˻ī" produces only short I (riding on a space character) + yōd.
Í í Ó ó Ú ú = vowelless ạlef after i, o or u without mater lectionis: ríšōn (first), ló (not)
Ĩ ĩ Õ õ Ũ ũ Ẃ ẃ Ý ý Ŕ ŕ = vowelless ạlef after yōd, wạw or rẹš: hĩ (she), bõ (come), hũ (he), šạẃ (vain), gẹý (valley of), wa yaŕ (and he saw). NOTE: These are synonyms, and behave similarly in transliteration: hĩ / hīᵓ, ló / loᵓ, šạẃ / šạwᵓ, gẹý / gẹyᵓ, wa-yaŕ / wa-yarᵓ.
Ɂ ɂ = full ạlef, for the theoretical scenario that a word-final vowelless ạlef is after some other vowelless consonant than W, Y or R (for which is available the ortography Ẃ ẃ, Ý ý, Ŕ ŕ): e.g. šạẃ >= šạwɂ, gẹý >= gẹyɂ, wa yaŕ >= wa yarɂ. This character is for romanized hebrew only: native script would use an ordinary ạlef.
Á á = short a + vowelless ạlef: e.g. liqrát, ḫaṱṱátķem
Ā ā = long ạ + vowelless ạlef = long "a", a common word ending
Ạₗ ạₗ = long Ạ + meteg (qạmạź ḫạzạq), a traditional way to indicate that a long Ạ is not a weak Ọ. (This is not necessary to indicate in any romanized text style, and is usually not indicated in native script either. This encoding is available in romanized text styles to clarify that native script should indicate this.)
Ȃ ȃ = 1) As the last letter of a preposition, represents a short A, and indicates that the first consonant of the next word should not be doubled, in a preposition that normally causes doubling of the next consonant: haylạdīm => hȃ yėlạdīm (expected form, with doubling: hayyėlạdīm => ha yėlạdīm). 2) As the last letter of a preposition, in the scenario "-ȃ ȧ-" represents a short A + end of preposition (without a space in traditional text styles, with a space in some modern text styles) + the next word begins with a vowelless ạlef (-á-), instead of the expected "-a ȧ-": e.g. laᵓdonī (masoretic) => lȃ Ȧdonī // la Ȧdonẹy (also masoretic). => Modified glyphs in MITT fonts: Ȃ ȃ => use a less strongly curved inverted breve, to be more clearly different from circumflex  â in small text size.
 â = ắyin + ạ + vowelless ạlef
Ɐ ⱥ = An additional short A vowel, which is added (since ca. 600 CE in masoretic hebrew, and in all later dialects of hebrew) before consonant ḫẹt or a hard / double hē, which is the last letter in a word, and the preceding vowel is other than A (of any length): gạvōⱥɦ, rūⱥḫ. In hebrew script also ắyin behaves in the same way, but in these romanized text styles that case is written similarly as in arabic script, so that the additional A is after / under the ắyin, not before it: higgīắ. (Theoretically this could also be written as -ⱥˁ: higgīⱥˁ. The transliteration algorithms would handle it correctly, but this method of writing is never preferred by these text styles.) => in hebrew script: ֭ א֭
Ą ą = Grammatically or traditionally short "a" without ạlef, which is advised to be written with ạlef in unvowelized modern hebrew.
Ʌ ᶏ = Grammatically or traditionally long "a" without ạlef, which is advised to be written with ạlef in unvowelized modern hebrew. => Modified glyphs in MITT fonts: Ʌ ᶏ => "A" "a" with hook in the left bottom corner, and dot below.
Ǭ ǭ = Alternative characters for Ʌ ᶏ, which emphasize the tiberian and ashkenazi pronunciation "o". => Modified glyphs in MITT fonts: Ǭ ǭ => oval O o with tail, with hook in the right bottom corner.
Ả ả = Vowel A that is in reality pronounced short, but according to traditional ortography should be written as long: Yėrūšạlẹm => Yėrūšảlẹm, Banglạdeš => Banglảdeš. => in hebrew script: short A + meteg [masoretic text uses short A + meteg e.g. in laᵓdonī == la Ȧdonī]
Į į = replacement for šwā in traditional grammar, e.g. wa tėhī => wa tįhī (or wa tįhỳã).
Ɨ ɨ = crammed I: in the masoretic Bible texts, a second vowel between two consonants (which is normally not possible): Yėrūšảlaɨm. In hebrew script, simply an ordinary I directly after another vowel point.
Ē ē = ẹ + vowelless ạlef = long "e", a common word ending
Ê ê = ắyin + ẹ + vowelless ạlef
Ȇ ȇ = as the last letter of a preposition: in the scenario "-ȇ è-" indicates a strong E, traditionally no word space, and the next word beginning with a vowelless ạlef (-ē-), instead of the expected "-e è-": e.g. lēlohīm (masoretic) => lȇ Èlohīm // le Èlohīm (nonmasoretic, the expected form). => Modified glyphs in MITT fonts: Ȇ ȇ => use a less strongly curved inverted breve, to be more clearly different from circumflex Ê ê in small text size.
Ḙ ḙ = 1) weak šwā, usually neither written nor pronounced by most people: e.g. lạmadḙti, hạlḙķū, hạyḙtạh. 2) in the scenario "-eḙ-" indicates a different ortography "-eè-" for what is traditionally written as Ē ē in some specific words only (e.g. in Tanakh): lēmor (masoretic) => leḙmor // leèmor (nonmasoretic in this verb), lēķol (nonmasoretic in this verb) => leḙķol // leèķol (masoretic) -- other masoretic examples: rēšīt (not reḙšīt), be èmet, be Èdōm, mẹ Èlohay
Ę ę = two-dot ẹ (źẹrẹh), which is advised to be written with yōd (ẹy) in unvowelized modern hebrew
Ẻ ẻ = three-dot e (segōl), which is advised to be written with ắyin (eˁ) in hebrew script -- a common practice in proper nouns among yiddish-speakers Symbols for indicating these spelling variants in hebrew script text HE-NT-P: Ą ą Ʌ ᶏ Ę ę Ẻ ẻ => meteg under the ạlef, ắyin or yōd, otherwise normal
ב > B b V v = softer pronunciation, after a vowel Ṽ ṽ = foreign letter V. => modified glyphs in MITT fonts: Ṽ ṽ => V v with macron above Ḇ ḇ = foreign letter B. Ḃ ḃ = bẹt after a vowel in a proper noun from the Tanakh era (whose post-Tanakh era softening might be deemed historically incorrect) => bẹt + ??? -- Modified glyphs in MITT fonts: Ḃ ḃ => B b with an open center area and open top edge, which has some visual similarities with letter V v. Symbols for indicating these spelling variants in hebrew script text HE-NT-P: V v => bẹt + rạfeh Ṽ ṽ => bẹt + zarqā Ḇ ḇ => bẹt + rėvīắ
ג > G g Ģ ģ = softer pronunciation after a vowel, used since approximately 0 BC (+/- 300 years), not any more in mainstream western pronunciation -- Modified glyphs in MITT fonts: Ģ => G with hook below, ģ => g with horn in top right corner Ḡ ḡ = foreign letter G (regardless of pronunciation, which can be G, DZ, etc.). Ǧ ǧ = foreign sound DZ, e.g. in foreign proper nouns in modern hebrew ˴G ˴g = foreign sound DZ [Ǧ ǧ] ˙G ˙g = arabic letter ġayn [Ġ ġ], third variant Symbols for indicating these spelling variants in hebrew script text HE-NT-P: Ģ ģ => gīmẹl + rạfeh Ǧ ǧ => gīmẹl + zarqā Ḡ ḡ => gīmẹl + rėvīắ ˴G ˴g => gīmẹl + gereš ˙G ˙g => gīmẹl + judeo-arabic rėvīắ [MITT: non-combining ˙]
ד > D d Ð ƌ = softer pronunciation after a vowel, used since approximately 0 BC (+/- 300 years), not any more in mainstream western pronunciation. Not recommended for representing the foreign sound TH (see Ḓ ḓ), despite the similarity of pronunciation. -- Modified glyphs in MITT fonts: ƌ => the top bar is half narrower than usual. In serif fonts the vertical bar preferably bends to the left, without a sharp angle in the right top corner. Ḓ ḓ = foreign sound voiced TH (as in "this"), e.g. in arabic letter ḓāl. Not recommended for representing the post-Tanakh era softer pronunciation of hebrew dạlet after a vowel (see Ð ƌ), despite the similarity of pronunciation. Ḏ ḏ = foreign letter D. ˴D ˴d = foreign sound voiced TH (as in "this") [Ḓ ḓ] ˙D ˙d = arabic letter ḓāl [Ḓ ḓ] Symbols for indicating these spelling variants in hebrew script text HE-NT-P: Ð ƌ => dạlet + rạfeh Ḓ ḓ => dạlet + zarqā Ḏ ḏ => dạlet + rėvīắ ˴D ˴d => dạlet + gereš ˙D ˙d => dạlet + judeo-arabic rėvīắ [MITT: non-combining ˙]
ה > H h Ҥ ɦ = the recommended way to write double hh, in most text styles. -ⱯҤ -ⱥɦ = -ahh with "added A" (at the end of a word). -- Modified glyph in MITT fonts: Ҥ => "H", whose right vertical pillar has the shape of "ſ" (latin small letter long S).
ו > W w Ō ō = o + wạw = long "o", a more stylish way to write "ow"
Ū ū = u + wạw = long "u", a more stylish way to write "uw"
Ô ô = ắyin + o + wạw, a more stylish way to write "ˁow"
Û û = ắyin + u + wạw, a more stylish way to write "ˁuw"
Ṷ ṷ = a more stylish way to write the word "and", when its vowel is "u": ṷ and wu are synonyms, and behave similarly in transliteration.
WȨ wȩ = Primary replacement of WĖ wė in customized disambiguated spelling of a consecutive future verb, to indicate that the traditional word form would have WĖ wė, which is changed into WĖ wė in some text styles: e.g. wė lạmdū ("and he called") => wė lạmdū.
Ṵ ṵ = Secondary replacement of WĖ wė in customized disambiguated spelling of a consecutive future verb, to indicate that the traditional word form would have WĖ wė, which is changed into Ṵ ṵ in some text styles: e.g. wė lạmdū ("and he called") => ṵ lạmdū.
Ỏ ỏ = vowel O that is pronounced short or without mater lectionis in traditional ortography, which is advised to be written with wạw (ow = ō) in unvowelized modern hebrew: yiķtỏv, rạdịỏ, sṯūdịỏ. => in hebrew script: ֭ ו֭ Disambiguation of a lone W with dagesh in native script: usually it means wạw + dagesh marking vowel U (which is a common form of the word "and", ṷ). In the rare case that someone wants to write a word that contains the letters WW only, and nothing else, this would also be written wạw + dagesh in native script. To disambiguate this from ṷ, a silent šwā must be marked above the letter in native script: וּ֔
Ủ ủ = grammatically or traditionally short u, which is advised to be written with wạw (uw = ū) in unvowelized modern hebrew: plủs, Pontïủs.
Ų ų = yiddish U, which is written with vowelless wạw in native script
Ẇ ẇ = yiddish ligature WW, acts in the role of a single W sound
Ẁ ẁ = variant of W used in some disambiguated consecutive verb forms, e.g. wa ȧmartem => ẁạ ȧmartem, wė ạmar => ẁȧ ạmar.
Ẉ ẉ = variant of W used in some disambiguated or casual spellings, whose traditional form begins with ṷ: e.g. ṷ bạrūķ => ẉe bạrūķ, ṷ bẹraķtīķạ => ẉe bẹraķtīķạ / ẉạ bẹraķtīķạ / generally any "and" ṷ => ẉạ for historical correctness, to avoid starting with a vowel a word that does not begin with ạlef [or ắyin].
OẎ oẏ = yiddish ligature WY, pronounced "oy"
Ụ ụ = alephic waw, rare in hebrew, but common in yiddish: e.g. maụsỏlỉụm
Ŵ ŵ = foreign letter W (regardless of pronunciation, which can be W, U, etc.). => modified glyphs in MITT fonts: Ŵ ŵ => W w with macron above
Ẃ ẃ = vowelless ạlef after wạw: šạẃ (vain). NOTE: These are synonyms, and behave similarly in transliteration: šạẃ / šạwᵓ.
˴W ˴w [= Ŵ ŵ] = foreign letter W => wạw + gereš Symbols for indicating these spelling variants in hebrew script text HE-NT-P: Ẉ ẉ => in HE-NT-P: wạw + zarqā in many other native script text styles: wạw + wạw (when doubled, use dạgẹš on the latter wạw only)
ז > Z z Ẕ ẕ = foreign letter Z. Ž ž = foreign sound SZ. Ⱬ ⱬ = hebrew letter zaĭn in a word root, which originally included archaic letter ⱬāĭn instead of zaĭn (some 3500 years ago). Equivalent of arabic letter ḓāl: voiced TH, as in english "then". ˴Z ˴z = foreign sound SZ [Ž ž] ˙Z ˙z = arabic letter ẓẵ [Ẓ ẓ], secondary variant (arabic ṱẵ with dot) Symbols for indicating these spelling variants in hebrew script text HE-NT-P: Ⱬ ⱬ => zaĭn + rạfeh Ẕ ẕ => zayin + rėvīắ ֗ Ž ž => zayin + zarqā ˴Z ˴z => zayin + gereš ˙Z ˙z => zayin + judeo-arabic rėvīắ [MITT: non-combining ˙]
ח‬ > Ḫ ḫ -ⱯḪ -ⱥḫ = -aḫ with "added A" (at the end of a word). NOTE: When ḫẹt is the last letter in a word, and the vowel immediately before is not "a" (of any length), in hebrew script ḫẹt takes a short "a" under it, which is pronounced before this letter, not after it. In transliteration this additional "a" is written before ḫẹt, as a special letter form (ⱥ), to clarify that no ạlef exists between the previous vowel and ḫẹt. Ḥ ḥ = arabic ḥẵ (while ḫẹt without any diacritics is the equivalent of arabic ḫẵ, a harder consonant) Ȟ ȟ = hebrew letter ḫẹt in a word root, which originally included the archaic ȟāt (harder ḫẹt) until ca. 200 BCE -- Modified glyphs in MITT fonts: H h with inverted breve below. ˴K ˴k = arabic letter ḫẵ [Ķ ķ] ˙Ḫ ˙ḫ = arabic letter ḫẵ [Ķ ķ] Symbols for indicating these spelling variants in hebrew script text HE-NT-P: Ḥ ḥ => ḫẹt + rạfeh Ȟ ȟ => ḫẹt + zarqā ˴K ˴k => ḫẹt + gereš ˙Ḫ ˙ḫ => ḫẹt + judeo-arabic rėvīắ [MITT: non-combining ˙]
ט‬ > Ṱ ṱ Ț ț = foreign letter T, secondary variant Ẓ ẓ = arabic ẓẵ (arabic ṱẵ with dot) => modified glyphs in MITT fonts: Ẓ ẓ => Z z with comma below ˙Ṱ ˙ṱ = arabic letter ẓẵ [Ẓ ẓ], primary variant (arabic ṱẵ with dot) Symbols for indicating these spelling variants in hebrew script text HE-NT-P: Ț ț => ṱẹt + rėvīắ ֗ Ẓ ẓ => ṱẹt + rạfeh ˙Ṱ ˙ṱ => ṱẹt + judeo-arabic rėvīắ [MITT: non-combining ˙]
י > Y y Ī ī = i + yōd = long "i", a more stylish way to write "iy"
iy = 1) i + yōd in a word that contains nothing else: ī-yōšer => iy-yōšer -- these are synonyms, and behave similarly in transliteration. 2) different spelling to remove ambiguity: kī = for / that, kiy = if (archaic, biblical).
Ȋ ȋ = as the last letter of a preposition: WITH PREFIX MI- AND WORD-INITIAL Y WITH SHWA: in the scenario "-ȋ yė-" indicates vowel I, traditionally no word space, and the next word beginning with a vowelless nondoubled yod: e.g. mīmīnī (masoretic) => mȋ yėmīnī // mi yėmīnī (nonmasoretic, expected form with doubled first yod) WITH PREFIX MI- AND SOME OTHER WORD-INITIAL CONSONANT WITH SHWA: midvạreyķạ (nonmasoretic, theoretical example, not in use) => mȋ dėvạreyķạ // mi dėvạreyķạ (masoretic, expected doubled first consonant) WITH PREFIX (OTHER THAN MI-) AND WORD-INITIAL Y WITH SHWA: līrūšạlẹm (masoretic) => lȋ Yėrūšạlẹm // li Yėrūšạlẹm (nonmasoretic, with shwa under yod) // le Yėrūšạlẹm (untraditional, modern colloquial) WITH PREFIX (OTHER THAN MI-) AND SOME OTHER WORD-INITIAL CONSONANT WITH SHWA: lidvạray (masoretic) => li dėvạray // lȋ dėvạray (indication to have no shwa under D, nonmasoretic and ortographically questionable, understood but not preferred by the text conversion algorithms) => Modified glyphs in MITT fonts: Ȋ ȋ => use a less strongly curved inverted breve, to be more clearly different from circumflex Î î in small text size.
İ ı = yōd with high basic I, a rarely used special Unicode character, in which yōd has a basic I vowel that is positioned much higher than usual. This Unicode character is apparently used in yiddish only, and it is pronounced "I", not "YI".
Ỉ ỉ = grammatically or traditionally short i, which is advised to be written with yōd in unvowelized modern hebrew: qỉbbẹl, Mỉląnỏ.
Î î = ắyin + i + yōd, a more stylish way to write "ˁiy"
Ĭ ĭ = "yi" (yōd + short i) in e.g. Ĭṡrạẹl, maĭm -- however, in most cases "yi" is written as such: yittẹn
Ă ă = "yya" (double yōd + short a), except not after vowel A (of any length): e.g. îriyyat => îriăt, but not: ḫayyat >= ḫaặt. -- yya and ă are synonyms, and behave similarly in transliteration.
Ặ ặ = "yyạ" (double yōd + long a), except not after vowel A (of any length): e.g. Angliyyạh => Angliặh, but not: ḫayyạl >= ḫaặl. -- yyạ and ặ are synonyms, and behave similarly in transliteration.
Ṍ ṍ = Alternative characters for Ặ ặ, which emphasize the tiberian and ashkenazi pronunciation "o". => Modified glyphs in MITT fonts: Ṍ ṍ => oval O o with tail, with breve above and dot below.
Ẽ ẽ = "yyē" (double yōd + strong e + ạlef) in the biblical name Dạniẽl -- yyē and ẽ are synonyms, and behave similarly in transliteration.
Ȳ ȳ = silent yōd. Unpronounced in the suffix -ạȳw in post-Tanakh era hebrew: yạmạȳw, pạnạȳw. Also in some other rare scenarios in the masoretic Tanakh text: ḫoṱíȳm (1 Samuel 14:33), qoríȳm (Psalms 99:6). The MITT text styles prefer the ortography à ã instead of ạȳ (see below). Other text styles use an ordinary Y y: yạmạyw, pạnạyw.
à ã = 1) "ay" in the suffix -ayw => ãw: e.g. yạmãw, pạnãw. Ay and ã are synonyms, and behave similarly in transliteration. 2) as the last letter of a word, or before a word-final h: untraditional final A vowel, where there is no vowel at all in traditional grammar: e.g. taȧmīn => taȧmīnã. 3) as not the last letter of a word, and not before w: "tã" is an untraditional replacement for traditional "at", e.g. tėqạfatnī => tėqạftãnī. 4) if none of the criteria above matches: long vowel A, which is traditionally written as a short A: kammạh => kãmmạh.
Ỡ ỡ = Alternative characters for à ã (in each scenario 1, 2 and 3 above), which emphasize the tiberian and ashkenazi pronunciation "o". => Modified glyphs in MITT fonts: Ỡ ỡ => oval O o with tail, with tilde above.
EẎ eẏ = yiddish ligature YY, pronounced "ey"
AẎ aẏ = yiddish ligature AYY, pronounced "ay"
Ŷ ŷ = ligature YYY
Ý ý = vowelless ạlef after yōd: gẹý (valley of). NOTE: These are synonyms, and behave similarly in transliteration: gẹý / gẹyᵓ.
Ỷ ỷ = 1) replacement ī => ỷã in traditional grammar, e.g. wa tėhī => wa tįhī / wa tįhỷã. 2) Replacement for the traditional T t in the prefix of plural 3. feminine future verb forms, e.g. tạqomnạh => ỷạqomnạh, tėvōeynạh => ỷėvōeynạh.
Ị ị = primary yōd ȧlạfīt, a written yōd that represents pronounced ạlef + I (if preceded by a vowel or word break), ạlef + I + ạlef (if preceded by a vowel or word break, and followed by a vowel), or I + ạlef (if followed by a vowel). Used in some hebrew words of foreign origin, which contain the diphthong IA, IO or IU: Iṯạlyạh => Iṯạlịảh (expected: Iṯạliah), Serbyạh => Serbịảh (expected: Serbiah) / rạdyō => rạdịỏ (expected: rạdio), sṯūdyō => sṯūdịỏ (expected: sṯūdio) / Hānōy => Hảnỏị (expected: Hanoi) / Yōn => Ịon (expected: Ion) / Gāyūs => Gảịủs (expected: Gaius).
Ï ï = secondary yōd ȧlạfīt, a single character that represents typically written I + yōd, which is pronounced as long I + ạlef: Riyō => Rïỏ (expected: Rīỏ).
Ȉ ȉ = third yōd ȧlạfīt, a single character that represents typically written I + double yōd, which is pronounced as short I + ạlef: Pōnṱiyyūs => Pỏntȉủs (expected: Pontius), Piyyūs => Pȉủs (expected: Pius).
J j = foreign letter J (regardless of pronunciation, which can be Y, DZ, SZ, HH, etc.). Symbols for indicating these spelling variants in hebrew script text HE-NT-P: J j => yōd + zarqā
כ‬ > K k Ķ ķ = softer pronunciation, after a vowel Ḵ ḵ = foreign letter K, primary variant: indicates letter k in foreign names and loan words, which would be commonly transcribed as Q in hebrew script C c = foreign letter C (regardless of pronunciation, which can be S, K, CH, etc.). Symbols for indicating these spelling variants in hebrew script text HE-NT-P: Ķ ķ => kaf + rạfeh Ḵ ḵ => kaf + rėvīắ ֗ C c => kaf + zarqā
ל‬ > L l Ł ł = polish letter L with stroke. Ⱡ ⱡ = if the preceding consonant is ạlef, this special L causes ligature ạlef-lạmed, if the text is converted into native script. Symbols for indicating these spelling variants in hebrew script text HE-NT-P: Ł ł => lạmed + zarqā
מ > M m
נ > N n Ṋ ṋ = disambiguation of singular 3 masculine vs. plural 1 in some verb or noun suffixes, so that S3M retains the traditional ENN enn, while P1 is changed into ẸṊ ẹṋ. Ñ ñ = spanish letter Ñ. Symbols for indicating these spelling variants in hebrew script text HE-NT-P: Ñ ñ => nūn + zarqā
ס‬ > S s ẞ ß = german letter estsett. Ç ç = foreign letter C cedilla Ș ș = arabic letter șād ˴S ˴s = arabic letter șād [Ș ș] Symbols for indicating these spelling variants in hebrew script text HE-NT-P: ẞ ß => sạmeķ + zarqā Ç ç => sạmeķ + rạfeh Ș ș => sạmeķ + rėvīắ ˴S ˴s => sạmeķ + gereš
ע > ᶜ ˁ Ắyin is written as ᶜ ˁ usually only when it is an emphasized glottal stop before or after consonant: yạdaˁtī (I knew), šivˁạh (seven). In nearly all other cases ắyin is written with a diacritical mark above a vowel, as explained below. As an exception, ắyin is written as ᶜ ˁ also between two I vowels, which can be uppercase or lowercase (because in sans-serif fonts these vowels are very narrow and tightly packed, and having visually complex accent marks on the tightly packed letters can look unclear and inconvenient): maźźīîm => maźźīˁīm, maźźiîm => maźźiˁīm, maźźiḯm => maźźiˁim. -- Modified glyph in MITT fonts: ˁ => ᶜ positioned lower than usual, with its top edge level with the top edge of "e".
ʿ = Horizontally reversed hook above a short vowel means that an ắyin is before the short vowel. -- Modified glyphs in MITT fonts: ʿ => horizontally reversed hook accent mark. Ắ ắ => A a with horizontally reversed hook above. [Ḗ ḗ => E e with horizontally reversed hook above.] Ḯ ḯ => I i with horizontally reversed hook above. [Ṓ ṓ => O o with horizontally reversed hook above.] Ṹ ṹ => U u with horizontally reversed hook above.
^ = Circumflex above a long vowel means that an ắyin is before the long vowel that includes an ạlef, yōd or wạw (ā, ē, ī, ō, ū).
ᶜ ˁ = ắyin as a significant glottal stop in the middle of a word: Yėšaˁyạhū (see below for an esthetically more pleasant spelling)
ᶝ ᵜ = ắyin in a word root, which originally included archaic logical grammatical consonant ġāĭn instead of ắyin (until ca. 200 BCE) -- modified glyphs in MITT fonts: ᶝ => Similar to superscript "c" (modifier letter small c), with the upper end of line having a round dot, which is larger than a period in the same font. ᵜ => Similar to superscript "c" (modifier letter small c), with the upper end of line having a round dot, which is larger than a period in the same font, and positioned lower than usual, with the top edge level with the top edge of lowercase "e".
Ỽ ɕ = When ắyn is a vowelless last letter of word, a larger character Ỽ ɕ is preferred by these text styles, but the smaller characters ᶜ ˁ are supported also as the last character in word: ǦAMĪᶜ => ǦAMĪỼ, ǧamīˁ => ǧamīɕ. (Theoretically it is possible to write ắyn as ᶜ ˁ or Ỽ ɕ in all other scenarios too, but it is not preferred by these text styles.)
ʢ ʕ = alternative ắyin, a Unicode character that does not have a tail descending below the baseline of text row, to leave space for vowel marks under the letter. Normally it is never necessary to use this character in text. -- No esthetic attention has been paid to these characters in MITT fonts, because these characters don't have many use scenarios, if any.
ᴥ ɕ = alternative ắyin in a word root, which originally included archaic logical grammatical consonant ġāĭn instead of ắyin (until ca. 200 BCE) -- No esthetic attention has been paid to these characters in MITT fonts, because these characters don't have many use scenarios, if any. NOTE: There are two possible ways to write ắyin as a glottal stop after consonant: "ắ" and "ˁa" are synonyms, also "ô" and "ˁō" are synonyms -- these behave similarly in transliteration. Examples: kimˁaṱ (this style is my preference in most cases) / kimắṱ (this style is my preference in proper nouns).
NOTE: When ắyin is the last letter in a word: 1) If the vowel immediately before is not "a" (of any length), ắyin takes a short "a" under it: rẹắ (since ca. 600 CE in masoretic hebrew, and in all later dialects of hebrew). Also a guttural enhancer as the last character in a word is interpreted as having an ắyin (despite the absence of any diacritical mark that is typical for ắyin), but this is not a recommended primary ortography: šạvūⱥ, Hōšẹə. (See also the character Ỽ ɕ afore.) 2) If the wovel immediately before is "a" (of any length), these transliteration standards write that vowel "under" the ắyin, but in reality the vowel is under the previous consonant: rắ (r + short a + ắyin), nậ (n + long ạ + ắyin). In all other circumstances except the end of a word, transliterated rắ would mean "r + ắyin + short a", and nậ would mean "n + ắyin + long ạ". (A clearer and more typical way to write these would be rˁa and nˁạ, emphasizing the glottal stop by writing ắyin as a separate character, not only a diacritical mark over the vowel. However, in proper nouns it can be esthetically preferable to write an ắyin discreetly as a diacritical mark, even when it causes a glottal stop.)
NOTE: Theoretically an ắyin can be written with a diacritical mark also when it is a glottal stop after consonant: šivˁạh => šivậh. This style is recommended for proper nouns only, if it is deemed esthetically preferable to avoid the character ˁ in a proper noun.
NOTE: It is also theoretically possible to always write ắyin with ˁ: îr => ˁīr, rōḗh => rōˁeh, yōdẹắ => yōdẹⱥˁ / yōdẹˁa. The diacritical mark over a vowel disappears or changes its shape, when ắyin becomes a separate character. (Otherwise we would have two ắyins: one as a separate character, and another as a diacritical mark.) The form "yōdẹⱥˁ" has a special letter form (ⱥ), to indicate that no ạlef exists before the vowel, which has been moved ahead of the ắyin analogously with the order in which these two are pronounced in reality. All these forms are theoretical, and not recommended in any other text than explanation of the hebrew grammar or writing system. The reason for this recommendation is purely esthetical.
-Ɐᶜ -ⱥˁ = -ắ (at the end of a word). A theoretical form, see the explanation above.
Ġ ġ = foreign (e.g. arabic) letter ġāĭn.
Ɠ ĝ = alternative foreign ġāĭn, a Unicode character that does not have a tail descending below the baseline of text row, to leave space for vowel marks under the letter. Normally it is not necessary to define such a technical detail of native hebrew script in romanized hebrew.
˴ᶜ ˴ˁ = arabic letter ġayn [Ġ ġ], secondary variant with gereš
˙ᶜ ˙ˁ = arabic letter ġayn [Ġ ġ], secondary variant with judeo-arabic rėvīắ Symbols for indicating these spelling variants in hebrew script text HE-NT-P: Ġ ġ => ắyin + rāfeh ˴ᶜ ˴ˁ => ắyin + gereš ˙ᶜ ˙ˁ => ắyin + judeo-arabic rėvīắ [MITT: non-combining ˙]
Ƹ ɜ = full ắyin, for the rare scenario that a word-final vowelless ắyin is after a vowelless consonant, or for writing without a guttural enhancer a word-final ắyin in ancient hebrew or aramaic, which should get an "added A" according to masoretic and later ortography: e.g. tẹroắ => tẹroɜ (aramaic in Daniel 2:40). This character is for romanized hebrew only: native script would use an ordinary ắyin. -- Modified glyphs in MITT fonts: Ʃ ɜ => similar to the cyrillic letter Э э.
ƻ ɚ = full ġāĭn, for the rare scenario that a word-final vowelless ġāĭn is after a vowelless consonant, or for writing without a guttural enhancer a word-final ġāĭn in ancient hebrew or aramaic, which should get an "added A" according to masoretic and later ortography. This character is for romanized hebrew only: native script would use an ordinary ġāĭn. -- Modified glyphs in MITT fonts: ƻ ɚ => similar to the cyrillic letter Э э, with dot above.
Ɇ ɇ = full ắyin + vowelless ạlef: e.g. ȧraˁᵓ => ȧraɇ (aramaic in Daniel 2:39). This character is for romanized hebrew only: native script would use an ordinary ắyin and ạlef. -- Modified glyphs in MITT fonts: Ɇ ɇ => similar to the cyrillic letter Э э, with acute above.
ƍ ᵹ = full ġāĭn + vowelless ạlef. This character is for romanized hebrew only: native script would use an ordinary ġāĭn and ạlef. -- Modified glyphs in MITT fonts: ƍ ᵹ => similar to the cyrillic letter Э э, with dot and acute above.
Ǥ ǥ = foreign ġāĭn + vowelless ạlef. This character is for romanized hebrew only: native script would use a foreign ġāĭn and an ordinary ạlef. -- Modified glyphs in MITT fonts: Ǥ ǥ => G g with dot and acute above.
פ > P p F f = softer pronunciation, after a vowel. Ṕ ṕ = foreign letter P. -- modified glyphs in MITT fonts: Ṕ ṕ => P p with macron above Ḟ ḟ = foreign letter F. -- modified glyphs in MITT fonts: Ḟ ḟ => F with macron above, f with macron below Ƿ ƿ = pē after a vowel in a proper noun from the Tanakh era (whose post-Tanakh era softening might be deemed historically incorrect) => pē + ??? -- Modified glyphs in MITT fonts: Ƿ ƿ => P p with open right edge, which has some visual similarities with letter F f. ˙P ˙p = arabic letter fẵ [F f] Symbols for indicating these spelling variants in hebrew script text HE-NT-P: F f => pē + rāfeh Ṕ ṕ => pē + rėvīắ ֗ ˙P ˙p => pē + judeo-arabic rėvīắ [MITT: non-combining ˙]
צ > Ź ź -- Modified glyphs in MITT fonts: Ź ź => Z z with a more vertical acute above than usual (tonos). Č č = foreign sound CH (Č), as in "such" Ż ż = hebrew letter źạdeh in a word root, which originally included archaic letter żāt instead of źạdeh (some 3500 years ago). => źạdeh + rạfeh Ẑ ẑ = hebrew letter źạdeh in a word root, which originally included archaic letter ẑādeh instead of źạdeh (some 3500 years ago). Equivalent of arabic letter ḑād (șād with dot). -- Modified glyphs in MITT fonts: Ẑ ẑ => Z z with grave above. ˴Ź ˴ź = foreign sound CH [Č č] ˙Ź ˙ź = arabic letter ḑād [Ḑ ḑ] (arabic șād with dot) Symbols for indicating these spelling variants in hebrew script text HE-NT-P: Ż ż => źạdeh + rạfeh Č č => źạdeh + zarqā Ẑ ẑ => źạdeh + rėvīắ ˴Ź ˴ź => źạdeh + gereš ˙Ź ˙ź => źạdeh + judeo-arabic rėvīắ [MITT: non-combining ˙]
ק > Q q Ɋ ɋ = foreign letter K, secondary variant -- Modified glyphs in MITT fonts: Ɋ ɋ => Q q with macron above. X x = foreign letter X (regardless of pronunciation, which can be KS, EKS, Z, HH, etc.). Symbols for indicating these spelling variants in hebrew script text HE-NT-P: Ɋ ɋ => qōf + rėvīắ ֗ X x => qōf + zarqā
ר > R r ˴R ˴r = arabic letter ġayn [Ġ ġ], primary variant with gereš ˙R ˙r = arabic letter ġayn [Ġ ġ], primary variant with judeo-arabic rėvīắ Ŕ ŕ = vowelless ạlef after rẹš: wa yaŕ (and he saw). NOTE: These are synonyms, and behave similarly in transliteration: wa-yaŕ / wa-yarᵓ. Symbols for indicating these spelling variants in hebrew script text HE-NT-P: ˴R ˴r => => rẹš + gereš ˙R ˙r => rẹš + judeo-arabic rėvīắ [MITT: non-combining ˙]
שׁ‎‬ > Š š Ṡ ṡ = with dot on the left, pronounced like ordinary S. ¨S ¨s = arabic letter šīn [Š š] Ŧ ŧ = archaic semitic logical grammatical consonant ŧān = dotless śīn + rạfeh Ŝ ŝ = hebrew šin in a word root, which originally included archaic logical grammatical consonant ŧān instead of šin (probably over 3000 years ago) = dotless śīn + middle dot above Ś ś = dotless, ambiguous š / ṡ / ŝ / ŧ. Symbols for indicating these spelling variants in hebrew script text HE-NT-P: Š š => dotless śīn + šin dot above on the right Ṡ ṡ => dotless śīn + ṡin dot above on the left Ŧ ŧ => dotless śīn + rạfeh ¨S ¨s => dotless śīn + sėgoltā Ś ś => dotless plain character
ת‬ > T t Ƭ ʈ = softer pronunciation (thin TH) after a vowel, used since approximately 0 BC (+/- 300 years), but not any more in mainstream western pronunciation. -- modified glyphs in MITT fonts: ʈ => "t" whose horizontal bar is diagonal, so that the left end of the bar is lower than usual. Ƽ ƾ = softer pronunciation (thin TH) after a vowel, alternative characters, which emphasize the ashkenazi pronunciation "s" since the middle ages. -- Modified glyph in MITT fonts: Ƽ => "S" with the horizontal bar of "T" on top. Ṫ ẗ = foreign sound thin TH (as in english "think"). -- Modified glyph in MITT fonts: Ṫ => T with umlaut above. Ṯ ṯ = foreign letter T, primary variant ˴T ˴t = foreign sound voiceless TH (as in "thin") [Ṫ ẗ] ˙T ˙t = arabic letter ẗẵ [Ṫ ẗ] ¨T ¨t = arabic letter ẗẵ [Ṫ ẗ] Symbols for indicating these spelling variants in hebrew script text HE-NT-P: Ƭ ʈ => tạw + rafeh Ƽ ƾ => tạw + rafeh Ṫ ẗ => tạw + zarqā Ṯ ṯ => tạw + rėvīắ ֗ ˴T ˴t => tạw + gereš ˙T ˙t => tạw + judeo-arabic rėvīắ [MITT: non-combining ˙] ¨T ¨t => tạw + sėgoltā
Add ASCII 32-127 encoding text style (if possible!?).
CONSIDER in hebrew native script:
- see the drawn sketch sketch_streamlined_hebrew_nikkud.jpg
- HEBREW POINT QAMATS QATAN = dot under a horizontal bar.
- Short U = 2 dots diagonally rising -- as the default design for all purposes.
- Generic E = 2 dots diagonally falling?
(As a dedicated vowel point, which is understandable also in its typical design, and not used for other purposes.) Or then: šwā is separate from the generic E, written with two vertical dots. No dedicated vowel point is needed.
- also compare to babylonian or palestinian vowel point symbols
MITT HE-RZ/NT 0.9 -- Standard for romanized and native-script text styles for hebrew language. Ion Mittler, 10 march 2025. Released in the public domain under CC0-1.0 license (Creative Commons 0 version 1.0). http://creativecommons.org/publicdomain/ zero/1.0/
Modern International Text Types — mitt.fi
Keyword variants for search engines: The standard MITTHERZNT (MITTHERZ / MITTHENT) defines the text styles MITT HE-RZ-P [MITTHERZP], MITT HE-RZ-V [MITTHERZV], MITT HE-RZ-L [MITTHERZL], MITT HE-RZ-S [MITTHERZS], MITT HE-RZ-T [MITTHERZT], MITT HE-RZ-U [MITTHERZU], MITT HE-NT-P [MITTHENTP], MITT HE-NT-E [MITTHENTE], MITT HE-NT-M [MITTHENTM], MITT HE-NT-V [MITTHENTV], MITT HE-NT-T [MITTHENTT] and MITT HE-NT-U [MITTHENTU].
MITT HE-DG 0.8
MITT HE-DG 0.8 -- Standard for disambiguated grammar for hebrew language.
The purpose of these standardized expressions is to make it possible to communicate the meaning more precisely and with less ambiguity than the traditional grammar allows (without long explanations).
DISAMBIGUATION OF SOME VERB FORMS AND PERSONAL SUFFIXES:
Disambiguation of singular 3 masculine vs. plural 1 in some verb or noun suffixes, so that S3M retains the traditional form ENN enn, while P1 is changed into ẸṊ ẹṋ:
Ṋ ṋ
SINGULAR 3. M mimmennū / eynennū
PLURAL 1. mimmennū => mimmẹṋū / eynennū => eynẹṋū
NOTE: Mimmennū is distinguished (at least) in the Babylonian tradition: one was mimenu with tsere, the other mimannu with (*segol > ) patach. Yeivin discusses it in his big book on Babylonian vocalisation.
Regarding the form mimmennu, the expected form with 1cp suffix truly is mimmennu. When attaching a suffix, the base form min reduplicates to *minmin-. With 3ms, *minmin+hu falls to mimmennu, which is a more unusual form (with regressive assimilation of nun to mem and progressive total assimilation of nun to heh). 1cs *minmin+nu is rather unexceptionally mimmennu.
Google Books link
S3M and P1 ... which are identical in Tiberian as mimmennu ... are different in Babylonian: i.e. mimmannu [in Codex Babylonicus Petropolitanus mimmanu, see Yeshayah 59:11] "from him" vs. mimmenu "from us" [see Yeshayah 53:3].
Disambiguation of singular 2 masculine vs. singular 3 feminine in some verb forms in the future / noncompleted tense, so that S2M retains the traditional form, while S3F adds vowel A at the end, and shortens the preceding vowel into shwa (unless this is a strong long vowel, which has mater lectionis or normally should have):
* SOME RARE (BUT VARYING AND CONTRADICTING) DISAMBIGUATIONS ARE CURRENTLY IN USE, GIVEN BY REVERSO -- though ambiguous variants are also used for them:
https://conjugator.reverso.net/conjugation-hebrew-verb-%D7%9C%D6%B0%D7%A9%D7%81%D6%B7%D7%9B%D6%B0%D7%A0%D6%B5%D7%A2%D6%B7.html
SINGULAR 2. M * tėšaķnắ / * tėfaᶜpắ / * tėnaźźaḫ / *! tėšaggẹắ / *! tėvaźźẹắ // tėqarqắ
SINGULAR 3. F * tėšaķnẹắ / * tėfaᶜpẹắ / * tėnaźźẹⱥḫ / *! tėšaggắ / *! tėvaźźắ // tėqarqắ => tėqarqɝᶜḁ
SINGULAR 2. M tėfaźlaḫ // tėraḫrẹⱥḫ // tigbaɦ // tėbaᶜbẹắ
SINGULAR 3. F tėfaźlaḫ => tėfaźlɝḫḁ // tėraḫrẹⱥḫ => tėraḫrɘḫḁ // tigbaɦ => tigbɝhḁ // tėbaᶜbẹắ => tėbaᶜbɘᶜḁ
SINGULAR 2. M tiqțol / taȧmīn / tạvõ / tạvĩ
SINGULAR 3. F tiqțol => tiqțȼlḁ / taȧmīn => taȧmīnḁ / tạvõ => tạvōḁ / tạvĩ => tạvīḁ
SINGULAR 2. M tizkọrķạ
SINGULAR 3. F tizkọrķạ => tizkɞrḁķạ
SINGULAR 2. M tėmullā / tėmallē / tōdeh / tivraḫ
SINGULAR 3. F tėmullā => tėmullɝḁ / tėmallē => tėmallɘḁ / tōdeh => tōdǡh / tivraḫ => tivrɝḫḁ
SINGULAR 2. M tėdubbar / tėdabbẹr / tiķtov / tīgắ
SINGULAR 3. F tėdubbar => tėdubbɝrḁ / tėdabbẹr => tėdabbɘrḁ / tiķtov => tiķtȼvḁ / tīgắ => tīgɝᶜḁ
SINGULAR 2. M wa tėhī / tėqạfatnī
SINGULAR 3. F wa tėhī => wa tįhī (or wa tįhỷḁ) / tėqạfatnī => tėqạftȁnī
SINGULAR 2. M tėlammėdẹm / tėlammedķem
SINGULAR 3. F tėlammėdẹm => tėlammėdǡm / tėlammedķem => tėlammȅdḁķem
SINGULAR 2. M takkeh / tukkạh
SINGULAR 3. F takkeh => takkǡh / tukkạh => tukkảh
SINGULAR 2. M takkehū / takkehạ
SINGULAR 3. F takkehū => takkǡhū / takkehạ => takkǡɦạ
SINGULAR 2. M * tukkạhū / * tukkạhạ
SINGULAR 3. F * tukkạhū => tukkảhū / * tukkạhạ => tukkảhạ
* Theoretical verb forms: it may not make much logical sense that a passive verb has an object suffix.
https://biblehub.com/text/ezekiel/23-36.htm [hă-ṯiš-pō-wṭ] S2M
https://biblehub.com/text/ezekiel/23-25.htm [tip-pō-wl] S3F [tê-’ā-ḵêl] S3F
NOTE: The 2ms and 3fs are ambiguous all the way back to Proto-Semitic (6000~ years ago), although the imperfect verbal conjugation itself doesn't go back that far, but it's developed from another system which still exhibits the same ambiguity.
The final additional A is written with an ordinary short A a, which is never the last letter of a word in its normal usage. However, use Ả ả in the ending -eh => -ảh, to ensure its uniqueness for disambiguation purposes only. Also use Ả ả, if the preceding vowel was ẹ and the shwa is left unwritten (which can happen with final consonant alef or ắyin).
Indicating the vowel that gets replaced with shwa:
As the shwa, use a variant of E that indicates the replaced vowel.
- ắ => -ɝᶜḁ -ẹắ => -ɘᶜḁ -ol => -ȼlḁ -eh => -ǡh
- aḫ => -ɝḫḁ -ẹⱥḫ => -ɘḫḁ
- ā => -ɝḁ -ē => -ɘḁ
- aɦ => -ɝhḁ
- ạh => -ảh
- at-=> - tȁ-
- possible vowels with final alef:
ā => ɝḁ == mark replaced ạ with: ƺ ɝ #
ē => ɘḁ == mark replaced ẹ with: ⱻ ɘ
ĩ => īḁ ----- no vowel gets replaced
õ => ōḁ ----- no vowel gets replaced
- possible vowels with final ắyin:
ắ => ɝᶜḁ == mark replaced a with: ƺ ɝ #
ẹắ => ɘᶜḁ == mark replaced ẹ with: ⱻ ɘ
- possible vowels with final ḫẹt:
aḫ => ɝḫḁ == mark replaced a with: ƺ ɝ #
ẹaḫ => ɘḫḁ == mark replaced ẹ with: ⱻ ɘ
- possible vowels with final weak hē:
ạh => ảh == mark replaced ạ with: Ả ả
eh => ǡh == mark replaced e with: Ǡ ǡ
- possible vowels with final strong (double) hē:
aɦ => ɝɦḁ == mark replaced a with: ƺ ɝ #
- possible vowels with other final consonants:
a? => ɝ?ḁ == mark replaced a with: ƺ ɝ #
at- => tȁ- mark relocated a with: Ȁ ȁ
e? => ȅ?ḁ == mark replaced e with: Ȅ ȅ
ẹ? => ɘ?ḁ == mark replaced ẹ with: ⱻ ɘ
ẹ? => ǡ? == mark replaced ẹ with: Ǡ ǡ -- (disambiguated from eh => ǡh by absence of hē)
o? => ȼ?ḁ == mark replaced o with: Ȼ ȼ
ọ?- => ɞ?ḁ- == mark replaced ọ with: ᵫ ɞ
- disambiguation of plural 2. / 3. feminine verb forms in future / noncompleted tense: 3. feminine is the less original of these forms, subject to disambiguation. Use the same character Ả ả as in short A vowels, which are written as long A in traditional ortography. Location in an -ah ending would disambiguate between such generic purpose and this grammatically disambiguating special purpose.
PLURAL 2. F tiqṱolnạh / taȧmẹnnạh
PLURAL 3. F tiqṱolnạh => ỷiqṱolnạh / taȧmẹnnạh => ỷaȧmẹnnạh tiqṱolnảh taȧmẹnnảh
PLURAL 2. F tạvõnạh / tėvōeynạh
PLURAL 3. F tạvõnạh => ỷạvõnạh / tėvōeynạh => ỷėvōeynạh tạvõnảh tėvōeynảh
PLURAL 2. F taggẹdnạh / tạqomnạh / tėqūmeynạh
PLURAL 3. F taggẹdnạh => ỷaggẹdnạh / tạqomnạh => ỷạqomnạh / tėqūmeynạh => ỷėqūmeynạh taggẹdnảh tạqomnảh tėqūmeynảh
NOTE: 2f.pl and 3f.pl probably merged (in Hebrew) sometime by the first millennium BCE, but originally they were different. 3fp was yaqtulna, as in pretty much all other Semitic languages. References probably abound in any historical grammar.
https://www.rsuh.ru/binary/78809_9.1337448219.16827.pdf (page 442)
https://en.wikisource.org/wiki/Language_and_the_Study_of_Language/Lecture_VIII
https://bildnercenter.rutgers.edu/docman/rendsburg/121-ancient-hebrew-morphology/file
The 3rd feminine plural form was originally hnrmçy yismornah ‘they guard’, as may be determined from the comparative Semitic evidence, of which three examples remain in the Bible (Genesis 30:38, 1 Samuel 6:12; Daniel 8:22https://biblehub.com/text/daniel/8-22.htm). Otherwise, the 2nd feminine plural form hnrmçt tismornah was imported, taking over the function of the 3rd person as well as the 2nd person.
https://biblehub.com/text/ezekiel/23-40.htm [ṯiš-laḥ-nāh] ? 2. OR 3. ARCHAIC
https://biblehub.com/text/ezekiel/23-49.htm [tiś-śe-nāh] ? 2. OR 3. ARCHAIC
https://biblehub.com/text/genesis/30-38.htm [way-yê-ḥam-nāh] * 3. ARCHAIC
https://biblehub.com/text/genesis/30-39.htm [wat-tê-laḏ-nā] 3. usual modern
https://biblehub.com/text/1_samuel/6-12.htm [wa-yiš-šar-nāh] * 3. ARCHAIC
https://biblehub.com/text/daniel/8-22.htm [wat-ta-‘ă-mō-ḏə-nāh] 3. usual modern
[ya-‘ă-mō-ḏə-nāh] * 3. ARCHAIC
https://biblehub.com/text/ezekiel/23-48.htm [ṯa-‘ă-śe-nāh] * 3. ARCHAIC
Disambiguation of the rare S2F suffix -ti:
But regarding verbal suffixes, there really has been some development. -ti apparently served as both 1cs and 2fs, and we have instances of the latter preserved (e.g. Jdg 5 שקמתי דבורה).
https://biblehub.com/text/ezekiel/23-21.htm [wat-tip̄-qə-ḏî]
http://hebraistyka.uw.edu.pl/prace_dyplomowe/The%20Vocalization%20of%20Verbs.pdf
In Babylonian, gutturals are treated mostly as regular consonants and do not require the lowering of a vowel like in Tiberian Hebrew, cf.ֹ יַחְשב ‘he will think’ in Tiberian as opposed to יִחשֹב in Babylonian.
[In Babylonian] an epenthetic is inserted between the first two consonants of a word-internal consonantal cluster, e.g.ּו תִקִרְב ‘you (pl.) will bring closer’ On the contrary, in the Tiberian vocalization an epenthetic vowel, viz. the vocalic shewa, occurs between two last consonants, i.e.ּו תִקְרְב (Khan 2013a).
e.g. yirachqu (B) <=> yerachaqu (T) ‘they will be sent away’
When a vowel before an epenthetic vowel is ḥireq, the epenthetic vowel has the quality of /i/, e.g.ּו נִכִלמ ‘they were ashamed’. Accordingly, when a vowel preceding the epenthetic vowel is shureq, it admits of the /u/ quality, e.g.ּו וְהשלכ ‘they were thrown’.
The occurrence rate of pataḥ (A) is much higher in the Babylonian one, since in this tradition, on one hand, the law of attenuation is much less operative, and on the other the Philippi’s law is more operative than in Tiberian Hebrew. Moreover, vowels which in Tiberian Hebrew have transformed into seghol, in the Babylonian tradition, due to the lack this vowel, are represented either by pataḥ, or by ḥireq. Regarding the vowels in the environment of the gutturals, as has been already pointed out, in the Babylonian tradition no peculiarities occur. In most cases the gutturals were treated as regular consonants and did not require the lowering of the adjacent vowel.
DISAMBIGUATION OF CONSECUTIVE VERB FORMS:
The consecutive verbs were identical with "and" + ordinary verb until ca. 200 CE, when a specific pronunciation was invented for the consecutive past tense (e.g. wė yómer => wa yómer), but not for the consecutive future tense.
Ẁ ẁ Ẉ ẉ
Customized disambiguation Ṷ ṷ => WẠ Wạ wạ is marked with => ẈẠ Ẉạ ẉạ. OR INSTEAD: Ẉả Ẉả ẉả.
WĖ wė => Ṷ ṷ => WȨ wȩ / Ṵ ṵ [UNIQUE SYMBOLS: WȨ wȩ / Ṵ ṵ == disambiguated WĖ wė, NOT USED FOR OTHER PURPOSES]
Ẉ ẉ is always used in various u => w customized pronunciations, e.g. BEFORE BUMAF: ṷ bạrūķ // ẉe bạrūķ
          LITERAL NONCOMPLETED (present-future)
               Ordinary spelling = literal tense     Untypical spelling = reversed tense     Masoretic same spelling = reversed tense
                    wa ȧvạreķėķạ (M)                           wạ ȧvạrẹķ (M)              <=    * wa ȧvạrėķẹhū (M)
                    wė ehyeh (M)                               wạ ehyeh (M)
                    wė eqrā                                    wạ eqrā (M)
                    wė omar (M)                                wạ omar (M)
                    wi yėhī (M) / [ṷ yėhī]                     wȃ yėhī (M) / wa yehī (M)
                    wė yiqqạrē (M)                             wa yiqqạrē (M)
                    wė yiqrėū (M)                              wa yiqrā (M)
                    wi yėvạrẹķ (M) / [ṷ yėvạrẹķ]               wȃ yėvạreķ (M)
                    wi yėlammẹd (M) / [ṷ yėlammẹd]             wȃ yėlammėdū (M)
                    wė yómrū (M)                               wa yómer (M)
                    wė nihyeh (M)                              wa nihyeh (M)
          LITERAL COMPLETED (past-present)
               Ordinary spelling = literal tense     Masoretic same spelling = reversed tense     Custom disamb. reversed tense: WȨ wȩ / Ṵ ṵ, but wạ before initial shwa
                    wa ȧmartem                         ==      wa ȧmartem (M)                   =>      ẁạ ȧmartem
                    wė ạmar                            ==      wė ạmar (M)                              ẁȧ ạmar
                    wė qạrā (M, sometimes)             ==      wė qạrā (M)                      =>      wȩ qạrā / ṵ qạrā
                    wė hạyạh                           ==      wė hạyạh (M)                     =>      wȩ hạyạh / ṵ hạyạh
                    wė nivrėķū                         ==      wė nivrėķū (M)                   =>      wȩ nivrėķū / ṵ nivrėķū
                    ṷ bẹraķtīķạ / (ẉė bẹraķtīķạ)       ==      ṷ bẹraķtīķạ (M)                  =>      ẉạ bẹraķtīķạ
                    wė lạmdū                           ==      wė lạmdū (M)                     =>      wȩ lạmdū / ṵ lạmdū
AN AMBIGUOUS SCENARIO WITH OBJECT SUFFIX:
SINGULAR 2. M tėqạfatnī => tėqạftãnī
SINGULAR 3. F tėqạfatnī
VERB TENSES
* = Unstandardized or unusual expression, which attempts to precisely communicate the meaning of a verb tense that is not standardized or commonly used in russian language. If such verb tenses are used in strictly literal translation, the words written here in italics should be marked as not being literally based on the original text.
== he came hũ higgīắ negation: hũ ló higgīắ
== he comes hũ maggīắ negation: hũ ló maggīắ
== he will come hũ yaggīắ negation: hũ ló yaggīắ
==! he would come hũ hạyạh maggīắ negation: hũ ló hạyạh maggīắ
== it would be easy zeh yihyeh qal negation: zeh ló yihyeh qal
-- he is going to come hũ ômẹd lėhaggīắ negation: hũ ló ômẹd lėhaggīắ = he is about to / planning to come
-- he was going to come hũ ậmad lėhaggīắ negation: hũ ló ậmad lėhaggīắ = he was about / going to come
-- he is likely to come hũ ậlūl lėhaggīắ negation: hũ ló ậlūl lėhaggīắ = he will probably come
== saying be ọmrō / ọmrạɦ / ọmrạm = while currently saying
== saying ke ọmrō / ọmrạɦ / ọmrạm = in the manner of saying
-- the coming one [ ha (ȧšer) maggīắ ] (the participle meaning is not indicated clearly: this can be interpreted as simple present tense)
-- one who has come [ hạ (ȧšer) higgīắ ] (the participle meaning is not indicated clearly: this can be interpreted as simple past tense)
1. he had been coming hũ kvạr zėman mạh higgīắ negation: kvạr zėman mạh hũ ló higgīắ literally: he already some time came
2. he had been coming hũ ạz zėman mạh higgīắ negation: ạz zėman mạh hũ ló higgīắ literally: he then some time came
1. he has been coming hũ kvạr zėman mạh maggīắ negation: kvạr zėman mạh hũ ló maggīắ literally: he already some time comes
2. he has been coming hũ ắd kān hạyạh be bōō negation: hũ ắd kān ló hạyạh be bōō literally: he until here was in his coming
== he was coming hũ hạyạh be bōō / haggīô negation: hũ ló hạyạh be bōō / haggīô literally: he was in his coming
== he is coming hũ be bōō / haggīô / ọmrō negation: hũ ló be bōō / haggīô literally: he is in his coming
== he will be coming hũ yihyeh be bōō / haggīô negation: hũ ló yihyeh be bōō / haggīô literally: he will be in his coming
1. he had come hũ kvạr qōdem higgīắ negation: hũ ló higgīắ kvạr qōdem literally: he already earlier came
2. he had come hũ ạz qōdem higgīắ negation: hũ ló higgīắ ạz qōdem literally: he then earlier came
3. he had come hũ ạz kvạr qōdem higgīắ negation: hũ ạz ló higgīắ kvạr qōdem literally: he then already earlier came
4. he had come hũ ạz kvạr higgīắ mi qōdem negation: hũ ạz ló higgīắ mi qōdem literally: he then already came earlier
1. he has come hũ higgīắ mi qōdem negation: hũ ló higgīắ mi qōdem literally: he came earlier
2. he has come hũ ắd koh higgīắ negation: hũ ắd koh ló higgīắ literally: he until now / so far came
3. he has come [ hũ ắdaĭn higgīắ ] negation: hũ ắdaĭn ló higgīắ literally: he so far came
In historical verb forms, the prefix "and" is joined to the verb with a dash (-). An acceptable alternative spelling is to write the prefix "and" together with the verb. Both these writing styles indicate that the verb tense is reversed from completed to noncompleted, or vice versa:
wė dibber = and he spoke >==/==> wė-dibber / [ wedibber ] = and will speak
wi yėdabbẹr = he will speak >==/==> wa-yėdabbẹr / [ waydabbẹr ] = he spoke
MITT HE-DG 0.8 -- Standard for disambiguated grammar for hebrew language. Ion Mittler, 11 february 2025. Released in the public domain under CC0-1.0 license (Creative Commons 0 version 1.0). http://creativecommons.org/publicdomain/ zero/1.0/
Modern International Text Types — mitt.fi