MITT AR 0.95
MITT AR-RZ/NT 0.95 -- Standard for romanized and native-script text styles for arabic language.
This document contains many rare Unicode characters, whose name is not mentioned. If you need to know the Unicode name of some character, you can copy the character from this document to some website that identifies Unicode characters.
Letters marked with a hashtag # have a different design in the MITT fonts than you will see in this documentation with a standard font. Many of these design differences are explained in this document. A complete list of untypical letter designs related to the MITT text styles is available in the documentation of the MITT font.
This page is best viewed as the source code, in Notepad or other text editor.
Main versions of writing styles:
AR-RZ -- fluent romanized arabic:
** AR-RZ-P Precise romanized arabic. Uses uppercase and lowercase letters, and full traditional vowelization and double consonants. The definite article, prepositions and the word "and" are written as separate words. Indicates with diacritical marks, where traditional and modern mater lectionis would differ in the short vowels of foreign names. Foreign non-arabic proper nouns are written literally (letter by letter), not based on pronunciation (as in all other of these romanized text styles, except AR-RZ-V). In foreign proper nouns uses letters that are outside of the basic arabic alphabet, such as C, E, G, J, O, P, X, Ä, Å, Ö or Ü.
-- AR-RZ-D Detailed romanized arabic. Otherwise identical with AR-RZ-P, but imitates the selection of letters that are available in the basic arabic alphabet, and refrains from using any letters outside of it. Any vowels and consonants outside of the basic arabic alphabet become assimilated with the basic arabic letters and ambiguous with them. Foreign non-arabic proper nouns are written based on pronunciation.
-- AR-RZ-T Together-written detailed romanized arabic. Otherwise identical with AR-RZ-D, but prefixes and the word "and" are written together with the next word, without a space between these. Does not use the abbreviations ȳ or ů.
-- AR-RZ-C Casual romanized arabic. Otherwise identical with AR-RZ-P, but case endings are left unwritten (-a, -an, -i, -in, -u, -un).
-- AR-RZ-S Simple romanized arabic. Otherwise identical with AR-RZ-D, but case endings are left unwritten. Uses modern mater lectionis in the short vowels of foreign names, without indicating this with any specific diacritical marks.
-- AR-RZ-Z Together-written simple romanized arabic. Otherwise identical with AR-RZ-S, but prefixes and the word "and" are written together with the next word, without a space between these. Does not use the abbreviations ȳ or ů.
-- AR-RZ-O One-case precise romanized arabic. Identical with AR-RZ-P, but uses lowercase letters only.
-- AR-RZ-L Lowercase detailed romanized arabic. Identical with AR-RZ-D, but uses lowercase letters only.
-- AR-RZ-G Lowercase together-written detailed romanized arabic. Identical with AR-RZ-T, but uses lowercase letters only.
-- AR-RZ-A Lowercase casual romanized arabic. Identical with AR-RZ-C, but uses lowercase letters only.
-- AR-RZ-I Lowercase simple romanized arabic. Identical with AR-RZ-S, but uses lowercase letters only.
-- AR-RZ-E Lowercase together-written simple romanized arabic. Identical with AR-RZ-Z, but uses lowercase letters only.
-- AR-RZ-U Unvowelized lowercase romanized arabic. Identical with AR-RZ-E, but without vowelization or double consonants.
AR-BL -- arabic romanized with Basic Latin characters only (ASCII 32 - 126):
-- AR-BL-P Precise basic-latinized arabic.
-- ... ...
-- AR-BL-U Unvowelized lowercase basic-latinized arabic.
AR-NT -- arabic in the native script:
** AR-NT-P Precise arabic with arabic script. Logically identical with AR-RZ-P.
-- AR-NT-D Detailed arabic with arabic script. Logically identical with AR-RZ-D.
-- AR-NT-T Together-written detailed arabic with arabic script. Logically identical with AR-RZ-T. Does not differentiate if ạlif with hamzaɦ on top has vowel A, U or sukūn.
-- AR-NT-C Casual arabic with arabic script. Logically identical with AR-RZ-C.
-- AR-NT-S Simple arabic with arabic script. Logically identical with AR-RZ-S.
-- AR-NT-Z Together-written simple arabic with arabic script. Logically identical with AR-RZ-Z. Does not differentiate if ạlif with hamzaɦ on top has vowel A, U or sukūn.
-- AR-NT-O One-case precise arabic with arabic script. Logically identical with AR-RZ-O.
-- AR-NT-L Lowercase detailed arabic with arabic script. Logically identical with AR-RZ-L.
-- AR-NT-G Lowercase together-written detailed arabic with arabic script. Logically identical with AR-RZ-G. Does not differentiate if ạlif with hamzaɦ on top has vowel A, U or sukūn.
-- AR-NT-A Lowercase casual arabic with arabic script. Logically identical with AR-RZ-A.
-- AR-NT-I Lowercase simple arabic with arabic script. Logically identical with AR-RZ-I.
-- AR-NT-E Lowercase together-written simple arabic with arabic script. Logically identical with AR-RZ-E. Does not differentiate if ạlif with hamzaɦ on top has vowel A, U or sukūn.
-- AR-NT-U Unvowelized lowercase arabic with arabic script. Logically identical with AR-RZ-U.
This list contains many different arabic text styles, because there are many possible personal preferences, how arabic text should be written: Should there be a distinction between uppercase and lowercase letters? Should non-arabic vowels and consonants in foreign proper nouns be assimilated to the sounds that are found in arabic language? Should prefixes be written as separate words, or together with the next word? Should the nominative markers -u and -un be written, or should they be left unwritten? Should the text generally include vowels? These 13 text styles in latin and arabic script try to offer all probable combinations of personal preferences, which might exist among people in real life.
The arabic script traditionally does not have a distinction between uppercase and lowercase letters. The text styles AR-NT-P, -D, -T, -C, -S and -Z indicate uppercase letters with Unicode characters arabic sukun (ْ ) or vertical four dots (⁞), as is explained further below in this document. In these six text styles a sukūn is never used for its traditional purpose (to indicate that a consonant has no vowel), to avoid confusions about the meaning of sukūn in each case. Also the wașlaɦ symbol is not used in these text styles for its original purpose, but instead for indicating modern mater lectionis on a short a in a foreign name.
(Terms such as ạlif, sukūn, wașlaɦ, hamzaɦ, fatḥaɦ, ḑammaɦ etc. are written in this document as precise transliterations from arabic language, except when mentioning the name of a Unicode character that contains such an arabic word: the names of Unicode characters are always given precisely in their official form, which may contain arabic words that are written in a casual and simplified form.)
Typical texts in arabic script can be automatically converted into AR-RZ-U format (or AR-RZ-L format, if the text is vowelized). The unicase arabic script is interpreted as lowercase, when converting text to latin script, because most of the letters would be lowercase, if the text is manually converted into a richer text style that uses uppercase and lowercase letters.
Foreign non-semitic proper nouns are usually written based on pronunciation in the arabic script. The arabic style of writing foreign proper nouns is imitated also in such of these romanized standards, whose primary emphasis is replicating the arabic script text as such in latin script, rather than writing optimally convenient text with the latin script.
Only such latin and arabic Unicode characters have been deemed acceptable for these standards, which do not force the text row to be any higher than normal. Sometimes an arabic text standard uses a latin Unicode character, even if a similar character were possible to achieve as a combination of an arabic letter (visually identical to the latin letter) and a separate diacritical mark. Single Unicode characters are always favoured, because they produce the expected visual look more reliably and precisely in various software.
The AR-BL text styles may not be esthetically very pleasant to read. Their purpose is to provide an easy and safe way to write and store arabic text (with the probable intention to later display the text in a more fluently readable text style), using the most universally supported Basic Latin character set only (ASCII 32 - 126): a...z A...Z 0 1 2 3 4 5 6 7 8 9 . , : ; - ' " ! ? ( ) [ ] { } < > / | \ _ ~ = + * ^ ` @ # $ & %.
Sample text, for comparing these text styles:
** AR-RZ-P Hȧḓihi ɛl marᵓaŧu, Charlotte [{Šąrlǫt}], tusakkinu fī ɛl Qāhiraŧi. Ụḫtahā mudarrisaŧủ lil-riyāḑiȳāti.
-- AR-RZ-D Hȧḓihi ɛl marᵓaŧu, Šąrlųt {[Șharlutti]}, tusakkinu fī ɛl Qāhiraŧi. Ụḫtahā mudarrisaŧủ lil-riyāḑiȳāti.
-- AR-RZ-T Hȧḓihi ɛl-marᵓaŧu, Šąrlųt {[Șharlutti]}, tusakkinu fī ɛl-Qāhiraŧi. Ụḫtahā mudarrisaŧủ lir-riyāḑiyyāti.
-- AR-RZ-C Hȧḓihi ɛl marᵓaɦ, Charlotte [{Šąrlǫt}], tusakkinu fī ɛl Qāhiraɦ. Ụḫtahā mudarrisaɦ lil-riyāḑiȳāt.
-- AR-RZ-S Hȧḓihi ɛl marᵓaɦ, Šārlūt {[Șharlutti]}, tusakkinu fī ɛl Qāhiraɦ. Ụḫtahā mudarrisaɦ lil-riyāḑiȳāt.
-- AR-RZ-Z Hȧḓihi ɛl-marᵓaɦ, Šārlūt {[Șharlutti]}, tusakkinu fī ɛl-Qāhiraɦ. Ụḫtahā mudarrisaɦ lir-riyāḑiyyāt.
-- AR-RZ-O hȧḓihi ɛl marᵓaŧu, charlotte [{šąrlǫt}], tusakkinu fī ɛl qāhiraŧi. ụḫtahā mudarrisaŧủ lil-riyāḑiȳāti.
-- AR-RZ-L hȧḓihi ɛl marᵓaŧu, šąrlųt {[șharlutti]}, tusakkinu fī ɛl qāhiraŧi. ụḫtahā mudarrisaŧủ lil-riyāḑiȳāti.
-- AR-RZ-G hȧḓihi ɛl-marᵓaŧu, šąrlųt {[șharlutti]}, tusakkinu fī ɛl-qāhiraŧi. ụḫtahā mudarrisaŧủ lir-riyāḑiyyāti.
-- AR-RZ-A hȧḓihi ɛl marᵓaɦ, charlotte [{šąrlǫt}], tusakkinu fī ɛl qāhiraɦ. ụḫtahā mudarrisaɦ lil-riyāḑiȳāt.
-- AR-RZ-I hȧḓihi ɛl marᵓaɦ, šārlūt {[șharlutti]}, tusakkinu fī ɛl qāhiraɦ. ụḫtahā mudarrisaɦ lil-riyāḑiȳāt.
-- AR-RZ-E hȧḓihi ɛl-marᵓaɦ, šārlūt {[șharlutti]}, tusakkinu fī ɛl-qāhiraɦ. ụḫtahā mudarrisaɦ lir-riyāḑiyyāt.
-- AR-RZ-U hḓh ᵻlmrɵɦ, šᵻrlwt {[șhrlt]}, tskn fy ᵻlqᵻhrɦ. ɵḫthᵻ mdrsɦ llryᵻḑyᵻt.
-- AR-BL-P Ha+^dihi @l mar.at|u, Charlotte [{^Sa~rlo~t}], tusakkinu fi= @l Qa=hirat|i. ,U^ktaha= mudarrisat|u` lil-riya=`diyya=ti.
-- AR-BL-U h^dh /lmr:h|, ^s/rlwt {[`shrlt]}, tskn fy /lq/hrh|. :^kth/ mdrsh| llry/`dy/t.
-- AR-NT-P ﹿهٰذِهِ الـمَرأةُ،‏ ﹿچهَرلٚتّٖ [{ﹿﺷَﭑرلٚوۢت}]،‏ تُسَكِّنُ فِي الـْقَاهِرَةِ.‏ ﹿا٘ختَهَا مُدَرِّسَةٌ لِلـرِّيَاضِيَّاتِ.‏
-- AR-NT-D ﹿهٰذِهِ الـمَرأةُ،‏ ﹿﺷَﭑرلوۢت {[ﹿصهَرلُتِّ]}،‏ تُسَكِّنُ فِي الـْقَاهِرَةِ.‏ ﹿا٘ختَهَا مُدَرِّسَةٌ لِلـرِّيَاضِيَّاتِ.‏
-- AR-NT-T ﹿهٰذِهِ المَرأةُ،‏ ﹿﺷَﭑرلوۢت {[ﹿصهَرلُتِّ]}،‏ تُسَكِّنُ فِي الْقَاهِرَةِ.‏ ﹿا٘ختَهَا مُدَرِّسَةٌ لِلرِّيَاضِيَّاتِ.‏
-- AR-NT-C ﹿهٰذِهِ الـمَرأة،‏ ﹿچهَرلٚتّٖ [{ﹿﺷَﭑرلٚوۢت}]،‏ تُسَكِّنُ فِي الـْقَاهِرَة.‏ ﹿا٘ختَهَا مُدَرِّسَة لِلـرِّيَاضِيَّات.‏
-- AR-NT-S ﹿهٰذِهِ الـمَرأة،‏ ﹿﺷَﺎرلُوت {[ﹿصهَرلُتِّ]}،‏ تُسَكِّنُ فِي الْـقَاهِرَة.‏ ﹿا٘ختَهَا مُدَرِّسَة لِلـرِّيَاضِيَّات.‏
-- AR-NT-Z ﹿهٰذِهِ المَرأة،‏ ﹿﺷَﺎرلُوت {[ﹿصهَرلُتِّ]}،‏ تُسَكِّنُ فِي الْقَاهِرَة.‏ ﹿا٘ختَهَا مُدَرِّسَة لِلرِّيَاضِيَّات.‏
-- AR-NT-O هٰذِهِ الـمَرأةُ،‏ چهَرلٚتّٖ [{ﺷَﭑرلٚوۢت}]،‏ تُسَكِّنُ فِي الـقَاهِرَةِ.‏ ا٘ختَهَا مُدَرِّسَةٌ لِلـرِّيَاضِيَّاتِ.‏
-- AR-NT-L هٰذِهِ الـمَرأةُ،‏ ﺷَﭑرلوۢت {[صهَرلُتِّ]}،‏ تُسَكِّنُ فِي الـقَاهِرَةِ.‏ ا٘ختَهَا مُدَرِّسَةٌ لِلـرِّيَاضِيَّاتِ.‏
-- AR-NT-G هٰذِهِ المَرأةُ،‏ ﺷَﭑرلوۢت {[صهَرلُتِّ]}،‏ تُسَكِّنُ فِي القَاهِرَةِ.‏ ا٘ختَهَا مُدَرِّسَةٌ لِلرِّيَاضِيَّاتِ.‏
-- AR-NT-A هٰذِهِ الـمَرأة،‏ چهَرلٚتّٖ [{ﺷَﭑرلٚوۢت}]،‏ تُسَكِّنُ فِي الـقَاهِرَة.‏ ا٘ختَهَا مُدَرِّسَة لِلـرِّيَاضِيَّات.‏
-- AR-NT-I هٰذِهِ الـمَرأة،‏ ﺷَﺎرلُوت {[صهَرلُتِّ]}،‏ تُسَكِّنُ فِي الـقَاهِرَة.‏ ا٘ختَهَا مُدَرِّسَة لِلـرِّيَاضِيَّات.‏
-- AR-NT-E هٰذِهِ المَرأة،‏ ﺷَﺎرلُوت {[صهَرلُتِّ]}،‏ تُسَكِّنُ فِي القَاهِرَة.‏ ا٘ختَهَا مُدَرِّسَة لِلرِّيَاضِيَّات.‏
-- AR-NT-U هذه المرأة،‏ ﺷﺎرلوت {[ﺻﻬرلت]}،‏ تسكن في القاهرة.‏ أَختها مدرسة للرياضيات.‏
Logical convertibility between these text standards (based on the text itself only, without using any dictionary data or human help):
               ROMANIZED ARABIC (AR-RZ)                            NATIVE SCRIPT ARABIC (AR-NT)
               --------------------------------------------------  --------------------------------------------------
               -P  -D  -T  -C  -S  -Z  -O  -L  -G  -A  -I  -E  -U  -P  -D  -T  -C  -S  -Z  -O  -L  -G  -A  -I  -E  -U
* AR-RZ-P  =>      OK  OK  OK  OK  OK  OK  OK  OK  OK  OK  OK  OK  OK  OK  OK  OK  OK  OK  OK  OK  OK  OK  OK  OK  OK
- AR-RZ-D  =>  ->      OK  ->  OK  OK  ->  OK  OK  ->  OK  OK  OK  ->  OK  OK  ->  OK  OK  ->  OK  OK  ->  OK  OK  OK
- AR-RZ-T  =>  ->  OK      ->  OK  OK  ->  OK  OK  ->  OK  OK  OK  ->  OK  OK  ->  OK  OK  ->  OK  OK  ->  OK  OK  OK
- AR-RZ-C  =>  ->  ->  ->      OK  OK  ->  ->  ->  OK  OK  OK  OK  ->  ->  ->  OK  OK  OK  ->  ->  ->  OK  OK  OK  OK
- AR-RZ-S  =>  ->  ->  ->  ->      OK  ->  ->  ->  ->  OK  OK  OK  ->  ->  ->  ->  OK  OK  ->  ->  ->  ->  OK  OK  OK
- AR-RZ-Z  =>  ->  ->  ->  ->  OK      ->  ->  ->  ->  OK  OK  OK  ->  ->  ->  ->  OK  OK  ->  ->  ->  ->  OK  OK  OK
- AR-RZ-O  =>  ->  ->  ->  ->  ->  ->      OK  OK  OK  OK  OK  OK  ->  ->  ->  ->  ->  ->  OK  OK  OK  OK  OK  OK  OK
- AR-RZ-L  =>  ->  ->  ->  ->  ->  ->  ->      OK  ->  OK  OK  OK  ->  ->  ->  ->  ->  ->  ->  OK  OK  ->  OK  OK  OK
- AR-RZ-G  =>  ->  ->  ->  ->  ->  ->  ->  OK      ->  OK  OK  OK  ->  ->  ->  ->  ->  ->  ->  OK  OK  ->  OK  OK  OK
- AR-RZ-A  =>  ->  ->  ->  ->  ->  ->  ->  ->  ->      OK  OK  OK  ->  ->  ->  ->  ->  ->  ->  ->  ->  OK  OK  OK  OK
- AR-RZ-I  =>  ->  ->  ->  ->  ->  ->  ->  ->  ->  ->      OK  OK  ->  ->  ->  ->  ->  ->  ->  ->  ->  ->  OK  OK  OK
- AR-RZ-E  =>  ->  ->  ->  ->  ->  ->  ->  ->  ->  ->  OK      OK  ->  ->  ->  ->  ->  ->  ->  ->  ->  ->  OK  OK  OK
- AR-RZ-U  =>  --  --  --  --  --  --  --  --  --  --  --  --      ->  ->  ->  ->  ->  ->  ->  ->  ->  ->  ->  ->  OK
* AR-NT-P  =>  OK  OK  OK  OK  OK  OK  OK  OK  OK  OK  OK  OK  OK      OK  OK  OK  OK  OK  OK  OK  OK  OK  OK  OK  OK
- AR-NT-D  =>  ->  OK  OK  ->  OK  OK  ->  OK  OK  ->  OK  OK  OK  ->      OK  ->  OK  OK  ->  OK  OK  ->  OK  OK  OK
- AR-NT-T  =>  ->  OK  OK  ->  OK  OK  ->  OK  OK  ->  OK  OK  OK  ->  OK      ->  OK  OK  ->  OK  OK  ->  OK  OK  OK
- AR-NT-C  =>  ->  ->  ->  OK  OK  OK  ->  ->  ->  OK  OK  OK  OK  ->  ->  ->      OK  OK  ->  ->  ->  OK  OK  OK  OK
- AR-NT-S  =>  ->  ->  ->  ->  OK  OK  ->  ->  ->  ->  OK  OK  OK  ->  ->  ->  ->      OK  ->  ->  ->  ->  OK  OK  OK
- AR-NT-Z  =>  ->  ->  ->  ->  OK  OK  ->  ->  ->  ->  OK  OK  OK  ->  ->  ->  ->  OK      ->  ->  ->  ->  OK  OK  OK
- AR-NT-O  =>  ->  ->  ->  ->  ->  ->  OK  OK  OK  OK  OK  OK  OK  ->  ->  ->  ->  ->  ->      OK  OK  OK  OK  OK  OK
- AR-NT-L  =>  ->  ->  ->  ->  ->  ->  ->  OK  OK  ->  OK  OK  OK  ->  ->  ->  ->  ->  ->  ->      OK  ->  OK  OK  OK
- AR-NT-G  =>  ->  ->  ->  ->  ->  ->  ->  OK  OK  ->  OK  OK  OK  ->  ->  ->  ->  ->  ->  ->  OK      ->  OK  OK  OK
- AR-NT-A  =>  ->  ->  ->  ->  ->  ->  ->  ->  ->  OK  OK  OK  OK  ->  ->  ->  ->  ->  ->  ->  ->  ->      OK  OK  OK
- AR-NT-I  =>  ->  ->  ->  ->  ->  ->  ->  ->  ->  ->  OK  OK  OK  ->  ->  ->  ->  ->  ->  ->  ->  ->  ->      OK  OK
- AR-NT-E  =>  ->  ->  ->  ->  ->  ->  ->  ->  ->  ->  OK  OK  OK  ->  ->  ->  ->  ->  ->  ->  ->  ->  ->  OK      OK
- AR-NT-U  =>  ->  ->  ->  ->  ->  ->  ->  ->  ->  ->  ->  ->  OK  --  --  --  --  --  --  --  --  --  --  --  --  
A dash -- or arrow -> is used in this diagram, if the conversion between two text styles is not supported, because the source text does not contain enough information for logically concluding the correct ortography in the target text style (in all possible scenarios). If such an unsupported conversion is requested, the default option is not to convert the source text at all. However, if it is deemed preferable to convert the text into the closest possible text style (most notably, when the script should change from native to romanized, or vice versa), the arrows point towards the text style that is the recommendable substitute for the unsupported conversion. A dash -- is used in the diagram, when no other logically supported substitutes are available (in the same script) than the source text style itself.
Foreign proper nouns cannot be automatically converted between a literal format (replicated letter by letter) and a transliteration based on pronunciation. Such a conversion would be reliably possible only if both the literal and the pronounced form of the name are documented in the source text, using some kind of tags or footnotes. This table of logical convertibility between the text styles ignores this aspect of foreign proper nouns, and promises convertibility from a text style to another, if no other logical obstacles exist for the conversion than the writing of foreign proper nouns being based on different principles.
The sample texts afore use such a notation that the form based on pronunciation is given in [{curly brackets inside square brackets}], and the literal form is given in {[square brackets inside curly brackets]}, after the spelling that is chosen for the main text. Thus it would be possible to automatically recognize, which of the two formats is the literal one. These codes and alternative spellings are not intended to be seen by the human reader in the main text.
These text styles use a strict logical correlation between latin letters and arabic script letters, so that the text can be converted back and forth between arabic script and latin script, and the text should stay exactly similar through all these conversions, without any changes caused by the conversion process back and forth. However, foreign proper nouns are not always fully compatible with this system. In some scenarios it is possible that converting a foreign proper noun from latin script to arabic script, and then back into latin script, produces a different spelling in latin script than was the original form of the name. This can happen because these romanized arabic text styles are optimized for fluent reading and strict logical compatibility with the arabic script, not strict compatibility with the way how other languages use the latin script.
Technical reliability of these text styles, as Unicode characters:
While writing arabic script into this document with Notepad, Notepad++ and Microsoft Word in Windows 11, the following irregular behaviours were witnessed in these programs, which are among the most widely used text and code editing software in the world:
A few times it happened in Notepad that the arabic vowel "a" (fatḥaɦ, َ ) spontaneously fell from the consonant that it was riding, and became a separately printed character after the arabic consonant, riding on a dotted circle instead. Most notably (but not uniquely) in the letter combination ﺷﭑ , where letter šīn is followed by ạlifủ ɛl wașli, the vowel "a" on the šīn tended to fall from the consonant, becoming a separate character. When this happened, it usually happended in all similar letter combinations everywhere in this document. Also other arabic vowels behaved similarly in the same scenario, so the issue was not related to vowel "a" only: the arabic consonant refused to take any vowel on itself, on these occasions. Closing and reopening the text file did not fix this problem. However, the text looked perfectly OK, if it was copy-pasted from Notepad into any other software. But not if copy-pasted from the other software back to Notepad. So the problem was not in the text itself, but in the way how Notepad handled the text. To fix the problem in Notepad, it was necessary to rewrite the characters manually, and then the text would look OK for a while. It was possible to copy-paste the text into the same text file, and it still looked OK, but copy-pasting it into a different tab in the same Notepad program would again cause the vowel "a" to fall from its arabic consonant. Also closing the file and reopening it usually brought the problem back. There seems to be some irregularity in the way how Notepad processes arabic script text in different situations: the process is not always identical.
Another undesirable behaviour, which was witnessed in Notepad only, is that the Unicode character arabic sukun medial form (ﹿ) has two different forms: One where the taṱwīl (ـ), an empty section of the base text line, is on the same level as the base of most of the arabic letters usually are. And another form, where the taṱwīl is located much higher. Such a form of this character was not witnessed in other programs: if text with the taṱwīl located high was copy-pasted from Notepad into other programs, they displayed the character with the taṱwīl on the normal base level.
In some cases the sukūn medial form with a high taṱwīl joined some of the next arabic characters (until the next character that does not have a medial form). If this happened, Notepad usually changed the form of the sukūn medial form, so that its taṱwīl was on the level of the normal text base line. However, sometimes (expecially inside square of curly brackets) Notepad showed all these joined letters in a smaller than usual size, and the base line of these letters was on the same higher than usual level as in the sukūn medial form. The rest of the word after these joined letters was written with ordinary-sized letters and with the base line at an ordinary height. If exactly the same word was written several times in the document, the joined letters were written with a high text base line in all cases that were inside square or curly brackets, but elsewhere in the document (not inside brackets) with larger letters and with the ordinary height of text base line. This behaviour was witnessed in Notepad only. If the text with a high base line was copy-pasted into other programs, they showed the text with an ordinary letter size and an ordinary height of text base line.
These joined letters with a high text base line had two further irregular behaviour patterns in Notepad. They were more prone than other text scenarios to dropping a vowel from a consonant to ride on a dotted circle. If such a set of letters was copy- pasted into a different tab in the same Notepad program, the joined letters were displayed in the wrong order, from left to right. But when any other RTL character was added on the same text row, these letters were restored into their correct order from right to left. Using the Undo at this moment rolled back the situation, removing the added RTL letter, but the joined letters still remained in the correct RTL order. Again Notepad evidently used two different ways (or in fact: three different way) of processing exactly the same sequence of Unicode characters in arabic script, with exactly the same text directionality setting (LTR) in the text area.
Editing any text in arabic script was found to be quite impossible in Notepad++, at least when the text direction setting is LTR in the document. If a character in the arabic script text was added, changed or changed, the change took place in a different part of the text than where the cursor was at the moment: a different letter was deleted, or the new character appeared at a different location on the text row. If an arabic character was edited near the end of text row, changes happened near the beginning of text row, and vice versa. If an arabic character at the start of text row was copied on clipboard, the clipboard would actually contain a letter from the end of text row. Notepad++ apparently mixed the concepts of the text direction setting (LTR) and the printing order of characters (RTL), and changed or copied the "xth letter" on the text row -- counting from the wrong end. For this reason, all changes in arabic script texts were done in Notepad, and then copy-pasted into Notepad++ as full text rows.
One of the most dramatic problems with arabic text was witnessed in Microsoft Word 365 (desktop installation), which did not have the support for arabic language configured: all brackets ( { [ ] } ) had the normal directionality, as in any LTR text. Also when these characters were within arabic RTL text. All software usually reverse the various brackets in RTL text, so that the Unicode character "(" is displayed as ")", and so on. Microsoft Word failed to reverse these characters, even if the text directionality setting was RTL in the text area. The (arabic script) text should have looked like this: "I tasted éclairs (a local delicacy) in Paris." But Microsoft Word showed it like this: "I tasted éclairs )a local delicacy( in Paris." This problem is not limited to Microsoft Word, but happens in various software, which may fail to reverse the brackets in RTL text in various circumstances, for various reasons. Such problems should have been foreseen by Unicode, and the standard should never have invented the expectation that software "should" reverse the brackets in certain circumstances. When software "should" do something, the foreseeable reality is that some software will do the right thing, and some others will not -- so the same Unicode text will look perfect in some software, and terrible in some other software.
These examples highlight the fact that RTL scripts generally (and perhaps the arabic script particularly) are not supported 100 % reliably by even the most widely used text editing software in the western world, at least when the writing direction setting is not RTL in the document, or the software installation is not specifically configured to support arabic language. Therefore the arabic script text styles, which are defined in this document, will probably not perform 100 % reliably either, and various technical anomalies may arise also with these text styles.
To increase the grammatical information content of the text, these text styles use many uncommon diactirical marks and special characters, both in latin and in arabic script. This causes a higher risk that some fonts or software will fail to display the text correctly and beautifully. One of the esthetic risks is that some letters in the text are displayed with a different font than the rest of text. This happens if the primary font does not include some rare character. In that case the software will use any other font that contains the character. If none of the available fonts contains the character, then the software probably displays some generic character, such as a square or a question mark, for example.
Punctuation in arabic script:
Modern arabic uses some specific arabic punctuation -- such as the inverted arabic comma, reversed arabic question mark, and the inverted arabic semicolon -- but also ordinary western/latin punctuation, such as the comma, colon or exclamation mark. This situation is not completely unproblematic, because the Unicode characters for western/latin punctuation have LTR (left-to-right) writing direction, while arabic text is RTL (right-to-left). Such a mixture of RTL and LTR characters functions perfectly most of the time, but sometimes it happens that an LTR punctuation mark at the end of a row of RTL text gets wrongly placed as the first letter of the row. It should looke like this, so that period is the last character in a text row, which is read from right to left:
.CCCC-BB-AAA
But it looks like the text below instead, as the software has erroneously moved the period from the end to the beginning of the text row:
CCCC-BB-AAA.
This problem happens when RTL text is written in a text area, whose writing direction setting is LTR. Unicode assumes a bit over-optimistically that people will always choose the correct writing direction setting for the text area. This is not a realistic assumption: many people do not even know what they should do, or then the software where text is written does not offer any way how to do it. It was originally planned that these text styles in RTL arabic script would use RTL characters only as punctuation, to avoid the problems that sometimes arise with LTR punctuation. However, a sufficient number of widely supported and correctly behaving Unicode characters was not found, which would look visually similar to the most commonly used latin punctuation, and which do not force the text row to be higher than is typical for arabic script.
Therefore, the chosen solution is to always write a Unicode right-to-left mark "‏" (‏ / ‏) after each LTR punctuation mark in these RTL text styles. This seems to solve the problem of displaced LTR punctuation as the last character of a text row. Unfortunately this means that the text will contain invisible characters, which are necessary for the text to be displayed correctly in all possible circumstances, but people cannot see them, and many people will not perceive their existence, purpose and necessity. The best solution would be that RTL text uses RTL punctuation only (which would remove the need for any invisible characters or requirements for the writing direction settings of the text area), but unfortunately the Unicode standard does not include a satisfactory selection of RTL punctuation.
These arabic script text styles use the following arabic RTL punctuation: arabic comma (،), arabic semicolon (؛) and arabic question mark (؟). Also these punctuation marks are supported: arabic date separator (؍), arabic decimal separator (٫), arabic thousands separator (٬), arabic five-pointed star (٭), arabic percent sign (٪), arabic-indic per mille sign (؉), arabic-indic per ten thousand sign (؊), arabic numerals ٠ ١ ٢ ٣ ٤ ٥ ٦ ٧ ٨ ٩. Some situations have been witnessed, where the Unicode character for arabic percent sign was poorly supported by fonts in the western world. The arabic comma (،) as the last character of a text row has been witnessed to behave similarly as LTR punctuation, sometimes jumping to the start of row as the last character of text row. Therefore these text styles write a Unicode right-to-left mark also after every arabic comma, to force it to behave correctly in all possible circumstances.
Here is a list of some RTL punctuation, which were considered and tested for a while, before the approach of RTL punctuation only was abandoned:
: => ׃ Colon: hebrew punctuation sof pasuq.
; => ؛ Semicolon: arabic semicolon.
? => ؟ Question mark: arabic question mark.
- => ־ Hyphen: hebrew punctuation maqaf.
— => ־־ Em dash: two hebrew punctuation maqafs.
_ => ߺ‎ Low line (underscrore): nko lajanyalan. (This character can force the text row height to be a bit higher than is typical in arabic script. Another possibility is to use two arabic tatweels "ــ", if it is carefully avoided that they will not join any arabic letters.)
* => ٭ Asterisk: arabic five-pointed star.
| => ׀ Vertical bar: hebrew punctuation paseq.
' => ׳ Apostrophe: hebrew punctuation geresh.
" => ״ Quotation mark: hebrew punctuation gershayim.
“ => ߵߵ Left double quotation mark: two nko low tone apostrophes.
” => ߴߴ Right double quotation mark: two nko high tone apostrophes.
` => ߵ Grave accent, used in the role of left single quotation mark: nko low tone apostrophe.
´ => ߴ Acute accent, used in the role of right single quotation mark: nko high tone apostrophe. (In some fonts this character is nearly identical with hebrew punctuation geresh. All these nko tone apostrophes force the text row height to be a bit higher than is typical in arabic script.)
The romanized MITT text styles may use Unicode character RATIO (∶) to indicate an abbreviation, if a foreign proper noun is an abbreviation and contains some abbreviation marker between the letters (not as the last letter).
Ạlif with hamzaɦ on top in arabic script:
Ạlif with hamzaɦ on top is an ambiguous character, which can mean three different things: the ạlif can have vowel A, vowel U, or sukūn (which means that there is no vowel). A vowel symbol or sukūn would be necessary on the ạlif, if we want to make the text unambiguous and precise -- as is required by many of these text styles. However, the Unicode standard does not include a presentation form for a combination of ạlif, hamzaɦ on top, and a sukūn or vowel sign. Some software support such a combination nevertheless, but many software do not, and the outcome might be that the vowel sign or sukūn is displayed separately, perhaps riding on a dotted circle ( ۫ ).
These text styles generally avoid using such Unicode characters, which are known to be poorly supported by various software in the western world. Therefore we have a problem: we cannot write these characters in the style that is expected by arabic grammar, as this is poorly supported by the Unicode standard and various software. We need to find some other way to write these three scenarios, so that each of them looks different, and is therefore unambiguous. To solve this problem, most of these native script text standards (except AR-NT-T, -Z, -G, -E and -U) use the conventions explained below -- some of which are not self-evident for a reader who has never heard about these conventions:
أ hamzaɦ on ạlif => ạlif + hamzaɦ + A
اّ ạlif + šaddaɦ => ạlif + hamzaɦ + U
ا٘ ạlif + nūn ġunnaɛ => ạlif + hamzaɦ + sukūn
It would be technically possible to write an ạlif with sukūn, and decide that it means ạlif + hamzaɦ + sukūn. However, some of these text styles use a sukūn for indicating uppercase letters (as is explained below), and therefore we avoid using a sukūn for any other purpose, under any circumstances.
Uppercase and lowercase letters:
Arabic script is traditionally unicase, without a separate set of uppercase and lowercase letters. The visual shape of arabic letters is not favourable for a "small caps" style either, where the distinction between uppercase and lowercase would be indicated with letter size, while the letter shapes are identical. Text styles AR-NT-P, -D, -T, -C, -S and -Z treat the letters of traditional arabic script as lowercase, and use Unicode character arabic sukūn (ْ ) to indicate uppercase letters. (To avoid misunderstandings, these text styles never use sukūn for any other purpose than this.)
Software and fonts usually do not support adding a sukūn on a consonant that already has a vowel. (This would be logically contradictory, as the grammatical meaning of sukūn is that the consonant has no vowel.) These text styles always write a sukūn before the uppercase letter, not above it. If a vowelless consonant is marked as uppercase by adding a sukūn directly on the letter, these text styles do recognize it as an alternative way of marking the letter as uppercase, however (except not on the L of a definite article, as is explained further below). The preferred style is to write the sukūn before the uppercase letter also when it would be technically possible to write it directly above the uppercase letter, because before the letter the sukūn usually stands out visually from the text more strongly. One of the main purposes of uppercase letters is to visually stand out from the mass of lowercase letters.
If the capitalized letter is the first letter of the word (and the word has no prefix), the preferable style would be that the sukūn rests on a space " " before the first letter of the word, as shown below:
شَارِعٌ ْ
Šāriˁủ
This style contains some esthetic risks, because fonts and software sometimes add a dotted circle ۫ under a solitary sukūn that is not riding on any arabic letter. This problem can be avoided by using a kharoshthi small circle (𐩑) instead, which does not have the tendency to get a dotted circle under it. This character is not completely safe either, however, because software may allow the word to break after this character, so that it is left alone on the previous row, and the rest of the word will be on the next row, as is shown below:
𐩑هِيَ تتُسَكِّنُ فِي الْـقَاهِرَةِ،‏ فِي 𐩑
شَارِعِ الْـجَمِيلِ.‏
To ensure that this will not happen, it would be possible to use the Unicode character word joiner "⁠" (⁠) between kharoshthi small circle and the next letter. Many fonts or software display this control character as a square ◻, however, so again the hard reality is that too many fonts and software support poorly this feature of the Unicode standard, which makes this feature too risky to be recommended for generic use.
Because of such esthetic risks of a sukūn riding on a space (and its potential visual RTL substitute, the kharoshthi small circle), these text styles prefer to use a sukūn riding on taṱwīl (ـ), an empty section of the base text line. When a sukūn needs to be added between two arabic letters that are joined together (drawn with one continuous line), the Unicode character arabic tatweel (ـ) is added between these letters and joins both of them, so that they retain their current joined shape. In all other situations, such as at the beginning of a word, these text styles use the Unicode character arabic sukun medial form (ﹿ), which has not joined any other arabic letters in some limited tests, and the letters next to this added character have retained their non-joined shape. (It might not be catastrophic, if a taṱwīl causes the arabic letters of a word to change their shape, but these text styles prefer that all letters of the arabic word retain the shape that they would have without the added sukūn or taṱwīl.)
ﹿهِيَ تتُسَكِّنُ فِي الـْقَاهِرَةِ،‏ فِي ﹿشَارِعِ الـْجَمِيلِ.‏
ﹿهِيَ تتُسَكِّنُ فِي الْـقَاهِرَةِ،‏ فِي ﹿشَارِعِ الْـجَمِيلِ.‏
ﹿهِيَ تتُسَكِّنُ فِي الْقَاهِرَةِ،‏ فِي ﹿشَارِعِ الْجَمِيلِ.‏
Hiya tusakkinu fī ɛl Qāhiraŧi, fī Šāriḯ ɛl Ǧamīli.
The text above contains four capitalized letters, two of which are indicated with a kharoshthi small circle (𐩑), and two are indicated with a sukūn (ْ ) in a word that has a definite article. The three rows in arabic script demonstrate three different ways for indicating an uppercase letter after a definite article. These text style standards will correctly understand all these three alternative styles of writing. On the first row the sukūn is riding on a taṱwīl (ـ), which is the Unicode character arabic tatweel, an empty section of the base text line between the definite article and the next word. (Normally arabic text does not have a taṱwīl in such a place, but the "separately written" styles of these text standards use it in these circumstances.) On the second and third rows the sukūn is riding on the L of the definite article.
Theoretically the most logical way of indicating an uppercase letter after a definite article is to write a sukūn on a taṱwīl (ـ), which is added between the definite article and the next word, similarly as on the first arabic script row above. The "together-written" text styles prefer to avoid adding any taṱwīls, however, and the sukūn is written on the L of the definite article, as on the third arabic script row above. By the general principle, this should mean that the letter L of the definite article is uppercase, but all these text styles interpret this special case so that the first word of the next word is uppercase, not the L of the definite article. (It is theoretically possible to force the L of the definite article to be uppercase, by adding a taṱwīl before the L, and writing the sukūn on this taṱwīl.)
Also the "separately written" text styles prefer this method of writing the sukūn on the L of the definite article (as on the second arabic script row above), despite the fact that there is a taṱwīl available, on which the sukūn could be written. The reason for this preference is that above the L the sukūn will be located a bit higher, which makes the sukūn visually stand out more clearly from the text. It is preferable that the indicators of uppercase letters stand out from the text visually.
When a capitalized word is preceded by the word "and", or a preposition that does not end with a vowelless consonant, also the "together-written" text styles are forced to use an additional taṱwīl or space between these words. The preposition bi- is followed by a sukūn riding on a taṱwīl. The word "and" (wa) is followed by a kharoshthi small circle, without a taṱwīl. (Adding a taṱwīl and sukūn is reasonable only between two letters that have joined each other. The letter wāw never joins the next letter, so it is followed by a kharoshthi small circle and no taṱwīl. The "separately written" text styles would add also a six-per-em space " " between the wāw and kharoshthi small circle, as will be explained further below.)
ﹿوَﹿخَلِيل يُؤمِنُ بِـْسَمِيرِ.‏
Wa Ḫalīl yuᶟminu bi Samīri.
A whole word (or a longer text) can be indicated as uppercase with vertical four dots ⁞⁞ ... ⁞⁞⁞, or in AR-BL: ;; ... ;;;. (Also tricolon ⁝ was originally considered for this purpose, but it is much less widely supported by fonts.) These are LTR characters, and therefore each vertical four dots is always followed by a right-to-left mark, to ensure that the text would behave correctly in all possible circumstances.
ﹿرَأينَا إعلَانً عَلَى الـبَوَابَةِ:‏ ⁞‏⁞‏ "‏اِحتَرِس مِنَ الـكَلبِ!‏"‏ ⁞‏⁞‏⁞‏
Raaynā ịˁlānả ắlaɛ ɛl bawābaŧi: ⁞⁞ "IḤTARIS MINA ḚL KALBI!" ⁞⁞⁞
Writing some small words separately, or together with the next word:
The definite article, prepositions and the word "and" are traditionally written together with the next word in arabic script. Most of these romanized and arabic script text styles write them as separate words, however. The romanized text styles usually use an ordinary space " " between one of these small words and the next word. The arabic script text styles use a taṱwīl (ـ).
If a definite article or some other prefix causes the first letter of the next word to be doubled (as often happens with the "sun letters" T Ṫ D Ḓ R Z S Š Ș Ḑ Ṱ Ż L N), all these romanized text styles use a hyphen (-) between the definite article and the main word, without doubling the first letter of the main word. The hyphen serves as an indicator of the doubling of a letter.
Al ḫubzu wa ɛl-nabīḓu, lil-ṱaâmi. Al ḥamdu li-Lȧhi.
In this example the letters Ḫ and Ḥ are not doubled, because they are not "sun letters". The letters N, Ṱ and the last L are doubled, but this is indicated with a hyphen, rather than writing these letters twice. It is permissible to write the doubled letter twice, though (the conversion algorithms will understand such a style correctly, when converting the text to arabic script), but the conversion algorithms normally will not use this style, when producing romanized text:
Al ḫubzu wa ɛl-nnabīḓu, lil-ṱṱaâmi. Al ḥamdu li-LLȧhi / li-Llȧhi.
In arabic script the text would have exactly these letters and doublings (but not the hyphens). Therefore this style must be accepted as grammatically correct and permissible. This might not be the most stylish way of writing romanized arabic, however. When the capitalized first letter of a proper noun is doubled, we would always have to choose if only the first letter is uppercase, or the first two letters. (See the two alternative spellings of lil-LLȧhi above.)
The two text samples above use the text style AR-RZ-P, which is a "separately written" text style. The "together-written" romanized text styles bind the definite article, prepositions and the word "and" together into one unbroken string of text, similarly as in typical text with arabic script. The small words are separated from each other with hyphens, however:
Al-ḫubzu wa-ɛn-nabīḓu, liṱ-ṱaâmi. Al-ḥamdu li-Lȧhi.
Now we have hyphens also in cases where the first letter of the next word is not doubled. But it is still easy to notice nearly all of the doubled letters, because the "together-written" romanized text styles change the L in the definite article into the same letter that the next word begins with, if this letter is doubled. (Automatic conversion algorithms should understand such text correctly, and convert the definite article back into "al" or "el", if such text is converted into some other text style.)
When romanized text styles are converted into arabic script, the conversion algorithm should automatically detect prepositions and other small words, which should be joined to the main word, and which might cause the first letter of the next word to be doubled. If the desired outcome differs from this assumption, any separately written romanized word can be forced to be joined to the next word in arabic script by using a thin space ( ) or four-per-em space ( ) instead of an ordinary space between the words (AR-BL: underscore _). Or if you want to prevent that a word, which looks like a common prefix in arabic language, will not be joined to the next word in arabic script, use two spaces between the prefix and the next word, or a three-per-em space ( ).
If you want to prevent that the text style conversion algorithms interpret a hyphen as an indicator of a doubled letter and of words that should be joined together in arabic script, use the Unicode character hyphen (‐) instead of the ordinary hypen-minus (-). These two characters look identical, but the conversion algorithms will recognize them as different Unicode characters. (Sorry for not using the term hyphen-minus everywhere in this document: whenever this standard speaks of a "hyphen", it actually means the Unicode character hyphen-minus, which is the ordinary hyphen that you probably get easily from your keyboard. Unicode character "hyphen" is a rare special character. Thanks to Unicode for using the popular term "hyphen" for a rare special character, and a rare special term "hyphen-minus" for the popular character.)
There is a risk, however, that some software (such as Notepad++) may show as a square ◻ the afore-mentioned Unicode characters: thin space, four-per-em space, three-per-em space, and hyphen.
In traditional "together-written" arabic script text this sample text looks like this:
ﹿاَلخُبزُ وَالنَّبِيذُ،‏ لِلطَّعَامِ.‏ ﹿاَلحَمدُ لِلّٰهِ.‏
The "separately written" arabic script text styles add a taṱwīl (ـ) between a preposition and the next word. The word "and" is separated from the next word with a six-per-em space ( ). With these modifications, the sample text looks like this:
ﹿاَلـخُبزُ وَ الـنَّبِيذُ،‏ لِلـطَّعَامِ.‏ ﹿاَلـحَمدُ لِـلّٰهِ.‏
Vowels in standard arabic text:
(This list does not include foreign vowels -- such as german ä, ö and ü, or scandinavian å -- or rare special scenarios in hebrew, such as vowel changes caused by disambiguated grammar. Foreign vowels and special scenarios are discussed elsewhere in this document.)
    VOWEL     ẮYN +
A   ONLY      VOWEL     COLLOQUIAL NAME          COLLOQUIAL IN ARABIC        FORMAL NAME IN ARABIC     EXAMPLES

     A a     # Ắ ắ      short A                  Ā qașīraɦ                   fatḥaɦ                    kataba, ắn
     Ả ả       ---      suffix AN                lāḥiqaŧủ ẠN                 fatḥatān                  laylaŧả saîdaŧả
     Ă ă       ---      masculine AN             ẠN muḓakkaraɦ               fatḥatān muḓakkaraɦ       šukră
     Ạ ạ       ---      alific A                 Ā ạlifiȳaɦ                  fatḥaɦ ạlifiȳaɦ           liạnna
   # Ḁ ḁ       ---      wawic A                  Ā wāwiȳaɦ                   fatḥaɦ wāwiȳaɦ            muḁqqataɦ
     Ȁ ȁ       ---      yaic A                   Ā yāȉȳaɦ                    fatḥaɦ yāȉȳaɦ             Ạndrīȁs
     AƐ aɛ   # ẮƐ ắɛ    shortened A              Ā maqșūraɦ                  fatḥaɦ maqșūraɦ           ịlaɛ
     Ą ą       ---      notable A                Ā malḥūżaɦ                  fatḥaɦ malḥūżaɦ           Kąnsąs
     Ȧ ȧ       ---      solemn A (dagger A)      Ā ǧalīlaɦ (Ā ḫanǧariȳaɦ)    ạlif ḫanǧariȳaɦ           hȧḓā
     -Ā -ā     Â â      long A                   Ā ṱawīlaɦ                   fatḥaɦ ṱawīlaɦ            māḓā, âlamủ
     Ā- ā-     ---      alific long A            Ā ṱawīlaɦ ạlifiȳaɦ          fatḥaɦ ṱawīlaɦ ạlifiȳaɦ   āḫaru
     Ã ã       ---      wawic long A             Ā ṱawīlaɦ wāwiȳaɦ           fatḥaɦ ṱawīlaɦ wāwiȳaɦ    
     Á á       ---      yaic long A              Ā ṱawīlaɦ yāȉȳaɦ            fatḥaɦ ṱawīlaɦ yāȉȳaɦ     
   # Ẵ ẵ     # Ẩ ẩ      supreme A                Ā ắżīmaɦ                    fatḥaɦ ắżīmaɦ             yẵ

    VOWEL     ẮYN +
I   ONLY      VOWEL     COLLOQUIAL NAME          COLLOQUIAL IN ARABIC        FORMAL NAME IN ARABIC     EXAMPLES

     I i     # Ḯ ḯ      short I                  Ī qașīraɦ                   kasraɦ                    min, ḯnda
     Ỉ ỉ       ---      suffix IN                lāḥiqaŧủ ỊN                 kasratān                  kulla waqtỉ
     Ị ị       ---      alific I                 Ī ạlifiȳaɦ                  kasraɦ ạlifiȳaɦ           ịllā
     Ḭ ḭ       ---      wawic I                  Ī wāwiȳaɦ                   kasraɦ wāwiȳaɦ            
     Ȉ ȉ       ---      yaic I                   Ī yāȉȳaɦ                    kasraɦ yāȉȳaɦ             ūlaȉka
     Į į       ---      notable I                Ī malḥūżaɦ                  kasraɦ malḥūżaɦ           Tąllįnn
     -Ī -ī     Î î      long I                   Ī ṱawīlaɦ                   kasraɦ ṱawīlaɦ            kabīr, yaîšu
     Ī- ī-     ---      alific long I            Ī ṱawīlaɦ ạlifiȳaɦ          kasraɦ ṱawīlaɦ ạlifiȳaɦ   īqāfu
     Ĩ ĩ       ---      wawic long I             Ī ṱawīlaɦ wāwiȳaɦ           kasraɦ ṱawīlaɦ wāwiȳaɦ    
     Í í       ---      yaic long I              Ī ṱawīlaɦ yāȉȳaɦ            kasraɦ ṱawīlaɦ yāȉȳaɦ     warāí

    VOWEL     ẮYN +
U   ONLY      VOWEL     COLLOQUIAL NAME          COLLOQUIAL IN ARABIC     FORMAL NAME IN ARABIC        EXAMPLES

     U u     # Ṹ ṹ      short U                  Ū qașīraɦ                   ḑammaɦ                    kuntu, ṹbūr
     Ủ ủ       ---      suffix UN                lāḥiqaŧủ ỤN                 ḑammatān                  baytủ
     Ụ ụ       ---      alific U                 Ū ạlifiȳaɦ                  ḑammaɦ ạlifiȳaɦ           ụmm
     Ṵ ṵ       ---      wawic U                  Ū wāwiȳaɦ                   ḑammaɦ wāwiȳaɦ            hȧṵlāi
     Ȕ ȕ       ---      yaic U                   Ū yāȉȳaɦ                    ḑammaɦ yāȉȳaɦ             
     Ų ų       ---      notable U                Ū malḥūżaɦ                  ḑammaɦ malḥūżaɦ           Frąnkfųrt
     -Ū -ū     Û û      long U                   Ū ṱawīlaɦ                   ḑammaɦ ṱawīlaɦ            qānūn, yaûdu (yaˁūdu)
     Ū- ū-     ---      alific long U            Ū ṱawīlaɦ ạlifiȳaɦ          ḑammaɦ ṱawīlaɦ ạlifiȳaɦ   ūbirā
     Ũ ũ       ---      wawic long U             Ū ṱawīlaɦ wāwiȳaɦ           ḑammaɦ ṱawīlaɦ wāwiȳaɦ    
     Ú ú       ---      yaic long U              Ū ṱawīlaɦ yāȉȳaɦ            ḑammaɦ ṱawīlaɦ yāȉȳaɦ     
     Ŭ ŭ       Ů ů      supreme U                Ū ắżīmaɦ                    ḑammaɦ ắżīmaɦ             qālŭ, samiů (samiˁŭ)

    VOWEL     ẮYN +
-   ONLY      VOWEL     COLLOQUIAL NAME          COLLOQUIAL IN ARABIC        FORMAL NAME IN ARABIC     EXAMPLES

     Ɛ ɛ       ---      wasla (silent letter)    wașlaɦ (ḥarfủ șāmitủ)       hamzaŧủ ɛl wașli          māȉdaŧủ ɛl matbaḫi
     Э э       ---      wawic hamzaɦ             hamzaɦ wāwiȳaɦ              hamzaɦ wāwiȳaɦ            
     Ẏ ẏ       ---      yaic hamzaɦ              hamzaɦ yāȉȳaɦ               hamzaɦ yāȉȳaɦ             
Wawic hamzaɦ = hamzaɦ riding a wāw, as the vowelless last letter of word. Yaic hamzaɦ = hamzaɦ riding a yẵ, as the vowelless last letter of word.
=> Modified glyphs in MITT fonts: Ắyn + short A = A a with horizontally reversed hook above. Wawic A = Ḁ ḁ => A a with tilde below.
The basic vowels in arabic language:
The short vowels in standard arabic are A (fatḥaɦ, َ ), I (kasraɦ, ِ ) and U (ḑammaɦ, ُ ). Long vowels with the most typical mater lectionis consonant are called "long A" (fatḥaɦ ṱawīlaɦ), "long I" (kasraɦ ṱawīlaɦ) and "long U" (ḑammaɦ ṱawīlaɦ). If vowel i follows a consonant that is doubled with a šaddaɦ (ّ ), all these text styles use an open kasratān ( ࣲ ) instead of an ordinary kasraɦ to indicate the vowel "i". This is done because some software and fonts have been witnessed to always change the combination of šaddaɦ and kasraɦ into šaddaɦ and fatḥaɦ -- showing the wrong vowel, "a". An open kasratān helps to avoid this problem, so that the reader will see at least the correct vowel, even if written in an unusual way.
Some foreign vowels and consonants:
Some of these text styles use in foreign names also other vowels than the three basic vowels of arabic language. Below is a guide for transliterating foreign proper nouns from latin script to arabic script literally, letter by letter -- regardless of the language, or how the word is pronounced. The text styles AR-NT-P, -C, -O and -A use the primary variant only, which is not in parentheses. Some ambiguity and taking the pronunciation into consideration is allowed in the other text styles, whose acceptable alternatives are listed in parentheses, using the most basic arabic alphabet only. Text styles AR-NT-P, -C, -O and -A do not use any mater lectionis in foreign proper nouns: letters ạlif, wāw or yẵ are added only when it is absolutely necessary technically. All other of these standards in arabic script use mater lectionis in foreign proper nouns quite extensively.
A a = َ
B b = ب
C c = چ‎ (ص or س or ك)
D d = د
E e = ٖ (ي ِ or ي َ)
F f = ف
G g = گ‎ ‎(ج)
H h = هـ
I i = ِ
J j = ې‎ (ي or ج)
K k = ك
L l = ل‬
M m = م
N n = ن
O o = ٚ ( و ُ or و َ)
P p = پ‎ (ب)
Q q = ق
R r = ر
S s = س‬
T t = ت‬
U u = ُ
V v = ڤ‎
W w = و
X x = ڳ‎ or ڱ‎ or ݢس‎ ‎(كس)
Y y = ي
Z z = ز (ز or ص)
Ä ä = ٞ ( َ)
Å å = ٗ ( َ)
Ö ö = ٛ ( ُ)
Ü ü = ٝ ( ُ)
AR-BL: Ä ä = ``A ``a, Å å = ^^A ^^a, Ö ö = ``O ``o, Ü ü = ``U ``u.
Unicode names of the vowel characters in this list: A = arabic fatha, E = arabic subscript alef, I = arabic kasra, O = arabic vowel sign small v above, U = arabic damma, Ä = arabic fatha with two dots, Å = arabic inverted damma, Ö = arabic vowel sign inverted small v above, Ü = arabic reversed damma.
Subscript ạlif can be a surprising (and misleading) symbol for representing the foreign vowel E. This was the only possible arabic vowel mark in Unicode standard (other than kasraɦ, kasratān and open kasratān, which are reserved for other purposes), which is printed under the consonant (not above it), is widely supported by fonts and software, can be reliably added on consonants (without the vowel often falling off the consonant, to ride on a dotted circle), is large enough to see easily (some small dots failed by this aspect), and is not visually too similar to the afore-mentioned other vowel marks below consonant. It would have been possible to use some rare arabic vowel mark, which is printed above the consonant, but this was deemed undesirable and misleading, because the foreign vowel E is typically assimilated to arabic vowel I, which is printed under the consonant.
Unicode fonts and software may not support adding any of these foreign vowels on an ạlif, wāw or yẵ, which has a hamzaɦ. This technical limitation is solved by defining that each of these non-arabic vowels (other than A, I or U) always includes a hamzaɦ, when the vowel is written on an ạlif. Wāw and yẵ are never used as carriers of hamzaɦ with foreign vowels.
ARA > LAT : SPECIAL CASES
ا > ᴵ ᶥ
أ‎ إ > ᵙ ᵓ
ء > ᶞ ˀ
The romanization of ạlif and hamzaɦ is handled here in one combined explanation, because the rules for writing these are complex and intertwined with each other in the arabic grammar. These romanized text styles define, how to write with latin characters each form that ạlif and hamzaɦ can have in arabic script, so that the romanized text is logically identical with the arabic script text, and can be converted back into arabic script so that the outcome will be exactly similar to the original text in arabic script. -- Modified glyphs in MITT fonts: ᵙ => ᵓ, ᵓ => positioned lower than usual, with its top edge level with the top edge of "e", ᶞ => ˀ, ˀ positioned lower than usual, with its top edge level with the top edge of "e".
=> ᴵ (uppercase), ᶥ (lowercase) = neutral ạlif ( ا ). (In this explanation the term "neutral" means that a letter in arabic script is without any diacritics: no hamzaɦ, no vowel, no sukūn, no šaddaɦ, no wașlaɦ, etc. Another term "vowelless" does not exclude the possibility that a sukūn might be present.) The most precise romanized text styles indicate a neutral ạlif with a diacritical mark over a vowel (e.g. marking the vowel as long), and practically never use these separate neutral ạlif characters (ᴵ ᶥ) in any typical scenario in arabic language. The simplified text styles AR-RZ-C, -S, -Z, -A, -I, -E, -U, which leave the case endings unwritten (-a, -an, -i, -in, -u, -un), use these separate characters to mark the additional neutral ạlif in masculine accusative case: ạbadă => ạbadᶥ, qāȉlă => qāȉlᶥ. The unvowelized text style AR-RZ-U prefers larger characters Ɨ ᵻ for neutral ạlif, but the smaller characters ᴵ ᶥ are supported too. AR-BL: \ /.
=> ˻ = no ạlif marker. This character prevents the automatic inclusion of ạlif before a word-initial or solitary vowel (according to the rule 2 above). If the romanized text "ī" is converted into arabic script, it produces ạlif + short I (kasraɦ) + yẵ. However, the romanized text "˻ī" produces only short I (riding on a space character) + yẵ.
=> ȣ (uppercase), ᴕ (lowercase) = vowelless ắyn + neutral ạlif, in simplified text styles, which leave the case endings unwritten, only indicating the presence of a neutral ạlif in masculine accusative case: ǦAMĪᶜĂ => ǦAMĪᶜᴵ => (ǦAMĪᴬ / ǦAMĪᴽ) => ǦAMĪȣ, ǧamīˁă => ǧamīˁᶥ => (ǧamīᵅ / ǧamīᵃ) => ǧamīᴕ. (The variants given in parentheses are supported alternative ways of writing. It is possible to write ȣ as ᴽ, ᴬ, ᶜᴵ or ỼƗ, and it is possible to write ᴕ as ᵃ, ᵅ, ˁᶥ or ɕᵻ. The unvowelized text style AR-RZ-U prefers the writing method ᶜƗ ˁᵻ. All other of these simplified text styles prefer the ligatures ȣ and ᴕ.)
=> ʖ (uppercase), ᶗ (lowercase) = vowelless hamzaɦ on line + neutral ạlif, in simplified text styles, which leave the case endings unwritten, only indicating the presence of a neutral ạlif in masculine accusative case: BIHĀĂ => BIHĀᶞᴵ => (BIHĀᶱ) => BIHĀʖ, bihāă => bihāˀᶥ => (bihāᵊ) => bihāᶗ. (The variants given in parentheses are supported alternative ways of writing. It is possible to write ʖ as ᶱ, ᶞᴵ or ɁƗ, and it is possible to write ᴕ as ᵊ, ˀᶥ or ɂᵻ. The unvowelized text style AR-RZ-U prefers the writing method ᶞƗ ˀᵻ. All other of these simplified text styles prefer the ligatures ʖ and ᶗ, because of their logical similarity to the preferred character Ɂ ɂ for hamzaɦ on line without a following neutral ạlif.) -- Modified glyph in MITT fonts: ˀ => positioned lower than usual, with its top edge level with the top edge of "e".
=> Ạlif without hamzaɦ: 1) Ā ā = A + neutral ạlif (long vowel A / k). AR-BL: A= a=. Theoretically these could be written as Ā = Aᴵ, ā = aᶥ, but these text styles never prefer to use the separate neutral ạlif characters, if avoiding them is possible. After a vowel or at the beginning of the word, this would become an ạlif maddaɦ ( ﺁ ) in arabic script: ạlif + long a written with maddaɦ, omitting the ạlif that should be after this vowel, because writing two consecutive ạlifs is not acceptable in arabic script. These romanized text styles do not indicate this aspect in any way, as it concerns the esthetical look of arabic script only. 2) As the first letter of a word: without a vowel (traditionally: with hamzaŧủ ɛl wașli) = Ɛ ɛ: ɛbnu, ɛl-Lȧh. AR-BL: # @. With an arabic basic vowel = A a, I i, U u: al, ibn, ustumiˁnā. (Note that at the end of a word AƐ aɛ means A + ạlif maqșūraɦ.) 3) See also at letter wāw: Ŭ ŭ, Ů ů, Ẃ ẃ.
=> Â â = ắyn + A + ạlif (ắyn + long vowel A). (This can also be written as ˁā, which is the form preferred by these text styles after a consonant, where the ắyn causes a clear glottal stop, while the form â is preferred at the beginning of a word and after a vowel, and in proper nouns possibly also after a consonant, for esthetic reasons.)
=> Ẵ ẵ = A + ạlif + vowelless hamzaɦ on line: yẵ, tẵ. (Theoretically this can also be written as āˀ or āɂ, but this is never preferred by these text styles. Unicode characters with suitable and quite similar diacritical marks are not available for vowels I, U or a short A. Therefore a vowelless hamzaɦ on line is written with character ˀ or ɂ after any other vowel than a long A.) => Modified glyphs in MITT fonts: Ẵ ẵ => Ā ā with hook below.
=> Ẩ ẩ = ắyn + A + ạlif + vowelless hamzaɦ on line. (This can also be written as ˁẵ, which is the form preferred by these text styles after a consonant, where the ắyn causes a clear glottal stop, except possibly in proper nouns, for esthetic reasons. Another theoretically possible form is ˁāˀ or ˁāɂ, but this is never preferred by these text styles.) => Modified glyphs in MITT fonts: Ẩ ẩ => move the hook below.
=> Ȧ ȧ = dagger ạlif (ạlif ḫanǧariȳaɦ): long vowel A written with a vertical line ( ٰ ) in arabic script, without a following ạlif, for historical reasons. Only in some words in modern language, such as: al-Lȧh, raḥmȧn, ḓȧlika, hȧṵlāi, hȧḓā, hȧkaḓā, lȧkin, samȧwāt, ẗalȧẗ. AR-BL: A+ a+.
=> Hamzaɦ on line: 1) As the last letter of word Ɂ (uppercase) ɂ (lowercase), in other positions ᶞ (uppercase) ˀ (lowercase): šayˀủ, badˀi, samāɂ. AR-BL: ;: :; in all positions (expect when unwritten). 2) Unwritten between two vowels. If a short basic arabic vowel (A, I or U) is written directly after any other vowel (which can also be long or a foreign vowel), a hamzaɦ on line exists between the vowels, unless a character between these vowels or a diactricial mark above the latter vowel indicates the presence of ắyn, ạlif, wāw or yẵ (the three last of these possibly carrying a hamzaɦ): ạšyāi, yuḑīu. (If the latter vowel is long or a foreign vowel, an ạlif with hamzaɦ exists between the vowels, not hamzaɦ on line, as is explained further below. In such circumstances, a hamzaɦ on line between the vowels would be indicated with character ᶞ ˀ: ịḥșāˀāt.)
=> Hamzaɦ riding a wāw: 1) ᵋ (uppercase), ᶟ (lowercase): ruᶟyaɦ. AR-BL: ;W ;w in all circumstances. 2) If after a vowel (not after a consonant, as a clear glottal stop), and followed by a long or short basic vowel, use a diacritical mark over the vowel, instead of a separately written character ᶟ. Short basic vowels: ᵋA => Ḁ, ᶟa => ḁ, [ᵋE => Ḛ], [ᶟe => ḛ], ᵋI => Ḭ, ᶟi => ḭ, [ᵋO => ƅ], [ᶟo => ʚ], ᵋU => Ṵ, ᶟu => ṵ: hȧṵlāi, muḁqqataŧỉ. Long basic vowels: ᵋĀ => Ã, ᶟā => ã, [ᵋĒ => Ẽ], [ᶟē => ẽ], ᵋĪ => Ĩ, ᶟī => ĩ, [ᵋŌ => Õ], [ᶟō => õ], ᵋŪ => Ũ, ᶟū => ũ. (Both of these ways of writing are permissible, and they behave similarly if text is converted into arabic script. All of these text styles recommend avoiding a separately written ᶟ, if an alternative way of writing with diacritical marks is available, except if the hamzaɦ is a clear glottal stop after a consonant. The variants above in square brackets [ ] might never happen in standard arabic text, but they are nevertheless defined and supported by these text styles.) 3) As the vowelless last letter of word: ᵋ ᶟ or Э ɜ. AR-BL: ;E ;e. => Modified glyphs in MITT fonts: Ḁ ḁ => A a with tilde below, ƅ ʚ => O o with tilde below.
=> Hamzaɦ riding a yẵ: 1) ᵉ (uppercase), ᵊ (lowercase). AR-BL: ;Y ;y in all circumstances. 2) If after a vowel (not after a consonant, as a clear glottal stop), and followed by a long or short basic vowel, use a diacritical mark over the vowel, instead of a separately written character ᵊ. Short basic vowels: ᵉA => Ȁ, ᵊa => ȁ, [ᵉE => Ȅ], [ᵊe => ȅ], ᵉI => Ȉ, ᵊi => ȉ, [ᵉO => Ȍ], [ᵊo => ȍ], ᵉU => Ȕ, ᵊu => ȕ: Ạndrīȁs, malāȉkaŧa. Long basic vowels: ᵉĀ => Á, ᵊā => á, [ᵉĒ => É], [ᵊē => é], ᵉĪ => Í, ᵊī => í, [ᵉŌ => Ó], [ᵊō => ó], ᵉŪ => Ú, ᵊū => ú: Ịsrāíl, warāí. (Both of these ways of writing are permissible, and they behave similarly if text is converted into arabic script. All of these text styles recommend avoiding a separately written ᵊ, if an alternative way of writing with diacritical marks is available, except if the hamzaɦ is a clear glottal stop after a consonant.) 3) As the vowelless last letter of word: Ẏ ẏ.
=> Ạlif with hamzaɦ below: 1) As the first letter of a word, or after a vowel: with short I vowel = Ị ị: ịḓā, ịlayhi. AR-BL: ,I .i. With short foreign E vowel = Ẹ ẹ. AR-BL: ,E .e. Long I vowel (Ī ī) and foreign vowels of any length similar to I in customary arabic transliteration (except not an ordinary E) would always include a hamzaɦ below in these circumstances, without any diacritical mark indicating it: Īliȳąhų, not not: el ebnu. 2) ᵙI ᵓi, ᵙE ᵓe as a glottal stop after a vowelless consonant. AR-BL: ,I .i, ,E .e. (Writing method 2 is permissible also in scenario 1, and the writing method Ị ị, Ẹ ẹ is permissible in scenario 2, but these text styles prefer to use the characters ᵙ ᵓ only in the case of a clear glottal stop after a consonant, though perhaps not in proper nouns, in which it might be preferable to avoid any such punctuation for esthetical reasons.) 3) The unvowelized text style AR-RZ-U uses the character Ị ị (which would usually include the basic arabic vowel I, but theoretically might instead include a foreign vowel quite similar to I). AR-BL: I i.
=> Ạlif with hamzaɦ above, without vowel: 1) As a glottal stop, with sukūn in the standardized arabic grammar = ᵙ (uppercase), ᵓ (lowercase): yaᵓtī. AR-BL: , (uppercase), . (lowercase). (These text styles never use a sukūn in these circumstances, though.) 2) The unvowelized text style AR-RZ-U uses character Ɵ ɵ (which can include a sukūn, A, U, or a foreign vowel quite similar to these, such as Ü): ɵḫthᵻ = ụḫtahā, ᵻlmrɵɦ = ɛl marᵓaŧu. AR-BL: ; :.
=> Ạlif with hamzaɦ above, with vowel A: 1) As the first letter of a word, or after a vowel: with short A vowel = Ạ ạ: ạwlādủ, liạnna. AR-BL: ,A .a. Long A vowel (Ā ā) and foreign vowels of any length similar to A in customary arabic transliteration (such as Ä) always include a hamzaɦ above in these circumstances, without any diacritical mark indicating it: ātiyă, raāhumā. 2) ᵙA ᵓa, as a glottal stop after a vowelless consonant: yasᵓalūhu. AR-BL: ,A .a. (Writing method 2 is permissible also in scenario 1, and the writing method Ạ ạ is permissible in scenario 2, but these text styles prefer to use the characters ᵙ ᵓ only in the case of a clear glottal stop after a consonant, though perhaps not in proper nouns, in which it might be preferable to avoid any such punctuation for esthetical reasons.)
=> Ạlif with hamzaɦ above, with vowel U: 1) As the first letter of a word, or after a vowel: with short U vowel = Ụ ụ: ụmm, ụḫt. AR-BL: ,U .u. Long U vowel (Ū ū) and foreign vowels of any length similar to U in customary arabic transliteration (such as Ü, but not an ordinary O) would always include a hamzaɦ above in these circumstances, without any diacritical mark indicating it. 2) ᵙU ᵓu, as a glottal stop after a vowelless consonant. AR-BL: ,U .u. (Writing method 2 is permissible also in scenario 1, and the writing method Ụ ụ is permissible in scenario 2, but these text styles prefer to use the characters ᵙ ᵓ only in the case of a clear glottal stop after a consonant, though perhaps not in proper nouns, in which it might be preferable to avoid any such punctuation for esthetical reasons.)
=> Ạlif with hamzaɦ above, with foreign vowel O: 1) As the first letter of a word, or after a vowel: with short O vowel = Ọ ọ. AR-BL: ,O .o. Long O vowel (Ō ō) would always include a hamzaɦ above in these circumstances, without any diacritical mark indicating it. 2) ᵙO ᵓo, as a glottal stop after a vowelless consonant. AR-BL: ,O .o. (Writing method 2 is permissible also in scenario 1, and the writing method Ọ ọ is permissible in scenario 2, but these text styles prefer to use the characters ᵙ ᵓ only in the case of a clear glottal stop after a consonant, though perhaps not in proper nouns, in which it might be preferable to avoid any such punctuation for esthetical reasons.)
=> AƐ aɛ = A + ạlif maqșūraɦ at the end of a word ( ﻰ َ ), which looks like yẵ without dots below: ịlaɛ. Supported alternative ways of writing, which are not used by the text style conversion algorithms: AE ae, or Æ æ. (AƐ aɛ will always be interpreted as A + ạlif maqșūraɦ, but AE ae only if these are the last letters in a word.) The unvowelized text style AR-RZ-U prefers to write ạlif maqșūraɦ as Ɛ ɛ, but supports also the characters E e and Æ æ. AR-BL: A# a@.
=> Ą ą (A with ogonek) = grammatically or traditionally short vowel A, which is advised to be written with mater lectionis ạlif in vowelless modern arabic. AR-BL: A~ a~. In AR-NT-P/D/T/C/O/L/G/A: ٱ َ (fatḥaɦ + arabic letter alef wasla). These text styles use wașlaɦ for this purpose only: to indicate modern mater lectionis on short vowel A.
=> ₐ (subscript small A) after any vowel indicates that ạlif should be used as mater lectionis here, if the text will be converted into arabic script. Aₐ aₐ is always written as Ą ą in these text styles, but foreign vowels similar to A do not have a Unicode character with ogonek (small hook below) available: they would get a subscript A instead, if it is deemed necessary to instruct the text style conversion algorithms to certainly use mater lectionis ạlif after the foreign vowel: Näₐslund, Gäₐvle. AR-BL: A_ a_. Also supported is character ˼ after vowel A (which can have various diacrirics): Nä˼slund, Gä˼vle. AR-BL: _.
ب > B b
ت‬ > T t
ث > Ṫ ẗ AR-BL: `T `t (double = `T`T `t`t). -- Modified glyph in MITT fonts: Ṫ => T with umlaut above.
ج > Ǧ ǧ AR-BL: ^G ^g (double = ^G^G ^g^g).
ح‬ > Ḥ ḥ AR-BL: ^H ^h (double = ^H^H ^h^h).
خ > Ḫ ḫ AR-BL: ^K ^k (double = ^K^K ^k^k).
د > D d
ذ > Ḓ ḓ AR-BL: ^D ^d (double = ^D^D ^d^d).
ر > R r
ز > Z z
س‬ > S s => Ṡ ṡ alternative character, which indicates that the word root has originally included the archaic letter "ṡin", not "samekh".
ش‬ > Š š AR-BL: ^S ^s (double = ^S^S ^s^s).
ص‬ > Ș ș AR-BL: `S `s (double = `S`S `s`s).
ض > Ḑ ḑ AR-BL: `D `d (double = `D`D `d`d).
ط > Ṱ ṱ AR-BL: ^T ^t (double = ^T^T ^t^t).
ظ > Ẓ ẓ AR-BL: `Z `z (double = `Z`Z `z`z). => modified glyphs in MITT fonts: Ẓ ẓ => Z z with comma below
ع > ᶜ ˁ Ắyn is written with character ᶜ (uppercase) or ˁ (lowercase): 1) when it is a clear glottal stop before or after a consonant: yaˁrifu, ịǧˁalŭ. 2) When ắyn is between two I vowels, which can be uppercase or lowercase (because in sans-serif fonts these vowels are very narrow and tightly packed, and having visually complex accent marks on the tightly packed letters can look unclear and inconvenient): ǧamīḯ => ǧamīˁi, mawqiḯ => mawqiˁi. 3) When ắyn is before such a vowel, for which a diacritical mark is not available for indicating the presence of ắyn: ǧamīˁă. 4) When ắyn is a vowelless last letter of word. In this case a larger character Ỽ ɕ can be used, but the smaller characters ᶜ ˁ are also supported as the last character in word: ǦAMĪᶜ => ǦAMĪỼ, ǧamīˁ => ǧamīɕ. 5) Theoretically it is possible to write ắyn as ᶜ ˁ or Ỽ ɕ in all other scenarios too, but it is not preferred by these text styles. In AR-BL, ắyn is always written as ** *, in all circumstances. -- Modified glyph in MITT fonts: ˁ => ᶜ positioned lower than usual, with its top edge level with the top edge of "e".
=> ʿ = Horizontally reversed hook above a vowel indicates the presence of ắyn before a short basic vowel: ᶜA => Ắ, ˁa => ắ, ᶜI => Ḯ, ˁi => ḯ, ᶜU => Ṹ, ˁu => ṹ: ḯnda, ǧamīṹ, Ịšiắyā. This style of writing is used at the beginning of a word, after a vowel, and in proper nouns possibly in other scenarios too, as it might be preferable to avoid any punctuation in the names of persons, for esthetical reasons. -- Modified glyphs in MITT fonts: ʿ => horizontally reversed hook accent mark. Ắ ắ => A a with horizontally reversed hook above. [Ḗ ḗ => E e with horizontally reversed hook above.] Ḯ ḯ => I i with horizontally reversed hook above. [Ṓ ṓ => O o with horizontally reversed hook above.] Ṹ ṹ => U u with horizontally reversed hook above.
=> ^ (circumflex above a vowel) indicates the presence of ắyn before a long basic vowel: ᶜĀ => Â, ˁā => â, ᶜĪ => Î, ˁī => î, ᶜŪ => Û, ˁū => û: âlamu, baîdă, Šįmûn. This style of writing is used at the beginning of a word, after a vowel, and in proper nouns possibly in other scenarios too, as it might be preferable to avoid any punctuation in the names of persons, for esthetical reasons. (Ắyn is never written with circumflex as a separate character ^, and the conversion algorithms would not interpret such a character as ắyn. The separately written character for ắyn is ᶜ ˁ.)
غ > Ġ ġ AR-BL: `G `g (double = `G`G `g`g).
ف > F f
ق > Q q
ك‬ > K k
ل‬ > L l
م > M m
ن > N n Other written characters that include the pronounced sound N:
-- Ả ả = nunated A (fatḥatān, ً ), pronounced -an. AR-BL: A` a` (as the last characters in a word).
-- Ỉ ỉ = nunated I (kasratān, ٍ ), pronounced -in. AR-BL: I` i` (as the last characters in a word).
-- Ủ ủ = nunated U (ḑammatān, ٌ ), pronounced -un. AR-BL: U` u` (as the last characters in a word).
-- Ă ă = fatḥatān + neutral ạlif, the masculine accusative marker, pronounced -an (= ảᶥ). AR-BL: A^ a^ (as the last characters in a word).
هـ > H h => Ҥ ɦ = tẵ marbūṱaɦ ( ة ), marker of feminine noun ending. This romanized form is used when tẵ marbūṱaɦ is not followed by any vowel (which is typical in the casual, simple and unvowelized romanized text styles). AR-BL: H| h|. -- Modified glyph in MITT fonts: Ҥ => "H", whose right vertical pillar has the shape of "ſ" (latin small letter long S). => Ŧ ŧ = tẵ marbūṱaɦ, an alternative romanized character, which is used when there is a vowel after tẵ marbūṱaɦ (as is typical in other romanized text styles than casual, simple or unvowelized). AR-BL: T| t|.
و > W w => Ū ū = U + wāw (long vowel U). AR-BL: U= u=. Theoretically could also be written UW uw, but these text styles never prefer this. => Û û = ắyn + U + wāw (ắyn + long vowel U). (This can also be written as ᶜŪ ˁū, which is the form preferred by these text styles after a consonant, where the ắyn causes a clear glottal stop, while the form Û û is preferred at the beginning of a word and after a vowel, and in proper nouns possibly also after a consonant, for esthetic reasons.) => Ŭ ŭ = U + wāw + neutral ạlif, in masculine plural verb suffixes. (Theoretically this could be written as Ūᴵ ūᶥ, but these text styles never prefer this.) => Ů ů = ắyn + U + wāw + neutral ạlif, in masculine plural verb suffixes, an alternative way to write ᶜŬ ˁŭ (or theoretically, ᶜŪᴵ ˁūᶥ). The form Ů ů is used in the "separately written" text styles only, when the ắyn is not after consonant as a clear glottal stop. => Ẃ ẃ = vowelless wāw + neutral ạlif: raᵓaẃ. (Theoretically this could be written as Wᴵ wᶥ, but these text styles never prefer this.) => Ō ō = foreign O + wāw (long foreign O). AR-BL: O= o=. => Ô ô = ắyn + foreign O + wāw (ắyn + long foreign O). (This can also be written as ᶜŌ ˁō, which is the form preferred by these text styles after a consonant, where the ắyn causes a clear glottal stop, while the form Ô ô is preferred at the beginning of a word and after a vowel, and in proper nouns possibly also after a consonant, for esthetic reasons.) => Ǫ ǫ (O with ogonek) = short foreign O, which is advised to be written with mater lectionis wāw in arabic script. AR-BL: O~ o~. In AR-NT-P/D/T/C/O/L/G/A: وۢ ٚ (arabic vowel sign small v above + wāw + arabic small high meem isolated form). => Ų ų (U with ogonek) = grammatically or traditionally short U, which is advised to be written with mater lectionis wāw in arabic script. AR-BL: U~ u~. In AR-NT-P/D/T/C/O/L/G/A: وۢ ُ (ḑammaɦ + wāw + arabic small high meem isolated form). => ᵤ (subscript small U) or ₒ (subscript small O) after any vowel indicates that wāw should be used as mater lectionis here, if the text will be converted into arabic script. Uᵤ uᵤ is always written as Ų ų and Oₒ oₒ (or Oᵤ oᵤ) is always written as Ǫ ǫ in these text styles. Foreign vowels similar to U or O do not have a Unicode character with ogonek (small hook below) available: they would get a subscript U or O instead, if it is deemed necessary to instruct the text style conversion algorithms to certainly use mater lectionis wāw after the foreign vowel: Lüᵤbeck, Gröₒningen. AR-BL: U_ u_ / O_ o_. Also supported is character ˼ after vowel O or U (which can have various diacrirics): Lü˼beck, Grö˼ningen. AR-BL: _. => Ŵ ŵ = the default way to write "ww": bawwābaɦ => baŵābaɦ, al ạwwalu => al ạŵalu.
ي > Y y => Ī ī = I + yẵ (long i). AR-BL: I= i=. Theoretically could also be written IY iy, but these text styles never prefer this. => Î î = ắyn + I + yẵ (ắyn + long vowel I). (This can also be written as ᶜĪ ˁī, which is the form preferred by these text styles after a consonant, where the ắyn causes a clear glottal stop, while the form Î î is preferred at the beginning of a word and after a vowel, and in proper nouns possibly also after a consonant, for esthetic reasons.) => Ē ē = foreign E + yẵ (long foreign E). AR-BL: E= e=. => Ê ê = ắyn + foreign E + yẵ (ắyn + long foreign E). (This can also be written as ᶜĒ ˁē, which is the form preferred by these text styles after a consonant, where the ắyn causes a clear glottal stop, while the form Ê ê is preferred at the beginning of a word and after a vowel, and in proper nouns possibly also after a consonant, for esthetic reasons.) => Ę ę (E with ogonek) = short foreign "e", which is advised to be written with mater lectionis yẵ in arabic script. AR-BL: E~ e~. In AR-NT-P/D/T/C/O/L/G/A: يۢ ٖ (arabic subscript alef + yẵ + arabic small high meem isolated form). => Į į (I with ogonek) = grammatically or traditionally short I, which is advised to be written with yẵ in arabic script. AR-BL: I~ i~. In AR-NT-P/D/T/C/O/L/G/A: يۢ ِ (kasraɦ + yẵ + arabic small high meem isolated form). => ᵢ (subscript small I) or ₑ (subscript small E) after any vowel indicates that yẵ should be used as mater lectionis here, if the text will be converted into arabic script. Iᵢ iᵢ is always written as Į į and Eₑ eₑ (or Eᵢ eᵢ) is always written as Ę ę in these text styles. Foreign vowels similar to I or E do not have a Unicode character with ogonek (small hook below) available: they would get a subscript I or E instead, if it is deemed necessary to instruct the text style conversion algorithms to certainly use mater lectionis yẵ after the foreign vowel: Doïᵢna, Eₑkeberg. AR-BL: I_ i_ / E_ e_. Also supported is character ˼ after vowel E or I (which can have various diacrirics): Doï˼na, E˼keberg. AR-BL: _. => Ȳ ȳ = the default way to write "yy": barriyyaɦ => barriȳaɦ, ḥaqīqiyyu => ḥaqīqiȳu, samāwiyyi => samāwiȳi, dīniyyīna <= dīniȳīna. This abbreviation is used in the "separately written" text styles only. Not indicated in AR-BL.
MITT AR-RZ/NT 0.95 -- Standard for romanized and native-script text styles for arabic language. Ion Mittler, 10 march 2025. Released in the public domain under CC0-1.0 license (Creative Commons 0 version 1.0). http://creativecommons.org/publicdomain/ zero/1.0/
Modern International Text Types — mitt.fi
Keyword variants for search engines: The standard MITTARRZNT (MITTARRZ / MITTARNT) defines the text styles MITT AR-RZ-P [MITTARRZP], MITT AR-RZ-D [MITTARRZD], MITT AR-RZ-T [MITTARRZT], MITT AR-RZ-C [MITTARRZC], MITT AR-RZ-S [MITTARRZS], MITT AR-RZ-Z [MITTARRZZ], MITT AR-RZ-O [MITTARRZO], MITT AR-RZ-L [MITTARRZL], MITT AR-RZ-G [MITTARRZG], MITT AR-RZ-A [MITTARRZA], MITT AR-RZ-I [MITTARRZI], MITT AR-RZ-E [MITTARRZE], MITT AR-RZ-U [MITTARRZU], MITT AR-NT-P [MITTARNTP], MITT AR-NT-D [MITTARNTD], MITT AR-NT-T [MITTARNTT], MITT AR-NT-C [MITTARNTC], MITT AR-NT-S [MITTARNTS], MITT AR-NT-Z [MITTARNTZ], MITT AR-NT-O [MITTARNTO], MITT AR-NT-L [MITTARNTL], MITT AR-NT-G [MITTARNTG], MITT AR-NT-A [MITTARNTA], MITT AR-NT-I [MITTARNTI], MITT AR-NT-E [MITTARNTE] and MITT AR-NT-U [MITTARNTU].