Harman Patil (Editor)

Phonetic symbols in Unicode

Updated on
Edit
Like
Comment
Share on FacebookTweet on TwitterShare on LinkedInShare on Reddit
Phonetic symbols in Unicode

Unicode supports several phonetic scripts and notations through the existing writing systems and the addition of extra blocks with phonetic characters. These phonetic extras are derived of an existing script, usually Latin, Greek or Cyrillic. In Unicode there is no "IPA script". Apart from IPA, extensions to the IPA and obsolete and nonstandard IPA symbols, these blocks also contain characters from the Uralic Phonetic Alphabet and the Americanist Phonetic Alphabet.

Contents

Phonetic scripts

The International Phonetic Alphabet (IPA) makes use of letters from other writing systems as most phonetic scripts do. IPA notably uses Latin, Greek and Cyrillic characters. Combining diacritics also adds meaning to the phonetic text. Finally, these phonetic alphabets make use of modifier letters, that are specially constructed for the phonetic meaning. A "modifier letter" is strictly intended not as an independent grapheme but as a modification of the preceding character resulting in a distinct grapheme, notably in the context of the International Phonetic Alphabet. For example, ʰ should not occur on its own but modifies the preceding or following symbol. Thus, is a single IPA symbol, distinct from t. In practice, however, several of these "modifier letters" are also used as full graphemes, e.g. ʿ as transliterating Semitic ayin or Hawaiian okina, or ˚ transliterating Abkhaz ә.

Consonants

The following tables indicates the Unicode code point sequences for phonemes as used in the International Phonetic Alphabet. A bold code point indicates that the Unicode chart provides an application note such as "voiced retroflex lateral" for U+026D ɭ LATIN SMALL LETTER L WITH RETROFLEX HOOK (HTML ɭ). An entry in bold italics indicates the character name itself refers to a phoneme such as U+0298 ʘ LATIN LETTER BILABIAL CLICK (HTML ʘ)

Vowels

The following figures depict the phonetic vowels and their Unicode / UCS code points. Vowels appearing in pairs in the figure to the right indicate rounded and unrounded variations respectively. Again, characters with Unicode names referring to phonemes are indicated by bold text. Those with explicit application notes are indicated by bold italic text. Those from borrowed unchanged from another script (Latin, Greek or Cyrillic) are indicated by italics.

Unicode blocks

  • Basic Latin (0020–007E), IPA example: Open front unrounded vowel (0061)
  • Latin-1 Supplement (00A0–00FF), IPA example: Near-open front unrounded vowel (00E6)
  • Latin Extended-A (0100–017F), IPA example: Voiceless pharyngeal fricative (0127)
  • Latin Extended-B (0180–024F), IPA example: Tenuis dental click (01C0 0287)
  • IPA Extensions (0250–02AF), IPA example: Near-open central vowel (0250)
  • Spacing Modifier Letters (02B0–02FF), IPA example: Palatal ejective (0063 02BC)
  • Combining Diacritical Marks (0300–036F), IPA example: Near-close central unrounded vowel (026A 0308)
  • Greek and Coptic (0370–03FF), IPA example: Voiced bilabial fricative (03B2)
  • Latin Extended-C (2C60–2C7F), IPA example: Labiodental flap (2C71)
  • Phonetic Extensions (1D00–1D7F)
  • Phonetic Extensions Supplement (1D80–1DBF)
  • Superscripts and Subscripts (2070–209F)
  • Modifier Tone Letters (A700–A71F)
  • From Unicode blocks to scripts

    Phonetical scripts are encoded in six Unicode blocks.

    Spacing Modifier Letters (U+02B0–02FF)

    The characters in the "Spacing Modifier Letters" block are intended as forming a unity with the preceding letter (which they "modify"). E.g. the character U+02B0 ʰ MODIFIER LETTER SMALL H isn't intended simply as a superscript h (h), but as the mark of aspiration placed after the letter being aspirated, as in "aspirated voiceless bilabial plosive". The block contains:

  • Latin superscript modifier letters: (U+02B0–U+02B8): ʰ aspiration; ʱ breathy voice, murmured; ʲ palatalization; ʳ, ʴ, ʵ, ʶ r-coloring or r-offglides; ʷ labialization; ʸ palatalization, Americanist usage for U+02B2
  • Miscellaneous phonetic modifiers: (U+02B9–U+02D7): ʹ ʺ ʻ ʼ ʽ ʾ ʿ ˀ ˁ ˂ ˃ ˄ ˅ ˆ ˇ ˈ ˉ ˊ ˋ ˌ ˍ ˎ ˏ ː ˑ ˒ ˓ ˔ ˕ ˖ ˗
  • Spacing clones of diacritics: (U+02D8–U+02DD): ˘ breve; ˙ dot above; ˚ ring above; ˛ ogonek; ˜ small tilde; ˝ double acute accent
  • Additions based on 1989 IPA: (U+02DE–U+02E4): ˞ ˟ ˠ ˡ ˢ ˣ ˤ
  • Tone letters: (U+02E5–U+02E9): ˥ ˦ ˧ ˨ ˩
  • Extended Bopomofo tone marks: U+02EA ˪ yin departing tone mark; U+02EB ˫ yang departing tone mark
  • IPA modifiers: U+02EC ˬ MODIFIER LETTER VOICING, unaspirated
  • Other modifier letters: U+02EE ˮ modifier letter double apostrophe for Nenets
  • Uralic Phonetic Alphabet (UPA) modifiers: (U+02EF–U+02FF): ˯ ˰ ˱ ˲ ˳ ˴ ˵ ˶ ˷ ˸ ˹ ˺ ˻ ˼ ˽ ˾ ˿
  • Phonetic Extensions (U+1D00–1D7F)

    This block, together with Phonetic Extensions Supplement below, contains:

  • Small capitals "ɢ ɪ ɴ ɶ ʀ ʏ ʙ ʜ ʟ"
  • Turned small letters "ɐ ɥ ɯ ɹ ɺ ɻ ʇ ʌ ʍ ʎ ʞ ʮ ʯ"
  • Extra small capitals "ʁ ʛ ᴀ ᴁ ᴃ ᴄ ᴅ ᴆ ᴇ ᴊ ᴋ ᴌ ᴍ ᴎ ᴏ ᴐ ᴘ ᴙ ᴚ ᴛ ᴜ ᴠ ᴡ ᴢ ᴣ ᴦ ᴧ ᴨ ᴩ ᴪ"
  • Letters with palatal hooks "ƫ ᶀ ᶁ ᶂ ᶃ ᶄ ᶅ ᶆ ᶇ ᶈ ᶉ ᶊ ᶋ ᶌ ᶍ ᶎ ᶪ ᶵ"
  • Letters with retroflex hooks "ᶏ ᶐ ᶒ ᶓ ᶔ ᶕ ᶖ ᶗ ᶘ ᶙ ᶚ ᶩ ᶯ ᶼ"
  • Fonts support for IPA

    IPA font support is increasing, and is now included in several fonts such as the Times New Roman versions that come with various recent computer operating systems. Diacritics are not always properly rendered, however. IPA fonts that are freely available online include Gentium, several from the SIL (such as Charis SIL, and Doulos SIL), DejaVu Sans, and TITUS Cyberbit, which are all freely available; as well as commercial typefaces such as Brill, available from Brill Publishers, and Lucida Sans Unicode and Arial Unicode MS, shipping with various Microsoft products. These all include several ranges of characters in addition to the IPA. Modern Web browsers generally do not need any configuration to display these symbols, provided that a font capable of doing so is available to the operating system.

    Input by selection from a screen

    Further Information: Unicode input#Selection from a screen

    Many systems provide a way to select Unicode characters visually. ISO 14755 refers to this as a screen-selection entry method.

    Microsoft Windows has provided a Unicode version of the Character Map program (find it by hitting ⊞ Win+R then type charmap then hit ↵ Enter) since version NT 4.0 – appearing in the consumer edition since XP. This is limited to characters in the Basic Multilingual Plane (BMP). Characters are searchable by Unicode character name, and the table can be limited to a particular code block. More advanced third-party tools of the same type are also available (a notable freeware example is BabelMap).

    macOS provides a "character palette" with much the same functionality, along with searching by related characters, glyph tables in a font, etc. It can be enabled in the input menu in the menu bar under System Preferences → International → Input Menu (or System Preferences → Language and Text → Input Sources) or can be viewed under Edit → Emoji & Symbols in many programs.

    Equivalent tools – such as gucharmap (GNOME) or kcharselect (KDE) – exist on most Linux desktop environments.

    References

    Phonetic symbols in Unicode Wikipedia