A Unicode font (also known as UCS font and Unicode typeface) is a computer font that contains a wide range of characters, letters, digits, glyphs, symbols, ideograms, logograms, etc., which are collectively mapped into the standard Universal Character Set, derived from many different languages and scripts from around the world. Unlike most conventional computer fonts, which are specific to a particular language or legacy character encoding and contain only a small subset of the UCS characters, these fonts attempt to include many thousands of possible glyphs, so that they can be used as a single typeface across multilingual documents.
Contents
Background
The Unicode standard does not specify or create any font (typeface), a collection of graphical shapes called glyphs, itself. Rather, it defines the abstract characters as a specific number (known as a codepoint) and also defines the required changes of shape depending on the context the glyph is used in (e.g., combining characters, precomposed characters and letter-diacritic combinations). The choice of font, which governs how the abstract characters in the Universal Coded Character Set (UCS) are converted into a bitmap or vector output that can then be viewed on a screen or printed, is left up to the user. If a font is chosen which does not contain a glyph for a codepoint used in the document, typically a question mark, a box, or some other substitute character is displayed.
Computer fonts use various techniques to display characters or glyphs. A bitmap font contains a grid of dots known as pixels forming an image of each glyph in each face and size. Outline fonts (also known as vector fonts) use drawing instructions or mathematical formulæ to describe each glyph. Stroke fonts use a series of specified lines (for the glyph's border) and additional information to define the profile, or size and shape of the line in a specific face and size, which together describe the appearance of the glyph.
Many fonts have kerning pairs which implements better spacing in between the characters. Fonts also include embedded special orthographic rules to output certain combinations of letterforms (an alternative symbols for the same letter) be combined into special ligature forms (mixed characters). Operating systems, web browsers (user agent), and other software that extensively use typography, utilise a font to display text on the screen or print media, and can be programmed to use those embedded rules. Alternatively, they may use external script-shaping technologies (rendering technology or “smart font” engine), and they can also be programmed to use either a large Unicode font, or use multiple different fonts for different characters or languages.
No single "Unicode font" includes all the characters defined in the present revision of ISO 10646 (Unicode) standard, as more and more languages and characters are continually added to it. As a result, font developers and foundries are also incorporating those new characters in newer versions or revisions of a font, and correcting their previous errors, if there were any.
UCS has over 1.1 million code points, but only the first 65,536 (the Plane 0: Basic Multilingual Plane, or BMP) had entered into common use before 2000.
See the Unicode planes article for more information on other planes, including: Plane 1: Supplementary Multilingual Plane (SMP), Plane 2: Supplementary Ideographic Plane (SIP), Plane 14: Supplementary Special-purpose Plane (SSP), Plane 15 and 16: reserved for Private Use Areas (PUA).The first Unicode fonts (with very large character set, and supporting many Unicode blocks) were Lucida Sans Unicode (released March 1993), Unihan font (1993), and Everson Mono (1995).
Issues
There are typographical ambiguities in Unicode, so that some of the unified Han characters (seen in Chinese, Japanese, and Korean) will be typographically different in different regions. For example, Unicode point U+9AA8 骨 is typographically different between simplified Chinese and traditional Chinese. This has implications for the idea that a single typeface can satisfy the needs of all locales. The design of Unicode ensures that such differences do not create semantic ambiguity, but the use of incorrect forms is often considered visually awkward or aesthetically inappropriate to native readers of East Asian languages.
Application of Unicode fonts
Unicode is now the standard encoding for many new standards and protocols, and is built into the architecture of operating systems (Microsoft Windows, Apple Mac OS X, and many versions of Unix and Linux), programming languages (Ada, Perl, Python, Java, Common LISP, APL), and libraries (IBM International Components for Unicode (ICU), along with the Pango, Graphite, Scribe, Uniscribe, and ATSUI rendering engines), font formats (TrueType and OpenType) and so on. Many other standards are also getting upgraded to be Unicode-compliant.
Utility software
Here is a selection of some of the utility software which can identify the characters present in a font file:
List of Unicode fonts
Of the many Unicode fonts available, those listed below are the most commonly used worldwide on mainstream computing platforms.
Comparison of fonts
Number of characters included by the above version of fonts, for different Unicode blocks are listed below. Basic Latin (128: 0000–007F) means that in the range called 'Basic Latin', there are 128 assigned codes, numbered 0 to 7F. The cells then show the number of those codes which are covered by each font. Unicode blocks listed are valid for Unicode version 8.0.
Cells shaded green indicate complete coverage. Cells shaded blue are not complete, but are the most complete of the fonts listed. Empty cells indicate that no character exists in that block.List of SMP Unicode fonts
Of the many Unicode fonts rich with a significant amount of SMP characters, the few listed below are the most commonly used by a majority of users around the world on mainstream computing platforms. Please also consult the above list of fonts, some of which also contain a vast amount of SMP characters.
10000–1D7FF
Unicode blocks listed are valid for Unicode version 8.0.
List of SIP Unicode fonts
Of the many Unicode fonts rich with a vast amount of SIP characters, the few listed below are the most commonly used. Please also consult the list above of BMP fonts and the supplemental list of SMP fonts, as some of them also contains SIP plane characters.
20000-2FFFF
Unicode blocks listed are valid for Unicode version 8.0.
E0000-EFFFF
Unicode blocks listed are valid for Unicode version 8.0.