Girish Mahajan (Editor)

Uniscribe

Updated on
Edit
Like
Comment
Share on FacebookTweet on TwitterShare on LinkedInShare on Reddit

Uniscribe is the Microsoft Windows set of services for rendering Unicode-encoded text, especially complex text layout. They are implemented in the DLL USP10.DLL. USP10.dll became available to the public with Windows 2000 and Internet Explorer 5.0. In addition, the Windows CE platform has supported Uniscribe since version 5.0.

Contents

Although Uniscribe continues to be maintained, with Windows 7 its intended replacement DirectWrite was introduced, which has more features.

USP10.dll

USP is an initialism for Unicode Scripts Processor. The main purpose of Uniscribe includes the following:

  1. arranging input text from the input sequence to visual sequence.
  2. substituting glyphs according to context (e.g. different forms of Arabic characters)
  3. ordering displayed text based on text flow direction (e.g. LTR vs RTL, Horizontal vs Vertical)

Below are listed some common versions of usp10.dll, as well as the methods by which they are distributed.

Features are added according only the "major.minor" part of the version number, the third part in the full version number is used for system target identification numbers for which the DLL was ported by Microsoft, and the last part is the build number on each target system version (which may change within regular system/software updates). Some hotfixes provide upgrades only for specific applications (notably in the Office installation directory), and not suited for use in the Windows system directory (whose version of the DLL should never be updated and is often protected by the system) :

File sizes may vary depending on specific localizations of the DLL (depending on the target system or application for which it was compiled); those given here are for the US-English localization.

Universal Shaping Engine

Scripts with complex text layout have contextual and non-linear requirements to correctly render their typography. These requirements include: ligatures, where two consecutive characters have to be combined into one shape (Latin, Devanagari); reordering, where some characters have to be displayed before the letter they follow in actual pronunciation (Bengali, Sinhala, and other Indic languages); and context-shaping, as in cursive scripts where some letters have to change shape depending on whether they occur in the beginning, middle, or the end of the word (Arabic, Mongolian).

UniScribe uses several script-specific shaping engines for handling typography in supported complex scripts; these are implemented in addition to a generic engine for non-complex scripts (such as Cyrillic, Greek, Latin, etc.). The currently used engines include Indic (Bengali, Devanagari, Gujurati, Gurmukhi, Kannada, etc.), Arabic, Hangul, Hebrew, Khmer, Myanmar, and Thai/Lao variants.

The complexity of the Unicode standard and ambiguities in OpenType specification often result in incomplete or erroneous implementations of complex text layout. Script-specific shaping engines work on a case-by-case basis and do not consistently handle common features of OpenType fonts, which makes it difficult for OS programmers and font developers to support new scripts. Implementation errors are very hard or impossible to correct at a later stage without breaking up backward compatibility for existing documents and fonts, often requiring new OpenType layout features and a redesign of existing fonts and typography rendering engines.

In Windows 10, a major refactoring work was done to implement a generalized shaping model, the Universal Shaping Engine (USE). This engine is directly based on glyph properties defined in the Unicode standard, so any complex script with a suitable font would be supported without the time and effort required to create a dedicated shaping engine. Microsoft worked with the Unicode Technical Committee to make shaping requirements available in a machine readable format, so a complete definition of each supported script will be included in the Unicode standard and updating or adding new scripts will be significantly simplified.

USE builds on a generalized "universal cluster model" developed for the Indic scripts, which models a superset of human writing systems. The engine classifies each character of a complex script into several categories, base classes and subclasses. For example, a provisional Indic classification includes general, syllabic and positional categories, further divided them into base (number, consonant, tone letter, dependent vowel, etc.), base vowel (independent vowel), number (Brahmi joining number), final, medial, and modifier consonants, medial consonants, as well as top, bottom, left and right consonants and vowels. Unicode symbol strings are converted into collection of USE classes using well-defined rules, making glyph composition a standard procedure and allowing inter-character interactions not possible with current language features defined in OpenType specifications.

The Universal Shaping Engine was presented at the OpenType Developer Meeting in 2014; a compatible approach has also been implemented by the open source HarfBuzz text shaper. In Windows 10, the USE handles a total of 45 complex scripts: Balinese, Batak, Brahmi, Buginese, Buhid, Chakma, Cham, Duployan, Egyptian Hieroglyphs, Grantha, Hanunoo, Javanese, Kaithi, Kayah Li, Kharoshthi, Khojki, Khudawadi, Lepcha, Limbu, Mahajani, Mandaic, Manichaean, Meitei Mayek, Modi, Mongolian, N’Ko, Pahawh Hmong, Phags-pa, Psalter Pahlavi, Rejang, Saurashtra, Sharada, Siddham, Sinhala, Sundanese, Syloti Nagri, Tagalog, Tagbanwa, Tai Le, Tai Tham, Tai Viet, Takri, Tibetan, Tifinagh and Tirhuta.

Versions

Although Uniscribe has been available since Windows 2000, new versions of Uniscribe provided more functions to the system, namely, support for other writing systems. An earlier update of it supports the display of Arabic and Hebrew, then Thai and Vietnamese. Since Windows XP, more South Asian and Assyrian alphabets are supported.

References

Uniscribe Wikipedia