Bridging the Divide: Supporting Minority and Historic Scripts in Fonts: Problems and Recommendations

Deborah Anderson (dwanders@sonic.net), UC Berkeley, United States of America

Introduction

Today, users of many modern minority and historic scripts in Unicode are not able to reliably send text electronically, because Unicode-enabled fonts and software are not available.

¹

In addition, some communities have access to Unicode fonts, but the fonts aren’t used, because they do not provide features deemed necessary, such as positioning of characters (e.g., Egyptian Hieroglyphs
[Richmond and Glass, 2016]) or variant glyphs (e.g., Old Italic [Anderson, 2017]). Instead, images are used, which are not searchable or, alternatively, “hacked” fonts are employed, which require each person to have the same, non-standard font to send text. Keyboards or other input mechanisms are also not available for many of these same scripts. As a result, the promise that Unicode will “enable people around the world to use computers in any language” (Unicode Consortium, 2018a), does not yet ring true for some communities.

This short paper will highlight font-related problems with specific examples and will provide suggestions on how to address them.

Problems

Creating a Unicode-enabled font for a language is often not a simple task, especially when the script for the language includes combining marks (which require correct positioning), or if the script has special rendering behavior, such as the consonant clusters found in South Asian scripts (Evans, 2017).

Font creation is made more challenging when typographic details on the script (and language) are not available. Since many recently approved scripts in Unicode are not well known, information on the typography is not readily available. Unfortunately, fine details are often not included in Unicode proposals for the scripts.
Interaction with the user community is critical in developing a suitable font, but some communities are difficult to contact. In addition, there can be differing views on the preferred shapes of glyphs. For a set of 51 Tamil numbers and fractions, for example, the community took 8 years to come to agreement on the preferred representative shapes. Specific cases will be cited, based on the author’s experience, including discussion of how to connect user communities with font providers.

Technical Issue: Glyph Variants

For some script users, access to glyph variants is important. This is true, for example, for the Old Italic Unicode block which unified several related alphabets of Italy, dating from approximately the 8 until 1c BCE. In Old Italic, the glyph in a particular alphabet may vary from that shown in the Unicode Standard.

The Old Italic block was encoded with the understanding that different fonts would be used for the different languages and alphabets (Unicode Consortium, 2017). How should the two forms of Faliscan (above) be handled in the same font then? How should a pan-Old Italic font handle the different alphabets (which use the same code points)?

This paper will describe the pros and cons of different options available, including use of:

Code points in Unicode’s Private Use Area (with the caveat that these code points would not be reliable for general interchange) (Unicode Consortium, 2018c).
A Unicode variation sequence, when a distinction needs to be captured in plain-text (Unicode Consortium, 2018d).
An OpenType font feature, such as character variants, stylistic alternates, stylistic sets, or localized forms (Microsoft Typography, 2018).
Language-specific fonts (i.e., Faliscan1 and Faliscan2 fonts for the two forms above).

Appendix A

Bibliography

Anderson, D. (2017). Dealing with Variants in Historic Scripts. Presentation at 41
^st Internationalization and Unicode Conference, Santa Clara, California, October, 2017.
Evans, L. (2017). Beyond Unicode Proposals: Encoding Characters and Scripts is Not Enough! Presentation at 41
^st Internationalization and Unicode Conference, Santa Clara, California, October 2017.
Google.com. (n.d.). Google Noto Fonts.
https://www.google.com/get/noto/ (accessed April 17, 2018).
Microsoft Typography. (2017a). Language system tags.
https://www.microsoft.com/typography/otspec/languagetags.htm (accessed April 17, 2018).
Microsoft Typography. (2017b). Script tags.
https://www.microsoft.com/typography/otspec/scripttags.htm (accessed April 17, 2018).
Microsoft Typography. (2018). OpenType® specification
.
https://www.microsoft.com/en-us/Typography/OpenTypeSpecification.aspx (accessed April 17, 2018).
Phillips, A., and Davis, M. (2009). Tags for Identifying Languages.
https://tools.ietf.org/html/bcp47 (accessed April 17, 2018).
Richmond, B. and Glass, A. (2016). Proposal to encode three control characters for Egyptian Hieroglyphs. Proposal submitted to the Unicode Technical Committee.
http://www.unicode.org/L2/L2016/16018r-three-for-egyptian.pdf (accessed April 17, 2018).
SIL International. (2017). ISO 639-3: ISO 639 Code Tables.
http://www-01.sil.org/iso639-3/codes.asp (accessed April 17, 2018).
Unicode Consortium. (2017). Old Italic. In: Unicode Consortium,
The Unicode Standard, Version 10.0.0. Mountain View, CA: The Unicode Consortium, 349-351. http://www.unicode.org/versions/Unicode10.0.0/ (accessed 24 April 2018).
Unicode Consortium. (2018a). The Unicode Consortium website.
http://unicode.org/
[accessed April 17, 2018).
Unicode Consortium. (2018b). CLDR – Unicode Common Locale Data Repository. Unicode Consortium website. http://cldr.unicode.org (accessed April 17, 2018).
Unicode Consortium. (2018c). Private-Use Characters, Noncharacters & Sentinels FAQ. Unicode Consortium website.
http://www.unicode.org/faq/private_use.html (accessed April 17, 2018).
Unicode Consortium. (2018d). Variation Sequences. Unicode Consortium website.

http://www.unicode.org/faq/vs.html (accessed April 17, 2018).

Notes

Especially true for scripts in Unicode versions 6.0 to 9.0 (2010 – 2016), where over 40% of the scripts have no fonts. (Unicode version 10.0 was released in June 2017, so support in fonts would not yet be expected). The Google Noto project aims to provide fonts for all approved scripts, but release of fonts is only up to fonts for Unicode version 6.2, released in 2012.

Bridging the Divide: Supporting Minority and Historic Scripts in Fonts: Problems and Recommendations

Appendix A

Leave a Comment Cancel reply