12/22/2023 0 Comments Github git annexThe database consists of a number of fields containing data for each Han ideograph in the Unicode Standard. This document is a guide to that data, describing the mechanics of the Unihan database, the nature of its contents, and the status of the various fields. The Unihan database therefore includes structural analyses and definitions for ideographs. Most ideographs are divided into a determinative, which gives a vague sense of meaning, and a phonetic, which gives a vague sense of pronunciation. This isn’t to say that ideographs are truly ideographic, in that they represent abstract ideas but they generally have one root meaning from which the others derive, and generally retain the bulk of their semantic content across linguistic boundaries. Unlike characters in Western scripts such as Latin and Greek, whose basic property is their sound, which stays largely constant across languages, the basic property for Han ideographs is their meaning. Beyond all this, it’s important to track not only what properties a given ideograph has, but who claims it has those properties. Relationships between ideographs need to be defined to allow for fuzzy string matching. Data in character sets not included in the world of international standards bodies needs to be converted. Input methods require information such as pronunciations, as do collation algorithms. In practice, implementation of ideographs requires large amounts of ancillary data. That is, the Unicode Standard does not formally define what the ideograph U+4E00 is rather, it defines it as being the equivalent of, say, 0x523B in GB/T 2312, 0x14421 in CNS 11643, 0x306C in JIS X 0208, and so on. It contains mapping data to allow conversion to and from other coded character sets and additional information to help implement support for the various languages which use the Han ideographic script.įormally, ideographs are defined within the Unicode Standard via their mappings. The Unihan database is the repository for the Unicode Consortium’s collective knowledge regarding the CJK Unified Ideographs contained in the Unicode Standard. 4.5 Listing of Additional Sources Used by the Unihan Database.4.4 Listing of Characters Covered by the Unihan Database.4.3 Listing by Location within Unihan.zip.4.2 Listing by Date of Addition to the Unicode Standard.3.7.1 Simplified and Traditional Chinese Variants.2.1.2 Sorting Algorithm Used by the Radical-Stroke Charts.2.1.1 Extension of Unihan Properties to Non-Unihan Characters.įor any errata which may apply to this annex, see. įor more information about versions of the Unicode Standard, see. įor a list of current Unicode Technical Reports, see. “ Common References for Unicode Standard Annexes.”įor the latest version of the Unicode Standard, see. Related information that is useful in understanding this annex is found in Unicode Standard Annex #41, Please submit corrigenda and other comments with the online reporting The version number of a UAX document corresponds to the version of the Unicode Standard of which it forms a part. The Unicode Standard may require conformance to normative content in a Unicode Standard Annex, if so specified in the Conformance chapter of that version of the Unicode Standard. Material or cited as a normative reference by other specifications.Ī Unicode Standard Annex (UAX) forms an integral part of the Unicode Standard, but is published online as a separate document. This is a stable document and may be used as reference Interested parties, and has been approved for publication by the UnicodeĬonsortium. This document has been reviewed by Unicode members and other This document describes the organization and content of the Unihan database. Unicode® Standard Annex #38 Unicode Han Database (Unihan) Version
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |