User:Artsakenos/CCD-Guide

(----This will appear on the main page with the link to the subpage----) In order to understand the purpose of this work and to preserve its technical intent, please, read the guidelines before editing.

(----This will be the subpage----)

Introduction and Rules

The purpose of this work is to provide purely graphical decomposition of chinese characters. The decomposition is not meant to be etymological. If a character is said to be composed of two simpler characters, it can theoretically be drawn by superposing the corresponding two simpler characters.

Character decomposition reflect etymology (historical composition) most of the time, but not always; there has been historical variations. Variants are grapĥically different, so the decomposition should reflect that : there is not necessarily graphical derivation between these variants.

There could be many ways to decomposite a Chinese character, we should stay nearest as possible to the one which shows the character's meaning and how it is created. Therefore, the best way is to refer to "Shuo Wen Jie Zi"（說文解字） and "Kangxi Zidian"(康熙字典）. For example, 雖 means "as big as a lizard", and should be decomposited to "虫" and "唯", where 虫 is the Radical, and 唯 is the pronounciation. Another examples, "臨" should be decomposited to "臥" and "品". "發" to "弓" and "癹". Rules should be set to get rid of misdecompositions. --Wihwang (talk) 10:58, 4 February 2009 (UTC) This might depend on the purpose of decomposition data. For instance, a computer program to generate stroke orders or graphical glyphs for Chinese characters would work properly for 雖 only using the graphical decomposition into 虽 and 隹. Perhaps it is necessary to distinguish, for some characters, between an etymological decomposition and a graphical one.

Caution

Don't spoil the tabulations - Please leave this file in a machine-readable form. Thank you.
If you intend to work for some time (couple of days?) on this file by downloading / uploading it, please leave a note in order to avoid edition conflicts. Thank you.

Sources and References

Richard Sear's site: The most complete online etimology source
YellowBridge
Chinese Character
HSK Tools
Étymologie graphique en chinois

File format

1	2	3	4	5	6	7	8	9	10	11
勰	15	吅	劦	6		思	9		KSWP	力

 1. Chinese characters (sorted by unicode order).
 2. Number of strokes in the character (not always reliable)
 3. Composition kind (see below)
 4. First character part (may be composed of several characters, if the composition does not exist as a single character).
 5. ...and Number of strokes in this first character part.
 6. Verification for the first part (empty = verification made; "?" = still to do).
 7. Second character part ; "*" when no different second part (primitives, or repetitions).
 8. ...and Number of strokes in this second character part.
 9. Verification for the second part.
10. KanJi codification (for easy sorting)
11. Radical (or * if the character itself is the key)

Composition kind

一 = Graphical primitive, non composition (second character is always *)
吅 = Horizontal composition (when repetition, the second character is *)
吕 = Vertical composition (when repetition, the second character is *)
回 = Inclusion of the second character inside the first (门, 囗, 匚...)
咒 = Vertical composition, the top part being a repetition.
弼 = Horizontal composition of three, the third being the repetition of the first.
品 = Repetition of three.
叕 = Repetition of four.
冖 = Vertical composition, separated by "冖".
+ = Graphical superposition or addition.
? = Unclear, seems compound but ...
* = Vertical combination, but atypical.

Note: There is a standard to describe decomposition rules (reported in User:Artsakenos/CCD-ISO10646), which is not in use here for different reasons: e.g., (i) there is no "three characters composition" like 罒 or 目, the composition is (nearly) always a 2+1 one. (ii) the "surround" kind is given by the surrounding character, there is no need to state it by a separate code. (iii) the 冖 composition is not identified (actually it is the only one to be of a true 目 kind).

Statistics

The Table Contains 20902 decompositions from 一 (4e00) to 龥 (9fa5)

Composition Kind Count: 品=48,吅=14958 (71.6%),*=4,+=196,冖=132,叕=4,弼=60,十=1,咒=77,一=283,吕=4725 (22.6%),回=412,?=2

Verification Part1: =19203 (91.9%),?=1699

Verification Part2: =18698 (89.5%),?=2204

Most common compounds: 木: 1025, 口: 725, 金: 706, 氵: 1032, 艹: 968

Notes

This project has been exploited by: MDBG (see [1]).

TODOs and Suggestions

(----To appear in the talk page?----) Do you agree with the possibility to add one more field where to record a set of tag which describe the tipology of the graphical decomposition? (e.g., pictophonetic, associative, (and following the categorization work of Michelet) representing an idea, deriving from an imagine, ...)

Can the project be moved to its original Chinese_characters_decomposition URL?