Open main menu

This is the Commons:Ancient Chinese characters project's tutorial for creating SVG scalable images from orginary gif files for the project.


Selecting the source imagesEdit

  1. Choose a Chinese character, Japanese Kanji or Korean Hanja that has not been converted to SVG before or which might be done somewhat better.
  2. Pick up one or even all of the images for the styles seal, bigseal, bronze and oracle:
  3. Find data and pictures of the character in question at InternationalScientific (Richard Sears allowed use of his data[1])
  4. Select the to you most interesting picture of a given category and download it to your computer.
    • For each style, save the selected gif or png image on you computer. To distinguish between them you may follow this naming convention:
    • oracle => *-oracle.gif / png
    • bronze => *-bronze.gif / png
    • bamboo and silk => *-silk.png
    • bamboo and slip => *-slip.png
    • seal => *-seal.gif / png
    • bigseal => *-bigseal.gif

Please keep the code name of the images, i.e. "J12333" in J12333.gif, this will be needed later.

Conversion from gif to SVG formatEdit

Conversion using InkscapeEdit

Follow these steps to convert the gif image to SVG, e.g. *-seal.gif to *-seal.svg. For more information, please see the detailed picture guide below.

  1. Paste the gif image file into Inkscape and set the page size to 300px × 300px.
  2. Scale the gif image to 300px in height, uniformly, that is preserving its original proportions.
  3. Select > Path -> Trace Bitmap from the tool bar (Shortcut: Shift + Alt + B)
  4. Run a Single Scan either with Brightness cutoff = 0.950 or Color quantization = 2
  5. Retouch the path to make up for the gif file's low quality and correct obvious mistakes
  6. Enlarge the SVG path so that it has a height of 290px, again uniformly to preserve the original proportions.
  7. Center the SVG image relative to the page by selecting > Object -> Align and Distribute from the tool bar (Shortcut Shift + Ctrl + A). Do this vertically and horizontally.
  8. If everything went well, then you can delete the underlying gif image which is still there under the path.
  9. Save your file according to the naming convention, e.g. *-seal.svg, where * is the character you looked for in step 1.

Detailed picture guideEdit

Steps described using Inkscape

Please see this tutorial or click on the thumbnail on the left side for a detailed step-by-step tutorial on how the conversion is done.

Creating SVG files is really important since SVG is scalable, while gif is not. Converting these images to SVG is really easy even with no experience in image manipulation, and Wikimedia benefits greatly from your work. Thank you for your contribution.

After the usual learning curve (about five characters), it will usually take from three to five minutes to convert a single character with shape improvements. Of course complex characters may need a little more attention and simple ones (like the one in the example image above) may need a little less attention.

Conversion using potraceEdit

Inkscape is not the only program allowing to convert raster images to vectorial images. Another program which gives good results is 'potrace'. 'potrace' can convert 'bmp' images to SVG and is scriptable under Linux. So you could convert the GIF to BMP using ImageMagick convert and then the BMP to SVG using 'potrace'.

A sample Shell script taking the base GIF file name as argument follows:

convert $1.gif $1.bmp
potrace -s $1.bmp

The Linux tutorial describes how to automate the task of generating centered SVG files from the GIFs.

Upload to Wikimedia CommonsEdit

The ACClicense templateEdit

Upload the converted gif file to Wikimedia Commons and use the ACClicense-Template as follows: Attempt to set up an universal template for ACC project and file from © 2003 Richard Sears.

When you upload an SVG ancient Chinese character, please copy past this "ACClicense" template such as :

{{ACClicense  |字|oracle |shang|strokes=|component1=}}   (for Shang oracle images)
  {{ACClicense|字|oracle|zhouyuan|strokes=|component1=}}   (for Zhouyuan oracle images)
{{ACClicense  |字|bronze|western|strokes=|component1=}}   (for Western Zhou bronzeware images)
  {{ACClicense|字|bronze|shang|strokes=|component1=}}   (for Shang bronzeware images)
  {{ACClicense|字|bronze|spring|strokes=|component1=}}   (for Spring and Autumn bronzeware images)
  {{ACClicense|字|bronze|warring|strokes=|component1=}}   (for Warring States bronzeware images)
{{ACClicense  |字|slip|chu|strokes=|component1=}}   (for Chu slip images)
  {{ACClicense|字|silk|chu|strokes=|component1=}}   (for Chu silk images)
{{ACClicense  |字|slip|qin|strokes=|component1=}}   (for Qin slip images)
 {{ACClicense |字|silk slip|strokes=|component1=}}  (for other silk slips)
{{ACClicense  |字|seal|strokes=|component1=}}   (for Shuowen seal images)
  {{ACClicense|字|ancient|strokes=|component1=}}   (for Shuowen ancient images)
  {{ACClicense|字|zhou|strokes=|component1=}}   (for Shuowen zhou images)
  {{ACClicense|字|odd|strokes=|component1=}}   (for Shuowen odd images)
{{ACClicense  |字|liushutong|strokes=|component1=}}   (for Liushutong images)
Parameter Mandatory? Meaning
#1 yes Character being described    This character becomes a category    To avoid that categorizing, use cat=r with par#6 and stroke=.
#2 yes type of script (may be one of : seal / bigseal / bronze / silk / oracle).
#3 yes for oracle and bronze The time period of unearthed characters (oracle, bronze, silk, slip). Oracle, silk and slip will be auto-filled, so this parameter is just for bronze (may be one of : shang (Shang) / west/western/western Zhou/Western Zhou/zhouyuan (Western Zhou) / spring/spring and autumn/Spring and Autumn (Spring and Autumn) / war/warring/warring states/Warring States for Warring States).
#4 (no) The fourth data is the code name of the former GIF image in Richard Sears. It is not required, but welcome if your source is Richard Sears.
#5 (no) The fifth parameter is for various comments, if needed. But linguistic informations are not welcome: other websites provide such datas with more accuracy than us.
(no)* The sixth parameter, or rad=, enables categorization depending on the system of the 214 Traditional Kangxi Radicals; it can be coded as a three digit number, e.g. <078> or <078{{!}}sort-value> (leading zeroes are not necessary and may be omitted).
#7 (no) The seventh parameter is for free comment (Kangxi variant)
#8 (no) The eighth parameter enables categorization (only to use if the character belongs to the 540 Traditional Shuowen Radicals, or its variants); it should be coded as a three digit number, e.g. <078> or <078{{!}}sort-value>.
#9 (no) The ninth parameter is for free comment (Shuowen variant)
kaiOrder= (no) The Kaishu number of a character in Sinica Database which can be found in Sinica's URL. It is not required and only used when the the character being described is different from the source.
fontcode= (no) The code name of the former PNG image in Sinica Database which can be found in Sinica's URL. It is not required, but welcome if your source is Sinica.
draw= (no, disabled for Liushutong script) The draw= allows a classification of the glyph according to its strokes number, for comparison and research purpose. See the stroke tutorial if you intend to use it. This function is automatically disabled for Liushutong script, so please don't include this function when you upload -bigseal.svg files. See also Commons:Maint:ACC:NoDraw
component1= yes The component1= allows a classification of the glyph according to its components, for comparison and research purpose. It should be used for simple or compound characters. If no component is given, the character is indexed in the category:ACC needing decomposition category.
component2 to component5 (no) The Other components, used for compound characters.
permission= (no) You can also provide your own free license(s).
strokes= (no)* the number of additional strokes; together with the radical number 6= used by template:Rcat for categorizing.
cat= (no)* <cat=n> for none; set <cat=r> to use template:Rcat for categorizing: with parameter #6 and strokes
see also Commons:Maint:ACClicense for undefined cats
i= (no)* to add the parameters for template:Igen: e.g. i=I[nkscape]; i=P[otrace]; see also Commons:Maint:ACC:Unspec for undefined SVG tools.
Parameters flagged with (no)* should be provided.

You do not have to type anything else while uploading your picture as the license and description are automatically added by using this template!

The HTML form to be used is the basic form, which allows to enter the template above in the "Summary" field.

For the character used in the tutorial this would be:

{{ACClicense|木|oracle||j14138|strokes=4|component1=木}} because this character was found by searching InternationalScientific for "木" and then using the oracle style picture provided there. The j14138 was the original file name of that picture, j14138.gif.

Note: {{ACClicense|木|oracle|oracle|j14138||075|strokes=4|component1=木|i=I}} would be correct, because

  • 075 木 shows the traditional Chinese radical 75
  • i=I the SVG image was created with Inkscape

To inhibit categorization, instead of strokes=4 it can be coded strokes=0|cat=r:

  1. the template Rcat will check whether the category (in this case ) exists; if not yet, the "redlink" categorization does not occur
(when later the category becomes defined, an 'empty' edit-save changes the categorization)
  1. because ACClicense knows the stroke count of all radicals (木 consists of four strokes), just the number of the additional strokes must be specified - zero for images of radicals.

-- sarang사랑 12:08, 15 January 2010 (UTC)

Identifying character componentsEdit

Character components are used to classify the characters in the Category:Ancient Chinese characters by components.

The component1 to component6 parameters are to be chosen this way:

  • Be sure to identify the component on the Ancient Chinese Character, which may be different from the modern one!
  • Most of the time, each character component may be separately indexed. Some exceptions are:
    • If a compound character appears in several (let us say, more than five) ancient characters, it may have a category of its own and be indexed separately (in that case, the compound character category is itself part of its components' categories).
    • If a character is used in only one or two ancient character, there is no need for a separate category. In that case, the "Misc" component name must be used.
  • If a component cannot be identified, do not skip it but replace it with a "?" component (question mark). Somebody else may know better while further browsing the Category:ACC containing ? and replace it by its value.